What is polynomial regression?

Polynomial regression is a statistical technique used to model the relationship between a dependent variable and one or more independent variables. Polynomial regression uses higher-order terms like squares and cubes for non-linear patterns, in contrast to simple linear regression, which relies on a linear relationship between the variables.

When to use polynomial regression?

Polynomial regression is useful when the relationship between the variables exhibits curvature or when a linear model fails to capture the data adequately. It allows us to capture complex phenomena like diminishing returns, saturation effects, or exponential growth. Polynomial regression has applications in physics, economics, social sciences, and engineering.

Polynomial function

A polynomial function of degree nn is expressed as:

Here, yy represents the dependent variable, xx denotes the independent variable, and βo,β1,β2,,βn\beta_o,\beta_1,\beta_2 ,…,\beta_nare the coefficients to be estimated. The model's complexity is determined by the polynomial's degree, with higher degrees allowing for more flexible curve fitting.

Evaluation metrics for polynomial regression

When evaluating the performance of a polynomial regression model, several metrics assess its accuracy and goodness of fit. Some evaluation metrics for polynomial regression are:

Mean squared error

The mean squared error is a commonly used evaluation metric in regression tasks. It computes the average squared difference between the predicted and actual values.

A lower MSE value indicates better model performance, with zero representing a perfect match between predicted and actual values.

Root mean square error

Root mean square error (RMSE) is an evaluation metric commonly used in regression tasks to measure the average magnitude of errors between predicted and actual values. It measures how well the predictions of the model match the true values.

RMSE is the same as MSE. The only key difference between the two lies in the way they handle the scale of the error values.

R-squared (R2)(R^2)

It indicates the proportion of the dependent variable's variance that can be explained by the independent variables in the model. It is between 0 and 1, and higher values indicate a better fit.

Where SSR is sum of squared regression and SST is total sum of squares, yi^\hat{y_i} is the predicted value of y and yiˉ\bar{y_i} is the mean value of y.

However, R-squared alone may not capture the complexity of non-linear relationships accurately.

Code Example

Here's an example of how to evaluate mean squared error (MSE), root mean square error (RMSE), and R-squared (R2)(R^2) using Python:

import numpy as np
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
x = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 6, 8, 10])
poly_features = PolynomialFeatures(degree=2)
x_poly = poly_features.fit_transform(x)
model = LinearRegression()
model.fit(x_poly, y)
y_pred = model.predict(x_poly)
mse = mean_squared_error(y, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y, y_pred)
print("Mean squared error (MSE):", mse)
print("Root mean square error (RMSE):", rmse)
print("R-squared (R2):", r2)

Explanation

Line 6–7: Create an array x as the input data and y as the corresponding target variable.

Line 9–10: PolynomialFeatures generates polynomial features up to degree 2, transforming x into x_poly.

Line 12–13: Create a linear regression model and fit it to the polynomial features x_poly and the target variable y.

Line 15: Predict y using the fitted model and x_poly.

Line 17–19: Evaluate MSE, RMSE, and R-squared between the true values y and the predicted values y_pred.

Real-world applications

Polynomial regression finds applications across various domains. Here are a few examples:

  • Manufacturing and engineering: Polynomial regression finds applications in optimizing manufacturing and engineering processes. It allows for modeling the relationships between input variables (e.g., temperature, pressure, time) and output parameters (e.g., product quality, efficiency).

  • Biological and medical research: Polynomial regression is applied in biological and medical research to analyze non-linear relationships between variables, such as gene expression levels and disease progression. It aids in understanding complex biological systems, identifying biomarkers, and developing treatment strategies.

  • Energy consumption forecasting: Energy providers actively employ polynomial regression to forecast energy consumption patterns by considering weather conditions, time of day, and historical data. This proactive approach enables them to optimize resource allocation, effectively manage demand, and promote energy efficiency. 

Learn about the implementation of polynomial regression.

Conclusion

Polynomial regression allows to model non-linear relationships between variables. While linear regression assumes a straight line relationship, polynomial regression accommodates curves and bends, capturing the complex patterns in the data.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved