Ridge regression introduces an L2 regularization term to the coefficients of the regression model. This penalty term is based on the squared values of the coefficients wi.
In ridge regression, the model aims to minimize the sum of squared coefficients of the features in addition to minimizing the errors between the actual and the predicted output. The sum of squared coefficients is scaled by a regularization parameter α that controls the strength of regularization. Mathematically, we can write the objective function for ridge regression as follows:
LRidge=L+αj=1∑pwj2
Where p denotes the number of features or predictors in our model.
The penalty term α∑1pwj2 prevents the models from assigning excessively large weights to any specific predictor, which reduces the model’s sensitivity to noisy or unimportant variables.
Lasso regression#
In addition to multicollinearity, a large number of features also make the model complex. To deal with that, Lasso regression adds a penalty term L1 to the linear regression objective function based on the sum of the absolute values of the coefficients.
This penalty promotes sparsity in the coefficient matrix by effectively performing feature selection by driving some coefficients to exactly zero.
LLasso=L+αj=1∑p∣wj∣
Lasso regression differs from Ridge regression in terms of the penalty term. In Lasso regression, the penalty tends to identify the less important features and shrink the corresponding coefficients to exactly zero.
This “zeroing out” of less relevant features in Lasso regression simplifies the model and helps the model focus only on the most influential features. This enhances the model’s accuracy and ability to generalize to new unseen data.
Elastic Net regression#
Elastic Net regression combines both L1 and L2 penalties in the objective function as follows:
LElastic=L+α[L1ratioj=1∑p∣wj∣+21(1−L1ratio)j=1∑pwj2]
Here, L1ratio offers a balance between leveraging the strengths of both Lasso and Ridge regressions.
The value of L1ratio=1 makes the Elastic Net regression the same as the Lasso regression, and an L1ratio=0 would provide all weightage to the Ridge regression penalty term.
Elastic Net regression performs exceptionally well with datasets with high-dimensional and correlated features.
Implementation in Python#
Now that we have seen how regularization works, let’s implement a regression model to predict the price of a house based on its size, number of rooms, and location.
We will start by importing the necessary libraries and methods. We will then load the dataset and divide it into training and testing data sets. Let’s see how it can be done in Python.