What is regularization in linear regression?

Regularization manages the intricacy of a linear regression model. It achieves this by imposing penalties on coefficients that are considered non-essential or irrelevant to the predictive accuracy. Through regularization, the variance of the model is effectively reduced, safeguarding against overfitting and improving the model’s resilience to noise and outliers.

Home/Blog/Data Science/What are Regularization Techniques in Regression

What are Regularization Techniques in Regression

8 min read

Jan 22, 2024

content

Regression

Regularization

Ridge regression

Lasso regression

Elastic Net regression

Implementation in Python

Ridge regression

Lasso regression

Elastic Net regression

Future readings

Machine learning empowers computers to learn and make accurate predictions based on patterns in the data. We start by preparing the data and splitting it into training and test sets. We then select an appropriate model that best describes our problem. With the model in place, we adjust its parameters using the training data such that the model fits well. We then assess the performance of the trained model using the test data and appropriate evaluation metrics. Based on the evaluation results, we fine-tune the model’s hyperparameters to optimize performance. Finally, we deploy the trained and optimized model to generate predictions on new, unseen data.

This process aims to fine-tune the model to perform well on new unseen data and make accurate predictions.

Regression#

Regression is a statistical tool that describes the relationship between a dependent variable and one or more independent variables. This helps us understand how changes in one variable lead to changes in another.

Let’s take an example of a linear regression where we wish to model our dependent variable $y$ , based on the independent variable $x$ . The independent variables are the dataset’s features that quantify the dependent variable. Mathematically, we can write a linear regression model as follows:

y=w\cdot x + w_0

Where $w_0$ is the y-intercept and $w$ is the coefficient that represents the change in $y$ for a one-unit change in $x$ . During the training phase, we find the optimal values of $w_0$ and $w$ such that the regression equation fits the data. This process is called optimization, which minimizes a specified objective function. The objective function guides the optimization process by providing a quantitative measure of how well the model is performing. In linear regression, this objective function is the mean squared error (MSE). The MSE calculates the average of the squared differences between the predicted values $\hat{y}_i$ and the actual values $y_i$ . It can be written as follows:

L = \frac{1}{N} \sum_{i=1}^N (\hat{y}_i - y_i)^2

Regularization#

Regression models can sometimes suffer from overfitting. This is because, sometimes, when the model becomes too complex, it tries to fit every detail in the available features during the training phase. As a result, it also captures the noise in the training data. When we apply the same model to new unseen data, it fails to generalize well. This problem is called overfitting.

Take an example of a fitness tracker that monitors the progress of an athlete training for a marathon. To predict the completion time of a marathon, the fitness tracker records several quantities like sleep hours, calorie intake, and running distance. The fitness tracker employs a regression model to predict the completion time using these features. However, a fitness tracker might also capture irrelevant or noisy data caused by GPS inaccuracies or occasional outliers in calorie intake. These inaccuracies and outliers negatively impact the performance of the regression model. As a result, our model would perform well in the training phase but fails to provide accurate results in the evaluation phase when a new unseen terrain is presented to the model on the actual marathon day.

One way to address overfitting is to apply regularization techniques. Regularization techniques help control the model’s complexity and prevent overfitting by adding a penalty term to the model’s loss function, therefore, discouraging overly complex representations. It keeps the focus on the most relevant features and prevents the model from getting distracted by irrelevant or noisy details.

In simple terms, regularization is like having a knowledgeable guide that helps keep the focus on the important features and provides more accurate predictions for unseen data.

Let’s look at the commonly used regularization techniques.

Ridge regression#

Let’s assume we wish to build a linear regression model using multiple features, also known as multiple linear regression. We aim to find the best line that minimizes the difference between the observed and predicted value of the dependent variable based on multiple input features.

Now, assume that some of the input features are highly correlated, a phenomenon commonly known as multicollinearity. Recalling our example of predicting the completion time of the marathon, the calorie intake and running time might be highly correlated. This makes it difficult for the model to determine the individual effect of highly correlated features on the target variable. This can result in inaccurate regression coefficients.

Ridge regression introduces an L2 regularization term to the coefficients of the regression model. This penalty term is based on the squared values of the coefficients $w_i$ .

In ridge regression, the model aims to minimize the sum of squared coefficients of the features in addition to minimizing the errors between the actual and the predicted output. The sum of squared coefficients is scaled by a regularization parameter $\alpha$ that controls the strength of regularization. Mathematically, we can write the objective function for ridge regression as follows:

L_{\text{Ridge}} = L + \alpha \sum_{j=1}^p w_j^2

Where $p$ denotes the number of features or predictors in our model.

The penalty term $\alpha \sum_1^p w_j^2$ prevents the models from assigning excessively large weights to any specific predictor, which reduces the model’s sensitivity to noisy or unimportant variables.

Lasso regression#

In addition to multicollinearity, a large number of features also make the model complex. To deal with that, Lasso regression adds a penalty term L1 to the linear regression objective function based on the sum of the absolute values of the coefficients. This penalty promotes sparsity in the coefficient matrix by effectively performing feature selection by driving some coefficients to exactly zero.

L_{\text{Lasso}} = L + \alpha \sum_{j=1}^p |w_j|

Lasso regression differs from Ridge regression in terms of the penalty term. In Lasso regression, the penalty tends to identify the less important features and shrink the corresponding coefficients to exactly zero.

This “zeroing out” of less relevant features in Lasso regression simplifies the model and helps the model focus only on the most influential features. This enhances the model’s accuracy and ability to generalize to new unseen data.

Elastic Net regression#

Elastic Net regression combines both L1 and L2 penalties in the objective function as follows:

L_{\text{Elastic}} = L + \alpha \left[ L1_{\text{ratio}} \sum_{j=1}^p |w_j| + \frac{1}{2}(1-L1_{\text{ratio}}) \sum_{j=1}^p w_j^2 \right]

Here, $L1_{\text{ratio}}$ offers a balance between leveraging the strengths of both Lasso and Ridge regressions. The value of $L1_{\text{ratio}} = 1$ makes the Elastic Net regression the same as the Lasso regression, and an $L1_{\text{ratio}} = 0$ would provide all weightage to the Ridge regression penalty term. Elastic Net regression performs exceptionally well with datasets with high-dimensional and correlated features.

Implementation in Python#

Now that we have seen how regularization works, let’s implement a regression model to predict the price of a house based on its size, number of rooms, and location.

We will start by importing the necessary libraries and methods. We will then load the dataset and divide it into training and testing data sets. Let’s see how it can be done in Python.

Line 1: We import the Boston housing dataset.
Line 2: We import the methods for linear, Ridge, Lasso, and Elastic Net regressions.
Line 3: We import the train_test_split function from the sklearn library to split the data.
Lines 6–8: We load the Boston housing dataset into features X and the dependent variable y.
Line 11: We split the dataset using test_size=0.2 to select $80\%$ of the data for training and the remaining $20\%$ for testing purposes.

Ridge regression#

Now, let’s apply the linear and Ridge regression models on our dataset to predict the price of the house.

Here, we apply linear and Ridge regression models in lines 2–3 and lines 7–8, respectively. We also print the resulting coefficient matrix of the two models. The coefficient matrix consists of the coefficients $w_{j}$ that define the relationship between the $j^{\text{th}}$ independent variable and our dependent variable.

Note that the coefficients for the Ridge regression model are very similar to the linear regression model. This shows that the features are not highly correlated.

Lasso regression#

Now, let’s apply the Lasso regression model to predict the house price and compare the coefficients with linear regression.

Here, $L1_{\text{ratio}}$ is set to $0.5$ , giving an equal balance of the strengths to both Lasso and Ridge regressions. Similar to the Lasso regression, some of the coefficients are zero, eliminating their effect in calculating the output dependent variable.

Note: Go ahead and see how the coefficients change by changing the value of $L1_{\text{ratio}}$ . Remember, the value of $L1_{\text{ratio}}$ closer to $1$ makes the Elastic Net regression behave more like the Lasso regression, and an $L1_{\text{ratio}}$ closer to $0$ would provide more weightage to the Ridge regression penalty term.

Future readings#

This blog has briefly introduced commonly used regularization techniques and their implementation in Python. Regularization plays a crucial role to prevent overfitting, and it also improves the generalization of models. We encourage you to explore these techniques further to gain a deeper understanding. Additionally, you can check out the following courses on Educative:

Mastering Machine Learning Theory and Practice

Mastering Machine Learning Theory and Practice

The machine learning field is rapidly advancing today due to the availability of large datasets and the ability to process big data efficiently. Moreover, several new techniques have produced groundbreaking results for standard machine learning problems. This course provides a detailed description of different machine learning algorithms and techniques, including regression, deep learning, reinforcement learning, Bayes nets, support vector machines (SVMs), and decision trees. The course also offers sufficient mathematical details for a deeper understanding of how different techniques work. An overview of the Python programming language and the fundamental theoretical aspects of ML, including probability theory and optimization, is also included. The course contains several practical coding exercises as well. By the end of the course, you will have a deep understanding of different machine-learning methods and the ability to choose the right method for different applications.

36hrs

Beginner

109 Playgrounds

10 Quizzes

Become a Machine Learning Engineer

Start your journey to becoming a machine learning engineer by mastering the fundamentals of coding with Python. Learn machine learning techniques, data manipulation, and visualization. As you progress, you'll explore object-oriented programming and the machine learning process, gaining hands-on experience with machine learning algorithms and tools like scikit-learn. Tackle practical projects, including predicting auto insurance payments and customer segmentation using K-means clustering. Finally, explore the deep learning models with convolutional neural networks and apply your skills to an AI-powered image colorization project.

105hrs

Beginner

12 Challenges

28 Quizzes

Data Science Interview Handbook

This course will increase your skills to crack the data science or machine learning interview. You will cover all the most common data science and ML concepts coupled with relevant interview questions. You will start by covering Python basics as well as the most widely used algorithms and data structures. From there, you will move on to more advanced topics like feature engineering, unsupervised learning, as well as neural networks and deep learning. This course takes a non-traditional approach to interview prep, in that it focuses on data science fundamentals instead of open-ended questions. In all, this course will get you ready for data science interviews. By the time you finish this course, you will have reviewed all the major concepts in data science and will have a good idea of what interview questions you can expect.

9hrs

Intermediate

140 Playgrounds

128 Quizzes