Regularization is a technique frequently employed in machine learning and Artificial Intelligence to ensure that trained models generalize well beyond the specific data they were trained on. It helps prevent overfitting, ensuring the model performs effectively on new, unseen data. The two most commonly used types of regularization are L1 and L2 regularization, which will be discussed in subsequent sections.
When a model is overfitted, it performs exceptionally well on the data it was trained on, but its accuracy drops significantly when presented with unseen data. Regularization techniques come into play to help achieve the right balance in the model.
The regularization is accomplished by either reducing the importance of certain features or entirely removing them from the model during training.
The regularized function is simplified and less prone to overfitting, as illustrated in this basic example. This demonstrates the effectiveness of regularization in reducing complexity and improving generalization.
To update the weights while training the model, we try to minimize the cost function associated with a regression model. A cost function measures the predictability power of a machine learning model on a dataset. In regularization, we add another parameter
When
When
In the given diagram, four features are represented: swimmers, temp, stock_price, and watched_jaws. At the top, the total number of features remaining in the model is depicted, gradually decreasing from 4 to 0. As the regularization parameter (
L1 or Lasso regression is used to simplify the models by ultimately shrinking the parameters to zero with the help of regularization parameter
L1 allows feature selection that assigns zero weight to unimportant input features and non-zero weight to valuable features. It outputs a sparse solution where most features will have zero weights.
L2 or Ridge regression is used to lessen the impact of a feature during model training. It makes the weights small but not zero. The updated cost function with L2 regularization is
L2 works better when all the input features have a strong impact on the output, and the weights assigned to them are approximately of the same magnitude.
Both L1 and L2 regularization are used in ML models for training purposes. It depends on the use case, such as L1 is more helpful in dealing with higher dimensional data, whereas L2 is more useful where we want a contribution from all of the features by varying degrees of importance of each feature. Moreover, they can be used in combination as well, which is another type of regularization called Elastic Net regularization.
The differences between L1 and L2 are described below in the table.
L1 or Lasso Regression | L2 or Ridge Regression |
Penalizes the sum of absolute value of weights | Penalizes the sum of square weights |
Sparse solution | Non-sparse solution |
Robust to outliers | Not robust to outliers |
Cannot learn complex patterns | Can learn complex patterns |
Built in feature selection | No feature selection |
Reduces noise | Unable to reduce noise |
Regularization is used to enhance the predictability of the model by preventing over-fitting. It adds a penalty term
Free Resources