Regularization

Learn what regularization is and how it affects validation and training error.

Regularization is a technique to reduce the variance of a model. One such way is restricting the parameters to a subset of the parameter space. Reduction in variance turns out to be a prevention of overfitting.

Why not choose a simple model?

Starting with a model that’s too simple and gradually increasing its complexity by monitoring its performance on the testing data is one solution. Regularization does the reverse, that is, starting with a complex model and decreasing its complexity.

Press + to interact

The answer to this question of why not choose a simple model has to do more with implementation than theory. Regularization is more systematically implementable compared to increasing the model complexity gradually. Furthermore, different regularization methods offer different ways to reduce the variance of the model, where one way might be better than the other for a task at hand.

Shrinkage method

In shrinkage-based regularization, the parameters are restricted to stay close to zero (shrink to zero). One way is to apply this restriction explicitly while minimizing the loss, that is, minimize L(fw(x),y)L(f_{\bold w}(\bold x),\bold y) subject to \alpha_1 < w_1 < \beta_1, \alpha_2 < w_2 < \beta_2,\dots,\alpha_k < w_k < \beta_k, where LL is a loss function and w=(w1,w2,,wk)\bold w=(w_1,w_2,\dots,w_k). It’s hard to come up with lower and upper limits for the parameters, and the limits might differ for different parameters. Since the goal is to shrink the parameters as much as possible while minimizing the loss LL, it’s natural to couple the shrinkage of parameters and the loss in a single objective:

minw{L(fw(x),y)+R(w)}\min_{\bold w}\{L(f_{\bold w}(\bold x),\bold y) + R(\bold w)\}

Here, RR is a shrinkage function.
> Note: The goal is to minimize the loss LL while shrinking the parameters w\bold w as much as possible.

Shrinkage functions

There are several choices of shrinkage functions. The most popular are the L2 norm and the L1 norm. L2 norm of a vector w ...