L1 and L2 regularization

L1 and L2 are two of the most common methods to regularize a decision boundary. It is the technical name for the operation that we informally called smoothing out. L1 and L2 work similarly, and they have mostly similar effects. Once we get into advanced ML territory, we may want to look deeper into their relative merits — but for our purposes in this course, we follow a simple rule either pick randomly between L1 and L2 or try both and see which one works better.

Let’s see how L1 and L2 work.

How L1 and L2 work

L1 and L2 rely on the same idea. They add a regularization term to the neural network’s loss. For example, here’s the loss augmented by L1 regularization:

\large{ L_{\text{regularized}}=L_{\text{non-regularized}}+\lambda \sum{|w|}}