Ridge and Lasso Regression

Learn about ridge and lasso regression, their comparison, and the importance of their contours’ intersection with MSE.

Without regularization, we only have to optimize for training loss, but with regularization, we need to optimize for both training loss (data) and the regularization function (smoothness).

Ridge regression

When the model fwf_\bold w is linear in parameters, the loss function LL is squared loss, and the regularization function RR is L2 norm, then the regression problem is known as ridge regression.

Linear model

Let D={(xi,yi)1in}D=\{(\bold x_i,y_i) | 1 \le i \le n\} be a training data set for regression with a single real target, that is, yiRy_i \in \R. When the number of features is dd, that is, xi=(xi1,xi2,,xid)\bold x_i=(x_{i1},x_{i2},\dots,x_{id}) ...