Fundamentals of Machine Learning: A Pythonic Introduction/

...

Regularization

Learn what regularization is and how it affects validation and training error.

We'll cover the following...

Why not choose a simple model?
Shrinkage method
- Shrinkage functions
  - Implementation of L2 norm
  - Implementation of L1 norm
Regularization parameter
- Understanding λ for model regularization
- Implementation for λ

Press + to interact

The answer to this question of why not choose a simple model has to do more with implementation than theory. Regularization is more systematically implementable compared to increasing the model complexity gradually. Furthermore, different regularization methods offer different ways to reduce the variance of the model, where one way might be better than the other for a task at hand.

Shrinkage method

In shrinkage-based regularization, the parameters are restricted to stay close to zero (shrink to zero). One way is to apply this restriction explicitly while minimizing the loss, that is, minimize $L(f_{\bold w}(\bold x),\bold y)$ subject to $\alpha_1 < w_1 < \beta_1, \alpha_2 < w_2 < \beta_2,\dots,\alpha_k < w_k < \beta_k$ , where $L$ is a loss function and $\bold w=(w_1,w_2,\dots,w_k)$ . It’s hard to come up with lower and upper limits for the parameters, and the limits might differ for different parameters. Since the goal is to shrink the parameters as much as possible while minimizing the loss $L$ , it’s natural to couple the shrinkage of parameters and the loss in a single objective:

\min_{\bold w}\{L(f_{\bold w}(\bold x),\bold y) + R(\bold w)\}

Here, $R$ is a shrinkage function.
> Note: The goal is to minimize the loss $L$ while shrinking the parameters $\bold w$ as much as possible.

Shrinkage functions

There are several choices of shrinkage functions. The most popular are the L2 norm and the L1 norm. L2 norm of a vector $w$ ...

Course Overview

Supervised Learning

Detect Cyber Intrusion Using Machine Learning

Clustering

Project: Bag of Visual Words

Generalized Linear Regression

Face Recognition Using Kernel Linear Discriminant

Support Vector Machine

Logistic Regression

Ensemble Learning

Early Stage Diabetes Prediction Using Ensemble Learning

Decoding Dimensions: PCA and Autoencoders

Image Reconstruction Using PCA

Image Colorization using Autoencoders

Colorful Face Generation with VAEs

Appendix

Wrapping Up

How to Predict the Traffic Volume Using Machine Learning

Regularization

Why not choose a simple model?

Shrinkage method

Shrinkage functions