The Motivation for Regularization

Learn how overfitting and underfitting are related to the bias-variance trade-off.

We'll cover the following...

What is regularization?
The bias-variance trade-Off
The risk of overfitting in complex models
Assessing overfitting and underfitting

What is regularization?

Regularization is a technique used in machine learning to prevent overfitting and improve the generalization of the trained model. The main idea behind regularization is to add a penalty term to the cost function of the model that discourages the model from learning complex or redundant features that are specific to the training data and might not generalize well to new, unseen data.

The bias-variance trade-Off

We can extend the basic logistic regression model that we have learned about by using regularization, also called shrinkage. In fact, every logistic regression that you have fit so far in scikit-learn has used some amount of regularization. That is because it is a default option in the logistic regression model object. However, until now, we have ignored it.

As you learn about these concepts in greater depth, you will also become familiar with a few foundational concepts in machine learning: overfitting, underfitting, and the bias-variance trade-off. A model is said to overfit the training data if the performance of the model on the training data (for example, the ROC AUC) is substantially better than the performance on a held-out test set. In other words, good performance on the training set does not generalize to the unseen test set. We started to discuss these concepts in the chapter “Introduction to Scikit-Learn and Model Evaluation,” when we distinguished between model training and test ...

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

Fake News Detection Using Scikit-learn

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Predict Frog Toxicity with Python and XGBoost

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

The Motivation for Regularization

What is regularization?

The bias-variance trade-Off