Data Science with R: Decision Trees and Random Forests/

...

Information Leakage

Learn how information leakage can produce machine learning models that overfit in this lesson.

We'll cover the following...

What is information leakage?
Cross-validation information leakage

What is information leakage?

Information leakage occurs when a machine learning algorithm has access to information about future data during the training process. Information leakage produces models with better predictions than expected, leading to metrics (e.g., accuracy) that overestimate a model’s usefulness.

Test holdout sets and cross-validation simulate future data by withholding the information contained in the data during model training (e.g., validation folds in cross-validation). There are two common sources of information leakage in practice:

...

Welcome to the Course

Supervised Learning

Classification Tree Math

Using Classification Trees in R

Introducing the Bias-Variance Tradeoff

Model Tuning

Model Tuning with tidymodels

Feature Engineering

Regression Trees

The Random Forest Algorithm

Using Random Forests

Gradient Boosting Trees

Continuing Your Journey

Credit Card Fraud Detection using the R Language

Information Leakage

What is information leakage?