Information Leakage
Learn how information leakage can produce machine learning models that overfit in this lesson.
We'll cover the following...
What is information leakage?
Information leakage occurs when a machine learning algorithm has access to information about future data during the training process. Information leakage produces models with better predictions than expected, leading to metrics (e.g., accuracy) that overestimate a model’s usefulness.
Test holdout sets and cross-validation simulate future data by withholding the information contained in the data during model training (e.g., validation folds in cross-validation). There are two common sources of information leakage in practice: