Detecting Data Drift

Learn how to identify data drift with statistical and algorithmic methods.

Data drift is potentially harmful to an ML algorithm in deployment. As the underlying data changes, the predictions can become skewed—or worse, biased. In this lesson, we cover commonly used theoretical methods for identifying data drift.

Statistical methods

Statistical methods tend to be fast and low-lift. They’re simple mathematical formulations relying on hypothesis tests to detect drift at some confidence level.

Kolmogorov-Smirnov

The two-sample Kolmogorov-Smirnov (KS) test is a statistical hypothesis test with the following hypotheses:

  • HoH_{o}: The two samples come from the same distribution.

  • HaH_a: The two samples are drawn from different distributions.

For two samples of size nn and mm, the statistic is computed as

Get hands-on with 1400+ tech skills courses.