Detecting Data Drift
Learn how to identify data drift with statistical and algorithmic methods.
Data drift is potentially harmful to an ML algorithm in deployment. As the underlying data changes, the predictions can become skewed—or worse, biased. In this lesson, we cover commonly used theoretical methods for identifying data drift.
Statistical methods
Statistical methods tend to be fast and low-lift. They’re simple mathematical formulations relying on hypothesis tests to detect drift at some confidence level.
Kolmogorov-Smirnov
The two-sample Kolmogorov-Smirnov (KS) test is a statistical hypothesis test with the following hypotheses:
: The two samples come from the same distribution. : The two samples are drawn from different distributions.
For two samples of size
Get hands-on with 1400+ tech skills courses.