Data leakage in machine learning

Depending on the nature of the data set, it is possible that the target variable has a distribution that is very similar for both data sets (the training and the test). However, such a case may not hold true in real-life scenarios. The model can learn how the probability of each target variable changes according to the moment in time. Thus, any feature included in the data set, that is related to time, may be a potential threat of data leakage.

Therefore, the first approach to counter data leakage in time series forecasting is to remove all the features that relate to time.

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

TRENDING TOPICS

Learn to Code

Tech Interview Prep

Generative AI

Data Science

Machine Learning

GitHub Students Scholarship

Early Access Courses

Blind 75

Layoffs

Pricing

For Individuals

Try for Free

Gift a Subscription

CONTRIBUTE

Become an Author

Become an Affiliate

Earn Referral Credits

RESOURCES

Blog

Cheatsheets

Webinars

Answers

ABOUT US

Our Team

Careers

Hiring

Frequently Asked Questions

Press

LEGAL

Cookie Policy

Business Terms of Service

Data Processing Agreement

INTERVIEW PREP COURSES

Grokking the Modern System Design Interview

Grokking the Product Architecture Design Interview

Grokking the Coding Interview Patterns

Machine Learning System Design

Data leakage in machine learning

Time series forecasting