What is overfitting in machine learning?

Explanation

In the diagram above, the blue line represents the training set and the orange line represents the validation set.

The training set

Let’s consider the blue line.

As the complexity of the model increases, the loss decreases, and vice versa.

The validation set

Now, let’s consider the orange line.

As we keep moving from left to right, the loss falls until a certain point. If you keep moving right from that point, the loss increases.

The point from which the loss increases is the minimal loss for the validation set.

Explanation

In the diagrams above, the red data points are the ones we use to train the model. This means they were already available.

The green data point is newly introduced, and we test it against our model.

If the green data point is introduced in the balanced machine learning model, it will be close to the plotted line, indicating greater accuracy.

Underfit model

In Fig. A, our ML model has plotted a straight line against the data points. The line crosses through a few data points, and other data points are further from the plotted line.

When a new data point is introduced, it is far from the line. This indicates that the model is not accurate.

Overfit model

The model is fitted exceptionally well, as seen in Fig. B because the plotted line passes through all the training data points.

This model is entirely accurate for the training data. However, introducing a new data point shows that the model may not work well for new data, as it is far away from the plotted line.

Balanced model

In Fig. C, which is our balanced model, the plotted curve passes through some of the data points. If we calculate the loss, this model would have the most minor loss out of all the models.

The introduction of a new point is near the plotted curve. Hence, when the model is used with actual/production data, our predictions will be reasonably accurate.

Free Resources

License: Creative Commons-Attribution-ShareAlike 4.0 (CC-BY-SA 4.0)

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

TRENDING TOPICS

Learn to Code

Tech Interview Prep

Generative AI

Data Science

Machine Learning

GitHub Students Scholarship

Early Access Courses

Blind 75

Layoffs

Pricing

For Individuals

Try for Free

Gift a Subscription

CONTRIBUTE

Become an Author

Become an Affiliate

Earn Referral Credits

RESOURCES

Blog

Cheatsheets

Webinars

Answers

ABOUT US

Our Team

Careers

Hiring

Frequently Asked Questions

Press

LEGAL

Cookie Policy

Business Terms of Service

Data Processing Agreement

INTERVIEW PREP COURSES

Grokking the Modern System Design Interview

Grokking the Product Architecture Design Interview

Grokking the Coding Interview Patterns

Machine Learning System Design