Why use both validation and test sets in machine learning?

Overview

Machine learning is a branch of Artificial Intelligence (AI) that enables computers to learn patterns in data without being hard programmed to do so.

In machine learning, data is split into three sets, namely:

Training set
Validation set
Testing set

The training set is whereby the model learns patterns in the data.

The validation set evaluates the model’s performance on unseen data and is useful when tuning the model’s hyperparameters.

The testing set evaluates how well the tuned model can make predictions on unseen data.

Why use validation sets?

Data scientists and Machine learning engineers do not always use validation sets in their modeling, but here is why you should consider doing so going forward:

Validation data is useful in fine-tuning the model. This is usually through hyperparameter tuning for the best model performance.
Validation data can be useful in feature selection, obtaining the most important features for your model.

Example

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
#loading the boston dataset from sklearn
X, y = load_boston(return_X_y = True)
print('shape of data: ', X.shape)

# splitting the data
X_train, X_rem, y_train, y_rem = train_test_split(X,y, train_size=0.8)

#splitting the second data set into validation and test sets equally

X_valid, X_test, y_valid, y_test = train_test_split(X_rem,y_rem, test_size=0.5)

print('X_train',X_train.shape), print('y_train',y_train.shape)
print('X_valid',X_valid.shape), print('y_valid',y_valid.shape)
print('X_test',X_test.shape), print('y_test',y_test.shape)

Free Resources

License: Creative Commons-Attribution-ShareAlike 4.0 (CC-BY-SA 4.0)

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

TRENDING TOPICS

Learn to Code

Tech Interview Prep

Generative AI

Data Science

Machine Learning

GitHub Students Scholarship

Early Access Courses

Blind 75

Layoffs

Pricing

For Individuals

Try for Free

Gift a Subscription

CONTRIBUTE

Become an Author

Become an Affiliate

Earn Referral Credits

RESOURCES

Blog

Cheatsheets

Webinars

Answers

ABOUT US

Our Team

Careers

Hiring

Frequently Asked Questions

Press

LEGAL

Cookie Policy

Business Terms of Service

Data Processing Agreement

INTERVIEW PREP COURSES

Grokking the Modern System Design Interview

Grokking the Product Architecture Design Interview

Grokking the Coding Interview Patterns

Machine Learning System Design

Why use both validation and test sets in machine learning?

Overview

Why use validation sets?

Example

Explanation