Search⌘ K

Quick Overview of Split Validation

Explore split validation techniques for machine learning including how to partition data into training, test, and validation sets. Learn to optimize and assess models using Python's Scikit-learn train_test_split method. Understand the importance of keeping test data separate and how to use validation data for tuning model parameters.

Definition: Split validation

A crucial part of machine learning is partitioning the data into two separate sets using a technique called split validation.

  • The first set is called the training data and is used to build the prediction model.

  • The second set is called the test data and is kept in reserve to assess the model’s accuracy developed from the training data.

  • The training and test data is typically split 70/30 or 80/20, with the training data representing the larger portion. Once the model has been optimized and validated against the test data for accuracy, it’s ready to generate predictions using new input data.

Although the model is used on both the training and test sets, it’s ...