Quick Overview of Split Validation
An introduction to train, validate and test data split using sklearn.
We'll cover the following...
Definition: Split validation
A crucial part of machine learning is partitioning the data into two separate sets using a technique called split validation.
-
The first set is called the training data and is used to build the prediction model.
-
The second set is called the test data and is kept in reserve to assess the model’s accuracy developed from the training data.
-
The training and test data is typically split 70/30 or 80/20, with the training data representing the larger portion. Once the model has been optimized and validated against the test data for accuracy, it’s ready to generate predictions using new input data.
Although the model is used on both the training and test sets, it’s ...