Cross Validation
Cross Validation is a technique for making robust models. You'll discover how it works in this lesson.
Train, test and validation Datasets
We divide the dataset at hand into training and test dataset.
-
We train the model on the training dataset and evaluate its performance.
-
We evaluate the model’s performance on the test dataset (on which model is not trained) and report the performance of the model.
-
Scikit Learn provides
train_test_split
, which gives us the training and test dataset. These code snippets have been taken from the Scikit Learn documentation itself.
import numpy as npfrom sklearn.model_selection import train_test_splitfrom sklearn import datasetsfrom sklearn import svmX, y = datasets.load_iris(return_X_y=True)print("Original Shape of input and output columns")print(X.shape)print(y.shape)X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=0)print("Shape of the training dataset's input and output columns")print(X_train.shape)print(y_train.shape)print("Shape of the test dataset's input and output columns")print(X_test.shape)print(y_test.shape)
-
Line
6
imports the Iris Dataset and saves the input columns inX
and output column iny
. Lines8
and9
print the shape of the dataset. -
Line
11
splits the dataset into the training and the test datasets.test_size
specifies the percentage of instances to be kept in the test dataset. In the current case, 40% of the rows are kept in the test ...