...

/

Cross-validation

Cross-validation

Learn and perform hands-on cross-validation.

Overview

We have already learned about overfitting, underfitting, and the bias-variance trade-off. We are always looking for an optimal point between over- and underfitting. We have used a train test split in the model, where we divided our data into the train (X_train, y_train) and test (X_test, y_test) datasets with some percentage. We trained our regression model on the training part and tested/validated it on the test part. Both train test split and cross-validation help avoid overfitting more than underfitting.

However, the train test split does have its dangers:

  • What if the split we make is not random?

  • What if one subset (train/test) of our data has only one type of data point and is not a true representative of our complete dataset? In the simplest example, we can consider our data to be ordered by the number of rooms, and we get only the rooms with more numbers in the test data.

This will result in overfitting, and we don't want this. This is where cross-validation plays its role. Let's move on and learn ...

Access this course and 1400+ top-rated courses and projects.