Overview

We have already learned about overfitting, underfitting, and the bias-variance trade-off. We are always looking for an optimal point between over- and underfitting. We have used a train test split in the model, where we divided our data into the train (X_train, y_train) and test (X_test, y_test) datasets with some percentage. We trained our regression model on the training part and tested/validated it on the test part. Both train test split and cross-validation help avoid overfitting more than underfitting.

However, the train test split does have its dangers:

What if the split we make is not random?
What if one subset (train/test) of our data has only one type of data point and is not a true representative of our complete dataset? In the simplest example, we can consider our data to be ordered by the number of rooms, and we get only the rooms with more numbers in the test data.

This will result in overfitting, and we don't want this. This is where cross-validation plays its role. Let's move on and learn about cross-validation now. It's a straightforward concept and somewhat similar to a train ...

Course Introduction

Linear Regression

Regularization

Bias-Variance Trade-off

Categorical Features

Logistic Regression

Logistic Regression: Titanic Data

Sentiment Analysis Using Multinomial Logistic Regression

Multiclass Classification and Handling Imbalanced Classes

Project: Predicting Chronic Kidney Disease

K-Nearest Neighbors

Implementation of K-Nearest Neighbors

Logistic Regression vs. KNN

Decision Tree Learning

Implement the Decision Tree Classifier from Scratch

Bootstrapping and Confidence Interval

Support Vector Machine

Practice and Comparisons

What's Next?

Appendix

Cross-validation

Overview