...

Advanced Cross-Validation

Learn more advanced methods of cross-validation.

We'll cover the following...

The k-fold cross-validation technique
The leave-one-out cross-validation technique
Grid search cross-validation
Conclusion

Advanced cross-validation techniques, such as k-fold and leave-one-out, provide more robust and accurate assessments of model performance in ML. These methods go beyond the basic train-test split and allow for a more comprehensive evaluation of model generalization.

The k-fold cross-validation technique

The k-fold cross-validation technique involves dividing the original dataset into k equally sized subsets or folds. The model is trained and evaluated k times, each time using a different fold as the test set and the remaining folds as the training set. The performance metrics obtained from each fold are then averaged to obtain an overall assessment of the model’s performance.

Press + to interact

Python 3.8

import numpy as np
from sklearn.model_selection import KFold
from sklearn.linear_model import Ridge
from sklearn.metrics import r2_score
# Generate synthetic data
np.random.seed(42)
X = np.random.rand(1000, 10)  # Independent variables
important_features = [0, 1, 2, 3]  # Indices of important features
y = np.sum(X[:, important_features], axis=1) + 0.5 * np.random.randn(1000)  # Dependent variable
# Initialize k-fold cross-validation
k = 5
kf = KFold(n_splits=k)
# Initialize a list to store the R2 scores for each fold
r2_scores = []
# Perform k-fold cross-validation
for train_index, test_index in kf.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    
    # Train the Ridge regression model
    model = Ridge(alpha=0)  # Alpha controls regularization strength
    model.fit(X_train, y_train)
    
    # Calculate R2 score for the current fold
    y_test_pred = model.predict(X_test)
    r2_scores.append(r2_score(y_test, y_test_pred))
# Print the R2 scores for each fold and their average
for i, score in enumerate(r2_scores):
    print(f"R2 Score - Fold {i+1}: {score}")
print("Average R2 Score:", np.mean(r2_scores))

Course Overview

Introduction to Machine Learning

Preprocessing

Supervised Learning

Unsupervised Learning

Model Evaluation

How to Predict the Traffic Volume Using Machine Learning

Tips and Tricks

Conclusion

Customer Segmentation with K-Means Clustering

Advanced Cross-Validation

The k-fold cross-validation technique