XGBoost Hyperparameters: Early Stopping

Learn how early stopping can reduce the overfitting of random forest model trained with XGBoost.

We'll cover the following...

Early stopping as a method for reducing overfitting
- Improve XGBoost model performance
Try it yourself

Early stopping as a method for reducing overfitting

When training ensembles of decision trees with XGBoost, there are many options available for reducing overfitting and leveraging the bias-variance trade-off. Early stopping is a simple one of these and can help provide an automated answer to the question “How many boosting rounds are needed?” It’s important to note that early stopping relies on having a separate validation set of data, aside from the training set. However, this validation set will actually be used during the model training process, so it does not qualify as “unseen” data that was held out from model training, similar to how we used validation sets in cross-validation to select model hyperparameters in the chapter “The Bias-Variance Trade-Off.”

When XGBoost is training successive decision trees to reduce error on the training set, it’s possible that adding more and more trees to the ensemble will provide increasingly better fits to the training data, but start to cause lower performance on held-out data. To avoid this, we can use a validation set, also called an evaluation set or eval_set by XGBoost. The evaluation set will be supplied as a list of tuples of features and their corresponding response variables. Whichever tuple comes last in this list will be the one that is used for early stopping. We want this to be the validation set because the training data will be used to fit the model and can’t ...

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

Fake News Detection Using Scikit-learn

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Predict Frog Toxicity with Python and XGBoost

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

XGBoost Hyperparameters: Early Stopping

Early stopping as a method for reducing overfitting