What is boosting?

Boosting is a procedure for creating ensembles of many machine learning models, or estimators, similar to the bagging concept that underlies the random forest model. Like bagging, while boosting can be used with any kind of machine learning model, it is commonly used to build ensembles of decision trees. A key difference from bagging is that in boosting, each new estimator added to the ensemble depends on all the estimators added before it. Because the boosting procedure proceeds in sequential stages, and the predictions of ensemble members are added up to calculate the overall ensemble prediction, it is also called stagewise additive modeling. The difference between bagging and boosting can be visualized as in the figure below:

Press + to interact

While bagging trains many estimators using different random samples of the training data, boosting trains new estimators using information about which samples were incorrectly classified by the previous estimators in the ensemble. By focusing new estimators on these samples, the goal is that the overall ensemble will have better performance across the whole training dataset. AdaBoost, a precursor to XGBoost, accomplished this goal by giving more weight to incorrectly classified samples as new estimators in the ensemble are trained.

Understanding XGBoost

XGBoost is a modeling procedure and Python package that is one of the most popular machine learning methods in use today, due to its superior performance in many domains, from business to the natural sciences. XGBoost has also proven to be one of the most successful tools in machine learning competitions. We will not discuss all the details of how XGBoost is implemented, but rather get a high-level idea of how it works and look at some of the most important hyperparameters.

Note: For further details, the interested learner should refer to the publication XGBoost: A Scalable Tree Boosting System, by Tianqi Chen and Carlos Guestrin.

The XGBoost implementation of the gradient ...

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

Fake News Detection Using Scikit-learn

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Predict Frog Toxicity with Python and XGBoost

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

Gradient Boosting and XGBoost

What is boosting?

Understanding XGBoost