A gradient boosting framework

XGBoost stands for Extreme Gradient Boosting. The XGBoost algorithm is a scalable framework for training gradient boosted ensembles using decision trees as the weak learners. The XGBoost algorithm is accessible in R via the xgboost package and has become a go-to algorithm for production scenarios, given its predictive performance and scalability.

XGBoost is a gradient boosting framework because the algorithm’s underlying mathematics supports training ensembles using many different loss functions. In machine learning, a loss function is a mathematical definition measuring the quality of a model’s predictions.

XGBoost supports loss functions for scenarios like:

Regression
Binary classification
Multiclass classification
Cox survival models
Ranking

Loss functions are also often referred to as objective functions because the objective of machine learning algorithms is to minimize the value of the loss function.

The following sections cover essential aspects of the XGBoost algorithm’s mathematics. More information is available on the XGBoost official website.

Loss functions

XGBoost supports many loss functions and represents these functions generically with the following mathematical notation:

Where $y_i$ denotes the actual values of the observation (i.e., label values in classification) and $\hat{y_i}$ denotes the XGBoost ensemble’s predictions for an observation.

The XGBoost algorithm relies on gradients for minimizing the loss functions. Gradients are simply derivates of the loss function. Using gradients constrains the XGBoost framework to using only differentiable loss functions.

The objective function

Gradient boosting algorithms are prone to overfitting, and the XGBoost algorithm is no exception. To combat overfitting, the XGBoost algorithm starts with the following objective function:

A regularization term adds a penalty to an objective function. This penalty encourages machine learning algorithms to produce less complex models (i.e., models less likely to overfit).

In the equation above, $k$ denotes the number of trees in the ensemble and the $\omega(f_k)$ denotes the complexity score for an individual tree in the ensemble. Conceptually, the XGBoost regularization term adds a penalty that increases with the number and complexity of the trees in the ensemble.

Note: The goal of XGBoost is to minimize the value of the objective function. The regularization parameter is a penalty because it increases the objective function’s value.

Additive training

XGBoost builds the ensemble of weak learners iteratively, where each tree learns the patterns in the residuals of the previous tree in the ensemble. At any particular step in the iterative process (denoted by $t$ ), XGBoost adds a tree that minimizes the following objective function:

Welcome to the Course

Supervised Learning

Classification Tree Math

Using Classification Trees in R

Introducing the Bias-Variance Tradeoff

Model Tuning

Model Tuning with tidymodels

Feature Engineering

Regression Trees

The Random Forest Algorithm

Using Random Forests

Gradient Boosting Trees

Continuing Your Journey

Credit Card Fraud Detection using the R Language

Gradient Boosting with XGBoost

A gradient boosting framework

Loss functions

The objective function

Additive training