...

/

Gradient Boosting: Implementation from Scratch

Gradient Boosting: Implementation from Scratch

Get introduced to gradient boosting, its algorithm, and how we can train our data from scratch.

Gradient boosting is a machine learning technique that builds a strong predictive model by sequentially adding weak models to the ensemble. It uses gradient descent optimization to minimize the loss function of the model. The term “gradient” refers to the negative gradient of the loss function, which guides the learning process toward reducing errors. Gradient boosting is a versatile technique that can be used for both regression and classification tasks. In this case, we’ll focus on regression and demonstrate how to solve a regression problem using two approaches. Currently, we’re implementing a gradient boosting regressor from scratch.

Algorithm

  1. Initialize the predictor with an initial estimate (usually the mean of the target variable or a constant value cc).

  2. For each iteration:

    i. Compute the negative gradient (residuals) of the loss function with respect to the current predictor.

    ii. Fit a weak learner (e.g., decision tree) to the negative gradient (residuals).

    iii. Update the predictor by adding a scaled version of the weak learner’s predictions to the current predictor.

    iv. Repeat steps a to c until a stopping criterion is met (e.g., reaching a maximum number of iterations or achieving a desired level of performance).

  3. Return the final predictor, which is the ensemble of weak learners.

Implementation from scratch

We can generate a synthetic regression dataset using the make_regression function of sklearn as a starting point. By doing this, we can implement gradient boosting on the dataset and compare the results with the gradient boosting regressor provided by scikit-learn library.

Press + to interact
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100, n_features=10, noise=1, random_state=42)
print(X.shape,y.shape)

Then, we split the dataset using the train_test_split function of sklearn.model_selection so that we can train our dataset using a gradient boosting regressor.

Press + to interact
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=0.2)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

Initialization: Setting up the predictor

First, we initialize our predictor with a constant cc, which in most cases is the mean of the target variable. So, we have to create a list or array having the same size as our input array, where each element of the list contains the constant cc, which is usually the mean of the target variable. Here’s the code that initializes a random predictor by setting all predicted values to the mean of the input list:

Press + to interact
import numpy as np
# initialization of random predictor
def initial_prediction(y):
mean = []
a = np.mean(y)
for i in range(0, len(y)):
mean.append(a)
return mean
y_i = initial_prediction(y_train)
print(len(y_i))
print(y_i[:5])

Following is the explanation for the code above:

  • Lines 1–9: We define a function initial_prediction(y) that takes a list y as input and returns a new list where each element is set to the mean value of the input list, i.e., y.

  • Lines 11–13: We call the initial_prediction() function on the list of target variables and then print its length and first five elements.

Calculating negative gradients (residuals)

The second step of the algorithm is to calculate negative gradients (also known as residuals) of the loss function with respect to the current predictor. Consider the loss function as the squared error for simplification (but we can choose different errors too):

Lossamp;=12(yiy^i)2\begin{align*} Loss &= \frac{1}{2} (y_i - \hat{y}_i)^2 \end{align*} ...