...
/Gradient Boosting: Implementation from Scratch
Gradient Boosting: Implementation from Scratch
Get introduced to gradient boosting, its algorithm, and how we can train our data from scratch.
Gradient boosting is a machine learning technique that builds a strong predictive model by sequentially adding weak models to the ensemble. It uses gradient descent optimization to minimize the loss function of the model. The term “gradient” refers to the negative gradient of the loss function, which guides the learning process toward reducing errors. Gradient boosting is a versatile technique that can be used for both regression and classification tasks. In this case, we’ll focus on regression and demonstrate how to solve a regression problem using two approaches. Currently, we’re implementing a gradient boosting regressor from scratch.
Algorithm
-
Initialize the predictor with an initial estimate (usually the mean of the target variable or a constant value ).
-
For each iteration:
i. Compute the negative gradient (residuals) of the loss function with respect to the current predictor.
ii. Fit a weak learner (e.g., decision tree) to the negative gradient (residuals).
iii. Update the predictor by adding a scaled version of the weak learner’s predictions to the current predictor.
iv. Repeat steps a to c until a stopping criterion is met (e.g., reaching a maximum number of iterations or achieving a desired level of performance).
-
Return the final predictor, which is the ensemble of weak learners.
Implementation from scratch
We can generate a synthetic regression dataset using the make_regression
function of sklearn
as a starting point. By doing this, we can implement gradient boosting on the dataset and compare the results with the gradient boosting regressor provided by scikit-learn library.
from sklearn.datasets import make_regressionX, y = make_regression(n_samples=100, n_features=10, noise=1, random_state=42)print(X.shape,y.shape)
Then, we split the dataset using the train_test_split
function of sklearn.model_selection
so that we can train our dataset using a gradient boosting regressor.
from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=0.2)print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
Initialization: Setting up the predictor
First, we initialize our predictor with a constant , which in most cases is the mean of the target variable. So, we have to create a list or array having the same size as our input array, where each element of the list contains the constant , which is usually the mean of the target variable. Here’s the code that initializes a random predictor by setting all predicted values to the mean of the input list:
import numpy as np# initialization of random predictordef initial_prediction(y):mean = []a = np.mean(y)for i in range(0, len(y)):mean.append(a)return meany_i = initial_prediction(y_train)print(len(y_i))print(y_i[:5])
Following is the explanation for the code above:
-
Lines 1–9: We define a function
initial_prediction(y)
that takes a listy
as input and returns a new list where each element is set to the mean value of the input list, i.e.,y
. -
Lines 11–13: We call the
initial_prediction()
function on the list of target variables and then print its length and first five elements.
Calculating negative gradients (residuals)
The second step of the algorithm is to calculate negative gradients (also known as residuals) of the loss function with respect to the current predictor. Consider the loss function as the squared error for simplification (but we can choose different errors too):
...