...

/

Gradient Boosting: Implementation Using Scikit-learn

Gradient Boosting: Implementation Using Scikit-learn

Learn how to test the trained model through Gradient Boosting, discover its execution in sklearn, and assess the model's performance.

In this lesson, we’ll look into the testing phase of gradient boosting, building upon the trained model that we previously developed. Our main objective is to utilize this trained model to make predictions on a test dataset. To validate the performance of our implementation, we will compare our results with those obtained from GradientBoostingRegressor provided by the scikit-learn library.

Training of gradient boosting regressor

Before proceeding to the testing phase, we’ll consolidate all the code widgets of the previous lesson to review and understand the progress we’ve made so far. Then, we’ll evaluate the effectiveness of our trained model on unseen data.

Press + to interact
import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100, n_features=10, noise=1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=0.2)
#print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
# initialization of random predictor
def initial_prediction(y):
mean = []
a = np.mean(y)
for i in range(0, len(y)):
mean.append(a)
return mean
# Function for residual calculation
def Residual(actual, pred):
residual = []
N = len(actual)
for i in range(0, N):
res = actual[i] - pred[i]
residual.append(res)
return residual
def GradientBoosting_fit(X, y, iter, alpha):
# Step I: Random initialization of Predictor, f_k
y_hat = initial_prediction(y)
mu_y = y_hat[0]
hypothesis = []
# Calculation of Residual
residual = Residual(y, y_hat)
#print("res1", residual)
y_h=[]
# Step II
for i in range(0, iter):
# Creating instance of h_k
regressor = DecisionTreeRegressor(random_state = 0,max_depth=3)
regressor.fit(X, residual)
hypothesis.append(regressor)
# Predictions of second model h_k
h_new = regressor.predict(X)
# Ppdating the predictor
y_hat = y_hat + (alpha * h_new)
#print("prediction", y_hat)
# Updating of residual
residual = Residual(y, y_hat)
y_h.append(y_hat)
# Prediction of ensemble model on training dataset
print("Prediction on training data: ", y_hat[:5])
# Step III
return hypothesis, alpha, mu_y, y_h
hypothesis,alpha,mu_y, y_h = GradientBoosting_fit(X_train, y_train, iter=50, alpha=0.1)
print("alpha: ",alpha)
print("mu_y: ",mu_y)
print("No of trained models: ",len(hypothesis))

Testing of gradient boosting regressor

We’re going to write a function named GB_predict that takes several parameters: test_data (the data on which we want to check the performance of our trained model), list_of_models (list of weak learners with trained parameters), alpha (learning rate), and c (initial predictor, which in our case is the mean of target variable). It aims to make predictions on a test dataset using the trained gradient boosting model. The function iterates over the ensemble of decision tree models, updates the predictions based on each model’s output, and returns the final predictions.

Press + to interact
hypothesis,alpha,mu_y, y_h = GradientBoosting_fit(X_train, y_train, iter=50, alpha=0.1)
def GB_predict(test_data,list_of_models,alpha,c):
mu=[]
errors=[]
for i in range(0,len(test_data)):
mu.append(c)
#print("mu_",mu)
for model in list_of_models:
mu += alpha * model.predict(test_data)
error = mse(y_test,mu)
errors.append(error)
return mu,errors
prediction,errors = GB_predict(X_test,hypothesis,alpha,mu_y)
print("Prediction on test data",prediction)
print("MSE after 50 iterations is: ",errors[-1])

Note: The mean squared error of the gradient boosting regressor of scikit-learn after the 50th50^{th} ...