...
/Gradient Boosting: Implementation Using Scikit-learn
Gradient Boosting: Implementation Using Scikit-learn
Learn how to test the trained model through Gradient Boosting, discover its execution in sklearn, and assess the model's performance.
In this lesson, we’ll look into the testing phase of gradient boosting, building upon the trained model that we previously developed. Our main objective is to utilize this trained model to make predictions on a test dataset. To validate the performance of our implementation, we will compare our results with those obtained from GradientBoostingRegressor
provided by the scikit-learn library.
Training of gradient boosting regressor
Before proceeding to the testing phase, we’ll consolidate all the code widgets of the previous lesson to review and understand the progress we’ve made so far. Then, we’ll evaluate the effectiveness of our trained model on unseen data.
import numpy as npfrom sklearn.tree import DecisionTreeRegressorfrom sklearn.model_selection import train_test_splitfrom sklearn.datasets import make_regressionX, y = make_regression(n_samples=100, n_features=10, noise=1, random_state=42)X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=0.2)#print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)# initialization of random predictordef initial_prediction(y):mean = []a = np.mean(y)for i in range(0, len(y)):mean.append(a)return mean# Function for residual calculationdef Residual(actual, pred):residual = []N = len(actual)for i in range(0, N):res = actual[i] - pred[i]residual.append(res)return residualdef GradientBoosting_fit(X, y, iter, alpha):# Step I: Random initialization of Predictor, f_ky_hat = initial_prediction(y)mu_y = y_hat[0]hypothesis = []# Calculation of Residualresidual = Residual(y, y_hat)#print("res1", residual)y_h=[]# Step IIfor i in range(0, iter):# Creating instance of h_kregressor = DecisionTreeRegressor(random_state = 0,max_depth=3)regressor.fit(X, residual)hypothesis.append(regressor)# Predictions of second model h_kh_new = regressor.predict(X)# Ppdating the predictory_hat = y_hat + (alpha * h_new)#print("prediction", y_hat)# Updating of residualresidual = Residual(y, y_hat)y_h.append(y_hat)# Prediction of ensemble model on training datasetprint("Prediction on training data: ", y_hat[:5])# Step IIIreturn hypothesis, alpha, mu_y, y_hhypothesis,alpha,mu_y, y_h = GradientBoosting_fit(X_train, y_train, iter=50, alpha=0.1)print("alpha: ",alpha)print("mu_y: ",mu_y)print("No of trained models: ",len(hypothesis))
Testing of gradient boosting regressor
We’re going to write a function named GB_predict
that takes several parameters: test_data
(the data on which we want to check the performance of our trained model), list_of_models
(list of weak learners with trained parameters), alpha
(learning rate), and c
(initial predictor, which in our case is the mean of target variable). It aims to make predictions on a test dataset using the trained gradient boosting model. The function iterates over the ensemble of decision tree models, updates the predictions based on each model’s output, and returns the final predictions.
hypothesis,alpha,mu_y, y_h = GradientBoosting_fit(X_train, y_train, iter=50, alpha=0.1)def GB_predict(test_data,list_of_models,alpha,c):mu=[]errors=[]for i in range(0,len(test_data)):mu.append(c)#print("mu_",mu)for model in list_of_models:mu += alpha * model.predict(test_data)error = mse(y_test,mu)errors.append(error)return mu,errorsprediction,errors = GB_predict(X_test,hypothesis,alpha,mu_y)print("Prediction on test data",prediction)print("MSE after 50 iterations is: ",errors[-1])
Note: The mean squared error of the gradient boosting regressor of scikit-learn after the ...