Fundamentals of Machine Learning: A Pythonic Introduction/

...

Gradient Boosting: Implementation Using Scikit-learn

Learn how to test the trained model through Gradient Boosting, discover its execution in sklearn, and assess the model's performance.

We'll cover the following...

Training of gradient boosting regressor
Testing of gradient boosting regressor
Gradient boosting using sklearn
- Performance evaluation using MSE
Comparison with sklearn
Test your knowledge

In this lesson, we’ll look into the testing phase of gradient boosting, building upon the trained model that we previously developed. Our main objective is to utilize this trained model to make predictions on a test dataset. To validate the performance of our implementation, we will compare our results with those obtained from GradientBoostingRegressor provided by the scikit-learn library.

Training of gradient boosting regressor

Before proceeding to the testing phase, we’ll consolidate all the code widgets of the previous lesson to review and understand the progress we’ve made so far. Then, we’ll evaluate the effectiveness of our trained model on unseen data.

Press + to interact

Python 3.10.4

import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100, n_features=10, noise=1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=0.2)
#print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
# initialization of random predictor 
def initial_prediction(y):
    mean = []
    a = np.mean(y)
    for i in range(0, len(y)):
        mean.append(a)
    return mean
# Function for residual calculation
def Residual(actual, pred):
    residual = []
    N = len(actual)
    for i in range(0, N):
        res = actual[i] - pred[i]
        residual.append(res)
    return residual
def GradientBoosting_fit(X, y, iter, alpha):
    # Step I: Random initialization of Predictor, f_k
    y_hat = initial_prediction(y)
    mu_y = y_hat[0]
    hypothesis = []
    # Calculation of Residual 
    residual = Residual(y, y_hat)
    #print("res1", residual)
    y_h=[]
    # Step II
    for i in range(0, iter):
        # Creating instance of h_k
        regressor = DecisionTreeRegressor(random_state = 0,max_depth=3)
        regressor.fit(X, residual)
        hypothesis.append(regressor)
        # Predictions of second model h_k
        h_new = regressor.predict(X)
        # Ppdating the predictor
        y_hat = y_hat + (alpha * h_new)
        #print("prediction", y_hat)
        # Updating of residual
        residual = Residual(y, y_hat)
        
        y_h.append(y_hat)
    # Prediction of ensemble model on training dataset     
    print("Prediction on training data: ", y_hat[:5])
    # Step III
    return hypothesis, alpha, mu_y, y_h
hypothesis,alpha,mu_y, y_h = GradientBoosting_fit(X_train, y_train, iter=50, alpha=0.1)
print("alpha: ",alpha)
print("mu_y:  ",mu_y)
print("No of trained models: ",len(hypothesis))

Testing of gradient boosting regressor

We’re going to write a function named GB_predict that takes several parameters: test_data (the data on which we want to check the performance of our trained model), list_of_models (list of weak learners with trained parameters), alpha (learning rate), and c (initial predictor, which in our case is the mean of target variable). It aims to make predictions on a test dataset using the trained gradient boosting model. The function iterates over the ensemble of decision tree models, updates the predictions based on each model’s output, and returns the final predictions.

Press + to interact

Python 3.10.4

hypothesis,alpha,mu_y, y_h = GradientBoosting_fit(X_train, y_train, iter=50, alpha=0.1)
def GB_predict(test_data,list_of_models,alpha,c):
  mu=[]
  errors=[]
  for i in range(0,len(test_data)):
    mu.append(c)
  #print("mu_",mu)
  for model in list_of_models:
    mu += alpha * model.predict(test_data)
    error = mse(y_test,mu)
    errors.append(error)
  return mu,errors
prediction,errors = GB_predict(X_test,hypothesis,alpha,mu_y)
print("Prediction on test data",prediction)
print("MSE after 50 iterations is: ",errors[-1])

Course Overview

Supervised Learning

Detect Cyber Intrusion Using Machine Learning

Clustering

Project: Bag of Visual Words

Generalized Linear Regression

Face Recognition Using Kernel Linear Discriminant

Support Vector Machine

Logistic Regression

Ensemble Learning

Early Stage Diabetes Prediction Using Ensemble Learning

Decoding Dimensions: PCA and Autoencoders

Image Reconstruction Using PCA

Image Colorization using Autoencoders

Colorful Face Generation with VAEs

Appendix

Wrapping Up

How to Predict the Traffic Volume Using Machine Learning

Gradient Boosting: Implementation Using Scikit-learn

Training of gradient boosting regressor

Testing of gradient boosting regressor