Gradient Boosting Tree

In this lesson, we show you another type of ensemble method, the gradient boosting tree.

What is an ensemble method?

Ensemble methods are a group of Machine Learning methods that use multiple learning algorithms to gain better predictive performance than one single method. Generally speaking, there are three types of ensembleboosting, bagging, and stacking. In this course, we show ensemble and boosting. In this lesson, we will learn one method of boosting, gradient boosting tree (GBDT).

The core principle of bagging is to build many weak estimators; Each estimator was trained and predicted independently. The final result is the combination of their predictions. If it’s a regression, it’s the average of the results. If it’s a category, it’s a vote for the results. Random Forest is one of the methods, the normal decision tree is the weak estimator.

By contrast, in the boosting method, base estimators are built sequentially, and one tries to reduce the bias of the combined estimator. There is an interdependence between the weak estimators. The motivation is to combine several weak models to produce a powerful ensemble.

From the perspective of learning theory, bagging reduces the variance of the model, while boosting reduces the deviation of the model.

Before we get hands-on, let’s look at the GBDT first. It helps us understand the algorithm a little bit better.

What is GBDT?

Gradient Tree Boosting or Gradient Boosted Decision Trees (GBDT) is a generalization of boosting to arbitrary differentiable loss functions. It’s a widely used method in many fields because it’s an accurate and effective method for both regression and classification tasks. Typically, the gradient tree uses the decision tree as the weak estimator.

Generic gradient boosting at the m-th step would fit a decision tree hm(x)h_{m}(x) to residuals. Let JmJ_{m} be the number of its leaves. The tree partitions the input space into JmJ_{m} disjoint regions R1m,,RJm\displaystyle R_{1m},\ldots ,R_{J_{m}} and predicts a constant value in each region. The output of hm(x)h_{m}(x) ...