Hands-on Machine Learning with Scikit-Learn/

...

Gradient Boosting Tree

In this lesson, we show you another type of ensemble method, the gradient boosting tree.

We'll cover the following...

What is an ensemble method?
What is GBDT?
How does regularization work in GBDT?
Modeling on breast cancer dataset
Modeling on nonlinear dataset

What is an ensemble method?

Ensemble methods are a group of Machine Learning methods that use multiple learning algorithms to gain better predictive performance than one single method. Generally speaking, there are three types of ensemble：boosting, bagging, and stacking. In this course, we show ensemble and boosting. In this lesson, we will learn one method of boosting, gradient boosting tree (GBDT).

The core principle of bagging is to build many weak estimators; Each estimator was trained and predicted independently. The final result is the combination of their predictions. If it’s a regression, it’s the average of the results. If it’s a category, it’s a vote for the results. Random Forest is one of the methods, the normal decision tree is the weak estimator.

By contrast, in the boosting method, base estimators are built sequentially, and one tries to reduce the bias of the combined estimator. There is an interdependence between the weak estimators. The motivation is to combine several weak models to produce a powerful ensemble.

From the perspective of learning theory, bagging reduces the variance of the model, while boosting reduces the deviation of the model.

Before we get hands-on, let’s look at the GBDT first. It helps us understand the algorithm a little bit better.

What is GBDT?

Gradient Tree Boosting or Gradient Boosted Decision Trees (GBDT) is a generalization of boosting to arbitrary differentiable loss functions. It’s a widely used method in many fields because it’s an accurate and effective method for both regression and classification tasks. Typically, the gradient tree uses the decision tree as the weak estimator.

Generic gradient boosting at the m-th step would fit a decision tree $h_{m}(x)$ to residuals. Let $J_{m}$ be the number of its leaves. The tree partitions the input space into $J_{m}$ disjoint regions $\displaystyle R_{1m},\ldots ,R_{J_{m}}$ and predicts a constant value in each region. The output of $h_{m}(x)$ ...

Preliminaries

Working with Datasets

Feature Engineering

General Concepts

Linear Regression

Logistic Regression

Support Vector Machine

Tree Model and Ensemble Method

Unsupervised Learning

Deep Learning

Others

What's Next

Gradient Boosting Tree

What is an ensemble method?

What is GBDT?