How Gradient Boosting Works
Develop an intuitive understanding of how gradient boosting works.
We'll cover the following...
Gradient boosting vs. random forests
Gradient boosting is a machine learning technique that employs an ensemble of weak learners, usually
While gradient boosting and random forests are ensembles of decision trees, the algorithms are quite different. The following table summarizes the significant similarities and differences between the algorithms:
Gradient Boosting vs. Random Forests
Gradient Boosting | Random Forests |
Can be used for classification and regression | Can be used for classification and regression |
Models in the ensemble are added iteratively, where each model added is dependent on the previous model. | Models in the ensemble can be added in parallel as each model is independent of all the others. |
Models added to the ensemble are trained to address the errors of the previously added model. | Models added to the ensemble are trained on a random subset of the training data. |
Individual model predictions are weighted and then aggregated to produce the ensemble’s predictions. | Individual model predictions are not weighted and aggregated to produce the ensemble's predictions. |
The algorithm easily overfits the training data, so tuning is required to optimize the bias-variance tradeoff. | The algorithm is designed to directly address the bias-variance tradeoff, so little tuning is typically required. |
The algorithm has many hyperparameters that are usually tuned via cross-validation. | The algorithm has a few hyperparameters that can be tuned via cross-validation. |
Models are added to the ensemble until the specified limit is reached or ensemble improvements are very small. | Models are added to the ensemble until the specified limit is reached. |
Gradient boosting has become a go-to algorithm for production systems because, with proper tuning, gradient boosting often has better predictive performance than random forests. However, we should keep the following in mind:
Gradient boosting’s predictive performance gains compared to random forests are typically small. ...