Tuning Random Forests
Learn why tuning the random forest algorithm is relatively easy.
We'll cover the following...
Random forest and the bias-variance tradeoff
The random forest algorithm was designed to address aspects of the bias-variance tradeoff without directly tuning the hyperparameters. This differentiates the random forest algorithm from algorithms like CART decision trees and boosted decision trees (e.g., XGBoost). The following illustration maps the random forest algorithm’s design to the bias-variance tradeoff.
Here are a few things to consider:
First, the random forest’s bagging and feature randomization only provide each ensemble tree with limited training data. So, there’s no concern regarding ensemble trees overfitting (i.e., the lower right in the illustration).
Second, because there’s no concern for overfitting, the random forest algorithm sets the CART minbucket
hyperparameter to 1
. Given the training data provided, this setting allows ensemble trees to grow as deep and complex as the provided training data allows. Deep, complex trees address underfitting (i.e., the upper left in the illustration). ...