...

/

Random Forests: Predictions and Ensembles of Decision Trees

Random Forests: Predictions and Ensembles of Decision Trees

Learn about random forests, their predictions, and interpretability.

As we saw in the previous exercise, decision trees are prone to overfitting. This is one of the principal criticisms of their usage, despite the fact that they are highly interpretable. However, we were able to limit this overfitting, to an extent, by limiting the maximum depth to which the tree could be grown.

Concept behind random forests

Building on the concepts of decision trees, machine learning researchers have leveraged multiple trees as the basis for more complex procedures, resulting in some of the most powerful and widely used predictive models. In this section, we will focus on random forests of decision trees. Random forests are examples of what are called ensemble models, because they are formed by combining other, simpler models. By combining the predictions of many models, it is possible to improve upon the deficiencies of any given one of them. This is sometimes called combining many weak learners to make a strong learner.

Press + to interact
Random forests
Random forests

Once you understand decision trees, the concept behind random forests is fairly simple. That is because random forests are just ensembles of many decision trees; all the models in this kind of ensemble have the same mathematical form. So, how many decision tree models will be included in a random forest? This is one of the hyperparameters, n_estimators, that needs to be specified when building a random forest model. Generally speaking, the more trees, the better. As the number of trees increases, the variance of the overall ensemble will decrease. This should result in the random forest model having better generalization to new data, which will be reflected in increased testing scores. However, there will be a point of diminishing returns after which increasing the number of trees does not result in a substantial improvement in model performance.

So, how do random forests reduce the high variance (overfitting) issue that affects decision trees? The answer to this question lies in what is different about the different trees in the forest. There are two main ways in which the trees are different, one of which we are already familiar with:

  • The number of features considered at each split
  • The training samples used to grow different trees

The number of features considered at each split

We are already familiar with this option from the DecisionTreeClassifier class: max_features. In ...