Random Forest

In this lesson, we show how to use the Random Forest, which is an ensemble of models.

What is Random Forest?

In the last lesson, we talked about one of the main ensemble methods - boosting. In this lesson, we would talk about another main ensemble method - bagging.

Bagging is not a particular model, it’s a method to improve your predictive ability based on your existing models. It builds lots of independent models, most of which are weak model or estimators, that can be of the same type or different. They can use the same feature sets or not. The core point of bagging is to combine the output of different models. If it’s a classification, they are voting for the results.

Random forest is a typical example of them, which uses a lot of decision trees as a base(weak) estimator. Each tree randomly chooses a feature set and instances from the training data. According to the learning theory, bagging can reduce the variance of the model. Individual decision trees typically exhibit high variance and tend to overfit. The randomness of features and instances in forests yield decision trees with somewhat decoupled prediction errors. By taking an average of those predictions, some errors can cancel out. Random forests achieve a reduced variance by combining diverse trees, sometimes at the cost of a slight increase in bias.

Get hands-on with 1400+ tech skills courses.