Random Forest
In this lesson, we show how to use the Random Forest, which is an ensemble of models.
We'll cover the following
What is Random Forest?
In the last lesson, we talked about one of the main ensemble methods - boosting
. In this lesson, we would talk about another main ensemble method - bagging
.
Bagging
is not a particular model, it’s a method to improve your predictive ability based on your existing models. It builds lots of independent models, most of which are weak model or estimators, that can be of the same type or different. They can use the same feature sets or not. The core point of bagging
is to combine the output of different models. If it’s a classification, they are voting for the results.
Random forest
is a typical example of them, which uses a lot of decision trees as a base(weak) estimator. Each tree randomly chooses a feature set and instances from the training data. According to the learning theory, bagging
can reduce the variance of the model. Individual decision trees typically exhibit high variance and tend to overfit. The randomness of features and instances in forests yield decision trees with somewhat decoupled prediction errors. By taking an average of those predictions, some errors can cancel out. Random forests achieve a reduced variance by combining diverse trees, sometimes at the cost of a slight increase in bias.
Get hands-on with 1400+ tech skills courses.