Exercise: Fitting a Random Forest
Learn how to fit a random forest model with cross-validation on the training data from the case study.
We'll cover the following...
Extending decision trees with random forests
In this exercise, we will extend our efforts with decision trees by using the random forest model with cross-validation on the training data from the case study. We will observe the effect of increasing the number of trees in the forest and examine the feature importance that can be calculated using a random forest model. Perform the following steps to complete the exercise:
-
Import the random forest classifier model class as follows:
from sklearn.ensemble import RandomForestClassifier
-
Instantiate the class using these options:
rf = RandomForestClassifier(n_estimators=10, criterion='gini', max_depth=3, min_samples_split=2,\ min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None,\ min_impurity_decrease=0.0, bootstrap=True, oob_score=False, n_jobs=None,\ random_state=4, verbose=0, warm_start=False, class_weight=None)
For this exercise, we’ll use mainly the default options. However, note that we will set
max_depth = 3
. Here, we are only going to explore the effect of using different numbers of trees, which we will illustrate with relatively shallow trees for the sake of shorter runtimes. To find the best model performance, we’d typically try more trees and deeper depths of trees.We also set
random_state
for consistent results across runs. ...