Exercise: Fitting a Random Forest

Learn how to fit a random forest model with cross-validation on the training data from the case study.

We'll cover the following...

Extending decision trees with random forests
Try it yourself

Extending decision trees with random forests

In this exercise, we will extend our efforts with decision trees by using the random forest model with cross-validation on the training data from the case study. We will observe the effect of increasing the number of trees in the forest and examine the feature importance that can be calculated using a random forest model. Perform the following steps to complete the exercise:

Import the random forest classifier model class as follows:
```
from sklearn.ensemble import RandomForestClassifier
```
Instantiate the class using these options:
```
rf = RandomForestClassifier(n_estimators=10, criterion='gini', max_depth=3, min_samples_split=2,\
min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None,\
min_impurity_decrease=0.0, bootstrap=True, oob_score=False, n_jobs=None,\
random_state=4, verbose=0, warm_start=False, class_weight=None)
```
For this exercise, we’ll use mainly the default options. However, note that we will set max_depth = 3. Here, we are only going to explore the effect of using different numbers of trees, which we will illustrate with relatively shallow trees for the sake of shorter runtimes. To find the best model performance, we’d typically try more trees and deeper depths of trees.

We also set random_state for consistent results across runs.
Create a parameter grid for this exercise in order to search the numbers of trees, ranging from 10 to 100 by 10s:
```
rf_params_ex = {'n_estimators':list(range(10,110,10))}
```
We use Python’s range() function to create an iterator for the integer values we want, and then convert them to a list using list().
Instantiate a grid search cross-validation object for the random forest model using the parameter grid from the previous step. Otherwise, you can use the same options that were used for the cross-validation of the decision tree:

...

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

Fake News Detection Using Scikit-learn

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Predict Frog Toxicity with Python and XGBoost

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

Exercise: Fitting a Random Forest

Extending decision trees with random forests