Feature Randomization

Learn how feature randomization is the second technique used by the random forest algorithm to produce valuable ensembles.

Constraining available features

The random forest algorithm doesn’t stop with bagging. The algorithm also randomizes the features available to train each decision tree in the forest. The random forest algorithm provides a random subset of features at every split point for every tree in the forest.

For classification scenarios, the random forest algorithm only provides a random subset of features equal to the square root of the available predictive features by default. For example, if a classification random forest is trained using 16 predictive features, only four features are randomly selected to find split points in the decision trees.

For regression scenarios, the number of features randomly selected is equal to the total number of predictive features divided by three by default. For example, if a regression random forest is trained using 21 predictive features, only seven features are randomly selected to find split points in the decision trees.

The combination of bagging and randomized feature subsets allows the random forest algorithm to produce the most diverse decision trees possible.

Feature randomization in action

The following example of feature randomization uses the Adult Census Income dataset. The Adult Census Income dataset has a total of 14 predictive features. Using the random forest default for classification, the number of randomly selected features available for any split point are:

Get hands-on with 1400+ tech skills courses.