Data Science with R: Decision Trees and Random Forests/

...

Pruning Classification Trees

Learn how the CART algorithm reduces the complexity of trees after training via decision tree pruning.

We'll cover the following...

Pruning intuition
CART pre-pruning
CART post-pruning

Pruning intuition

Trees in the real world can grow too large. For example, a tree in someone’s yard might grow over the fence into a neighbor’s property and an expert might be called to prune the tree back (i.e., remove branches). Pruning is the process of removing that which is not needed.

In the case of CART, decision trees that have grown too large are prone to overfitting. CART automatically implements a pruning process after a decision tree has been trained to reduce the decision tree’s complexity.

Take, for example, the following hypothetical decision tree that has grown to full size based on the training data and the minsplit hyperparameter:

Press + to interact

CART pre-pruning

The CART algorithm supports pruning as described above. The process of pruning a decision tree after the tree is fully grown can be considered post-pruning.

CART also supports pruning the tree while it is being grown. This can be considered pre-pruning (also called early stopping) and is controlled in the rpart package using the cp hyperparameter.

The cp hyperparameter controls how much benefit a potential split needs for a split to be added to a growing tree. In the case of CART classification trees, the cp value tells the algorithm that the minimum value of Gini change is required to perform a split.

Conceptually, lower cp values allow the algorithm to do more splits and produce more complex trees. Higher values ...

Welcome to the Course

Supervised Learning

Classification Tree Math

Using Classification Trees in R

Introducing the Bias-Variance Tradeoff

Model Tuning

Model Tuning with tidymodels

Feature Engineering

Regression Trees

The Random Forest Algorithm

Using Random Forests

Gradient Boosting Trees

Continuing Your Journey

Credit Card Fraud Detection using the R Language

Pruning Classification Trees

Pruning intuition

CART pre-pruning