Regression Tree Basics

Build on your knowledge of CART classification trees to understand CART regression trees.

Regression trees vs. classification trees

The CART algorithm can learn decision tree models that predict numeric values based on a training dataset. Decision tree models that predict numeric values are known as regression trees.

In general, the CART algorithm works the same whether the tree to be trained will be used for classification or regression. However, the calculations used for regression trees are different. While classification trees use Gini-based calculations, regression trees use the sum of squared errors (SSE) calculation.

Regression trees learn by splitting the training data so that the SSE is minimized. This is similar to how classification trees learn by splitting training data to minimize Gini.

When making predictions, regression trees calculate the average of all values in a leaf node. Again, this is similar to the majority rules predictions made by classification trees.

To understand these concepts better, take a hypothetical example of building an imputation model for the Age feature of the Titanic dataset. As the Age feature is numeric, the CART algorithm can be used to build a regression tree to predict Age from the Pclass and Sex features.

The following table represents a leaf node for this hypothetical imputation model:

Get hands-on with 1200+ tech skills courses.