Regression Tree Basics
Build on your knowledge of CART classification trees to understand CART regression trees.
We'll cover the following
Regression trees vs. classification trees
The CART algorithm can learn decision tree models that predict numeric values based on a training dataset. Decision tree models that predict numeric values are known as regression trees.
In general, the CART algorithm works the same whether the tree to be trained will be used for classification or regression. However, the calculations used for regression trees are different. While classification trees use Gini-based calculations, regression trees use the sum of squared errors (SSE) calculation.
Regression trees learn by splitting the training data so that the SSE is minimized. This is similar to how classification trees learn by splitting training data to minimize Gini.
When making predictions, regression trees calculate the average of all values in a leaf node. Again, this is similar to the majority rules predictions made by classification trees.
To understand these concepts better, take a hypothetical example of building an imputation model for the Age
feature of the Titanic dataset. As the Age
feature is numeric, the CART algorithm can be used to build a regression tree to predict Age
from the Pclass
and Sex
features.
The following table represents a leaf node for this hypothetical imputation model:
Get hands-on with 1400+ tech skills courses.