Numeric Feature Impurity

Learn how CART decision trees handle numeric features with many distinct values.

Handling numeric features

The CART classification tree algorithm efficiently handles categorical features with many levels. Consider a numeric predictive feature. The age feature of the Adult Census Income dataset is a perfect example.

The age feature contains many values. Observations might have ages 31, 37, 42, 58, or 63 measured in years. Note that fractional components are also possible. For example, he age feature can have the value of 31.3 years.

Not surprisingly, numeric features present the CART classification tree algorithm with many possible split points. Just like for categorical features, the algorithm designers have implemented an optimization that efficiently handles numeric features.

A numeric feature example

The following sample of Adult Census Income data can be used to train a tree using R:

Get hands-on with 1300+ tech skills courses.