Using Decision Trees: Advantages and Predicted Probabilities

Learn about the uses and advantages of decision trees along with their predicted probabilities.

We'll cover the following...

While decision trees are simple in concept, they have several practical advantages.

No need to scale features

Consider the reasons why we needed to scale features for logistic regression. One reason is that, for some of the solution algorithms based on gradient descent, it is necessary that the features are on the same scale in order to quickly find a minimum of the cost function. Another is that when we are using L1 or L2 regularization to penalize coefficients, all the features must be on the same scale so that they are penalized equally. With decision trees, the node splitting algorithm considers each feature individually and, therefore, it doesn’t matter whether the features are on the same scale.

Non-linear relationships and interactions

Because each successive split in a decision tree is performed on a subset of the training samples resulting from the previous split(s), decision trees can describe complex non-linear relationships of a single feature, as well as interactions between features. Consider our discussion previously in the Connections to Univariate Feature Selection and Interactions lesson. Also, as a hypothetical example with synthetic data, consider the following dataset for classification: