...

/

Training Decision Trees: Node Impurity

Training Decision Trees: Node Impurity

Learn how node impurity guides decision-tree training.

At this point, you should have an understanding of how a decision tree makes predictions using features, and the class fractions of training samples in the leaf nodes. Now, we will learn how decision trees are trained. The training process involves selecting features to split nodes on, and the thresholds at which to make splits, for example PAY_1 <= 1.5 for the first split in the tree of the previous exercise. Computationally, this means the samples in each node must be sorted on the values of each feature to consider a split for, and splits between each successive pair of sorted feature values are considered. All features may be considered, or only a subset as we will learn about shortly.

How are the splits decided during the training process?

Given that the method of prediction is to take the majority class of a leaf node, it makes sense that we’d like to find leaf nodes that are primarily from one class or the other; choosing the majority class will be a more accurate prediction, the closer a node is to containing just one class. In the perfect case, the training data can be split so that every leaf node contains entirely positive or entirely negative samples. Then, we will have a high level of confidence that a new sample, once sorted into one of these nodes, will be either positive or negative. In practice, this rarely, if ever, happens. However, this illustrates the goal of training decision trees—that is, to make splits so that the next two nodes after the split have a higher purity, or, in other words, are closer to containing either only positive or only negative samples.

In practice, decision trees are actually trained using the inverse of purity, or node impurity. This is some measure of how far the node is from having 100% of the training samples belonging to one class and is analogous to the concept of a cost function, which signifies how far a given solution is from a theoretical perfect solution. The most intuitive concept of node impurity is the misclassification rate. Adopting a widely used notation (see the scikit-learn documentation) for the proportion of samples in each node belonging to a certain class, we can define pmkp_{mk} as the proportion of samples belonging to the kthk^{th} ...

Access this course and 1400+ top-rated courses and projects.