Data Science with R: Decision Trees and Random Forests/

...

Classification Tree Training Example

Learn how Gini impurity and Gini change are used to train a decision tree from a dataset.

We'll cover the following...

The dataset
The algorithm
Gini change of the college feature
The root node
The next split
The last split

The algorithm

The classification decision tree algorithm has the following steps:

Calculate the Gini impurity of the parent node.
For each available feature, calculate the Gini gain.
Choose to split the tree on the feature with the highest Gini gain from step 2.
While features remain, repeat steps 1–3 for each split.

This algorithm is relatively simple because the data is designed to eliminate complexities. In the following two lessons, you’ll learn how the CART classification tree algorithm handles real-world data complexities.

Gini change of the `college` feature

The root node of the tree represents the first data split. According to the preceding algorithm, the first step is to calculate the Gini impurity of the parent node. In this case, the root node has all the observations. For the hypothetical data sample, there are five of each label.

Here’s the Gini impurity of all the data:

College	Union	Manager	Income
no	yes	no	>50K
no	yes	no	>50K
no	no	no	<=50K
no	no	no	<=50K
no	no	no	<=50K
yes	no	yes	>50K
yes	no	yes	>50K
yes	no	yes	>50K
yes	yes	no	<=50K
yes	yes	no	<=50K

College	Income
no	>50K
no	>50K
no	<=50K
no	<=50K
no	<=50K
yes	>50K
yes	>50K
yes	>50K
yes	<=50K
yes	<=50K

Welcome to the Course

Supervised Learning

Classification Tree Math

Using Classification Trees in R

Introducing the Bias-Variance Tradeoff

Model Tuning

Model Tuning with tidymodels

Feature Engineering

Regression Trees

The Random Forest Algorithm

Using Random Forests

Gradient Boosting Trees

Continuing Your Journey

Credit Card Fraud Detection using the R Language

Classification Tree Training Example

The dataset

Hypothetical Data Sample

The algorithm

Gini change of the `college` feature

Data Subset

Credit Card Fraud Detection using the R Language

Classification Tree Training Example

The dataset

Hypothetical Data Sample

The algorithm

Gini change of the college feature

Data Subset

Gini change of the `college` feature