A decision tree is a supervised machine learning mode used to predict class labels in a given dataset. It uses a flowchart-like structure in which each node represents a feature or attribute, and the branches represent the value of the features. The leaf node represents the final decision or the class that the model has predicted.
Entropy is a measure of impurity or disorder within a dataset. In the context of decision trees, entropy determines which node is to be used as a decision node when performing a split in the tree.
The distribution of the classes in the dataset determines the entropy of a dataset. If all the samples in the dataset belong to a single class, then the entropy is 0. If all the samples are equally distributed among the classes in the dataset, then the entropy is 1.
The entropy of a given dataset can be measured using the equation below:
Where,
S = dataset.
i = Unique class in S.
p_i = Proportion of examples belonging to class i in S.
Information gain is used to measure the disorder in a dataset when splitting is performed on a specific feature.
When performing splitting, we can calculate the various values of information gain for different features and select the feature which gives us the highest value of information gain as it would provide us the most reduction in entropy.
Below is the formula used to calculate the information gain for a specific feature.
Where,
S = dataset
v = unique values in column F
F = feature to be used for split
Below is a sample dataset on which we are to form a decision tree.
In the dataset, we see that we are to determine if there is going to be a storm or not using the features, namely:
Weather
Temperature
Humidity
Wind Speed
Our first step would be to calculate the entropy of the given dataset. In the dataset, we can see that we have 5 rows that show a negative output and 9 rows that show a positive output.
So our entropy would be:
Now that we have calculated the entropy of the dataset, the next task is to find the information gain of specific features against the overall dataset.
First, we would have to calculate the entropy for "Sunny," "Cloudy," and "Rain" in the given dataset. The final values can be seen below:
Now, we would need to find the information gain. Below, we can see the calculations required for finding the information gain for the weather column.
First, we would have to calculate the entropy for "Hot," "Warm," and "Cool" in the given dataset. The final values can be seen below:
Below, we can see the calculations required for finding the information gain for the temperature column:
First, we must calculate the entropy for High and Normal in the given dataset. The final values can be seen below:
Below, we can see the calculations required for finding the information gain for the humidity column:
First, we must calculate the entropy for "Strong" and "Weak" in the given dataset. The final values can be seen below:
Below, we can see the calculations required for finding the information gain for the wind speed column:
Now that we have calculated all the information gain values for all the features in our dataset, it is now time to determine the feature on which we are to perform an initial split.
Below is a table summarizing all the information gain values for different features.
Feature | Information gain |
Weather | 0.2464 |
Temprature | 0.0289 |
Humidity | 0.1516 |
Wind Speed | 0.0478 |
As we can see, the weather column has the highest information gain out of all the other features; hence, it shall be used to perform the split.
The decision tree would look like the diagram below.
Note: When we follow the cloudy route we end up with 4 remaining rows {3,7,12,13}. In all these rows there is no negative output hence we make a leaf node there with a positive output.
Now that we have learned how to perform a split on a dataset using entropy values, it's now time to complete the above diagram and find out the remaining nodes in the decision tree.
Below, we can see a reference table attached for the new dataset table for the "Sunny" and the "Rain" path on which we will perform our calculations.
When we are done with the exercise, the final decision tree should look like the one in the diagram below:
Free Resources