How is entropy used to build a decision tree?

A decision tree is a supervised machine learning mode used to predict class labels in a given dataset. It uses a flowchart-like structure in which each node represents a feature or attribute, and the branches represent the value of the features. The leaf node represents the final decision or the class that the model has predicted.

Decision tree structure
Decision tree structure

Entropy

Entropy is a measure of impurity or disorder within a dataset. In the context of decision trees, entropy determines which node is to be used as a decision node when performing a split in the tree.

The distribution of the classes in the dataset determines the entropy of a dataset. If all the samples in the dataset belong to a single class, then the entropy is 0. If all the samples are equally distributed among the classes in the dataset, then the entropy is 1.

The entropy of a given dataset can be measured using the equation below:

Where,

  • S = dataset.

  • i = Unique class in S.

  • p_i = Proportion of examples belonging to class i in S.

Information gain

Information gain is used to measure the disorder in a dataset when splitting is performed on a specific feature.

When performing splitting, we can calculate the various values of information gain for different features and select the feature which gives us the highest value of information gain as it would provide us the most reduction in entropy.

Below is the formula used to calculate the information gain for a specific feature.

Where,

  • S = dataset

  • v = unique values in column F

  • F = feature to be used for split

Example

Below is a sample dataset on which we are to form a decision tree.

Table for decision tree example
Table for decision tree example

In the dataset, we see that we are to determine if there is going to be a storm or not using the features, namely:

  • Weather

  • Temperature

  • Humidity

  • Wind Speed

The entropy of the dataset

Our first step would be to calculate the entropy of the given dataset. In the dataset, we can see that we have 5 rows that show a negative output and 9 rows that show a positive output.

So our entropy would be:

Information gain for other features

Now that we have calculated the entropy of the dataset, the next task is to find the information gain of specific features against the overall dataset.

For weather

First, we would have to calculate the entropy for "Sunny," "Cloudy," and "Rain" in the given dataset. The final values can be seen below:

Now, we would need to find the information gain. Below, we can see the calculations required for finding the information gain for the weather column.

For temperature

First, we would have to calculate the entropy for "Hot," "Warm," and "Cool" in the given dataset. The final values can be seen below:

Below, we can see the calculations required for finding the information gain for the temperature column:

For humidity

First, we must calculate the entropy for High and Normal in the given dataset. The final values can be seen below:

Below, we can see the calculations required for finding the information gain for the humidity column:

For wind speed

First, we must calculate the entropy for "Strong" and "Weak" in the given dataset. The final values can be seen below:

Below, we can see the calculations required for finding the information gain for the wind speed column:

Selecting the feature to split

Now that we have calculated all the information gain values for all the features in our dataset, it is now time to determine the feature on which we are to perform an initial split.

Below is a table summarizing all the information gain values for different features.

Information gain summary

Feature

Information gain

Weather

0.2464

Temprature

0.0289

Humidity

0.1516

Wind Speed

0.0478

As we can see, the weather column has the highest information gain out of all the other features; hence, it shall be used to perform the split.

The decision tree would look like the diagram below.

Initial decision tree split
Initial decision tree split

Note: When we follow the cloudy route we end up with 4 remaining rows {3,7,12,13}. In all these rows there is no negative output hence we make a leaf node there with a positive output.

Exercise

Now that we have learned how to perform a split on a dataset using entropy values, it's now time to complete the above diagram and find out the remaining nodes in the decision tree.

Below, we can see a reference table attached for the new dataset table for the "Sunny" and the "Rain" path on which we will perform our calculations.

Table for node at sunny path
Table for node at sunny path
Table for node at rain path
Table for node at rain path

The final decision tree

When we are done with the exercise, the final decision tree should look like the one in the diagram below:

Final decision tree diagram
Final decision tree diagram

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved