How to make a decision tree

If we ever do in-depth research on the different methods used in machine learning, we will stumble on decision trees. It's a very commonly used machine learning algorithm that trains a model by computing specific values that would help the model reach a viable output. In this Answer, we’ll be looking at precisely what a decision tree is, how it works, and its applications.

Concerned domain

Decision trees fall under the umbrella of artificial intelligence and machine learning. In this field, the focus is on developing algorithms and models that allow computers to learn from data and make predictions without being explicitly programmed. It involves training a computer system to recognize patterns, extract feautres, and perform tasks based on past experiences, also known as the training data. For a more precise understanding, some of the most common applications of these fields are listed below:

Virtual personal assistants: Siri, Google Assistant, Amazon Alexa, and Microsoft Cortana
Image recognition: Facebook's automatic photo tagging, Google Photos
Autonomous vehicles: Self-driving cars
Online translations: Google Translate, Microsoft Translator
Medical diagnosis: AI-assisted medical diagnosis using medical images

Now that we have a basic understanding of the parent domain of decision trees, we can see how they work.

Decision tree

Decision trees are a popular machine learning algorithm that can be used for both classification and regression tasks. In classification tasks, we predict the class or category of an instance based on its attributes or features while in regression tasks, we predict a continuous numerical value as an output.

Decision trees create a tree-like model where each component represents the structure and the process of how they work. These components are shown:

An internal node represents a test on an attribute
A branch represents the outcome of the test
A leaf node represents a class label or a prediction

Now the process starts with a chosen dataset on which we have to train the algorithm. We have to select the splitting feature, which would give us the highest information gain. The insight we gain by calculating the information gain is the amount of information or uncertainty that is reduced in the dataset after the split based on that attribute. Our goal is to find the attribute that provides the most significant information gain when used as a split criterion for minimum uncertainty. For any tree, we have to choose a root node, and the decision tree is no exception. Thus, from all the possible attributes, we choose one as the root node through a specific calculation, which we will talk about later on. After that, we divide the dataset into subsets based on a selected attribute corresponding to a branch from the root node. Instances with similar attribute values are grouped together. These steps are then repeated with all attributes, creating more and more tree branches until a stop criterion is reached.

With the decision tree created, we can now assign class labels to the leaf nodes. These are typically the majority class of the instances or the mean or median value of the target variable in the subset. These are the basic steps for creating a decision tree from scratch. Now, we delve into the calculations.

Calculations for the decision tree

In a decision tree, the target attribute is the variable that we are trying to predict or classify. It is the outcome or response variable that we want to understand based on the values of the input features. So to start the calculations, we have to choose the target attribute according to the task at hand. Let's look at the steps sequentially:

Step 1: After choosing the target attribute, we find the information gain of all attributes. The information gain we calculate is going to help us decide which attribute will be suitable as a root node in the decision tree. As information gain is the amount of uncertainty that is reduced in the dataset after the split our goal is to find the attribute that provides the most significant information gain. The attribute with the highest information gain is chosen as the root node.
Step 2: Calculate the entropy of all the attributes. Calculate the entropy of all attributes in the dataset. Entropy measures the impurity or uncertainty in the data. The formula for entropy is often defined as

Where $S_v$ represents the subset of data for each distinct value $v$ of the attribute A.

Step 4: Choose the attribute with the highest information gain as the best attribute to split the data.
Step 5: Create a decision node in the tree based on the selected attribute. Each branch of this node corresponds to a distinct value of the selected attribute, which is then partitioned into subsets.
Step 6: For each subset created in Step 5, repeat the process from Step 2 (calculating entropy) for the target attribute in that subset and continue until a stopping criterion is met. This stopping criterion can be a predefined tree depth, reaching a certain number of examples in a leaf node, or other criteria to prevent overfittingWhen a machine learning model performs well on the training data but fails to generalize to new, unseen data due to capturing noise and random fluctuations..
Step 7: You can continue growing the tree until all examples are classified correctly (which may lead to overfitting) or use a stopping criterion to halt the tree growth.
Step 8: The resulting decision tree is ready to predict new, unseen examples.

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

How to make a decision tree

Concerned domain

Decision tree

Calculations for the decision tree

Conclusion