Decision Trees

Learn how to apply the decision trees algorithm to classification and regression tasks.

Decision trees are a type of ML model used for making decisions or predictions by recursively splitting data into smaller subsets based on the values of one or more features.

Understanding decision trees

A decision tree is a tree-like model where each node represents a feature or attribute, and each branch represents a possible value or outcome for that feature. The decision tree begins with a root node, which represents the entire dataset. At each subsequent node, the dataset is split into two or more smaller subsets based on a selected feature, and each subset is evaluated based on a chosen metric to determine the best split. This process is repeated recursively until reaching a stopping criterion (e.g., achieving a predetermined maximum depth or reaching a point where the number of samples in a subset are too small to split further).

In a classification problem, we can think of a decision tree as a tool that helps us sort items into different groups. Imagine we have a bunch of fruits, and we want to classify them into categories like apples, bananas, and oranges based on their color, size, and shape. The decision tree helps us make these classifications by following a set of rules, ultimately putting each fruit in the correct category.

In a regression problem, we aim to predict a specific number, such as the price of a house, rather than sorting items into categories. Imagine we have data about various houses with features like the number of bedrooms, square footage, and location. We use a decision tree to make predictions by considering these features and looking at houses that are similar in these aspects. The decision tree helps us find an average or median value of the target variable, like house prices, for the houses in each group. This way, we can predict the price of a new house based on its features by following the path in the decision tree.

Get hands-on with 1200+ tech skills courses.