...

/

Prediction with Regression

Prediction with Regression

Learn how to understand relationships between variables and make predictions using the concept of regression.

ML beyond classification

So far we have solved pattern identification problems that involved classifying objects into categories using the ML process. We have classified movies into two categories, and we have also seen an extension of this concept where galaxies had to be categorized into more than two categories. Your ML toolbox now has the classification tool to identify interesting patterns in data and categorize them into classes. Let’s try solving another problem using our ML toolbox.

It’s soccer season, you are a Champions League fan, and your favorite UEFA team is playing! The competition is tough; all teams have top players, and the title will go to the best team who wins across the league and knockout phases. It all depends on the number of goals each team scores in the matches they play and if these are good enough to win the match.

As a sports enthusiast, you really want to predict how many matches your favorite team will win in this year’s league. But wait! Whether a team wins a match or loses it, depends on the goals scored during the match.

That was straightforward! You are lucky because you got hold of the data of the number of wins of teams in the previous year’s league matches. However, this information alone is not sufficient to know which teams can really achieve the maximum wins in this year’s league. What if we also had data of the number of goals each team scored in the season? This might help figure out the pattern of wins that the teams displayed. Let’s look at both these stats below:

Press + to interact
A snapshot of match wins and goals
A snapshot of match wins and goals

Let's try visualizing this data using a scatter plot.

Press + to interact
Scatter plot of the match data
Scatter plot of the match data

Let’s observe the data a bit closely. There’s clearly some pattern in the data. The higher the number of goals scored by a team, the more matches it wins. This shows a relationship between the two numbers. The visual plot of data confirms our earlier idea that the number of matches won by a team depends on the number of goals scored during the tournament.

Can we now predict the number of match wins for a team if we have information on the number of goals it scored? We certainly can! But how?

We can’t (and shouldn’t) rote-learn the data above to figure this out. Let’s use the ML process of identifying the pattern in data, to solve this problem. In our ML toolbox, we already have some classification tools that help categorize objects based on learning the pattern in data. But does this ML approach work here? Clearly, we don’t want to know the class of an object like before. Instead, we want to predict match wins, given the number of goals scored by a team. It is still a machine learning problem of prediction but needs a different model this time!

Modeling linear relationships in data

If we look at the scatter plot of data again, we can observe a linear relationship between the two values. This sounds familiar. In earlier lessons, we used a linear decision boundary to separate the movie data set, as shown below.

Press + to interact
Linear decision boundary separating the two movie classes
Linear decision boundary separating the two movie classes

This means we can try fitting a line on our soccer match data, to capture the linear relationship between the two variables. Only, this time the line does not represent any decision boundary and is not used to separate the different classes of objects in data.

Using the widget below, let’s try ...