Linear Regression Limitation

Discover the limitations of linear regression, especially in the case of outliers.

Towards the goal

We started our journey in machine learning by way of linear regression. Now we’ll use that knowledge (and that code) to branch off toward our goal, which is a program that recognizes images.

This chapter covers the first step toward image recognition. We’ll build a classifier, which isa program that assigns data to one of a limited number of classes. Instead of numerical labels, classifiers work with categorical labels. For example, consider the difference between our pizza predictor and a system that recognizes plants. The pizza predictor outputs a number. By contrast, the plant classifier would output the name of a plant taken from a predefined list of possible species.

This lesson will start off small: our classifier will be a binary classifier that only recognizes two classes. Many useful real-life systems are based on binary classification. For example, the pneumonia detector that we described in the first chapter assigns X-ray scans to either the class “pneumonia” or the class “no pneumonia.”

We’ll replace the linear regression in our program with binary classification. And start with a classification problem that has nothing to do with computer vision. However, in the next chapter, we’ll turn around and apply that binary classifier to an image recognition problem.

Before we dive in, be aware of a slight change in vocabulary. So far, we have used training and prediction as two phases of learning systems. From now on, we’ll call those two phases training and classification to emphasize that the result of our prediction is a categorical label. Classification is just a specific type of prediction, so we’ll use the more specific term.

Where linear regression fails

Our friend at the pizzeria asked us to help them one last time. The pizzeria is located in a metropolitan neighborhood. On busy nights, it’s common for noisy customers to hang out in front of the shop until the neighbors eventually call the police.

The owner suspects that the same input variables that impact the number of pizzas sold, such as temperature and tourists, also affect the number of loud customers by the entrance, hence the likelihood of a police call. They want to know in advance whether a police call is likely to happen so that he can set up counter-measures, such as walking outside and begging people to lower their voices.

As usual, the owner sent us a file of labeled data. Here are the first few lines:

Reservations Temperature Tourists Police
13 26 9 1
2 14 6 0
14 20 3 1
23 25 9 1
13 24 8 1
1 13 2 0

These are the same data we used to train our linear regression program, except for one difference: the labels are 0 (meaning a quiet, uneventful night) or 1 (meaning that the police arrived on the scene). Roberto would like a system that forecasts those binary values in the last column.

This is a classification problem because we want to classify data as either 0 or 1. We might be tempted to solve this problem with the same code we have used so far. Unfortunately, that approach would fail, and here’s why.

Linear regression is all about approximating data with a line, like this:

Get hands-on with 1400+ tech skills courses.