Add More Dimensions

Explore what happens when we deal with more and more dimensional data.

What we have covered so far

In the previous two chapters, we predicted output from an input. A restaurant’s pizza sales from its reservations. Most interesting real-world problems, however, have more than one input. Even something as simple as pizza sales is not likely to depend on reservations alone. For example, if there are many tourists in town, the restaurant will probably sell more pizzas, even if it has as many reservations as yesterday.

If pizza sales have many variables, imagine how many variables we’ll have to consider once we get into complex domains, like recognizing pictures. A learning program that only supports one variable will never solve those hairy problems. If we ever want to tackle them, we would better upgrade our program to support multiple input variables.

We can learn from multiple input variables with an advanced version of linear regression called multiple linear regression. In this chapter, we’ll extend our program to support multiple linear regression. We’ll also add a few tricks to our bag, including a couple of useful matrix operations and several NumPy functions. Let’s dive right in!

More variables, more dimensions

In the previous chapter, we coded a gradient descent-based version of our learning program. The advanced program can potentially scale to complex models with more than one variable.

In a moment of weakness, we mentioned that opportunity to our friend Roberto. That was a mistake. Now Roberto is all pumped up about forecasting pizza sales from a bunch of different input variables besides reservations, such as the weather, or the number of tourists in town.

This is going to be more work for us, and we can not blame the pizza restaurant owner for wanting to add variables to the model. After all, the more variables we consider, the more likely we’ll get accurate predictions of pizza sales.

Let’s start with a detailed version of the old pizza.txt file. Here are the first few lines of this new dataset:

Reservations Temperature Pizzas
13 26 44
2 14 23
14 20 28

The owner suspects that more people drop into their pizzeria on warmer days, so they keep track of the temperature in degrees Celsius. (For reference, 2 °C is almost freezing and 26 °C is normal temperature). Now the third column contains labels (the pizzas), and the first two contain input variables.

First, let’s see what happens to linear regression when we move from one to two input variables. We know that linear regression is about approximating the examples with a line, like this:

As a reminder, here is the formula of that line:

y^=xw+b\hat{y} = x * w + b

If we add a second input variable ( the temperature), then the examples are not lying on a plane anymore. They are points in three-dimensional space. To approximate them, we can use the equivalent ...

Create a free account to view this lesson.

By signing up, you agree to Educative's Terms of Service and Privacy Policy