Training

Learn and practice the linear regression model.

Implement training

Now we want to write code that implements the first part of linear regression. Given a bunch of examples (XX and YY), it finds a line with weight ww that approximates them. Can we think of a way to do that? Feel free to stop reading for a minute and think about it. It’s a fun problem to solve.

We might think that there is one simple way to find ww by using math. After all, there must be some formula that takes a list of points and comes up with a line that approximates them. We could Google for that formula and maybe even find a library that implements it.

As it turns out, such a formula does indeed exist, but we wouldn’t use it because that would be a dead end. If we use a formula to approximate these points with a straight line, then we’ll get stuck later when we tackle datasets that require twisty model functions. We would better look for a more generic solution that works for any model.

We have explored the mathematician’s approach significantly. Let’s look at a programmer’s approach instead.

How wrong are we?

Let’s discuss one strategy to find the best line that approximates the examples. Imagine if we have a function that takes the examples (XX and YY) and line weight (w)(w), and measures the line’s error. The better the line approximates the examples, the lower the error. If we have such a function, we can use it to evaluate multiple lines until we find a line with a low enough error.

Except that instead of error, the ML programmers have another name for this function: they call it the loss.

Here is how we can write a loss function. Assume that we have come up with a random value of ww,say, 1.51.5.

Let’s use this w to predict how many pizzas we’ll sell if we have 1414 reservations. Call predict(14, 1.5), and we get ŷ = 21 pizzas.

But the crucial point here is that this prediction does not match the ground truth— the real-world examples from the pizza file. Look back at the first few examples:

Reservations Pizzas
13 33
2 16
14 32
23 51

On that night with 1414 reservations, Roberto sold 3232 pizzas, not 2121. So we can calculate an error that is the difference between the predicted value ŷ and the ground truth—that thick orange segment shown in the figure below:

Get hands-on with 1400+ tech skills courses.