Training
Learn and practice the linear regression model.
We'll cover the following
Implement training
Now we want to write code that implements the first part of linear regression. Given a bunch of examples ( and ), it finds a line with weight that approximates them. Can we think of a way to do that? Feel free to stop reading for a minute and think about it. It’s a fun problem to solve.
We might think that there is one simple way to find by using math. After all, there must be some formula that takes a list of points and comes up with a line that approximates them. We could Google for that formula and maybe even find a library that implements it.
As it turns out, such a formula does indeed exist, but we wouldn’t use it because that would be a dead end. If we use a formula to approximate these points with a straight line, then we’ll get stuck later when we tackle datasets that require twisty model functions. We would better look for a more generic solution that works for any model.
We have explored the mathematician’s approach significantly. Let’s look at a programmer’s approach instead.
How wrong are we?
Let’s discuss one strategy to find the best line that approximates the examples. Imagine if we have a function that takes the examples ( and ) and line weight , and measures the line’s error. The better the line approximates the examples, the lower the error. If we have such a function, we can use it to evaluate multiple lines until we find a line with a low enough error.
Except that instead of error, the ML programmers have another name for this function: they call it the loss.
Here is how we can write a loss function. Assume that we have come up with a random value of ,say, .
Let’s use this w to predict how many pizzas we’ll sell if we have reservations. Call predict(14, 1.5)
, and we get ŷ = 21
pizzas.
But the crucial point here is that this prediction does not match the ground truth— the real-world examples from the pizza file. Look back at the first few examples:
Reservations | Pizzas |
---|---|
13 | 33 |
2 | 16 |
14 | 32 |
23 | 51 |
On that night with reservations, Roberto sold pizzas, not . So we can calculate an error that is the difference between the predicted value ŷ and the ground truth—that thick orange segment shown in the figure below:
Get hands-on with 1400+ tech skills courses.