Gradient Descent
Discover the math behind gradient descent to deepen our understanding by exploring graphical representations.
We'll cover the following
Background
Let’s look for a better train()
algorithm. The job of train()
is to find the parameters that minimize the loss, so let’s start by focusing on loss()
itself:
def loss(X, Y, w, b):
return np.average((predict(X, w, b) - Y) ** 2)
Look at this function’s arguments. The and contain the input variables and the labels, so they never change from one call of loss()
to the next. To make the discussion simple, let’s also temporarily fix at . So now the only variable is .
How does the loss change as w changes? We put together a program that plots loss()` for w ranging from to , and draws a green cross on its minimum value. Let’s look at the following graph: