Gradient Descent
Discover the math behind gradient descent to deepen our understanding by exploring graphical representations.
Background
Let’s look for a better train() algorithm. The job of train() is to find the parameters that minimize the loss, so let’s start by focusing on loss() itself:
def loss(X, Y, w, b):
return np.average((predict(X, w, b) - Y) ** 2)
Look at this function’s arguments. The and contain the input variables and the labels, so they never change from one call of loss() to the next. To make the discussion simple, let’s also temporarily fix at . So now the only variable is .
How does the loss change as w changes? We put together a program that plots loss()` for w ranging from to , and draws a green cross on its minimum value. Let’s look at the following graph:
Let’s call it the loss curve. The entire idea of train() is to find that marked spot at the bottom of the curve. It is the value of that gives the minimum loss. At ...