...

>

Gradient Descent

Gradient Descent

Discover the math behind gradient descent to deepen our understanding by exploring graphical representations.

Background

Let’s look for a better train() algorithm. The job of train() is to find the parameters that minimize the loss, so let’s start by focusing on loss() itself:

def loss(X, Y, w, b):
  return np.average((predict(X, w, b) - Y) ** 2)

Look at this function’s arguments. The XX and YY contain the input variables and the labels, so they never change from one call of loss() to the next. To make the discussion simple, let’s also temporarily fix bb at 00. So now the only variable is ww.

How does the loss change as w changes? We put together a program that plots loss()` for w ranging from 1-1 to 44, and draws a green cross on its minimum value. Let’s look at the following graph:

Let’s call it the loss curve. The entire idea of train() is to find that marked spot at the bottom of the curve. It is the value of ww that gives the minimum loss. At ww ...