Gradient Descent

Discover the math behind gradient descent to deepen our understanding by exploring graphical representations.

Background

Let’s look for a better train() algorithm. The job of train() is to find the parameters that minimize the loss, so let’s start by focusing on loss() itself:

def loss(X, Y, w, b):
  return np.average((predict(X, w, b) - Y) ** 2)

Look at this function’s arguments. The XX and YY contain the input variables and the labels, so they never change from one call of loss() to the next. To make the discussion simple, let’s also temporarily fix bb at 00. So now the only variable is ww.

How does the loss change as w changes? We put together a program that plots loss()` for w ranging from 1-1 to 44, and draws a green cross on its minimum value. Let’s look at the following graph: