...

/

Use Gradient Descent to Update Weights

Use Gradient Descent to Update Weights

Learn how to design the function that will pass to gradient descent algorithm to update the weights in our network.

The calculus behind error minimization

To do gradient descent, we need to work out the slope of the error function with respect to the weights. This requires calculus. Calculus is simply a mathematically precise way of working out how something changes when something else does. For example, we could calculate how the length of a spring changes as the force used to stretch it changes. Here, we’re interested in how the error function depends on the link weights inside a neural network. Another way of asking this is, “How sensitive is the error to changes in the link weights?”

Let’s start with a picture, because that helps keep us visualize what we are trying to achieve.

Press + to interact
Finding the correct minimum
Finding the correct minimum

The graph is just like the one we saw before. We’re not doing anything different. This time, the function we’re trying to minimize is the neural network’s error. The parameter we’re trying to refine is a network link weight. In this simple example, we’ve only shown one weight, but we know neural networks will have many more.

The next diagram shows two link weights, and this time the error function is a three-dimensional surface that varies as the two link weights vary. We can see we’re trying to minimize the error, which is now more like a mountainous landscape with a valley.

Press + to interact
Updating the weights based on error minimization
Updating the weights based on error minimization

It’s harder to visualize that error surface as a function of many more parameters, but the idea to use gradient descent to find the minimum is still the same.

Let’s mathematically write out what we want:

Ewjk\frac{\partial E}{\partial w_{jk}}

That is, how does the error EE change as the weight wjkw_{jk} ...

Access this course and 1400+ top-rated courses and projects.