Use Gradient Descent to Update Weights
Learn how to design the function that will pass to gradient descent algorithm to update the weights in our network.
We'll cover the following...
The calculus behind error minimization
To do gradient descent, we need to work out the slope of the error function with respect to the weights. This requires calculus. Calculus is simply a mathematically precise way of working out how something changes when something else does. For example, we could calculate how the length of a spring changes as the force used to stretch it changes. Here, we’re interested in how the error function depends on the link weights inside a neural network. Another way of asking this is, “How sensitive is the error to changes in the link weights?”
Let’s start with a picture, because that helps keep us visualize what we are trying to achieve.
The graph is just like the one we saw before. We’re not doing anything different. This time, the function we’re trying to minimize is the neural network’s error. The parameter we’re trying to refine is a network link weight. In this simple example, we’ve only shown one weight, but we know neural networks will have many more.
The next diagram shows two link weights, and this time the error function is a three-dimensional surface that varies as the two link weights vary. We can see we’re trying to minimize the error, which is now more like a mountainous landscape with a valley.
It’s harder to visualize that error surface as a function of many more parameters, but the idea to use gradient descent to find the minimum is still the same.
Let’s mathematically write out what we want:
That is, how does the error change as the weight ...