Choose the Right Weights Iteratively
Derive a simplified expression for error differentiation using the sigmoid function to find the right weights.
We'll cover the following...
Differentiate the error
Choosing the right weights directly is too difficult. An alternative approach is to iteratively improve the weights by descending the error function and taking small steps. Each step is in the direction of the greatest downward slope from our current position.
This means that the error function didn’t need to sum all the output nodes in the first place. The reason is that the output of a node only depends on the connected links and hence their weights. This fact is sometimes glossed over, and sometimes the error function is simply stated without an explanation.
Here is our simpler expression:
Now, we will do a bit of calculus.
That part is a constant, so it doesn’t vary like varies. This means isn’t a function of . If we think about it, it would be really strange if the truth examples providing the target values changed depending on the weights. That leaves the part, which we know depends on because the weights are used to feed the signal forward to become the outputs .
We’ll use the chain rule to break this differentiation task into more manageable pieces:
...