Apply Backpropagation
Learn how to calculate the gradients of weights while applying backpropagation.
Three layered network
The three-layered network in the form of a computational graph is as follows:
All the variables in the preceding graph are matrices with the exception of the loss . The symbol represents the sigmoid. we combine the softmax and the cross-entropy loss into one operation. We need a name for that operation, so we temporarily call it softmax and loss (SML). Finally, we give the names and to the outputs of the matrix multiplications.
The diagram shown earlier represents the same neural network that we designed and built in the previous two chapters. Let’s follow its operations from left to right:
- and get multiplied and passed through the sigmoid, producing the hidden layer .
- Then, is multiplied by and passed through the softmax and the loss function, producing the loss
We learned backpropagation so we can calculate the gradients of with respect to and . To apply the chain rule, we need the local gradients on the paths back from to and .
...