Apply Backpropagation

Learn how to calculate the gradients of weights while applying backpropagation.

Three layered network

The three-layered network in the form of a computational graph is as follows:

All the variables in the preceding graph are matrices with the exception of the loss LL. The σσ symbol represents the sigmoid. we combine the softmax and the cross-entropy loss into one operation. We need a name for that operation, so we temporarily call it softmax and loss (SML). Finally, we give the names aa and bb to the outputs of the matrix multiplications.

The diagram shown earlier represents the same neural network that we designed and built in the previous two chapters. Let’s follow its operations from left to right:

  • xx and w1w_1 get multiplied and passed through the sigmoid, producing the hidden layer hh.
  • Then, hh is multiplied by w2w_2 and passed through the softmax and the loss function, producing the loss L.L.

We learned backpropagation so we can calculate the gradients of LL with respect to w1w_1 and w2w_2. To apply the chain rule, we need the local gradients on the paths back from LL to w1w_1 and w2w_2.

σ\sigma' ...