For training RNNs, a special form of backpropagation known as backpropagation through time (BPTT) is used. To understand BPTT, however, first, we need to understand how backpropagation (BP) works. Then, we’ll discuss why BP can’t be directly applied to RNNs but how BP can be adapted for RNNs, resulting in BPTT. Finally, we’ll discuss two major problems present in BPTT.

How backpropagation works

Backpropagation is the technique that’s used to train a feed-forward neural network. In backpropagation, we do the following:

  1. Calculate a prediction for a given input.

  2. Calculate an error, EE, of the prediction by comparing it to the actual label of the input (for example, mean squared error and cross-entropy loss).

  3. Update the weights of the feed-forward network to minimize the loss calculated in step 2 by taking a small step in the opposite direction of the gradient Ewij\frac{\partial E}{\partial w_{ij}} for all wijw_{ij}, where wijw_{ij} is the jthj^{th} weight of the ithi^{th} layer.

To understand the computations above more clearly, consider the feed-forward network depicted in the figure below. This has two single weights, w1w_1 and w2w_2, and calculates two outputs, hh and yy, as shown in the following figure. We assume no nonlinearities in the model for simplicity:

Get hands-on with 1400+ tech skills courses.