Natural Language Processing with TensorFlow/

...

Backpropagation through Time

Learn about backpropagation and how it works through time.

We'll cover the following...

For training RNNs, a special form of backpropagation known as backpropagation through time (BPTT) is used. To understand BPTT, however, first, we need to understand how backpropagation (BP) works. Then, we’ll discuss why BP can’t be directly applied to RNNs but how BP can be adapted for RNNs, resulting in BPTT. Finally, we’ll discuss two major problems present in BPTT.

How backpropagation works

Backpropagation is the technique that’s used to train a feed-forward neural network. In backpropagation, we do the following:

Calculate a prediction for a given input.
Calculate an error, $E$ , of the prediction by comparing it to the actual label of the input (for example, mean squared error and cross-entropy loss).
Update the weights of the feed-forward network to minimize the loss calculated in step 2 by taking a small step in the opposite direction of the gradient $\frac{\partial E}{\partial w_{ij}}$ for all $w_{ij}$ , where $w_{ij}$ is the $j^{th}$ weight of the $i^{th}$ layer.

To understand the computations above more clearly, consider the feed-forward network depicted in the figure below. This has two single weights, $w_1$ and $w_2$ , and calculates two outputs, $h$ and $y$ , as shown in the following figure. We assume no nonlinearities in the model for simplicity: