Backpropagation Through Time
For training RNNs, a special form of backpropagation, known as Backpropagation Through Time (BPTT), is used. To understand BPTT, however, first we need to understand how backpropagation (BP) works. Then we will discuss why BP cannot be directly applied to RNNs, but how BP can be adapted to RNNs, resulting in BPTT. Finally, we will discuss two major problems present in BPTT.
How backpropagation works
Backpropagation is the technique that is used to train a feed-forward neural network. In backpropagation, you do the following:
- Calculate a prediction for a given input
- Calculate an error, E, of the prediction by comparing it to the actual label of the input (for example, mean squared error and cross-entropy loss)
- Update the weights of the feed-forward network to minimize the loss calculated in step 2, by taking a small step in the opposite direction of the gradient for all wij, where wij is the jth weight of ith layer
To understand more clearly, consider the feed-forward...