Back Propagation Through Time (BPTT)
There are many types of sequential models. You've already used simple RNNs, deep RNNs, and LSTMs. Let's take a look at a couple of additional models used for NLP.
Remember that you trained feed-forward models by first making a forward pass through the network that goes from input to output. This is the standard feed-forward model where the layers are densely connected. To train this kind of model, you can backpropagate the gradients through the network, taking the derivative of the loss of each weight parameter in the network. Then, you can adjust the parameters to minimize the loss.
But in RNNs, as discussed earlier, your forward pass through the network also consists of going forward in time, updating the cell state based on the input and the previous state, and generating an output, Y
. At that time step, computing a loss and then finally summing these losses from the individual time steps gets your total loss.
This means that...