Initially proposed by Hochreiter, Long Short-Term Memory Models (LSTMs) gained traction as an improved version of recurrent models [Hochreiter, S., et al. (1997)]. LSTMs promised to alleviate the following problems associated with traditional RNNs:
- Vanishing gradients
- Exploding gradients
- The inability to remember or forget certain aspects of the input sequences
The following diagram shows a very simplified version of an LSTM. In (b), we can see the additional self-loop that is attached to some memory, and in (c), we can observe what the network looks like when unfolded or expanded:
There is much more to the model, but the most essential elements are shown in Figure 13.6. Observe how an LSTM layer receives from the previous time step not only the previous output, but also something called state, which acts as a type of memory. In the diagram, you can see that while the current output and state are available...