In the last chapter, we saw how recursive loops, information gates, and memory cells can be used to model complex time-dependent signals with neural networks. More specifically, we saw how the Long Short-Term Memory (LSTM) architecture leverages these mechanics to preserve prediction errors and backpropagate them over increasingly long time steps. This allowed our system to inform predictions using both short-term (that is, from information relating to the immediate environment) and long-term representations (that is, from information pertaining to the environment that was observed long ago).
The beauty of the LSTM lies in the fact that it is able to learn and preserve useful representations over very large periods of time (up to a thousand time steps). By maintaining a constant error flow through the architecture, we can implement a...