As we saw previously, regular RNNs suffer from gradient explosion. As such, it can sometimes be hard to teach them long-term relations in sequences of data. Moreover, they store information in a single-state matrix. For instance, if a gunshot happens at the very beginning of a very long video, it will be unlikely that the hidden state of the RNNs will not be overridden by noise by the time it reaches the end of the video. The video might not be classified as violent.
To circumvent those two problems, Sepp Hochreiter and Jürgen Schmidhuber proposed, in their paper (Long Short-Term Memory, Neural Computation, 1997), a variant of the basic RNN—the Long Short-Term Memory (LSTM) cell. This has improved markedly over the years, with many variants being introduced. In this section, we will give an overview of its inner workings, and we will show why gradient vanishing is less of an issue.