Before we detail the mathematics behind the LSTM cell, let's try to get a general understanding of how it works. To do so, we will use the example of a live classification system that is applied to the Olympic Games. The system has to detect, for every frame, which sport is being played during a long video from the Olympics.
If the network sees people standing in line, can it infer what sport it is? Is it soccer players singing the anthem, or is it athletes preparing to run a 100-meter race? Without information about what happened in the frames just prior to this, the prediction will not be accurate. The basic RNN architecture we presented earlier would be able to store this information in the hidden state. However, if the sports are alternating one after the other, it would be much harder. Indeed, the state is used to generate the current predictions. The basic RNN is unable to store information that it will not use immediately.
The LSTM architecture solves...