If the word in the eighth position in a sentence has a causal relationship with the word used in the first position, it becomes essential to remember this and apply it in the eighth position. However, RNNs are poor at capturing long-term dependencies because of the vanishing gradient problem, and for such use cases, it is important to remember these relationships. Along with remembering, we also need to understand what should be remembered from the past and what should be forgotten. An LSTM cell will help us with what we discussed here. LSTM cells help in remembering by using a structure called gates that help keep the necessary information in memory as long as it's required.
LSTM cells use the concept of state or memory to retain long-term dependencies. At every stage, it is decided as to what to keep in memory and what to discard. All this is done using gates. Let's look at the working of an LSTM cell in detail (this is shown in...