The drawback of a recurrent neural network (RNN) is that it will not retain information for a long time in memory. We know that an RNN stores sequences of information in its hidden state but when the input sequence is too long, it cannot retain all the information in its memory due to the vanishing gradient problem, which we discussed in the previous chapter.
To combat this, we introduce a variant of RNN called a long short-term memory (LSTM) cell, which resolves the vanishing gradient problem by using a special structure called a gate. Gates keep the information in memory as long as it is required. They learn what information to keep and what information to discard from the memory.
We will start the chapter by exploring LSTM and exactly how LSTM overcomes the shortcomings of RNN. Later, we will learn how to perform forward and backward propagation with...