Long short-term memory
In this section, we will discuss a special unit called Long short-term memory (LSTM), which is integrated into RNN. The main purpose of LSTM is to prevent a significant problem of RNN, called the vanishing gradient problem.
Problem with deep backpropagation with time
Unlike the traditional feed forward network, due to unrolling of a RNN with narrow time steps, the feed forward network generated this way could be aggressively deep. This sometimes makes it extremely difficult to train via backpropagation through the time procedure.
In the first chapter, we discussed the vanishing gradient problem. An unfolded RNN suffers from the vanishing gradient problem of exploding while performing backpropagation through time.
Every state of a RNN depends on its input and its previous output multiplied by the current hidden state vector. The same operations happen to the gradient in the reverse direction during backpropagation through time. The layers and numerous time steps of the...