Training an LSTM neural network
RNNs suffer from a fundamental problem of “vanishing gradients” where, due to the nature of backpropagation in neural networks, the influence of earlier inputs on the overall error diminishes drastically as the sequence gets longer. This is especially problematic in sequence processing tasks where long-term dependencies exist (i.e., future outputs depend on much earlier inputs).
Getting ready
LSTM networks were introduced to overcome this problem. They use a more complex internal structure for each of their cells compared to RNNs. Specifically, an LSTM has the ability to decide which information to discard or to store based on an internal structure called a cell. This cell uses gates (input, forget, and output gates) to control the flow of information into and out of the cell. This helps maintain and manipulate the “long-term” information, thereby mitigating the vanishing gradient problem.