Long Short-Term Memory Networks
An LSTM network was designed to overcome the vanishing gradient problem. LSTMs have feedback connections, and the key to LSTMs is the cell state—a horizontal line running through the entire chain with only minor linear interactions, which persists the context information. LSTM adds or removes information to the cell state by gates, which are composed of activation functions, such as sigmoid or tanh, and a pointwise multiplication operation.
Figure 5.9 – An LSTM model (source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/)
Figure 5.9 shows an LSTM that has the gates to protect and control the cell state. Using the cell state, LSTM solves the issue of vanishing gradients and thus is particularly good at processing time series sequences of data, such as text and speech inference.