When RNNs are trained over very long sequences of data, the gradients tend to become either very large or very small that they vanish to almost zero. Long Short-Term Memory (LSTM) networks address the vanishing/exploding gradient problem by adding gates for controlling the access to past information. LSTM concept was first introduced by Hochreiter and Schmidhuber in 1997.
Read the following research paper on LSTM to get more information about origins of LSTM:
S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997. http://www.bioinf.jku.at/publications/older/2604.pdf
S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997. http://www.bioinf.jku.at/publications/older/2604.pdf
In RNN, a single neural network layer of repeatedly used learning function φ is used, whereas, in LSTM, a repeating module consisting of four main functions is used. The module that builds the LSTM network is called the cell. The LSTM...