Long short-term memory (LSTM) networks are a specialized form of recurrent neural network. They have the ability to retain long-term memory of things they have encountered in the past. In an LSTM, each neuron is replaced by what is known as a memory unit. This memory unit is activated and deactivated at the appropriate time, and is actually what is known as a recurrent self-connection.
If we step back for a second and look at the back-propagation phase of a regular recurrent network, the gradient signal can end up being multiplied many times by the weight matrix of the synapses between the neurons within the hidden layer. What does this mean exactly? Well, it means that the magnitude of those weights can then have a stronger impact on the learning process. This can be both good and bad.
If the weights are small they can lead to what is known as vanishing...