LSTM stands for long short term memory. It is a model or architecture that extends the memory of recurrent neural networks. Typically, recurrent neural networks have 'short term memory' in that they use persistent previous information to be used in the current neural network. Essentially, the previous information is used in the present task. That means we do not have a list of all of the previous information available for the neural node.
Find out how LSTM works alongside recurrent neural networks. Watch this short video tutorial.
LSTM introduces long-term memory into recurrent neural networks. It mitigates the vanishing gradient problem, which is where the neural network stops learning because the updates to the various weights within a given neural network become smaller and smaller. It does this by using a series of 'gates'. These are contained in memory blocks which are connected through layers, like this:
There are three types of gates within a unit:
Each gate is like a switch that controls the read/write, thus incorporating the long-term memory function into the model.
There are a huge range of ways that LSTM can be used, including:
There are many similarities between LSTM and GRU (Gated Recurrent Units). However, there are some important differences that are worth remembering:
GRU as a concept, is a little newer than LSTM. It is generally more efficient - it trains models at a quicker rate than LSTM. It is also easier to use. Any modifications you need to make to a model can be done fairly easily. However, LSTM should perform better than GRU where longer term memory is required. Ultimately, comparing performance is going to depend on the data set you are using.