What is LSTM?

What does LSTM stand for?

LSTM stands for long short term memory. It is a model or architecture that extends the memory of recurrent neural networks. Typically, recurrent neural networks have 'short term memory' in that they use persistent previous information to be used in the current neural network. Essentially, the previous information is used in the present task. That means we do not have a list of all of the previous information available for the neural node.

Find out how LSTM works alongside recurrent neural networks. Watch this short video tutorial.

How does LSTM work?

LSTM introduces long-term memory into recurrent neural networks. It mitigates the vanishing gradient problem, which is where the neural network stops learning because the updates to the various weights within a given neural network become smaller and smaller. It does this by using a series of 'gates'. These are contained in memory blocks which are connected through layers, like this:

what-is-lstm-img-0

There are three types of gates within a unit:

Input Gate: Scales input to cell (write)

Output Gate: Scales output to cell (read)

Forget Gate: Scales old cell value (reset)

Each gate is like a switch that controls the read/write, thus incorporating the long-term memory function into the model.

Applications of LSTM

There are a huge range of ways that LSTM can be used, including:

Handwriting recognition

Time series anomaly detection

Speech recognition

Learning grammar

Composing music

The difference between LSTM and GRU

There are many similarities between LSTM and GRU (Gated Recurrent Units). However, there are some important differences that are worth remembering:

A GRU has two gates, whereas an LSTM has three gates.

GRUs don't possess any internal memory that is different from the exposed hidden state. They don't have the output gate, which is present in LSTMs.

There is no second nonlinearity applied when computing the output in GRU.

GRU as a concept, is a little newer than LSTM. It is generally more efficient - it trains models at a quicker rate than LSTM. It is also easier to use. Any modifications you need to make to a model can be done fairly easily. However, LSTM should perform better than GRU where longer term memory is required. Ultimately, comparing performance is going to depend on the data set you are using.