Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

What is LSTM?

Save for later
  • 3 min read
  • 11 Apr 2018

article-image

What does LSTM stand for?


LSTM stands for long short term memory. It is a model or architecture that extends the memory of recurrent neural networks. Typically, recurrent neural networks have 'short term memory' in that they use persistent previous information to be used in the current neural network. Essentially, the previous information is used in the present task. That means we do not have a list of all of the previous information available for the neural node.

Find out how LSTM works alongside recurrent neural networks. Watch this short video tutorial.

How does LSTM work?


LSTM introduces long-term memory into recurrent neural networks. It mitigates the vanishing gradient problem, which is where the neural network stops learning because the updates to the various weights within a given neural network become smaller and smaller. It does this by using a series of 'gates'. These are contained in memory blocks which are connected through layers, like this:

what-is-lstm-img-0

There are three types of gates within a unit:

  • Input Gate: Scales input to cell (write)
  • Output Gate: Scales output to cell (read)
  • Forget Gate: Scales old cell value (reset)


Each gate is like a switch that controls the read/write, thus incorporating the long-term memory function into the model.

Applications of LSTM


There are a huge range of ways that LSTM can be used, including:

  • Handwriting recognition
  • Time series anomaly detection
  • Speech recognition
  • Learning grammar
  • Composing music
  • Unlock access to the largest independent learning library in Tech for FREE!
    Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
    Renews at $19.99/month. Cancel anytime

The difference between LSTM and GRU


There are many similarities between LSTM and GRU (Gated Recurrent Units). However, there are some important differences that are worth remembering:

  • A GRU has two gates, whereas an LSTM has three gates.
  • GRUs don't possess any internal memory that is different from the exposed hidden state. They don't have the output gate, which is present in LSTMs.
  • There is no second nonlinearity applied when computing the output in GRU.


GRU as a concept, is a little newer than LSTM. It is generally more efficient - it trains models at a quicker rate than LSTM. It is also easier to use. Any modifications you need to make to a model can be done fairly easily. However, LSTM should perform better than GRU where longer term memory is required. Ultimately, comparing performance is going to depend on the data set you are using.