LSTM versus Plain RNNs
We saw that LSTMs are built on top of plain RNNs, with the primary goal of addressing the vanishing gradient problem to enable modeling long-range dependencies. Looking at the following figure tells us that a plain RNN passes only the hidden state (the short-term memory), whereas an LSTM passes the hidden state as well as the explicit cell state (the long-term memory), giving it more power. So, when the term "good
" is being processed in the LSTM, the recurrent layer also passes the cell states holding the long-term memory:
In practice, does this mean that you always need an LSTM? The answer to this question, as with most questions in data science and especially deep learning, is, "it depends". To understand these considerations, we need to understand the benefits and drawbacks of LSTMs compared to plain RNNs.
Benefits of LSTMs:
- More powerful, as it uses more...