So, what if you wanted to build a machine that writes like a dead author? Or understands that a pop in the price of a stock two weeks ago might mean that the stock will pop again today? For sequence prediction tasks where key information is observed early on in training, say at t+1, but necessary to make an accurate prediction at t+250, vanilla RNNs struggle. This is where LSTM (and, for some tasks, GRU) networks come into the picture. Instead of a simple cell, you have multiple, conditional mini neural networks, each determining whether or not to carry information across timesteps. We will now discuss each of these variations in detail.
Augmenting your RNN with GRU/LSTM units
Long Short-Term Memory units
Special thanks to...