Forecasting with LSTM using Keras
There are a few shortcomings in using RNNs – for example, an RNN's memory is short term and does not do well when persisting a longer-term memory.
In the previous recipe, you trained a small RNN architecture with one hidden layer. In a deep RNN, with multiple hidden layers, the network will suffer from the vanishing gradient problem – that is, during backpropagation, as the weights get adjusted, it will be unable to change the weights of much earlier layers, reducing its ability to learn. Because of this, the output becomes influenced by the closer layers (nodes).
In other words, any memory of earlier layers decays through time, hence the term vanishing. This is an issue if you have a very long sequence – for example, a long paragraph or long sentence – and you want to predict the next word. In time series data, how problematic the lack of long-term memory is will vary, depending on your goal and the data you...