Summary
In this chapter, we introduced the concept of RNNs, emphasizing the issues that normally arise when classic models are trained using the BPTT algorithm. In particular, we explained why these networks cannot easily learn long-term dependencies.
For this reason, new models have been proposed, whose performance was immediately outstanding. We discussed the most famous recurrent cell, called Long Short-Term Memory (LSTM), which can be used in layers that can easily learn all the most important dependencies of a sequence, allowing us to minimize the prediction error even in contexts with very high variance (such as stock market quotations). The last topic was a simplified version of the idea implemented in LSTMs, which led to a model called a Gated Recurrent Unit (GRU). This cell is simpler and more computationally efficient, and many benchmarks confirmed that its performance is approximately the same as LSTM.
In the next chapter, we are going to discuss some models called...