Introduction
In the previous chapters, we studied Recurrent Neural Networks (RNNs) and a specialized architecture called the Gated Recurrent Unit (GRU), which helps combat the vanishing gradient problem. LSTMs offer yet another way to tackle the vanishing gradient problem. In this chapter, we will take a look at the architecture of LSTMs and see how they enable a neural network to propagate gradients in a faithful manner.
Additionally, we will look at an interesting application of LSTMs in the form of neural language translation, which will empower us to build a model that can be used to translate text given in one language to another language.
LSTM
The vanishing gradient problem makes it difficult for the gradient to propagate from the later layers in the network to the early layers, causing the initial weights of the network to not change much from the initial values. Thus, the model doesn't learn well and leads to poor performance. LSTMs solve the issue by introducing a "memory...