In the previous section, we learned about how a traditional RNN faces a vanishing or exploding gradient problem resulting in it not being able to accommodate long-term memory. In this section, we will learn about how to leverage LSTM to get around this problem.
In order to further understand the scenario with an example, let's consider the following sentence:
I am from England. I speak __.
In the preceding sentence, intuitively, we know that the majority of the people from England speak English. The blank value to be filled (English) is obtained from the fact that the person is from England. While in this scenario we have the signaling word (England) closer to the blank value, in a realistic scenario, we might find that the signal word is far away from the blank space (the word we are trying to predict). When the distance between the signal word and blank value is large, the predictions through traditional RNNs might be wrong because of the vanishing...