Recurrent neural networks have proven to be incredibly efficient at tasks involving the learning and prediction of sequential data. However, when it comes to natural language, the question of long-term dependencies comes into play, which is basically remembering the context of a particular conversation, paragraph, or sentence in order to make better predictions in the future. For example, consider a sentence that says:
Last year, I happened to visit China. Not only was Chinese food different from the Chinese food available everywhere else in the world, but the people were extremely warm and hospitable too. In my three years of stay in this beautiful country, I managed to pick up and speak very good....
If the preceding sentence were fed into a recurrent neural network to predict the next word in the sentence (such as Chinese), the network would find it difficult since...