In the previous chapters, we saw how we could leverage Artificial Neural Networks (ANNs) and Convolutional Neural Networks (CNNs) to mine patterns in text and apply them to various tasks such as classifying questions and sarcasm detection in news headlines. With ANNs, we primarily saw that inputs are independent of one another. With CNNs, we went one step further and tried to capture spatial relationships in the inputs by trying to extract patterns across a set of tokens together. However, our scope was limited to only a few tokens in the vicinity.
Sentences are essentially sequences of words, and the contextual meaning of a particular word in a sentence may not be derived solely from the immediately surrounding words. It might actually be a result of some words far away in the sentence as well. Also, the sense behind the usage of the word...