Training an RNN
In NLP, input data is commonly textual data. Since a text is usually nothing but a sequence of words, using RNNs is sometimes a good solution. Indeed, RNNs, unlike fully connected networks, consider data’s sequential information.
In this recipe, we will train an RNN on tweets to predict whether they are positive, negative, or neutral.
Getting started
In NLP, we usually manipulate textual data, which is unstructured. To handle it properly, this is usually a multi-step process – first, convert the text into numbers, and then only train a model on those numbers.
There are several ways to convert text into numbers. In this recipe, we will use a simple approach called tokenization. Tokenization is just converting a sentence into tokens. A token can be as simple as a word, so that a sentence like “The dog is out” would be tokenized as ['the', 'dog', 'is', 'out']
. There is usually one more step...