Training RNNs for sentiment analysis
In this section, we will train an RNN model using PyTorch for a text classification task – sentiment analysis. In this task, the model takes in a piece of text – a sequence of words – as input and outputs either 1
(meaning positive sentiment) or 0
(negative sentiment). In order to go from text to 1s and 0s, we will need the help of tokenization and embeddings.
Tokenization is the process of converting words into numerical tokens or integers, as we will see in this exercise. Sentences are then equivalent to an array of numbers, each number in the ordered array representing a word. While tokenization provides us with integer indices for each word, we still want to represent each word as a vector of numbers – as a feature – in the feature space of words. Why? Because the information contained in a word cannot just be represented by a single number. This process of representing words as vectors is called embedding...