Improving LSTMs – generating text with words instead of n-grams
Here we will discuss ways to improve LSTMs. First, we will discuss how the number of model parameters grows if we use one-hot-encoded word features. This motivates us to use low-dimensional word vectors instead of one-hot-encoded vectors. Finally, we will discuss how we can employ word vectors in the code to generate better-quality text compared to using bigrams. The code for this section is available in lstm_word2vec.ipynb
in the ch8
folder.
The curse of dimensionality
One major limitation stopping us from using words instead of n-grams as the input to our LSTM is that this will drastically increase the number of parameters in our model. Let's understand this through an example. Consider that we have an input of size 500 and a cell state of size 100. This would result in a total of approximately 240K parameters (excluding the softmax layer), as shown here:
Let's now increase the size of the input to 1000. Now the total number...