Using TensorFlow RNN API with pretrained GloVe word vectors
So far, we have implemented everything from scratch in order to understand the exact underlying mechanisms of such a system. Here we will discuss how to use the TensorFlow RNN API along with pretrained GloVe word vectors in order to reduce both the amount of code and learning for the algorithm. This will be available as an exercise in the lstm_image_caption_pretrained_wordvecs_rnn_api.ipynb
notebook found in the ch9
folder.
We will first discuss how to download the word vectors and then discuss how to load only the relevant word vectors from the downloaded file, as the vocabulary size of the pretrained GloVe vectors is around 400,000 words, whereas ours is just 18,000. Next, we will perform some elementary spelling correction of the captions, as there seems to be a lot of spelling mistakes present. Then we will discuss how we can process the cleaned data using a tf.nn.rnn_cell.LSTMCell
module found in the RNN API.