In this section, we create the word vocabulary for the video captions. We create some additional words that are required as follows:
eos => End of Sentence
bos => Beginning of Sentence
pad => When there is no word to feed,required by the LSTM 2 in the initial N time steps
unk => A substitute for a word that is not included in the vocabulary
The LSTM 2, in which a word is an input, would require these four additional symbols. For the (N+1) time step, when we start generating the captions, we feed the word of the previous time step wt-1. For the first word to be generated, there is no valid previous time step word, and so we feed the dummy word <bos>, which signifies the start of sentence. Similarly, when we reach the last time step, wt-1 is the last word of the caption. We train the model to output the final word as <...