In the previous section on caption generation, we have decoded based on the word that has the highest probability in a given time step. In this section, we'll improve upon the predicted captions by using beam search.
Generating captions, using beam search
Getting ready
Beam search works as follows:
- Extract the probability of various words in first time step (where VGG16 features of the picture and the start token are the input)
- Instead of providing the most probable word as the output, we'll consider the top three probable words
- We'll proceed to the next time step, where we extract the top three characters in this time step
- We'll loop through the top three predictions in first time step, as an input to...