Captions generated for test images
Let's see what sort of captions are generated for the test images.
After 100 steps, the only thing that our model has learned is that the caption starts with an SOS
token, and there are some words followed by a bunch of EOS
tokens (see Figure 9.11):
After 1,000 steps, our model knows to generate slightly semantic phrases and recognizes objects in some images correctly (for example, a man holding a tennis racket, shown in Figure 9.12). However, the text seems to be short and vague, and in addition, several images are described incorrectly:
After 2,000 steps, our model has become quite good at generating expressive phrases composed of proper grammar (see Figure 9.13). Images are not described with small and vague phrases as we saw in step 1,000 before:
After 5,000 steps, our model now recognizes most of the images correctly...