Generating captions
First, you need to be congratulated! You made it through a whirlwind implementation of the Transformer. I am sure you must have noticed a number of common building blocks that were used in previous chapters. Since the Transformer model is complex, we left it for this chapter to look at other techniques like Bahdanau attention, custom layers, custom rate schedules, custom training using teacher forcing, and checkpointing so that we could cover a lot of ground quickly in this chapter. You should consider all these building blocks an important part of your toolkit when you try and solve an NLP problem.
Without further ado, let's try and caption some images. Again, we will use a Jupyter notebook for inference so that we can quickly try out different images. All the code for inference is in the image-captioning-inference.ipynb
file.
The inference code needs to load the Subword Encoder, set up masking, instantiate a ResNet50 model to extract features...