In this chapter, we have learned about leveraging word vectors to come up with a way to address a scenario where the classes we want to predict are not present during training. Further, we learned about Siamese networks, which learn a distance function between two images to identify images of a similar person. Finally, we learned about prototypical networks and relation networks and how they learn to perform few-shot image classification.
In the next chapter, we will learn about combining computer vision and natural language processing-based techniques to come up with ways to solve annotating an image, detecting objects in an image, and handwriting transcription.