In the previous chapters, we looked at several case studies pertaining to applying transfer learning on problems in computer vision as well as natural language processing (NLP). However, these were problems in their own specific domains. In this chapter, we will be focusing on building an intelligent system that is a combination of these two popular domains—computer vision and NLP. To be more specific, we will be focusing on building an object-recognition system coupled with machine translation to build an automated image-caption generator.
The idea of image captioning is not something new. Typically, any image present in diverse sources of media, such as books, papers, or social media, usually needs to be captioned with a proper text description for better meaning and context. What makes this task tough is that an image caption is typically...