Chapter 9. Applications of LSTM – Image Caption Generation
In the previous chapter, we saw how we can use LSTMs to generate text. In this chapter, we will use an LSTM to solve a more complex task: generating suitable captions for given images. This task is more complex in the sense that solving it involves multiple subtasks, such as training/using a CNN to generate encoded vectors of images, learning word embeddings, and training an LSTM to generate captions. So this is not as straightforward as the text generation task, where we simply input text and output text in a sequential manner.
Automated image captioning or image annotation has a wide variety of applications. One of the most prominent application is image retrieval in search engines. Automated image captioning can be used to retrieve all the images belonging to a certain concept (for example, a cat) as per the user's request. Another application can be in social media, where, when an image is uploaded...