If image classification and object detection are intelligent tasks, describing an image in natural language is definitely a more challenging task that requires more intelligence – just think for a moment about how everyone grows from a newborn (who learns to recognize objects and detect their locations) to a three-year old (who learns to tell a story about a picture). The official term for the task of describing an image in natural language is image captioning. Unlike speech recognition, which has a long history of research and development, image captioning (with full natural language, not just keyword output) has only had a short but exciting history of research due to its complexityand the deep learning breakthrough in 2012.
In this chapter, we'll first review how a deep learning-based image captioning model that won the 2015...