In the previous chapter, we learned about building an object detection and classification model, which was really exciting. But in this chapter, we are going to do something even more impressive by combining current state-of-the-art techniques in both computer vision and natural language processing to form a complete image description approach (https://www.cs.cmu.edu/~afarhadi/papers/sentence.pdf). This will be responsible for constructing computer-generated natural descriptions of any provided images.
Our team has been asked to build this model to generate natural language descriptions of images to be used as the core intelligence of a company that wants to help the visually impaired take advantage of the explosion of photo sharing that's done on the web. It's exciting to think that this deep learning technology could have the power to effectively...