There are several approaches to captioning images. Earlier methods used to construct a sentence based on the objects and attributes present in the image. Later, recurrent neural networks (RNN) were used to generate sentences. The most accurate method uses the attention mechanism. Let's explore these techniques and results in detail in this section.
Image captioning approaches
Conditional random field
Initially a method was tried with the conditional random field (CRF) constructing the sentence with the objects and attributes detected in the image. The steps involved in this process are shown as follows:
System flow for an example images (Source: http://www.tamaraberg.com/papers/generation_cvpr11.pdf)
CRF has limited...