Understanding natural language processing for image captioning
As natural language has to be generated from the image, getting familiar with natural language processing (NLP) becomes important. The concept of NLP is a vast subject, and hence we will limit our scope to topics that are relevant to image captioning. One form of natural language is text. The text is a sequence of words or characters. The atomic element of text is called token, which is a sequence of characters. A character is an atomic element of text.
In order to process any natural language in the form of text, the text has to be preprocessed by removing punctuation, brackets and so on. Then, the text has to be tokenized into words by separating them into spaces. Then, the words have to be converted to vectors. Next, we will see how this vector conversion can help.
Expressing words in vector form
Words expressed in vector form can help perform arithmetic operations on themselves. The vector has to be compact, with less dimension...