Learning word embeddings
We will next discuss how we can learn word embeddings for the words found in the captions. First we will preprocess the captions in order to reduce the vocabulary:
def preprocess_caption(capt): capt = capt.replace('-',' ') capt = capt.replace(',','') capt = capt.replace('.','') capt = capt.replace('"','') capt = capt.replace('!','') capt = capt.replace(':','') capt = capt.replace('/','') capt = capt.replace('?','') capt = capt.replace(';','') capt = capt.replace('\' ',' ') capt = capt.replace('\n',' ') return capt.lower()
For example, consider the following sentence:
A living room and dining room have two tables, couches, and multiple chairs.
This will be transformed to the following:
a living room and dining room have two tables couches and multiple chairs
Then we will use the Continuous Bag-of-Words (CBOW) model to learn the word embeddings as we did in Chapter 3, Word2vec – Learning Word Embeddings. A crucial...