Exploring CBOW
The continuous bag-of-words (CBOW) model forms part of Word2Vec – a model created by Google in order to obtain vector representations of words. By running these models over a very large corpus, we are able to obtain detailed representations of words that represent their semantic and contextual similarity to one another. The Word2Vec model consists of two main components:
- CBOW: This model attempts to predict the target word in a document, given the surrounding words.
- Skip-gram: This is the opposite of CBOW; this model attempts to predict the surrounding words, given the target word.
Since these models perform similar tasks, we will focus on just one for now, specifically CBOW. This model aims to predict a word (the target word), given the other words around it (known as the context words). One way of accounting for context words could be as simple as using the word directly before the target word in the sentence to predict the target word, whereas...