Comparing Word2Vec with Doc2Vec, GloVe, and fastText
There are a few similar techniques to Word2Vec, including Doc2Vec, GloVec, and fastText. As we are close to the end of the chapter, let’s spend some time discussing them and comparing the differences.
Word2Vec versus Doc2Vec
Both Word2Vec and Doc2Vec are based on the distributional hypothesis. While Word2Vec focuses on learning vector representations for individual words, Doc2Vec extends Word2Vec to learn vector representations for entire documents or paragraphs. In terms of the modeling approach (https://arxiv.org/abs/1310.4546), Word2Vec takes a sequence of words as input and learns word embeddings. Doc2Vec takes a sequence of words along with an additional document ID (or label) as input. It learns document embeddings by predicting words in the context of the document ID. The document ID acts as an additional input signal, helping the model to differentiate between different documents. We will learn about Doc2Vec...