So far, we have seen how to generate embeddings for a word. But how can we generate the embeddings for a document? A naive method would be to compute a word vector for each word in the document and take an average of it. Mikilow and Le introduced a new method for generating the embeddings for documents instead of just taking the average of word embeddings. They introduced two new methods, called PV-DM and PV-DBOW. Both of these methods just add a new vector, called paragraph id. Let's see how exactly these two methods work.
Doc2vec
Paragraph Vector – Distributed Memory model
PV-DM is similar to the CBOW model, where we try to predict the target word given a context word. In PV-DM, along with word vectors, we introduce...