We know how important vector representation of documents are – for example, in all kinds of clustering or classification tasks, we have to represent our document as a vector. In fact, in most of this book, we have looked at techniques either using vector representations or worked on using these vector representations – topic modeling, TF-IDF, and a bag of words were some of the representations we previously looked at.
Building on Word2Vec, the kind researchers have also implemented a vector representation of documents or paragraphs, popularly called Doc2Vec. This means that we can now use the power of the semantic understanding of Word2Vec to describe documents as well, and in whatever dimension we would like to train it in!
Previous methods of using word2vec information for documents involved simply averaging the word vectors of that document, but that did...