Summary
In this chapter, we learned about the development of Word2Vec to Doc2Vec. Doc2Vec is based on the idea that any document, be it a sentence or a paragraph or an entire article, can be represented by a vector. This can be described by the term paragraph vector. We learned the two variations of the Doc2Vec model architecture: the PV-DBOW model and the PV-DM model. We also built Doc2Vec models and examined their outcomes.
In the next few chapters, we will learn about another milestone in NLP: Latent Dirichlet Allocation for topic modeling. We will learn what the Dirichlet distribution is and then learn about the model itself.