In the previous section, we discussed how measuring document similarity is one of the major use cases of Word2vec. Think of a problem statement, such as one where we are building an engine that can rank resumes based on their relevance to a job description. Here, we ideally need to figure out the distance between the job description and the set of resumes. The smaller the distance between the resume and the job description, the higher the relevance of the resume to the job description.
One measure we discussed in Chapter 4, Transforming Text into Data Structures, was to use cosine similarity to find how close or far text documents are to one another or how far removed they are from one another. In this section, we will discuss another measure, Word Mover's Distance (WMD), which is more relevant than cosine similarity, especially when we base the distance measure for documents on word embeddings.
Kusner et al. devised the WMD algorithm. They define the...