Representing words with context-independent vectors
So far, we have looked at several ways of representing similarities among documents. However, finding out that two or more documents are similar to each other is not very specific, although it can be useful for some applications, such as intent or document classification. In this section, we will talk about representing the meanings of words with word vectors.
Word2Vec
Word2Vec is a popular library for representing words as vectors, published by Google in 2013 (Mikolov, Tomas; et al. (2013). Efficient Estimation of Word Representations in Vector Space. https://arxiv.org/abs/1301.3781). The basic idea behind Word2Vec is that every word in a corpus is represented by a single vector that is computed based on all the contexts (nearby words) in which the word occurs. The intuition behind this approach is that words with similar meanings will occur in similar contexts. This intuition is summarized in a famous quote from the linguist...