Introduction to Word2Vec
In NLP development, an important discovery is the distributional hypothesis. It states that words that occur in similar contexts tend to have similar meanings. For example, the words cat and dog, temple and monk, or king and queen are sometimes seen together. In contrast, the words iron and monk, or car and sky appear less often in the same contexts. If words are semantically similar, they tend to show up in similar contexts and with similar distributions. The distributional hypothesis received an interesting comment from the linguist J. R. Firth in the 1950s: “You shall know a word by the company it keeps” [1]. This became the theoretical foundation for many computational models of word meaning and word representation.
The distributional hypothesis paves the way for the quantification of word similarities. In 2013, a Google team led by Tomas Mikolov published two milestone papers for Word2Vec and Doc2Vec [2] [3]. A word can be represented...