Globalized Vectors (GloVe) was developed by the Stanford NLP group in 2014 as a probabilistic follow-up to Word2Vec. GloVe was designed to preserve the analogies framework used by Word2vec, but instead uses dimensionality reduction techniques that would preserve key statistical information about the words themselves. Unlike Word2vec, which learns by streaming sentences, GloVe learns embeddings by constructing a rich co-occurrence matrix. The co-occurrence matrix is a global store of semantic information, and is key to the GloVe algorithm. The creators of GloVe developed it on the principle that co-occurrence ratios between two words in a context are closely related to meaning.
So how does it work, and how is it different from Word2vec? GloVe creates a word embedding by means of the following:
- Iterating over a sentence, word by word
- For each word, the algorithm looks at...