GloVe – Global Vectors representation
One of the main limitations of skip-gram and CBOW algorithms is that they can only capture local contextual information, as they only look at a fixed-length window around a word. There’s an important part of the puzzle missing here as these algorithms do not look at global statistics (by global statistics we mean a way for us to see all the occurrences of words in the context of another word in a text corpus).
However, we have already studied a structure that could contain this information in Chapter 3, Word2vec – Learning Word Embeddings: the co-occurrence matrix. Let’s refresh our memory on the co-occurrence matrix, as GloVe uses the statistics captured in the co-occurrence matrix to compute vectors.
Co-occurrence matrices encode the context information of words, but they require maintaining a V × V matrix, where V is the size of the vocabulary. To understand the co-occurrence matrix, let’s take...