Extensions to the word embeddings algorithms
The original paper by Mikolov and others, published in 2013, discusses several extensions that can improve the performance of the word embedding learning algorithms even further. Though they are initially introduced to be used for skip-gram, they are extendable to CBOW as well. Also, as we already saw that CBOW outperforms the skip-gram algorithm in our example, we will use CBOW for understanding all the extensions.
Using the unigram distribution for negative sampling
It has been found that the performance results of negative sampling are better when performed by sampling from certain distributions rather than from the uniform distribution. One such distribution is the unigram distribution. The unigram probability of a word wi is given by the following equation:
Here, count(wi) is the number of times wi appears in the document. When the unigram distribution is distorted as for some constant Z, it has shown to provide better performance than the...