The skip-gram algorithm
The first algorithm we will talk about is known as the skip-gram algorithm. The skip-gram algorithm, introduced by Mikolov and others in 2013, is an algorithm that exploits the context of the words of written text to learn good word embeddings. Let's go through step by step to understand the skip-gram algorithm.
First, we will discuss the data preparation process, followed by an introduction to the notation required to understand the algorithm. Finally, we will discuss the algorithm itself.
As we discussed in numerous places, the meaning of the word can be elicited from the contextual words surrounding that particular word. However, it is not entirely straightforward to develop a model that exploits this property to learning word meanings.
From raw text to structured data
First, we need to design a mechanism to extract a dataset that can be fed to our learning model. Such a dataset should be a set of tuples of the format (input, output). Moreover, this needs to...