The skip-gram algorithm
The first algorithm we will talk about is known as the skip-gram algorithm: a type of Word2vec algorithm. As we have discussed in numerous places, the meaning of a word can be elicited from the contextual words surrounding it. However, it is not entirely straightforward to develop a model that exploits this way of learning word meanings. The skip-gram algorithm, introduced by Mikolov et al. in 2013, is an algorithm that does exploit the context of the words in a written text to learn good word embeddings.
Let’s go through the skip-gram algorithm step by step. First, we will discuss the data preparation process. Understanding the format of the data puts us in a great position to understand the algorithm. We will then discuss the algorithm itself. Finally, we will implement the algorithm using TensorFlow.
From raw text to semi-structured text
First, we need to design a mechanism to extract a dataset that can be fed to our learning model. Such...