Introduction to Skip-Gram (SG)
In the SG architecture, the objective is to predict the context words given a target word. The training process involves sliding a fixed-size window over the text corpus and generating training examples for each word in the corpus. The target word is selected, and the context words within the window are treated as positive training examples. The SG model aims at maximizing the probability of predicting the context words given the target word.
The SG model is a simple neural network. It has an input layer, a hidden layer, and an output layer. We are not interested in the output layer or the model’s architecture, but we are interested in the weights of the hidden layer. The weights become the word embeddings or word vectors. Figure 7.4 shows an SG neural network. w(t) in the input layer is the word to be converted to a vector. w(t-2) and w(t-1) in the output layer are the two words before w(t), and w(t+1) and w(t+2) are the two words after w(t...