Using word embeddings
In this recipe, we will switch gears and learn how to represent words using word embeddings, which are powerful because they are a result of training a neural network that predicts a word from all other words in the sentence. Embeddings are also vectors, but usually of a much smaller size, 200 or 300. The resulting vector embeddings are similar for words that occur in similar contexts. Similarity is usually measured by calculating the cosine of the angle between two vectors in the hyperplane, with 200 or 300 dimensions. We will use the embeddings to show these similarities.
Getting ready
In this recipe, we will use a pretrained word2vec
model, which can be found at https://github.com/mmihaltz/word2vec-GoogleNews-vectors. Download the model and unzip it in the data directory. You should now have a file with the …/
data/GoogleNews-vectors-negative300.bin.gz
path.
We will also use the gensim
package to load and use the model. It should be installed...