Chapter 1: Introduction to Natural Language Processing
Activity 1: Generating word embeddings from a corpus using Word2Vec.
Solution:
- Upload the text corpus from the link aforementioned.
- Import the word2vec from gensim models
from gensim.models import word2vec
- Store the corpus in a variable.
sentences = word2vec.Text8Corpus('text8')
- Fit the word2vec model on the corpus.
model = word2vec.Word2Vec(sentences, size = 200)
- Find the most similar word to 'man'.
model.most_similar(['man'])
The output is as follows:
Figure 1.29: Output for similar word embeddings
- 'Father' is to 'girl', 'x' is to boy. Find the top 3 words for x.
model.most_similar(['girl', 'father'], ['boy'], topn=3)
The output is as follows: