Exploring the embedding space with Gensim
Let us reload the Word2Vec model we just built and explore it using the Gensim API. The actual word vectors can be accessed as a custom Gensim class from the model’s wv
attribute:
from gensim.models import KeyedVectors
model = KeyedVectors.load("data/text8-word2vec.bin")
word_vectors = model.wv
We can take a look at the first few words in the vocabulary and check to see if specific words are available:
words = word_vectors.vocab.keys()
print([x for i, x in enumerate(words) if i < 10])
assert("king" in words)
The preceding snippet of code produces the following output:
['anarchism', 'originated', 'as', 'a', 'term', 'of', 'abuse', 'first', 'used', 'against']
We can look for similar words to a given word (“king”), shown as follows:
def print_most_similar(word_conf_pairs...