Using spaCy's pretrained vectors
We installed a medium-sized English spaCy language model in Chapter 1, Getting Started with spaCy, so that we can directly use word vectors. Word vectors are part of many spaCy language models. For instance, the en_core_web_md
model ships with 300-dimensional vectors for 20,000 words, while the en_core_web_lg
model ships with 300-dimensional vectors with a 685,000 word vocabulary.
Typically, small models (those whose names end with sm
) do not include any word vectors but include context-sensitive tensors. You can still make the following semantic similarity calculations, but the results won't be as accurate as word vector computations.
You can reach a word's vector via the token.vector
method. Let's look at this method in an example. The following code queries the word vector for banana:
import spacy nlp = spacy.load("en_core_web_md") doc = nlp("I ate a banana.") doc[3].vector
The following screenshot...