Using third-party word vectors
We can also use third-party word vectors within spaCy. In this section, we'll learn how to import a third-party word vector package into spaCy. We'll use fastText's subword-based pretrained vectors from the Facebook AI. You can view the list of all the available English pretrained vectors at https://fasttext.cc/docs/en/english-vectors.html.
The name of the package identifies the vector's dimension, the vocabulary size, and the corpus genre that the vectors will be trained on. For instance, wiki-news-300d-1M-subword.vec.zip
indicates that it contains 1 million 300-dimensional word vectors that have been trained on a Wikipedia corpus.
Let's start downloading the vectors:
- In your terminal, type the following command. Alternatively, you can copy and paste the URL into your browser and the download should start:
$ wget https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M-subword.vec.zip
The preceding...