Google provides a large, pretrained Word2Vec model with around 3 million 300-dimension English word vectors. It is large enough, and pretrained to display promising results. We will use Google vectors as our input word vectors for the evaluation. You will need at least 8 GB of RAM to run this example. In this recipe, we will import the Google News vectors and then perform an evaluation.
Importing Google News vectors
How to do it...
- Import the Google News vectors:
File file = new File("GoogleNews-vectors-negative300.bin.gz");
Word2Vec model = WordVectorSerializer.readWord2VecModel(file);
- Run an evaluation on the Google News vectors:
model.wordsNearest("season",10))