Word2vec
While GloVe is a count-based model where a matrix is created for counting words, word2vec is a predictive model that uses prediction and loss adjustment to find the similarity. It works like a feed-forward neural network and is optimized using various techniques, including stochastic gradient descent (SGD), which are core concepts of machine learning. It is more useful in predicting the words from the given context words in vector representation. We will be using the implementation of word2vec from https://github.com/IsaacChanghau/Word2VecfJava. We will also need the GoogleNews-vectors-negative300.bin
file from https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing, as it contains pre-trained vectors for the GoogleNews
dataset with 300 dimensional vectors for 3 million words and phrases. The example program will find the similar word to kill. The following is the sample output:
loading embeddings and creating word2vec... [main] INFO org.nd4j.linalg.factory...