Word2Vec is a sophisticated neural network style natural language processing tool and uses a technique called skip-grams to convert a sentence of words into an embedded vector representation. Let's look at an example of how this can be used by looking at a collection of sentences about animals:
- A dog was barking
- Some cows were grazing the grass
- Dogs usually bark randomly
- The cow likes grass
Using neural network with a hidden layer (machine learning algorithm used in many unsupervised learning applications), we can learn (with enough examples) that dog and barking are related, cow and grass are related in the sense that they appear close to each other a lot, which is measured by probabilities. The output of Word2vec is a vector of Double features.
In order to invoke Word2vec, you need to import the package:
import org.apache.spark.ml.feature.Word2Vec
First, you...