In this recipe, we will explore Word2Vec, which is Spark's tool for assessing word similarity. The Word2Vec algorithm is inspired by the distributional hypothesis in general linguistics. At the core, what it tries to say is that the tokens which occur in the same context (that is, distance from the target) tend to support the same primitive concept/meaning.
The Word2Vec algorithm was invented by a team of researchers at Google. Please refer to a white paper mentioned in the There's more... section of this recipe which describes Word2Vec in more detail.