As discussed in Chapter 9, Developing Applications with Spark SQL, the standard approach to statistical modeling of language is typically based on counting the frequency of the occurrences of n-grams. This usually requires very large training corpora in most real-world use cases. Additionally, n-grams treat each word as an independent unit, so they cannot generalize across semantically related sequences of words. In contrast, neural language models associate each word with a vector of real-value features and therefore semantically-related words end up close to each other in that vector space. Learning word vectors also works very well when the word sequences come from a large corpus of real text. These word vectors are composed of learned features that are automatically discovered by the neural network.
Vector representations...