In this chapter, we took a deeper look into more complex text processing and explored MLlib's text feature extraction capabilities, in particular the tf-idf term weighting schemes. We covered examples of using the resulting tf-idf feature vectors to compute document similarity and train a newsgroup topic classification model. Finally, you learned how to use MLlib's cutting-edge Word2Vec model to compute a vector representation of words in a corpus of text and use the trained model to find words with contextual meaning that is similar to a given word. We also looked at using Word2Vec with Spark ML
In the next chapter, we will take a look at online learning, and you will learn how Spark Streaming relates to online learning models.