In this chapter, we started with installing Spark, then moved to introducing Spark in depth while understanding how Spark works in combination with RDDs. We also walked through various ways of creating RDDs while exploring different operations. We then introduced MLlib, and stepped through some detailed examples of decision trees and K-Means Clustering in Spark. We then pulled off our masterstroke of creating a search engine in just a few lines of code using TF-IDF. Finally, we looked at the new features of Spark 2.0.
In the next chapter, we'll take a look at A/B testing and experimental design.