In this chapter, we discussed machine learning as a field of study and its various sub classifications. We then progressed ahead to discuss various parlance associated with machine learning and how machine learning has been implemented in Apache Spark. We also covered the datatypes that are used in the spark.mllib package and then discussed job chaining using pipeline. Terminologies around pipeline were also discussed in detail along with use cases. Finally, operations on features what a feeling discussed.
In the following chapter, we will look into another module of Spark, that is, GraphX, and we will discover types of GraphX RDD and various operations associated with them. We will also discuss use cases around GraphX implementation.