Introduction
MLlib is the machine learning (ML) library that is provided with Apache Spark, the in-memory, cluster-based, open source data processing system. In this chapter, I will examine the functionality of algorithms provided within the MLlib library in terms of areas of machine learning tasks such as classification, recommendation, and neural processing. For each algorithm, we'll provide working examples that tackle real problems. We will take a step-by-step approach in describing how the following algorithms can be used, and what they are capable of doing.
Big data and machine learning takes place in three steps-collect, analyze and predict. For this purpose, the Spark ecosystem supports a wide range of workloads, including batch applications, iterative algorithms, interactive queries, and stream processing. The Spark MLlib component offers a variety of ML algorithms which are scalable.