Introduction
Machine learning is a field of study that gives computers the ability to learn without being explicitly programmed. Many successful applications of machine learning exist already, including systems that analyse past sales data to predict customer behavior, optimizing robot behavior so that a task can be completed using minimum resources and extracting knowledge from bio-informatics data. With the advent of big data, maintaining large collections of data is one thing, but extracting useful information from these collections is even more challenging. The ML system should be able to scale on high volumes of data, the accuracy of the models built would also have to be quite high as the training takes place on large data.
Big data and machine learning take place in three steps-collecting, analyzing, and predicting. For this purpose, the Spark ecosystem supports a wide range of workloads including batch applications, iterative algorithms, interactive queries, and stream processing...