Introduction
With the prevalence of machine-generated real-time data, including but not limited to IoT sensors, devices, and beacons, it is increasingly important to gain insight into this fire hose of data as quickly as it is being created. Whether you are detecting fraudulent transactions, real-time detection of sensor anomalies, or sentiment analysis of the next cat video, streaming analytics is an increasingly important differentiator and business advantage.
As we progress through these recipes, we will be combining the constructs of batch and real-time processing for the creation of continuous applications. With Apache Spark, data scientists and data engineers can analyze their data using Spark SQL in batch and in real time, train machine learning models with MLlib, and score these models via Spark Streaming.
An important reason for the rapid adoption of Apache Spark is that it unifies all of these disparate data processing paradigms (machine learning via ML and MLlib, Spark SQL, and...