Streaming engine
The Apache Spark streaming component is an integral part of the framework. It does not require any specific installation or configuration. Apache Spark In-memory capabilities are a good solution to problems dealing with large scale real-time processing.
There are numerous articles and books related to the Apache Spark streaming library. This section introduces some basic concepts in the context of machine learning algorithms.
Why streaming?
Many applications require real-time or pseudo real-time processing of data from weather reporting, automated manufacturing processing, ATMs, advertising targeting, to financial markets analysis. The implementation of such systems is challenging because of its stringent requirements:
- Low latency: Response time is sometimes computed in milliseconds
- Continuous traffic: Never ending stream of data
- No downtime: Fault-tolerant design to avoid loss of information
It is not uncommon that these requirements are formalized into contractual obligations...