Traditional batch applications typically ran for hours, processing all or most of the data stored in relational databases. More recently, Hadoop-based systems have been used to support MapReduce-based batch jobs to process very large volumes of distributed data. In contrast, stream processing occurs on streaming data that is continuously generated. Such processing is used in a wide variety of analytics applications that compute correlations between events, aggregate values, sample incoming data, and so on.
Stream processing typically ingests a sequence of data and incrementally computes statistics and other functions on a record-by-record/event-by-event basis, or over sliding time windows, on the fly.
Increasingly, streaming data applications are applying machine learning algorithms and Complex Event Processing (CEP) algorithms to provide...