Big data is composed of massive databases, and millions or even billions of document files. One of the possible ways to generate insights from these datasets is by batch processing. One of the classical approaches of batch processing is called Hadoop's MapReduce paradigm. The processing time can take anywhere from minutes to hours, or even more—it all depends on the size of the job. But if we are thinking about insights in real time, we are more concerned about streaming data.
Streaming data can be defined as a sequence of digitally-encoded coherent signals that are used to send or receive information that is in the process of being transmitted.
Formally, it can be defined as any ordered pair (S, Δ) (S, Δ), where S is a sequence of tuples and Δ is a sequence of positive real-time intervals.
In order to humanize the...