Introduction
Life is not discrete; it is a continuous flow. The first four chapters were focused on how to deal with a data pipeline manipulating every message individually. But what happens when we need to find a pattern or make a calculation over a subset of messages?
In the data world, a stream is linked to the most important abstractions. A stream depicts a continuously updating and unbounded process. Here, unbounded means unlimited size. By definition, a stream is a fault-tolerant, replayable, and ordered sequence of immutable data records. A data record is defined as a key-value pair.
Before we proceed, some concepts need to be defined:
- Stream processing application: Any program that utilizes the Kafka streams library is known as a stream processing application.
- Processor topology: This is a topology that defines the computational logic of the data processing that a stream processing application requires to be performed. A topology is a graph of stream processors (nodes) connected by streams...