Understanding the Kappa Architecture
The Kappa Architecture is simpler than Lambda pattern as it comprises the Speed and Serving Layers only. All the computations occur as stream processing and there are no batch re-computations done on the full Dataset. Recomputations are only done to support changes and new requirements.
Typically, the incoming real-time data stream is processed in memory is persisted in a database or HDFS to support queries, as illustrated in the following figure:
The Kappa Architecture can be realized by using Apache Spark combined with a queuing solution, such as Apache Kafka. If the data retention times are bound to several days to weeks, then Kafka could also be used to retain the data for the limited period of time.
In the next few sections, we will introduce a few hands-on exercises using Apache Spark, Scala, and Apache Kafka that are very useful in the real-world applications development context. We will start by using Spark SQL and Structured Streaming to implement...