Introducing Apache Kafka
In the previous section, Introducing Lambda Architecture, we mentioned that Kafka is used for stream processing. Apache Kafka is a high throughput distributed messaging system. It allows decoupling the data coming in with the data going out.
It means that multiple systems (producers) can send messages to Kafka. Kafka will then deliver these messages out to the consumers registered.
Kafka is distributed, resilient, and fault-tolerant, and has a very low latency. Kafka can scale horizontally by adding more machines to the system. It is written in Scala and Java.
Kafka is broadly used; Airbnb, Netflix, Uber, and LinkedIn use this technology.
The purpose of this chapter is not to become an expert in Kafka, but rather to familiarize you with the fundamentals of this technology. By the end of the chapter, you will be able to understand the use case developed in this chapter—streaming bitcoin transactions in a Lambda architecture.
Topics, partitions, and offsets
In order to process...