Apache Kafka is a high-throughput, distributed, fault-tolerant, and replicated messaging system that was first developed at LinkedIn. The use cases of Kafka vary from log aggregation, to stream processing, to replacing other messaging systems.
Kafka has emerged as one of the important components of real-time processing pipelines in combination with Storm. Kafka can act as a buffer or feeder for messages that need to be processed by Storm. Kafka can also be used as the output sink for results emitted from Storm topologies.
In this chapter, we will be covering the following topics:
- Kafka architecture--broker, producer, and consumer
- Installation of the Kafka cluster
- Sharing the producer and consumer between Kafka
- Development of Storm topology using Kafka consumer as Storm spout
- Deployment of a Kafka and Storm integration topology