In the big data world, Kafka can be used in multiple ways. One of the common usage patterns of Kafka is to use it as a streaming data platform. It supports storing streaming data from varied sources, and that data can later be processed in real time or in batch.
The following diagram shows a typical pattern for using Kafka as a streaming data platform:
Kafka as streaming data platform
The previous diagram depicts how Kafka can be used for storing events from a variety of data sources. Of course, the data ingestion mechanism would differ depending upon the type of data sources. However, once data is stored in Kafka topics, it can be used in data search engines, real-time processing, or alerting and even for batch processing.
Batch processing engines, such as Gobblin, read data from Kafka and use Hadoop MapReduce to store data in Hadoop...