Streaming Data with Kafka
There are several streaming platforms on the market, but Apache Kafka is the front-runner. Kafka is an open source project like Spark but focuses on being a distributed message system. Kafka is used for several applications, including microservices and data engineering. Confluent is the largest contributor to Apache Kafka and offers several offerings in the ecosystem, such as Hosted Kafka, Schema Registry, Kafka Connect, and the Kafka REST API, among others. We will go through several areas of Confluent for Kafka, focusing on data processing and movement.
In this chapter, we will cover the following main topics:
- Kafka architecture
- Setting Confluent Kafka
- Kafka streams
- Schema Registry
- Spark and Kafka
- Kafka Connect