Getting started with Kafka
Kafka is a popular open source platform for building real-time data pipelines and streaming applications. In this section, we will learn how to get a basic Kafka environment running locally using docker-compose
so that you can start building Kafka producers and consumers.
docker-compose
is a tool that helps define and run multi-container Docker applications. With compose, you use a YAML file to configure your application’s services then spin everything up with one command. This allows you to avoid having to run and connect containers manually. To run our Kafka cluster, we will define a set of nodes using docker-compose
. First, create a folder called multinode
(just to keep our code organized) and create a new file called docker-compose.yaml
. This is the regular file that docker-compose
expects to set up the containers (the same as Dockerfile for Docker). To improve readability, we will not show the entire code (it is available at https://github.com/PacktPublishing/Bigdata-on-Kubernetes/tree/main/Chapter07/multinode), but a portion of it. Let’s take a look:
docker-compose.yaml
--- version: '2' services: zookeeper-1: image: confluentinc/cp-zookeeper:7.6.0 environment: ZOOKEEPER_SERVER_ID: 1 ZOOKEEPER_CLIENT_PORT: 22181 ZOOKEEPER_TICK_TIME: 2000 ZOOKEEPER_INIT_LIMIT: 5 ZOOKEEPER_SYNC_LIMIT: 2 ZOOKEEPER_SERVERS: localhost:22888:23888;localhost:32888:33888;localhost:42888:43888 network_mode: host extra_hosts: - "mynet:127.0.0.1" kafka-1: image: confluentinc/cp-kafka:7.6.0 network_mode: host depends_on: - zookeeper-1 - zookeeper-2 - zookeeper-3 environment: KAFKA_BROKER_ID: 1 KAFKA_ZOOKEEPER_CONNECT: localhost:22181,localhost:32181,localhost:42181 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:19092 extra_hosts: - "mynet:127.0.0.1"
The original Docker Compose file is setting up a Kafka cluster with three Kafka brokers and three Zookeeper nodes (more details on Kafka architecture in the next section). We just left the definition for the first Zookeeper and Kafka brokers as the other ones are the same. Here, we’re using Confluent Kafka (an enterprise-ready version of Kafka maintained by Confluent Inc.) and Zookeeper images to create the containers. For the Zookeeper nodes, the key parameters are as follows:
ZOOKEEPER_SERVER_ID
: The unique ID for each Zookeeper server in the ensemble.ZOOKEEPER_CLIENT_PORT
: The port for clients to connect to this Zookeeper node. We use different ports for each node.ZOOKEEPER_TICK_TIME
: The basic time unit used by Zookeeper for heartbeats.ZOOKEEPER_INIT_LIMIT
: The time the Zookeeper servers have to connect to a leader.ZOOKEEPER_SYNC_LIMIT
: How far out of date a server can be from a leader.ZOOKEEPER_SERVERS
: Lists all Zookeeper servers in the ensemble inaddress:leaderElectionPort:followerPort
format.
For the Kafka brokers, the key parameters are as follows:
KAFKA_BROKER_ID
: Unique ID for each Kafka broker.KAFKA_ZOOKEEPER_CONNECT
: Lists the Zookeeper ensemble that Kafka should connect to.KAFKA_ADVERTISED_LISTENERS
: Advertised listener for external connections to this broker. We use different ports for each broker.
The containers are configured to use host networking mode to simplify networking. The dependencies ensure Kafka only starts after Zookeeper is ready.
This code creates a fully functional Kafka cluster that can handle replication and failures of individual brokers or Zookeepers. Now, we will get those containers up and running. In a terminal, move to the multinode
folder and type the following:
docker-compose up –d
This will tell docker-compose
to get the containers up. If the necessary images are not found locally, they will be automatically downloaded. The -d
parameter makes docker-compose
run in detached mode. If we don’t use this parameter, the terminal will keep printing containers’ logs. We don’t want that, so we must use -d
.
To check the logs for one of the Kafka Brokers, run the following command:
docker logs multinode-kafka-1-1
Here, multinode-kafka-1-1
is the name of the first Kafka Broker container we defined in the YAML file. With this command, you should be able to visualize Kafka’s logs and validate that everything is running correctly. Now, let’s take a closer look at Kafka’s architecture and understand how it works.