Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Big Data on Kubernetes

You're reading from   Big Data on Kubernetes A practical guide to building efficient and scalable data solutions

Arrow left icon
Product type Paperback
Published in Jul 2024
Publisher Packt
ISBN-13 9781835462140
Length 296 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Author (1):
Arrow left icon
Neylson Crepalde Neylson Crepalde
Author Profile Icon Neylson Crepalde
Neylson Crepalde
Arrow right icon
View More author details
Toc

Table of Contents (18) Chapters Close

Preface 1. Part 1:Docker and Kubernetes FREE CHAPTER
2. Chapter 1: Getting Started with Containers 3. Chapter 2: Kubernetes Architecture 4. Chapter 3: Getting Hands-On with Kubernetes 5. Part 2: Big Data Stack
6. Chapter 4: The Modern Data Stack 7. Chapter 5: Big Data Processing with Apache Spark 8. Chapter 6: Building Pipelines with Apache Airflow 9. Chapter 7: Apache Kafka for Real-Time Events and Data Ingestion 10. Part 3: Connecting It All Together
11. Chapter 8: Deploying the Big Data Stack on Kubernetes 12. Chapter 9: Data Consumption Layer 13. Chapter 10: Building a Big Data Pipeline on Kubernetes 14. Chapter 11: Generative AI on Kubernetes 15. Chapter 12: Where to Go from Here 16. Index 17. Other Books You May Enjoy

Getting started with Kafka

Kafka is a popular open source platform for building real-time data pipelines and streaming applications. In this section, we will learn how to get a basic Kafka environment running locally using docker-compose so that you can start building Kafka producers and consumers.

docker-compose is a tool that helps define and run multi-container Docker applications. With compose, you use a YAML file to configure your application’s services then spin everything up with one command. This allows you to avoid having to run and connect containers manually. To run our Kafka cluster, we will define a set of nodes using docker-compose. First, create a folder called multinode (just to keep our code organized) and create a new file called docker-compose.yaml. This is the regular file that docker-compose expects to set up the containers (the same as Dockerfile for Docker). To improve readability, we will not show the entire code (it is available at https://github.com/PacktPublishing/Bigdata-on-Kubernetes/tree/main/Chapter07/multinode), but a portion of it. Let’s take a look:

docker-compose.yaml

---
version: '2'
services:
    zookeeper-1:
      image: confluentinc/cp-zookeeper:7.6.0
      environment:
        ZOOKEEPER_SERVER_ID: 1
        ZOOKEEPER_CLIENT_PORT: 22181
        ZOOKEEPER_TICK_TIME: 2000
        ZOOKEEPER_INIT_LIMIT: 5
        ZOOKEEPER_SYNC_LIMIT: 2
        ZOOKEEPER_SERVERS: localhost:22888:23888;localhost:32888:33888;localhost:42888:43888
    network_mode: host
    extra_hosts:
      - "mynet:127.0.0.1"
    kafka-1:
      image: confluentinc/cp-kafka:7.6.0
      network_mode: host
      depends_on:
        - zookeeper-1
        - zookeeper-2
        - zookeeper-3
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: localhost:22181,localhost:32181,localhost:42181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:19092
    extra_hosts:
      - "mynet:127.0.0.1"

The original Docker Compose file is setting up a Kafka cluster with three Kafka brokers and three Zookeeper nodes (more details on Kafka architecture in the next section). We just left the definition for the first Zookeeper and Kafka brokers as the other ones are the same. Here, we’re using Confluent Kafka (an enterprise-ready version of Kafka maintained by Confluent Inc.) and Zookeeper images to create the containers. For the Zookeeper nodes, the key parameters are as follows:

  • ZOOKEEPER_SERVER_ID: The unique ID for each Zookeeper server in the ensemble.
  • ZOOKEEPER_CLIENT_PORT: The port for clients to connect to this Zookeeper node. We use different ports for each node.
  • ZOOKEEPER_TICK_TIME: The basic time unit used by Zookeeper for heartbeats.
  • ZOOKEEPER_INIT_LIMIT: The time the Zookeeper servers have to connect to a leader.
  • ZOOKEEPER_SYNC_LIMIT: How far out of date a server can be from a leader.
  • ZOOKEEPER_SERVERS: Lists all Zookeeper servers in the ensemble in address:leaderElectionPort:followerPort format.

For the Kafka brokers, the key parameters are as follows:

  • KAFKA_BROKER_ID: Unique ID for each Kafka broker.
  • KAFKA_ZOOKEEPER_CONNECT: Lists the Zookeeper ensemble that Kafka should connect to.
  • KAFKA_ADVERTISED_LISTENERS: Advertised listener for external connections to this broker. We use different ports for each broker.

The containers are configured to use host networking mode to simplify networking. The dependencies ensure Kafka only starts after Zookeeper is ready.

This code creates a fully functional Kafka cluster that can handle replication and failures of individual brokers or Zookeepers. Now, we will get those containers up and running. In a terminal, move to the multinode folder and type the following:

docker-compose up –d

This will tell docker-compose to get the containers up. If the necessary images are not found locally, they will be automatically downloaded. The -d parameter makes docker-compose run in detached mode. If we don’t use this parameter, the terminal will keep printing containers’ logs. We don’t want that, so we must use -d.

To check the logs for one of the Kafka Brokers, run the following command:

docker logs multinode-kafka-1-1

Here, multinode-kafka-1-1 is the name of the first Kafka Broker container we defined in the YAML file. With this command, you should be able to visualize Kafka’s logs and validate that everything is running correctly. Now, let’s take a closer look at Kafka’s architecture and understand how it works.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image