Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Apache Kafka Quick Start Guide
Apache Kafka Quick Start Guide

Apache Kafka Quick Start Guide: Leverage Apache Kafka 2.0 to simplify real-time data processing for distributed applications

eBook
$9.99 $25.99
Paperback
$32.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Apache Kafka Quick Start Guide

Configuring Kafka

This chapter describes what Kafka is and the concepts related to this technology: brokers, topics, producers, and consumers. It also talks about how to build a simple producer and consumer from the command line, as well as how to install Confluent Platform. The information in this chapter is fundamental to the following chapters.

In this chapter, we will cover the following topics:

  • Kafka in a nutshell
  • Installing Kafka (Linux and macOS)
  • Installing the Confluent Platform
  • Running Kafka
  • Running Confluent Platform
  • Running Kafka brokers
  • Running Kafka topics
  • A command–line message producer
  • A command–line message consumer
  • Using kafkacat

Kafka in a nutshell

Apache Kafka is an open source streaming platform. If you are reading this book, maybe you already know that Kafka scales very well in a horizontal way without compromising speed and efficiency.

The Kafka core is written in Scala, and Kafka Streams and KSQL are written in Java. A Kafka server can run in several operating systems: Unix, Linux, macOS, and even Windows. As it usually runs in production on Linux servers, the examples in this book are designed to run on Linux environments. The examples in this book also consider bash environment usage.

This chapter explains how to install, configure, and run Kafka. As this is a Quick Start Guide, it does not cover Kafka's theoretical details. At the moment, it is appropriate to mention these three points:

  • Kafka is a service bus: To connect heterogeneous applications, we need to implement a message publication mechanism to send and receive messages among them. A message router is known as message broker. Kafka is a message broker, a solution to deal with routing messages among clients in a quick way.
  • Kafka architecture has two directives: The first is to not block the producers (in order to deal with the back pressure). The second is to isolate producers and consumers. The producers should not know who their consumers are, hence Kafka follows the dumb broker and smart clients model.
  • Kafka is a real-time messaging system: Moreover, Kafka is a software solution with a publish-subscribe model: open source, distributed, partitioned, replicated, and commit-log-based.

There are some concepts and nomenclature in Apache Kafka:

  • Cluster: This is a set of Kafka brokers.
  • Zookeeper: This is a cluster coordinator—a tool with different services that are part of the Apache ecosystem.
  • Broker: This is a Kafka server, also the Kafka server process itself.
  • Topic: This is a queue (that has log partitions); a broker can run several topics.
  • Offset: This is an identifier for each message.
  • Partition: This is an immutable and ordered sequence of records continually appended to a structured commit log.
  • Producer: This is the program that publishes data to topics.
  • Consumer: This is the program that processes data from the topics.
  • Retention period: This is the time to keep messages available for consumption.

In Kafka, there are three types of clusters:

  • Single node–single broker
  • Single node–multiple broker
  • Multiple node–multiple broker

In Kafka, there are three (and just three) ways to deliver messages:

  • Never redelivered: The messages may be lost because, once delivered, they are not sent again.
  • May be redelivered: The messages are never lost because, if it is not received, the message can be sent again.
  • Delivered once: The message is delivered exactly once. This is the most difficult form of delivery; since the message is only sent once and never redelivered, it implies that there is zero loss of any message.

The message log can be compacted in two ways:

  • Coarse-grained: Log compacted by time
  • Fine-grained: Log compacted by message

Kafka installation

There are three ways to install a Kafka environment:

  • Downloading the executable files
  • Using brew (in macOS) or yum (in Linux)
  • Installing Confluent Platform

For all three ways, the first step is to install Java; we need Java 8. Download and install the latest JDK 8 from the Oracle's website:

http://www.oracle.com/technetwork/java/javase/downloads/index.html

At the time of writing, the latest Java 8 JDK version is 8u191.

For Linux users :

  1. Change the file mode to executable as follows, follows these steps:
      > chmod +x jdk-8u191-linux-x64.rpm
  1. Go to the directory in which you want to install Java:
      > cd <directory path>

  1. Run the rpm installer with the following command:
      > rpm -ivh jdk-8u191-linux-x64.rpm

  1. Add to your environment the JAVA_HOME variable. The following command writes the JAVA_HOME environment variable to the /etc/profile file:
      > echo "export JAVA_HOME=/usr/java/jdk1.8.0_191" >> /etc/profile
  1. Validate the Java installation as follows:
      > java -version
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

At the time of writing, the latest Scala version is 2.12.6. To install Scala in Linux, perform the following steps:

  1. Download the latest Scala binary from http://www.scala-lang.org/download
  2. Extract the downloaded file, scala-2.12.6.tgz, as follows:
      > tar xzf scala-2.12.6.tgz
  1. Add the SCALA_HOME variable to your environment as follows:
      > export SCALA_HOME=/opt/scala
  1. Add the Scala bin directory to your PATH environment variable as follows:
      > export PATH=$PATH:$SCALA_HOME/bin
  1. To validate the Scala installation, do the following:
      >  scala -version
Scala code runner version 2.12.6 -- Copyright 2002-2018,
LAMP/EPFL and Lightbend, Inc.

To install Kafka on your machine, ensure that you have at least 4 GB of RAM, and the installation directory will be /usr/local/kafka/ for macOS users and /opt/kafka/ for Linux users. Create these directories according to your operating system.

Kafka installation on Linux

Open the Apache Kafka download page, http://kafka.apache.org/downloads, as in Figure 1.1:

Figure 1.1: Apache Kafka download page

At the time of writing, the current Apache Kafka version is 2.0.0 as a stable release. Remember that, since version 0.8.x, Kafka is not backward-compatible. So, we cannot replace this version for one prior to 0.8. Once you've downloaded the latest available release, let's proceed with the installation.

Remember for macOS users, replace the directory /opt/ with /usr/local.

Follow these steps to install Kafka in Linux:

  1. Extract the downloaded file, kafka_2.11-2.0.0.tgz, in the /opt/ directory as follows:
      > tar xzf kafka_2.11-2.0.0.tgz
  1. Create the KAFKA_HOME environment variable as follows:
      > export KAFKA_HOME=/opt/kafka_2.11-2.0.0
  1. Add the Kafka bin directory to the PATH variable as follows:
      > export PATH=$PATH:$KAFKA_HOME/bin

Now Java, Scala, and Kafka are installed.

To do all of the previous steps from the command line, there is a powerful tool for macOS users called brew (the equivalent in Linux would be yum).

Kafka installation on macOS

To install from the command line in macOS (brew must be installed), perform the following steps:

  1. To install sbt (the Scala build tool) with brew, execute the following:
      > brew install sbt

If already have it in your environment (downloaded previously), run the following to upgrade it:

      > brew upgrade sbt

The output is similar to that shown in Figure 1.2:

Figure 1.2: The Scala build tool installation output
  1. To install Scala with brew, execute the following:
      > brew install scala

If you already have it in your environment (downloaded previously), to upgrade it, run the following command:

      > brew upgrade scala

The output is similar to that shown in Figure 1.3:

Figure 1.3: The Scala installation output
  1. To install Kafka with brew, (it also installs Zookeeper), do the following:
      > brew install kafka

If you already have it (downloaded in the past), upgrade it as follows:

      > brew upgrade kafka

The output is similar to that shown in Figure 1.4:

Figure 1.4: Kafka installation output

Visit https://brew.sh/ for more about brew.

Confluent Platform installation

The third way to install Kafka is through Confluent Platform. In the rest of this book, we will be using Confluent Platform open source version.

Confluent Platform is an integrated platform that includes the following components:

  • Apache Kafka
  • REST proxy
  • Kafka Connect API
  • Schema Registry
  • Kafka Streams API
  • Pre-built connectors
  • Non-Java clients
  • KSQL

If the reader notices, almost every one of the components has its own chapter in this book.

The commercially licensed Confluent Platform includes, in addition to all of the components of the open source version, the following:

  • Confluent Control Center (CCC)
  • Kafka operator (for Kubernetes)
  • JMS client
  • Replicator
  • MQTT proxy
  • Auto data balancer
  • Security features

It is important to mention that the training on the components of the non-open source version is beyond the scope of this book.

Confluent Platform is available also in Docker images, but here we are going to install it in local.

Open Confluent Platform download page: https://www.confluent.io/download/ .

At the time of this writing, the current version of Confluent Platform is 5.0.0 as a stable release. Remember that, since the Kafka core runs on Scala, there are two versions: for Scala 2.11 and Scala 2.12.

We could run Confluent Platform from our desktop directory, but following this book's conventions, let's use /opt/ for Linux users and /usr/local for macOS users.

To install Confluent Platform, extract the downloaded file, confluent-5.0.0-2.11.tar.gz, in the directory, as follows:

> tar xzf confluent-5.0.0-2.11.tar.gz

Running Kafka

There are two ways to run Kafka, depending on whether we install it directly or through Confluent Platform.

If we install it directly, the steps to run Kafka are as follows.

For macOS users, your paths might be different if you've installed using brew. Check the output of brew install kafka command for the exact command that you can use to start Zookeeper and Kafka.

Go to the Kafka installation directory (/usr/local/kafka for macOS users and /opt/kafka/ for Linux users), as in the example:

> cd /usr/local/kafka

First of all, we need to start Zookeeper (the Kafka dependency with Zookeeper is and will remain strong). Type the following:

> ./bin/zookeeper-server-start.sh ../config/zookeper.properties

ZooKeeper JMX enabled by default

Using config: /usr/local/etc/zookeeper/zoo.cfg

Starting zookeeper ... STARTED

To check whether Zookeeper is running, use the lsof command over the 9093 port (default port) as follows:

> lsof -i :9093

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME

java 12529 admin 406u IPv6 0xc41a24baa4fedb11 0t0 TCP *:9093 (LISTEN)

Now run the Kafka server that comes with the installation by going to /usr/local/kafka/ for macOS users and /opt/kafka/ for Linux users:

> ./bin/kafka-server-start.sh ./config/server.properties

Now there is an Apache Kafka broker running in your machine.

Remember that Zookeeper must be running on the machine before starting Kafka. If you don't want to start Zookeeper manually every time you need to run Kafka, install it as an operation system auto-start service.

Running Confluent Platform

Go to the Confluent Platform installation directory (/usr/local/kafka/ for macOS users and /opt/kafka/ for Linux users) and type the following:

> cd /usr/local/confluent-5.0.0

To start Confluent Platform, run the following:

> bin/confluent start

This command-line interface is intended for development only, not for production:

https://docs.confluent.io/current/cli/index.html

The output is similar to what is shown in the following code snippet:

Using CONFLUENT_CURRENT: /var/folders/nc/4jrpd1w5563crr_np997zp980000gn/T/confluent.q3uxpyAt

Starting zookeeper
zookeeper is [UP]
Starting kafka
kafka is [UP]
Starting schema-registry
schema-registry is [UP]
Starting kafka-rest
kafka-rest is [UP]
Starting connect
connect is [UP]
Starting ksql-server
ksql-server is [UP]
Starting control-center
control-center is [UP]

As indicated by the command output, Confluent Platform automatically starts in this order: Zookeeper, Kafka, Schema Registry, REST proxy, Kafka Connect, KSQL, and the Confluent Control Center.

To access the Confluent Control Center running in your local, go to http://localhost:9021, as shown in Figure 1.5:

Figure 1.5: Confluent Control Center main page

There are other commands for Confluent Platform.

To get the status of all services or the status of a specific service along with its dependencies, enter the following:

> bin/confluent status

To stop all services or a specific service along with the services depending on it, enter the following:

> bin/confluent stop

To delete the data and logs of the current Confluent Platform, type the following:

> bin/confluent destroy

Running Kafka brokers

The real art behind a server is in its configuration. In this section, we will examine how to deal with the basic configuration of a Kafka broker in standalone mode. Since we are learning, at the moment, we will not review the cluster configuration.

As we can suppose, there are two types of configuration: standalone and cluster. The real power of Kafka is unlocked when running with replication in cluster mode and all topics are correctly partitioned.

The cluster mode has two main advantages: parallelism and redundancy. Parallelism is the capacity to run tasks simultaneously among the cluster members. The redundancy warrants that, when a Kafka node goes down, the cluster is safe and accessible from the other running nodes.

This section shows how to configure a cluster with several nodes on our local machine although, in practice, it is always better to have several machines with multiple nodes sharing clusters.

Go to the Confluent Platform installation directory, referenced from now on as <confluent-path>.

As mentioned in the beginning of this chapter, a broker is a server instance. A server (or broker) is actually a process running in the operating system and starts based on its configuration file.

The people of Confluent have kindly provided us with a template of a standard broker configuration. This file, which is called server.properties, is located in the Kafka installation directory in the config subdirectory:

  1. Inside <confluent-path>, make a directory with the name mark.
  1. For each Kafka broker (server) that we want to run, we need to make a copy of the configuration file template and rename it accordingly. In this example, our cluster is going to be called mark:
> cp config/server.properties <confluent-path>/mark/mark-1.properties
> cp config/server.properties <confluent-path>/mark/mark-2.properties
  1. Modify each properties file accordingly. If the file is called mark-1, the broker.id should be 1. Then, specify the port in which the server will run; the recommendation is 9093 for mark-1 and 9094 for mark-2. Note that the port property is not set in the template, so add the line. Finally, specify the location of the Kafka logs (a Kafka log is a specific archive to store all of the Kafka broker operations); in this case, we use the /tmp directory. Here, it is common to have problems with write permissions. Do not forget to give write and execute permissions to the user with whom these processes are executed over the log directory, as in the examples:
  • In mark-1.properties, set the following:
      broker.id=1
port=9093
log.dirs=/tmp/mark-1-logs
  • In mark-2.properties, set the following:
      broker.id=2
port=9094
log.dirs=/tmp/mark-2-logs
  1. Start the Kafka brokers using the kafka-server-start command with the corresponding configuration file passed as the parameter. Don't forget that Confluent Platform must be already running and the ports should not be in use by another process. Start the Kafka brokers as follows:
      > <confluent-path>/bin/kafka-server-start <confluent-
path>/mark/mark-1.properties &

And, in another command-line window, run the following command:

      > <confluent-path>/bin/kafka-server-start <confluent-
path>/mark/mark-2.properties &

Don't forget that the trailing & is to specify that you want your command line back. If you want to see the broker output, it is recommended to run each command separately in its own command-line window.

Remember that the properties file contains the server configuration and that the server.properties file located in the config directory is just a template.

Now there are two brokers, mark-1 and mark-2 , running in the same machine in the same cluster.

Remember, there are no dumb questions, as in the following examples:

Q: How does each broker know which cluster it belongs to?

A: The brokers know that they belong to the same cluster because, in the configuration, both point to the same Zookeeper cluster.

Q: How does each broker differ from the others within the same cluster?

A: Every broker is identified inside the cluster by the name specified in the broker.id property.

Q: What happens if the port number is not specified?

A: If the port property is not specified, Zookeeper will assign the same port number and will overwrite the data.

Q: What happens if the log directory is not specified?

A: If log.dir is not specified, all the brokers will write to the same default log.dir. If the brokers are planned to run in different machines, then the port and log.dir properties might not be specified (because they run in the same port and log file but in different machines).

Q: How can I check that there is not a process already running in the port where I want to start my broker?

A: As shown in the previous section, there is a useful command to see what process is running on specific port, in this case the 9093 port:

> lsof -i :9093

The output of the previous command is something like this:

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME

java 12529 admin 406u IPv6 0xc41a24baa4fedb11 0t0 TCP *:9093 (LISTEN)

Your turn: try to run this command before starting the Kafka brokers, and run it after starting them to see the change. Also, try to start a broker on a port in use to see how it fails.

OK, what if I want my cluster to run on several machines?

To run Kafka nodes on different machines but in the same cluster, adjust the Zookeeper connection string in the configuration file; its default value is as follows:

zookeeper.connect=localhost:2181

Remember that the machines must be able to be found by each other by DNS and that there are no network security restrictions between them.

The default value for Zookeeper connect is correct only if you are running the Kafka broker in the same machine as Zookeeper. Depending on the architecture, it will be necessary to decide if there will be a broker running on the same Zookeeper machine.

To specify that Zookeeper might run in other machines, do the following:

zookeeper.connect=localhost:2181, 192.168.0.2:2183, 192.168.0.3:2182

The previous line specifies that Zookeeper is running in the local host machine on port 2181, in the machine with IP address 192.168.0.2 on port 2183 , and in the machine with IP address, the 192.168.0.3, on port 2182. The Zookeeper default port is 2181, so normally it runs there.

Your turn: as an exercise, try to start a broker with incorrect information about the Zookeeper cluster. Also, using the lsof command, try to raise Zookeeper on a port in use.

If you have doubts about the configuration, or it is not clear what values to change, the server.properties template (as all of the Kafka project) is open sourced in the following:

https://github.com/apache/kafka/blob/trunk/config/server.properties

Running Kafka topics

The power inside a broker is the topic, namely the queues inside it. Now that we have two brokers running, let's create a Kafka topic on them.

Kafka, like almost all modern infrastructure projects, has three ways of building things: through the command line, through programming, and through a web console (in this case the Confluent Control Center). The management (creation, modification, and destruction) of Kafka brokers can be done through programs written in most modern programming languages. If the language is not supported, it could be managed through the Kafka REST API. The previous section showed how to build a broker using the command line. In later chapters, we will see how to do this process through programming.

Is it possible to only manage (create, modify, or destroy) brokers through programming? No, we can also manage the topics. The topics can also be created through the command line. Kafka has pre-built utilities to manage brokers as we already saw and to manage topics, as we will see next.

To create a topic called amazingTopic in our running cluster, use the following command:

> <confluent-path>/bin/kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic amazingTopic

The output should be as follows:

Created topic amazingTopic

Here, the kafka-topics command is used. With the --create parameter it is specified that we want to create a new topic. The --topic parameter sets the name of the topic, in this case, amazingTopic.

Do you remember the terms parallelism and redundancy? Well, the –-partitions parameter controls the parallelism and the --replication-factor parameter controls the redundancy.

The --replication-factor parameter is fundamental as it specifies in how many servers of the cluster the topic is going to replicate (for example, running). On the other hand, one broker can run just one replica.

Obviously, if a greater number than the number of running servers on the cluster is specified, it will result in an error (you don't believe me? Try it in your environment). The error will be like this:

Error while executing topic command: replication factor: 3 larger than available brokers: 2

[2018-09-01 07:13:31,350] ERROR org.apache.kafka.common.errors.InvalidReplicationFactorException: replication factor: 3 larger than available brokers: 2

(kafka.admin.TopicCommand$)

To be considered, the broker should be running (don't be shy and test all this theory in your environment).

The --partitions parameter, as its name implies, says how many partitions the topic will have. The number determines the parallelism that can be achieved on the consumer's side. This parameter is very important when doing cluster fine-tuning.

Finally, as expected, the --zookeeper parameter indicates where the Zookeeper cluster is running.

When a topic is created, the output in the broker log is something like this:

[2018-09-01 07:05:53,910] INFO [ReplicaFetcherManager on broker 1] Removed fetcher for partitions amazingTopic-0 (kafka.server.ReplicaFetcherManager)

[2018-09-01 07:05:53,950] INFO Completed load of log amazingTopic-0 with 1 log segments and log end offset 0 in 21 ms (kafka.log.Log)

In short, this message reads like a new topic has been born in our cluster.

How can I check my new and shiny topic? By using the same command: kafka-topics.

There are more parameters than --create. To check the status of a topic, run the kafka-topics command with the --list parameter, as follows:

> <confluent-path>/bin/kafka-topics.sh --list --zookeeper localhost:2181

The output is the list of topics, as we know, is as follows:

amazingTopic

This command returns the list with the names of all of the running topics in the cluster.

How can I get details of a topic? Using the same command: kafka-topics.

For a particular topic, run the kafka-topics command with the --describe parameter, as follows:

> <confluent-path>/bin/kafka-topics --describe --zookeeper localhost:2181 --topic amazingTopic

The command output is as follows:

Topic:amazingTopic PartitionCount:1 ReplicationFactor:1 Configs: Topic: amazingTopic Partition: 0 Leader: 1 Replicas: 1 Isr: 1

Here is a brief explanation of the output:

  • PartitionCount: Number of partitions on the topic (parallelism)
  • ReplicationFactor: Number of replicas on the topic (redundancy)
  • Leader: Node responsible for reading and writing operations of a given partition
  • Replicas: List of brokers replicating this topic data; some of these might even be dead
  • Isr: List of nodes that are currently in-sync replicas

Let's create a topic with multiple replicas (for example, we will run with more brokers in the cluster); we type the following:

> <confluent-path>/bin/kafka-topics --create --zookeeper localhost:2181 --replication-factor 2 --partitions 1 --topic redundantTopic

The output is as follows:

Created topic redundantTopic

Now, call the kafka-topics command with the --describe parameter to check the topic details, as follows:

> <confluent-path>/bin/kafka-topics --describe --zookeeper localhost:2181 --topic redundantTopic


Topic:redundantTopic PartitionCount:1 ReplicationFactor:2 Configs:

Topic: redundantTopic Partition: 0 Leader: 1 Replicas: 1,2 Isr: 1,2

As you can see, Replicas and Isr are the same lists; we infer that all of the nodes are in-sync.

Your turn: play with the kafka-topics command, and try to create replicated topics on dead brokers and see the output. Also, create topics on running servers and then kill them to see the results. Was the output what you expected?

As mentioned before, all of these commands executed through the command line can be executed programmatically or performed through the Confluent Control Center web console.

A command-line message producer

Kafka also has a command to send messages through the command line; the input can be a text file or the console standard input. Each line typed in the input is sent as a single message to the cluster.

For this section, the execution of the previous steps is needed. The Kafka brokers must be up and running and a topic created inside them.

In a new command-line window, run the following command, followed by the lines to be sent as messages to the server:

> <confluent-path>/bin/kafka-console-producer --broker-list localhost:9093 --topic amazingTopic

Fool me once shame on you
Fool me twice shame on me

These lines push two messages into the amazingTopic running on the localhost cluster on the 9093 port.

This command is also the simplest way to check whether a broker with a specific topic is up and running as it is expected.

As we can see, the kafka-console-producer command receives the following parameters:

  • --broker-list: This specifies the Zookeeper servers specified as a comma-separated list in the form, hostname:port.
  • --topic: This parameter is followed by the name of the target topic.
  • --sync: This specifies whether the messages should be sent synchronously.
  • --compression-codec: This specifies the compression codec used to produce the messages. The possible options are: none, gzip, snappy, or lz4. If not specified, the default is gzip.
  • --batch-size: If the messages are not sent synchronously, but the message size is sent in a single batch, this value is specified in bytes.
  • --message-send-max-retries: As the brokers can fail receiving messages, this parameter specifies the number of retries before a producer gives up and drops the message. This number must be a positive integer.
  • --retry-backoff-ms: In case of failure, the node leader election might take some time. This parameter is the time to wait before producer retries after this election. The number is the time in milliseconds.
  • --timeout: If the producer is running in asynchronous mode and this parameter is set, it indicates the maximum amount of time a message will queue awaiting for the sufficient batch size. This value is expressed in milliseconds.
  • --queue-size: If the producer is running in asynchronous mode and this parameter is set, it gives the maximum amount of messages will queue awaiting the sufficient batch size.

In case of a server fine tuning, batch-size, message-send-max-retries, and retry-backoff-ms are very important; take in consideration these parameters to achieve the desired behavior.

If you don't want to type the messages, the command could receive a file where each line is considered a message, as shown in the following example:

<confluent-path>/bin/kafka-console-producer --broker-list localhost:9093 –topic amazingTopic < aLotOfWordsToTell.txt

A command-line message consumer

The last step is how to read the generated messages. Kafka also has a powerful command that enables messages to be consumed from the command line. Remember that all of these command-line tasks can also be done programmatically. As the producer, each line in the input is considered a message from the producer.

For this section, the execution of the previous steps is needed. The Kafka brokers must be up and running and a topic created inside them. Also, some messages need to be produced with the message console producer, to begin consuming these messages from the console.

Run the following command:

> <confluent-path>/bin/kafka-console-consumer --topic amazingTopic --bootstrap-server localhost:9093 --from-beginning

The output should be as follows:

Fool me once shame on you
Fool me twice shame on me

The parameters are the topic's name and the name of the broker producer. Also, the --from-beginning parameter indicates that messages should be consumed from the beginning instead of the last messages in the log (now test it, generate many more messages, and don't specify this parameter).

There are more useful parameters for this command, some important ones are as follows:

  • --fetch-size: This is the amount of data to be fetched in a single request. The size in bytes follows as argument. The default value is 1,024 x 1,024.
  • --socket-buffer-size: This is the size of the TCP RECV. The size in bytes follows this parameter. The default value is 2 x 1024 x 1024.
  • --formater: This is the name of the class to use for formatting messages for display. The default value is NewlineMessageFormatter.
  • --autocommit.interval.ms: This is the time interval at which to save the current offset in milliseconds. The time in milliseconds follows as argument. The default value is 10,000.
  • --max-messages: This is the maximum number of messages to consume before exiting. If not set, the consumption is continuous. The number of messages follows as the argument.
  • --skip-message-on-error: If there is an error while processing a message, the system should skip it instead of halting.

The most requested forms of this command are as follows:

  • To consume just one message, use the following:
      > <confluent-path>/bin/kafka-console-consumer --topic 
amazingTopic --
bootstrap-server localhost:9093 --max-messages 1

  • To consume one message from an offset, use the following:
      > <confluent-path>/bin/kafka-console-consumer --topic 
amazingTopic --
bootstrap-server localhost:9093 --max-messages 1 --formatter
'kafka.coordinator.GroupMetadataManager$OffsetsMessageFormatter'
  • To consume messages from a specific consumer group, use the following:
      <confluent-path>/bin/kafka-console-consumer –topic amazingTopic -
- bootstrap-server localhost:9093 --new-consumer --consumer-
property
group.id=my-group

Using kafkacat

kafkacat is a generic command-line non-JVM utility used to test and debug Apache Kafka deployments. kafkacat can be used to produce, consume, and list topic and partition information for Kafka. kafkacat is netcat for Kafka, and it is a tool for inspecting and creating data in Kafka.

kafkacat is similar to the Kafka console producer and Kafka console consumer, but more powerful.

kafkacat is an open source utility and it is not included in Confluent Platform. It is available at https://github.com/edenhill/kafkacat.

To install kafkacat on modern Linux, type the following:

apt-get install kafkacat

To install kafkacat on macOS with brew, type the following:

brew install kafkacat

To subscribe to amazingTopic and redundantTopic and print to stdout, type the following:

kafkacat -b localhost:9093 –t amazingTopic redundantTopic

Summary

In this chapter, we've learned what Kafka is, how to install and run Kafka in Linux and macOS and how to install and run Confluent Platform.

Also, we've reviewed how to run Kafka brokers and topics, how to run a command-line message producer and consumer, and how to use kafkacat.

In Chapter 2, Message Validation, we will analyze how to build a producer and a consumer from Java.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Solve practical large data and processing challenges with Kafka
  • Tackle data processing challenges like late events, windowing, and watermarking
  • Understand real-time streaming applications processing using Schema registry, Kafka connect, Kafka streams, and KSQL

Description

Apache Kafka is a great open source platform for handling your real-time data pipeline to ensure high-speed filtering and pattern matching on the ?y. In this book, you will learn how to use Apache Kafka for efficient processing of distributed applications and will get familiar with solving everyday problems in fast data and processing pipelines. This book focuses on programming rather than the configuration management of Kafka clusters or DevOps. It starts off with the installation and setting up the development environment, before quickly moving on to performing fundamental messaging operations such as validation and enrichment. Here you will learn about message composition with pure Kafka API and Kafka Streams. You will look into the transformation of messages in different formats, such asext, binary, XML, JSON, and AVRO. Next, you will learn how to expose the schemas contained in Kafka with the Schema Registry. You will then learn how to work with all relevant connectors with Kafka Connect. While working with Kafka Streams, you will perform various interesting operations on streams, such as windowing, joins, and aggregations. Finally, through KSQL, you will learn how to retrieve, insert, modify, and delete data streams, and how to manipulate watermarks and windows.

Who is this book for?

This book is for developers who want to quickly master the practical concepts behind Apache Kafka. The audience need not have come across Apache Kafka previously; however, a familiarity of Java or any JVM language will be helpful in understanding the code in this book.

What you will learn

  • How to validate data with Kafka
  • Add information to existing data ?ows
  • Generate new information through message composition
  • Perform data validation and versioning with the Schema Registry
  • How to perform message Serialization and Deserialization
  • How to perform message Serialization and Deserialization
  • Process data streams with Kafka Streams
  • Understand the duality between tables and streams with KSQL
Estimated delivery fee Deliver to United States

Economy delivery 10 - 13 business days

Free $6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Dec 27, 2018
Length: 186 pages
Edition : 1st
Language : English
ISBN-13 : 9781788997829
Vendor :
Apache
Category :
Languages :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to United States

Economy delivery 10 - 13 business days

Free $6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Publication date : Dec 27, 2018
Length: 186 pages
Edition : 1st
Language : English
ISBN-13 : 9781788997829
Vendor :
Apache
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 120.97
Apache Kafka 1.0 Cookbook
$38.99
Building Data Streaming Applications with Apache Kafka
$48.99
Apache Kafka Quick Start Guide
$32.99
Total $ 120.97 Stars icon
Banner background image

Table of Contents

9 Chapters
Configuring Kafka Chevron down icon Chevron up icon
Message Validation Chevron down icon Chevron up icon
Message Enrichment Chevron down icon Chevron up icon
Serialization Chevron down icon Chevron up icon
Schema Registry Chevron down icon Chevron up icon
Kafka Streams Chevron down icon Chevron up icon
KSQL Chevron down icon Chevron up icon
Kafka Connect Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.5
(2 Ratings)
5 star 50%
4 star 0%
3 star 0%
2 star 50%
1 star 0%
Edgar Apr 02, 2019
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This books helps you to quickly learn the basics of Apache Kafka.
Amazon Verified review Amazon
Hessam Mar 02, 2019
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
The examples in this book cannot be used in real world scenarios. Also its chapters about Kafka features are so shallow. Overall, you are not going to get much from reading it.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela