Configuring Kafka brokers
This recipe shows how to deal with the Kafka brokers' basic configuration. For learning and development purposes, one can run Kafka in standalone mode. The real Kafka power is unlocked when it is running with replication in cluster mode and the topics are partitioned accordingly.
There are two main advantages of the cluster mode: parallelism and redundancy. Parallelism is the capacity to run tasks simultaneously among the cluster members. Redundancy warrants that when a Kafka node goes down, the cluster is safe and accessible from the other nodes.
Single node clusters are not recommended for production environments, so this recipe shows how to configure a cluster with several nodes.
Getting ready
Go to the Kafka installation directory (/usr/local/kafka/
for Mac users and /opt/kafka/
for Linux users):
> cd /usr/local/kafka
How to do it...
As already said, a broker is a server's instance. This recipe shows how to start two different servers on one machine. There is a server configuration template called server.properties
located in the Kafka installation directory in the config
sub-directory:
- For each Kafka broker (server) that we want to run, we make a copy of the configuration file template and rename it accordingly. In this example, the cluster is called
synergy
:
> cp config/server.properties synergy-1.properties > cp config/server.properties synergy-2.properties
- Modify each file according to the plan. If the file is called
synergy-1
, thebroker.id
should be 1. Specify theport
in which the server should run; the recommendation is9093
forsynergy-1
and9094
forsynergy-2
. Theport
property is not set in the template, so add the line accordingly. Finally, specify the location of the Kafka logs (specific archives to store all the Kafka broker operations); in this case, we use the directory/tmp
.
In synergy-1.properties
, set:
broker.id=1 port=9093 log.dir=/tmp/synergy-1-logs
In synergy-2.properties
, set:
broker.id=2 port=9094 log.dir=/tmp/synergy-2-logs
- Start the Kafka brokers using the
kafka-server-start.sh
command with the corresponding configuration file. Don't forget that Zookeeper must be already running with its corresponding Kafka node and the ports should not be in use by another process:
> bin/kafka-server-start.sh synergy-1.properties & ... > bin/kafka-server-start.sh synergy-2.properties &
Recall that trailing &
is to specify that you want your command line back. If you want to see the broker output, it is recommended that you run each command in its own command-line window.
How it works...
The properties file contains the server configuration. The server.properties
file located in the config
directory is just a template.
All of the members of the cluster should point to the same Zookeeper cluster. Every broker is identified inside the cluster by the name specified in the broker.id
property. If the port
property is not specified, Zookeeper will assign the same port number and will overwrite the data. If log.dir
is not specified, all the brokers will write to the same default log.dir
. If the brokers are going to run on different machines, then port
and log.dir
might not be specified.
There's more...
Before assigning a port to a server, there is a useful command to see what process is running on a specific port (in this case, the port 9093
):
> lsof -i :9093
The output of the previous command is something like this:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 17479 admin 97u IPv6 0xcfbcde96aa59c3bf 0t0 TCP *:9093 (LISTEN)
Try to run this command before starting the Kafka servers and run it after starting to see the change. Also, try to start a broker on a port in use to see how it fails.
To run Kafka nodes on different machines, change the ZooKeeper connection string in the configuration file; its default value is:
zookeeper.connect=localhost:2181
This value is correct only if you are running the Kafka broker on the same machine as Zookeeper. In production, it could not happen. To specify that ZooKeeper is running on different machines (that is, in a ZooKeeper cluster), set:
zookeeper.connect=localhost:2181, 192.168.0.2:2183, 192.168.0.3:2182
The previous line says that Zookeeper is running on the localhost machine on port 2181
, on the machine with IP Address 192.168.0.2
on port 2183
, and on the machine with IP Address 192.168.0.3
on port 2182
. The Zookeeper default port is 2181
, so try to run it there.
As an exercise, try to raise a broker with incorrect information about Zookeeper. Also, in combination with the lsof
command, try to raise Zookeeper on a port in use.
See also
- The
server.properties
template (as all the Kafka projects) is published online at: https://github.com/apache/kafka/blob/trunk/config/server.properties