Getting hands-on with Apache ZooKeeper
In this section, we will show you how to download and install Apache ZooKeeper so that we can start using ZooKeeper straightaway. This section is aimed at developers wanting to get their hands dirty using ZooKeeper for their distributed applications' needs by giving detailed installation and usage instructions. We will start with a single node ZooKeeper installation by getting acquainted with the basic configuration, followed by learning the ZooKeeper shell. Finally, you will be taught how to to set up a multinode ZooKeeper cluster.
Download and installation
ZooKeeper is supported by a wide variety of platforms. GNU/Linux and Oracle Solaris are supported as development and production platforms for both server and client. Windows and Mac OS X are recommended only as development platforms for both server and client.
Note
In this book, we will assume a GNU-based/Linux-based installation of Apache ZooKeeper for installation and other instructions.
ZooKeeper is implemented in Java and requires Java 6 or later versions to run. While Oracle's version of Java is recommended, OpenJDK should also work fine for the correct functioning of ZooKeeper and many of the code samples in this book.
Oracle's version of Java can be downloaded from http://www.oracle.com/technetwork/java/javase/downloads/index.html.
ZooKeeper runs as a server ensemble known as a ZooKeeper ensemble. In a production cluster, three ZooKeeper servers is the minimum recommended size for an ensemble, and it is recommended that you run them on separate machines. However, you can learn and evaluate ZooKeeper by installing it on a single machine in standalone mode.
Note
A recent stable ZooKeeper distribution can be downloaded from one of the Apache Download Mirrors (http://bit.ly/1xEl8hA). At the time of writing this book, release 3.4.6 was the latest stable version available.
Downloading
Let's download the stable version from one of the mirrors, say Georgia Tech's Apache download mirror (http://b.gatech.edu/1xElxRb) in the following example:
$ wget http://www.gtlib.gatech.edu/pub/apache/zookeeper/stable/zookeeper-3.4.6.tar.gz$ ls -alh zookeeper-3.4.6.tar.gz -rw-rw-r-- 1 saurav saurav 17M Feb 20 2014 zookeeper-3.4.6.tar.gz
Installing
Once we have downloaded the ZooKeeper tarball, installing and setting up a standalone ZooKeeper node is pretty simple and straightforward. Let's extract the compressed tar archive into /usr/share
:
$ tar -C /usr/share -zxf zookeeper-3.4.6.tar.gz $ cd /usr/share/zookeeper-3.4.6/ $ ls bin CHANGES.txt contrib docs ivy.xml LICENSE.txt README_packaging.txt recipes zookeeper-3.4.6.jar zookeeper-3.4.6.jar.md5 build.xml conf dist-maven ivysettings.xml lib NOTICE.txt README.txt src zookeeper-3.4.6.jar.asc zookeeper-3.4.6.jar.sha1
The location where the ZooKeeper archive is extracted in our case, /usr/share/zookeeper-3.4.6
, can be exported as ZK_HOME
as follows:
$ export ZK_HOME=/usr/share/zookeeper-3.4.6
Configuration
Once we have extracted the tarball, the next thing is to configure ZooKeeper. The conf
folder holds the configuration files for ZooKeeper. ZooKeeper needs a configuration file called zoo.cfg
in the conf
folder inside the extracted ZooKeeper folder. There is a sample configuration file that contains some of the configuration parameters for reference.
Let's create our configuration file with the following minimal parameters and save it in the conf
directory:
$ cat conf/zoo.cfg tickTime=2000 dataDir=/var/lib/zookeeper clientPort=2181
The configuration parameters' meanings are explained here:
tickTime
: This is measured in milliseconds; it is used for session registration and to do regular heartbeats by clients with the ZooKeeper service. The minimum session timeout will be twice thetickTime
parameter.dataDir
: This is the location to store the in-memory state of ZooKeeper; it includes database snapshots and the transaction log of updates to the database. Extracting the ZooKeeper archive won't create this directory, so if this directory doesn't exist in the system, you will need to create it and set writable permission to it.clientPort
: This is the port that listens for client connections, so it is where the ZooKeeper clients will initiate a connection. The client port can be set to any number, and different servers can be configured to listen on different ports. The default is 2181.
We will learn about the various storage, network, and cluster configuration parameters of ZooKeeper in more detail in Chapter 5, Administering Apache ZooKeeper.
As mentioned previously, ZooKeeper needs a Java Runtime Environment for it to work.
Note
It is assumed that readers already have a working version of Java running in their system where ZooKeeper is being installed and configured.
To see if Java is installed on your system, run the following command:
$ java –version
If Java is installed and its path is configured properly, then depending on the version and release of Java (Oracle or OpenJDK), the preceding command will show the version of Java and Java Runtime installed on your system. For example, in my system, I have Java 1.7.0.67 installed. So, using the preceding command, this will return the following output in my system:
$ java -version java version "1.7.0_67" Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
ZooKeeper needs the JAVA_HOME
environment variable to be set correctly. To see if this is set in your system, run the following command:
$ echo $JAVA_HOME
On my system, JAVA_HOME
is set to /usr/java/latest
, and hence, I got the following output:
$ echo $JAVA_HOME /usr/java/latest
Starting the ZooKeeper server
Now, considering that Java is installed and working properly, let's go ahead and start the ZooKeeper server. All ZooKeeper administration scripts to start/stop the server and invoke the ZooKeeper command shell are shipped along with the archive in the bin
folder with the following code:
$ pwd /usr/share/zookeeper-3.4.6/bin $ ls README.txt zkCleanup.sh zkCli.cmd zkCli.sh zkEnv.cmd zkEnv.sh zkServer.cmd zkServer.sh
The scripts with the .sh
extension are for Unix platforms (GNU/Linux, Mac OS X, and so on), and the scripts with the .cmd
extension are for Microsoft Windows operating systems.
To start the ZooKeeper server in a GNU/Linux system, you need to execute the zkServer.sh
script as follows. This script gives options to start, stop, restart, and see the status of the ZooKeeper server:
$ ./zkServer.sh JMX enabled by default Using config: /usr/share/zookeeper-3.4.6/bin/../conf/zoo.cfg Usage: ./zkServer.sh {start|start-foreground|stop|restart|status|upgrade|print-cmd}
To avoid going to the ZooKeeper install directory to run these scripts, you can include it in your PATH
variable as follows:
export PATH=$PATH:/usr/share/zookeeper-3.4.6/bin
Executing zkServer.sh
with the start
argument will start the ZooKeeper server. A successful start of the server will show the following output:
$ zkServer.sh start JMX enabled by default Using config: /usr/share/zookeeper-3.4.6/bin/../conf/zoo.cfg Starting zookeeper ... STARTED
To verify that the ZooKeeper server has started, you can use the following ps
command:
$ ps –ef | grep zookeeper | grep –v grep | awk '{print $2}' 5511
If the jps
command is installed on your system, you can verify the ZooKeeper server's status as follows:
$ which jps jps is /usr/bin/jps $ jps 5511 QuorumPeerMain 5565 Jps
The ZooKeeper process is listed as QuorumPeerMain
. In this case, as reported by jps
, the ZooKeeper server is running with the 5511
process ID that matches the one reported by the ps
command.
The ZooKeeper server's status can be checked with the zkServer.sh
script as follows:
$ zkServer.sh status JMX enabled by default Using config: /usr/share/zookeeper-3.4.6/bin/../conf/zoo.cfg Mode: standalone
To stop the server process, you can use the same script with the stop
argument:
$ zkServer.sh stop JMX enabled by default Using config: /usr/share/zookeeper-3.4.6/bin/../conf/zoo.cfg Stopping zookeeper ... STOPPED
Checking the status of ZooKeeper when it has stopped or is not running will show the following result:
$ zkServer.sh status JMX enabled by default Using config: /usr/share/zookeeper-3.4.6/bin/../conf/zoo.cfg Error contacting service. It is probably not running.
Once our ZooKeeper instance is running, the next thing to do is to connect to it. ZooKeeper ships with a default Java-based command-line shell to connect to a ZooKeeper instance. There is a C client as well, which we will discuss in a later section.
Connecting to ZooKeeper with a Java-based shell
To start the Java-based ZooKeeper command-line shell, we simply need to run zkCli.sh
of the ZK_HOME/bin
folder with the server IP and port as follows:
${ZK_HOME}/bin/zkCli.sh –server zk_server:port
In our case, we are running our ZooKeeper server on the same machine, so the ZooKeeper server will be localhost
, or the loopback address will be 127.0.0.1
. The default port we configured was 2181
:
$ zkCli.sh -server localhost:2181
As we connect to the running ZooKeeper instance, we will see the output similar to the following one in the terminal (some output is omitted):
Connecting to localhost:2181 ............... ............... Welcome to ZooKeeper! JLine support is enabled ............... WATCHER:: WatchedEvent state:SyncConnected type:None path:null [zk: localhost:2181(CONNECTED) 0]
To see a listing of the commands supported by the ZooKeeper Java shell, you can run the help
command in the shell prompt:
[zk: localhost:2181(CONNECTED) 0] help ZooKeeper -server host:port cmd args connect host:port get path [watch] ls path [watch] set path data [version] rmr path delquota [-n|-b] path quit printwatches on|off create [-s] [-e] path data acl stat path [watch] close ls2 path [watch] history listquota path setAcl path acl getAcl path sync path redo cmdno addauth scheme auth delete path [version] setquota -n|-b val path
We can execute a few simple commands to get a feel of the command-line interface. Let's start by running the ls
command, which, as in Unix, is used for listing:
[zk: localhost:2181(CONNECTED) 1] ls / [zookeeper]
Now, the ls
command returned a string called zookeeper
, which is a znode in the ZooKeeper terminology. Note that we will get introduced to the ZooKeeper data model in the next chapter, Chapter 2, Understanding the Inner Workings of Apache ZooKeeper. We can create a znode through the ZooKeeper shell as follows:
To begin with, let's create a HelloWorld
znode with empty data:
[zk: localhost:2181(CONNECTED) 2] create /HelloWorld "" Created /HelloWorld [zk: localhost:2181(CONNECTED) 3] ls / [zookeeper, HelloWorld]
We can delete the znode created by issuing the delete
command as follows:
[zk: localhost:2181(CONNECTED) 4] delete /HelloWorld [zk: localhost:2181(CONNECTED) 5] ls / [zookeeper]
The operations shown here will be clearer as we learn more about the ZooKeeper architecture, its data model, and namespace and internals in the subsequent chapters. We will look at setting up the C language-based command-line shell of the ZooKeeper distribution.
Connecting to ZooKeeper with a C-based shell
ZooKeeper is shipped with a C language-based command-line shell. However, to use this shell, we need to build the C sources in ${ZK_HOME}/src/c
. A GNU/GCC compiler is required to build the sources. To build them, just run the following three commands in the preceding directory:
$ ./configure $ make $ make install
By default, this installs the C client libraries under /usr/local/lib
. The C client libraries are built for both single-threaded as well as multithreaded libraries. The single-threaded library is suffixed with _st
, while the multithreaded library is suffixed with _mt
.
The C-based ZooKeeper shell uses these libraries for its execution. As such, after the preceding build procedure, two executables called cli_st
and cli_mt
are also generated in the current folder. These two binaries are the single-threaded and multithreaded command-line shells, respectively. When cli_mt
is run, we get the following output:
$ cli_mt USAGE cli_mt zookeeper_host_list [clientid_file|cmd:(ls|ls2|create|od|...)] Version: ZooKeeper cli (c client) version 3.4.6
To connect to our ZooKeeper server instance with this C-based shell, execute the following command in your terminal:
$ cli_mt localhost:2181 Watcher SESSION_EVENT state = CONNECTED_STATE Got a new session id: 0x148b540cc4d0004
The C-based ZooKeeper shell also supports multiple commands, such as the Java version. Let's see the available commands under this shell by executing the help
command:
help create [+[e|s]] <path> delete <path> set <path> <data> get <path> ls <path> ls2 <path> sync <path> exists <path> wexists <path> myid verbose addauth <id> <scheme> quit prefix the command with the character 'a' to run the command asynchronously.run the 'verbose' command to toggle verbose logging. i.e. 'aget /foo' to get /foo asynchronously
We can issue the same set of commands to list the znodes, create a znode, and finally delete it:
ls / time = 3 msec /: rc = 0 zookeeper time = 5 msec create /HelloWorld Creating [/HelloWorld] node Watcher CHILD_EVENT state = CONNECTED_STATE for path / [/HelloWorld]: rc = 0 name = /HelloWorld ls / time = 3 msec /: rc = 0 zookeeper HelloWorld time = 3 msec delete /HelloWorld Watcher CHILD_EVENT state = CONNECTED_STATE for path / ls / time = 3 msec /: rc = 0 zookeeper time = 3 msec
The format of the C-based ZooKeeper shell output displays the amount of time spent during the command execution as well as the return code (rc
). A return code equal to zero denotes successful execution of the command.
The C static and shared libraries that we built earlier and installed in /usr/local/lib
are required for ZooKeeper programming for distributed applications written in the C programming language. The Perl and Python client bindings shipped with the ZooKeeper distribution are also based on this C-based interface.
Setting up a multinode ZooKeeper cluster
So far, we have set up a ZooKeeper server instance in standalone mode. A standalone instance is a potential single point of failure. If the ZooKeeper server fails, the whole application that was using the instance for its distributed coordination will fail and stop functioning. Hence, running ZooKeeper in standalone mode is not recommended for production, although for development and evaluation purposes, it serves the need.
In a production environment, ZooKeeper should be run on multiple servers in a replicated mode, also called a ZooKeeper ensemble. The minimum recommended number of servers is three, and five is the most common in a production environment. The replicated group of servers in the same application domain is called a quorum. In this mode, the ZooKeeper server instance runs on multiple different machines, and all servers in the quorum have copies of the same configuration file. In a quorum, ZooKeeper instances run in a leader/follower format. One of the instances is elected the leader, and others become followers. If the leader fails, a new leader election happens, and another running instance is made the leader. However, these intricacies are fully hidden from applications using ZooKeeper and from developers.
The ZooKeeper configuration file for a multinode mode is similar to the one we used for a single instance mode, except for a few entries. An example configuration file is shown here:
tickTime=2000 dataDir=/var/lib/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
The two configuration parameters are also explained here:
initLimit
: This parameter is the timeout, specified in number of ticks, for a follower to initially connect to a leadersyncLimit
: This is the timeout, specified in number of ticks, for a follower to sync with a leader
Both of these timeouts are specified in the unit of time called tickTime
. Thus, in our example, the timeout for initLimit
is 5
ticks at 2000
milliseconds a tick, or 10
seconds.
The other three entries in the preceding example in the server.id=host:port:port
format are the list of servers that constitute the quorum. The .id
identifier is a number that is used for the server with a hostname in the quorum. In our example configuration, the zoo1
quorum member host is assigned an identifier 1
.
The identifier is needed to be specified in a file called myid
in the data directory of that server. It's important that the myid
file should consist of a single line that contains only the text (ASCII) of that server's ID. The id must be unique within the ensemble and should have a value between 1 and 255.
Again, we have the two port numbers after each server hostname: 2888
and 3888
. They are explained here:
- The first port,
2888
, is mostly used for peer-to-peer communication in the quorum, such as to connect followers to leaders. A follower opens a TCP connection to the leader using this port. - The second port,
3888
, is used for leader election, in case a new leader arises in the quorum. As all communication happens over TCP, a second port is required to respond to leader election inside the quorum.
Starting the server instances
After setting up the configuration file for each of the servers in the quorum, we need to start the ZooKeeper server instances. The procedure is the same as for standalone mode. We have to connect to each of the machines and execute the following command:
${ZK_HOME}/bin/zkServer.sh start
Once the instances are started successfully, we will execute the following command on each of the machines to check the instance states:
${ZK_HOME}/bin/zkServer.sh status
For example, take a look at the next quorum:
[zoo1] # ${ZK_HOME}/bin/zkServer.sh status JMX enabled by default Using config: /usr/share/zookeeper-3.4.6/bin/../conf/zoo.cfg Mode: follower [zoo2] # ${ZK_HOME}/bin/zkServer.sh status JMX enabled by default Using config: /usr/share/zookeeper-3.4.6/bin/../conf/zoo.cfg Mode: leader [zoo3] # ${ZK_HOME}/bin/zkServer.sh status JMX enabled by default Using config: /usr/share/zookeeper-3.4.6/bin/../conf/zoo.cfg Mode: follower
As seen in the preceding example, zoo2
is made the leader of the quorum, while zoo1
and zoo3
are the followers. Connecting to the ZooKeeper quorum through the command-line shell is also the same as in standalone mode, except that we should now specify a connection string in the host1:port2, host2:port2 … format to the server argument of ${ZK_HOME}/bin/zkCli.sh
:
$ zkCli.sh -server zoo1:2181,zoo2:2181,zoo3:2181 Connecting to zoo1:2181, zoo2:2181, zoo3:2181 … … … … Welcome to ZooKeeper! … … … … [zk: zoo1:2181,zoo2:2181,zoo3:2181 (CONNECTED) 0]
Once the ZooKeeper cluster is up and running, there are ways to monitor it using Java Management Extensions (JMX) and by sending some commands over the client port, also known as the Four Letter Words. We will discuss ZooKeeper monitoring in more detail in Chapter 5, Administering Apache ZooKeeper.
Running multiple node modes for ZooKeeper
It is also possible to run ZooKeeper in multiple node modes on a single machine. This is useful for testing purposes. To run multinode modes on the same machine, we need to tweak the configuration a bit; for example, we can set the server name as localhost
and specify the unique quorum and leader election ports.
Let's use the following configuration file to set up a multinode ZooKeeper cluster using a single machine:
tickTime=2000 initLimit=5 syncLimit=2 dataDir=/var/lib/zookeeper clientPort=2181 server.1=localhost:2666:3666 server.2=localhost:2667:3667 server.3=localhost:2668:3668
As already explained in the previous section, each entry of the server X specifies the address and port numbers used by the X ZooKeeper server. The first field is the hostname or IP address of server X. The second and third fields are the TCP port numbers used for quorum communication and leader election, respectively. As we are starting three ZooKeeper server instances on the same machine, we need to use different port numbers for each of the server entries.
Second, as we are running more than one ZooKeeper server process on the same machine, we need to have different client ports for each of the instances.
Last but not least, we have to customize the dataDir
parameter as well for each of the instances we are running.
Putting all these together, for a three-instance ZooKeeper cluster, we will create three different configuration files. We will call these zoo1.cfg
, zoo2.cfg
, and zoo3.cfg
and store them in the conf
folder of ${ZK_HOME}
. We will create three different data folders for the instances, say zoo1
, zoo2
, and zoo3
, in /var/lib/zookeeper
. Thus, the three configuration files are shown next.
Here, you will see the configuration file for the first instance:
tickTime=2000 initLimit=5 syncLimit=2 dataDir=/var/lib/zookeeper/zoo1 clientPort=2181 server.1=localhost:2666:3666 server.2=localhost:2667:3667 server.3=localhost:2668:3668
The second instance is shown here:
tickTime=2000 initLimit=5 syncLimit=2 dataDir=/var/lib/zookeeper/zoo2 clientPort=2182 server.1=localhost:2666:3666 server.2=localhost:2667:3667 server.3=localhost:2668:3668
The third and final instance is then shown here:
tickTime=2000 initLimit=5 syncLimit=2 dataDir=/var/lib/zookeeper/zoo3 clientPort=2183 server.1=localhost:2666:3666 server.2=localhost:2667:3667 server.3=localhost:2668:3668
We also need to fix the server ID parameter correctly in the myid
file for each instance. This can be done using the following three commands:
$ echo 1 > /var/lib/zookeeper/zoo1/myid $ echo 2 > /var/lib/zookeeper/zoo2/myid $ echo 3 > /var/lib/zookeeper/zoo3/myid
Now, we are all set to start the ZooKeeper instances. Let's start the instances as follows:
$ ${ZK_HOME}/bin/zkServer.sh start ${ZK_HOME}/conf/zoo1.cfg $ ${ZK_HOME}/bin/zkServer.sh start ${ZK_HOME}/conf/zoo2.cfg $ ${ZK_HOME}/bin/zkServer.sh start ${ZK_HOME}/conf/zoo3.cfg
Once all the instances start, we can use the zkCli.sh
script to connect to the multinode ZooKeeper cluster, like we did earlier:
$ ${ZK_HOME}/bin/zkCli.sh –server \ localhost:2181, localhost:2182, localhost:2183
Voila! We have a three-node ZooKeeper cluster running on the same machine!