Installing ZooKeeper for SolrCloud
You might know that in order to run SolrCloud, the distributed Solr deployment, you need to have Apache ZooKeeper installed. Zookeeper is a centralized service for maintaining configurations, naming, and provisioning service synchronizations. SolrCloud uses ZooKeeper to synchronize configurations and cluster states to help with leader election and so on. This is why it is crucial to have a highly available and fault-tolerant ZooKeeper installation. If you have a single ZooKeeper instance, and it fails, then your SolrCloud cluster will crash too. So, this recipe will show you how to install ZooKeeper so that it's not a single point of failure in your cluster configuration.
Getting ready
The installation instructions in this recipe contain information about installing ZooKeeper Version 3.4.6, but it should be useable for any minor release changes of Apache ZooKeeper. To download ZooKeeper, visit http://zookeeper.apache.org/releases.html. This recipe will show you how to install ZooKeeper in a Linux-based environment. For ZooKeeper to work, Java needs to be installed.
How to do it...
Let's assume that we have decided to install ZooKeeper in the /usr/share/zookeeper
directory of our server, and we want to have three servers (with IPs 192.168.1.1, 192.168.1.2
, and 192.168.1.3
) hosting a distributed ZooKeeper installation. This can be done by performing the following steps:
- After downloading the ZooKeeper installation, we create the necessary directory:
sudo mkdir /usr/share/zookeeper
- Then, we unpack the downloaded archive to the newly created directory. We do this on three servers.
- Next, we need to change our ZooKeeper configuration file and specify the servers that will form a ZooKeeper quorum. So, we edit the
/usr/share/zookeeper/conf/zoo.cfg
file and add the following entries:clientPort=2181 dataDir=/usr/share/zookeeper/data tickTime=2000 initLimit=10 syncLimit=5 server.1=192.168.1.1:2888:3888 server.2=192.168.1.2:2888:3888 server.3=192.168.1.3:2888:3888
- Now, the next thing we need to do is create a file called
myid
in the/usr/share/zookeeper/data
directory. The file should contain a single number that corresponds to the server number. For example, if ZooKeeper is located on192.168.1.1
, it will be1
, and if ZooKeeper is located on192.168.1.3
, it will be3
, and so on. - Now, we can start the ZooKeeper servers with the following command:
/usr/share/zookeeper/bin/zkServer.sh start
- If everything goes well, you should see something like:
JMX enabled by default Using config: /usr/share/zookeeper/bin/../conf/zoo.cfg Starting zookeeper ... STARTED
That's all. Of course, you can also add the ZooKeeper service to start automatically as your operating system starts up, but this is beyond the scope of the recipe and book.
How it works...
I talked about the ZooKeeper quorum and started this using three ZooKeeper nodes. ZooKeeper operates in a quorum, which means that at least 50 percent plus one server needs to be available and connected. We can start with a single ZooKeeper server, but such deployment won't be highly available and resistant to failures. So, to be able to handle at least a single ZooKeeper node failure, we need at least three ZooKeeper nodes running.
Let's skip the first part because creating the directory and unpacking the ZooKeeper server is quite simple. What I would like to concentrate on are the configuration values of the ZooKeeper server. The clientPort
property specifies the port on which our SolrCloud servers should connect to ZooKeeper. The dataDir
property specifies the directory where ZooKeeper will hold its data. Note that ZooKeeper needs read and write permissions to the directory. So far so good, right? So, now, the more advanced properties, such as tickTime
, specified in milliseconds is the basic time unit for ZooKeeper. The initLimit
property specifies how many ticks the initial synchronization phase can take. Finally, syncLimit
specifies how many ticks can pass between sending the request and receiving an acknowledgement.
There are also three additional properties present, server.1
, server.2
, and server.3
. These three properties define the addresses of the ZooKeeper instances that will form the quorum. The values for each of these properties are separated by a colon character. The first part is the IP address of the ZooKeeper server, and the second and third parts are the ports used by ZooKeeper instances to communicate with each other.
The last thing is the myid
file located in the /usr/share/zookeeper/data
directory. The contents of the file is used by ZooKeeper to identify itself. This is why we need to properly configure it so that ZooKeeper is not confused. So, for the ZooKeeper server specified as server.1
, we need to write 1
to the myid
file.
Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.