Managing clusters
In HBase ecosystem, it's must to monitor the cluster to control and improve their performance and states as it grows. As HBase sits on top of Hadoop ecosystem and serves real-time user traffic, it's essential to see the performance of the cluster at any given point of time, this allows us to detect the problem well in advance and take corrective actions before it happens.
Getting ready
It is important to know some of the details of Ganglia and its distributed components before we get into the details of managing clusters
gmond
This is an acronym for a low footprint service known as Ganglia Monitoring Daemon. This service needs to be installed at each node from where we want to pull the matrix. This daemon is the actual workhorse and collects the data of each host by listen/announce protocol. It also helps collect some of the core metrics such as disk, active process, network, memory, and CPU/VCPUs.
gmetad
This is an acronym for Ganglia meta daemon. It is a service that collects data from other gmetad and gmond and mushes it together into a single meta-cluster image. The format used to store the data is RRD and XML. This enables the client application browsing.
gweb
It's a web interface or a view to the data that is collected by the earlier two services. It's a PHP-based web interface. It requires the following:
- Apache web server
- PHP 5.2 or later
- The PHP json extension
How to do it…
We will divide our how to do it into two sections. In the first section, we will talk about installing Ganglia on all the nodes.
Once it's done, we will do the integration with HBase so that the relevant metrics are available.
Ganglia setup
To install Ganglia it is best to use prebuild binary package that is available from the vendor distributions. This will help in dealing with the pre-requisites libraries. Alternatively, it can be downloaded from the Ganglia website, http://sourceforge.net/projects/ganglia/files/latest/download?source=files.
If you are using browser from command prompt, you can do it by using following command:
wget –o http://downloads.sourceforge.net/project/ganglia/\ganglia%20monitoring%20core/3.0.7%20%28Fossett%29/ganglia-3.0.7.tar.gz
When doing wget, use it as a single line on your shell. Use sudo in case you don't have privilege for the current directory or download it in /tmp and later on copy to the respective location.
tar –xzvf ganglia-3.0.7.tar.gz –c /opt/HBase B
rm –rf ganglia-3.0.7.tar.gz
// it will delete the tar file which is not needed now.- Now let's Install the dependencies
sudo apt-get –y install build-essential libapr1-dev libconfuse-dev libexpat1-dev python-dev
The
-y
options means that apt-get won't wait for users confirmation. It will assumeyes
when question for confirmation would appear. - Building and installing the downloaded and exploded binary:
cd /opt/HBase B/ganglia-3.0.7 ./configure --- is a configuration command on linux env make sudo make install
- Once the preceding step is completed, you can generate a default configuration file by:
gmond --default_con fig > /etc/gmond.conf --use "sudo su - " in case there is a privilege issue sudo su – will make you a root user and will allow the system library to be accessed by the gmond.conf
vi /etc/gmond.conf
and change the following:globals { user=HBase gangila in place of above. }
Note
In case you are using a specific user to perform ganglia task then change the above and add this user as shown above.
- The recommendation will be to create this user by the following commands:
sudo adduser --disabled-login --no-create-home ganglia cluster { name =HBase B --- name of your cluster will be used owner ="HBase B Company" url =http://yourHBase bMaster.ganglia-monitor.com/ --- url of the main monitor or the CNAME }
- The UDP setup, which is the default setup, if good for fewer than 120 nodes. For more than 120 nodes, we have to switch to unicast.
The setup is as follows:
Change in /etc/gmond.conf Udp_send_channel { #mcast_join=--your IP address to join in host = yourHBase bMaster.ganglia-monitor.com post=8649 # ttl=1 } udp_recv_channel { #mcast_join=--your IP address to join in port =8649 # bind =--your IP address to join in }
- Start the monitoring daemon with:
sudo gmond
We can test it
by nc <hostname> 8649
or telnet hostname 8649Note
You have to kill the daemon thread to stop it using
ps –ef
|grep gmond
. This will provide the process ID with the following process:Execute
sudo kill -9 <PID>
- Now we have to install Ganglia meta daemon. It is good to have one if the cluster is less than 100 nodes. This is the workhorse and it will require powerful machine with decent compute power, as these are responsible for creating graphs.
- Let's move ahead:
cd /u/HBase B/ganglia-3.0.7 ./configure –-with-gmetad make sudo make install sudo cp /u/HBase B/gangli-3.0.7/gmetad/gmetad.conf /etc/gmetad.conf
- Open using
sudo vi /etc/gmrtad.conf
change the code:setuid_username "ganglia" data_source "HBase B" yourHBase bMaster.ganglia-monitor.com gridename "<our grid name say HBase B Grid>"
- Now we need to create directories, which will store data in a round-robin database (rrds):
mkdir –p /var/lib/ganglia/rrds
Now let's change the ownership to ganglia users, so that it can read and write as needed.
chown –R ganglia:ganglia /var/lib/ganglia/
- Let's start the daemon:
gmetad
Note
You have to kill the daemon thread to stop it using
ps –ef | grep gmetad
. This will provide the process ID with the process.Execute
sudo kill -9 <PID>
- Now, let's focus on Ganglia web.
sudo apt-get -y install rrdtool apache2 php5-mysql libapache2-mod-php5 php5-gd
Tip
Note that this will install
rrdtool
(round robin database tool), Apache/httpd, php5 connector (apache to mysql), Php5-mysql drivers, and so on. - Copy the PHP-based file to the following locations:
cp –r /u/HBase B/ganglia-3.0.7/web /var/www/ganglia sudo /etc/init.d/apache2 restart ( others which can be used are, status, stop )
- Point
http:// HBase bMaster.ganglia-monitor.com/ganglia
, you should start seeing the basic graphs as the HBase setup is still not done. - Integrate HBase and Ganglia:
vi /u/HBase B/HBase -0.98.5-hadoop2/conf /hadoop-metrics2-HBase .properties
- Change the below parameter for getting different status on the ganglia:
HBase .extendedperiod = 3600 HBase .class= org.apache.hadoop.metrics2.sink.FileSink HBase .period=5 HBase .servers=master2:8649 # jvm context will provide memory used , thread count in JVM etc. jvm.class= org.apache.hadoop.metrics2.sink.FileSink jvm.period=5 # enable rpc context to see the metrics on each HBase rpc method invocation. jvm.servers=master2:8649 rpc.class= org.apache.hadoop.metrics2.sink.FileSink rpc.period=5 rpc.servers=master2:8649
- Copy the
/u/HBase B/HBase B/HBase -0.98.5-hadoop2/conf/ hadoop-metrics2-HBase
.properties to all thenodes
and restartHMaster
and all the region servers:
How it works…
As the system grows from a few nodes to the tens or hundreds or becomes a very large cluster having more than hundreds of nodes it's pivotal to have a holistic view, drill down view, historical view of the logs at any given point of time in a graphical representation. In a large or very large installation, administrators are more concerned about redundancy, which avoids single point of failure. HBase and underlying HDFS are designed to handle the node failures gracefully, but it's equally important to monitor these failure as this can lead to pull down the cluster if a corrective action is not taken in time. HBase exposes various matrix to JMX and Ganglia like HMaster, region servers statistics, JMV (Java virtual machines), RPC (Remote procedure calls), Hadoop/HDFS, MapReduce details. Taking into consideration all these points and various other salient and powerful features, we considered Ganglia.
Ganglia provides the following advantages:
- It provides near-real-time monitoring for all the vital information of a very large cluster.
- It runs on commodity hardware and can be suited for most of the popular OS.
- Its open sourced and relatively easy to install.
- It integrates easily with traditional monitoring systems
- It provides an overall view of all nodes in a grid and all nodes in the cluster.
- The monitored data is available in both textual and graphic format.
- Works on multicast listen/announce protocol.
- Works with open standards.
- JSON
- XML
- XDR
- RRDTool
- APR – Apache portable runtime
- Apache HTTPD server
- PHP-based web interface
HBase works with only 3.0.X and higher version of Ganglia, hence we used 3.0.7 version.
In step 4, we installed the dependencies of libraries, which will be required for the ganglia to compile.
In step 5, we compiled ganglia and installed it by running the configure command, then we used make and then make install command.
In step 6, we created a file gmond.conf, and later on in step 7, we changed the setting to point to HBase master node. We also configured the port to 8649
with a user ganglia who can read from the cluster. By commenting the multicast address and the TTL (time to live), we also changed the UDP-based multicasting to which is a default one to unicasting, which enables us to expand the cluster to above 120 nodes. We also added a master Gmond node in this config file.
In step 8 we started the gmond and got some core monitoring such as CPU, disk, network, memory, and load average of the nodes.
In step 9, we went back to the /u/HBase B/ganglia-3.0.7/
and reran the configuration, but this time, we added configure –with-gmetad
, so that it complies with gmetad.
In step 11, we copied the gmetad.conf
from.
sudo
/u/HBase B/gangli-3.0.7/gmetad/gmetad.conf to /etc/gmetad.conf
.
In step 12, we added ganglia user and Master details in the data_source HBase B HBase bMaster.ganglia-monitor.com
.
In step 13/14, we create the rrds
directory that will hold the data in round-robin databases; later on, we stated the gmetad daemon on the master nodes.
In step 15, we installed all the dependency, which is required to run the web interface.
In step 16, we copied the web .php file from the existing location.
(
/u/HBase B/ganglia-3.0.7/web
) to (/var/www/ganglia
)
In step 17, we restarted the apache instance and saw all the basic graphs, which provides the details of the nodes and the host but not HBase details. We also copied it to all the nodes so that we have a similar configuration and the Ganglia master is getting the data from the child nodes.
In step 18, we changed the setting in hadoop-metrics2-HBase .properties
so that it starts collecting the metrics and starts sending it to the ganglia servers on port 8649
. The main class that is responsible for providing these details is org.apache.hadoop.metrics2.sink.FileSink
and it properties.
Now we point at the URL of master, and once the page is rendered, it starts showing the graphs as described by the image HBase -Ganglia-MasterAndRegion01-01.png
. It starts showing the following graphs:
- Memory and CPU usage
- JVM details (GC cycle, memory consumed by JVM, threads used, heap consumed, and so on)
- HBase Master details
- HBase Region compaction queue details
- Region server flush queue utilizations
- Region servers IO
There is more…
Ganglia is used for monitoring very large cluster, and in the word of Hadoop/HBase , it can be very useful as it provides the following:
- JVM
- HDFS
- Map reduce
- Region compaction time
- Region store files
- Region block cache hit ratio
- Master spilt size
- Master split number of operations
- Region block free
- Name Node activities
- Secondary name node details
- Disk status
See also
You could refer to the following sites for more information: