Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Hbase Essentials

You're reading from   Hbase Essentials A practical guide to realizing the seamless potential of storing and managing high-volume, high-velocity data quickly and painlessly with HBase

Arrow left icon
Product type Paperback
Published in Nov 2014
Publisher
ISBN-13 9781783987245
Length 164 pages
Edition 1st Edition
Languages
Tools
Concepts
Arrow right icon
Author (1):
Arrow left icon
Nishant Garg Nishant Garg
Author Profile Icon Nishant Garg
Nishant Garg
Arrow right icon
View More author details
Toc

Understanding HBase cluster components

In fully distributed and pseudo-distributed modes, a HBase cluster has many components such as HBase Master, ZooKeeper, RegionServers, HDFS DataNodes, and so on, discussed as follows:

  • HBase Master: HBase Master coordinates the HBase cluster and is responsible for administrative operations. It is a lightweight process that does not require too many hardware resources. A large cluster might have multiple HBase Master components to avoid cases that have a single point of failure. In this highly available cluster with multiple HBase Master components, only once HBase Master is active and the rest of HBase Master servers get in sync with the active server asynchronously. Selection of the next HBase Master in case of failover is done with the help of the ZooKeeper ensemble.
  • ZooKeeper: ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. Similar to HBase Master, ZooKeeper is again a lightweight process. By default, a ZooKeeper process is started and stopped by HBase but it can be managed separately as well. The HBASE_MANAGES_ZK variable in conf/hbase-env.sh with the default value true signifies that HBase is going to manage ZooKeeper. We can specify the ZooKeeper configuration in the native zoo.cfg file or its values such as client, port, and so on directly in conf/hbase-site.xml. It is advisable that you have an odd number of ZooKeeper ensembles such as one/three/five for more host failure tolerance. The following is an example of hbase-site.xml with ZooKeeper settings:
    <property>
      <name>hbase.zookeeper.property.clientPort</name>
      <value>2222</value>
    </property>
    <property>
      <name>hbase.zookeeper.quorum</name>
      <value>ZooKeeperhost1, ZooKeeperhost2, ZooKeeperhost3<value>
    </property>
    <property>
      <name>hbase.zookeeper.property.dataDir</name>
      <value>/opt/zookeeper<value>
    </property>
    
  • RegionServers: In HBase, horizontal scalability is defined with a term called region. Regions are nothing but a sorted range of rows stored continuously. In HBase architecture, a set of regions is stored on the region server. By default, the region server runs on port 60030. In an HBase cluster based on HDFS, Hadoop DataNodes and RegionServers are typically called slave nodes as they are both responsible for server data and are usually collocated in the cluster. A list of the region servers is specified in the conf/regionservers file with each region server on a separate line, and the start/stop of these region servers is controlled by the script files responsible for an HBase cluster's start/stop.
  • HBase data storage system: HBase is developed using pluggable architecture; hence, for the data storage layer, HBase is not tied with HDFS. Rather, it can also be plugged in with other file storage systems such as the local filesystem (primarily used in standalone mode), S3 (Amazon's Simple Storage Service), CloudStore (also known as Kosmos filesystem) or a self-developed filesystem.

Apart from the mentioned components, there are other considerations as well, such as hardware and software considerations, that are not within the scope of this book.

Note

The backup HBase Master server and the additional region servers can be started in the pseudo-distributed mode using the utility provided bin directory as follows:

[root@localhost hbase-0.96.2-hadoop2]# bin/local-master-backup.sh 2 3 

The preceding command will start the two additional HBase Master backup servers on the same box. Each HMaster server uses three ports (16010, 16020, and 16030 by default) and the new backup servers will be using ports 16012/16022/16032 and 16013/16023/16033.

[root@localhost hbase-0.96.2-hadoop2]# bin/local-regionservers.sh start 2 3

The preceding command will start the two additional HBase region servers on the same box using ports 16202/16302.

Start playing

Now that we have everything installed and running, let's start playing with it and try out a few commands to get a feel of HBase. HBase comes with a command-line interface that works for both local and distributed modes. The HBase shell is developed in JRuby and can run in both interactive (recommended for simple commands) and batch modes (recommended for running shell script programs). Let's start the HBase shell in the interactive mode as follows:

[root@localhost hbase-0.98.7-hadoop2]# bin/hbase shell

The preceding command gives the following output:

Start playing

Type help and click on return to see a listing of the available shell commands and their options. Remember that all the commands are case-sensitive.

The following is a list of some simple commands to get your hands dirty with HBase:

  • status: This verifies whether HBase is up and running, as shown in the following screenshot:
    Start playing
  • create '<table_name>', '<column_family_name>': This creates a table with one column family. We can use multiple column family names as well, as shown in the following screenshot:
    Start playing
  • list: This provides the list of tables, as shown in the following screenshot:
    Start playing
  • put '<table_name>', '<row_num>', 'column_family:key', 'value': This command is used to put data in the table in a column family manner, as shown in the following screenshot. HBase is a schema-less database and provides the flexibility to store any type of data without defining it:
    Start playing
  • get '<table_name>', '<row_num>': This command is used to read a particular row from the table, as shown in the following screenshot:
    Start playing
  • scan '<table_name >': This scans the complete table and outputs the results, as shown in the following screenshot:
    Start playing
  • delete '<table_name>', '<row_num>', 'column_family:key': This deletes the specified value, as shown in the following screenshot:
    Start playing
  • describe '<table_name>': This describes the metadata information about the table, as shown in the following screenshot:
    Start playing
  • drop '<table_name>': This command will drop the table. However, before executing this command, we should first execute disable '<tablename>', as shown in the following screenshot:
    Start playing
  • Finally, exit the shell and stop using HBase, as shown in the following screenshot:
    Start playing

Note

Refer to the following link for more commands: http://wiki.apache.org/hadoop/Hbase/Shell

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime