We will discuss this section in a bit questionnaire manner, and will come to understand HBase with the help of scenarios and conditions.
The following figure shows the flow of HBase: birth, growth, and current status:
It all started in 2006 when a company called Powerset (later acquired by Microsoft) was looking forward to building a natural language search engine. The person responsible for the development was Jim Kellerman and there were also many other contributors. It was modeled around the Google BigTable white paper that came out in 2006, which was running on
Google File System (GFS).
It started with a TAR file with a random bunch of Java files with initial HBase code. It was first added to the contrib
directory of Hadoop as a small subcomponent of Hadoop and with the dedicated effort for filling up gaps; it has slowly and steadily grown into a full-fledged project. It was first added with Hadoop 0.1.0 and as it become more and more feature rich and stable, it was promoted to Hadoop subproject and then slowly with more and more development and contribution from the HBase user and developer group, it has became one of the top-level projects at Apache.
The following figure shows HBase versions from the beginning till now:
Let's have a look at the year-by-year evolution of HBase's important features:
- 2006: The idea of HBase started with the white paper of Google BigTable
- 2007: Data compression on the per column family was made available, addition and deletion of column families online was added, script to start/stop HBase cluster was added, MapReduce connecter was added, HBase shell support was added, support of row and column filter was added, algorithm to distribute region was evenly added, first rest interface was added, and hosted the first HBase meetup
- 2008: HBase 0.1.0 to 0.18.1, HBase moved to new SVN, HBase added as a contribution to Hadoop, HBase become Hadoop subproject, first separate release became available, and Ruby shell added
- 2009: HBase 0.19.0 to 0.20.*, improvement in writing and scanning, batching of writing and scanning, block compression, HTable interface to REST, addition of binary comparator, addition of regular expression filters, and many more
- 2010 till date: HBase 0.89.* - 0.94.*, support for HDFS durability, improvement in import flow, support for MurmurHash3, addition of daemon threads for NameNode and DataNode to indicate the VM or kernel-caused pause in application log, tags support for key value, running MapReduce over snapshot files, introduction of transparent encryption of HBase on disk data, addition of per key-value security, offline rebuilding
.META.
from file system data, snapshot support, and many more - 0.96 to 1.0 and Future: HBase Version 1 and higher, add utility for adorning HTTP context, fully high availability with Hadoop HA, rolling upgrades, improved failure detection and recovery, cell-level access security, inline cell tagging, quota and grouping, reverse scan, rolling upgrade, and it will be more useful for analytics purposes and helpful for data scientists
Let's now discuss HBase and Hadoop compatibility and the features they provide together.
Prior to Hadoop v1.0, when DataNode used to crash, HBase Write-Ahead Log—the logfiles that maintain the read/write operation before the final writing is done to the MemStore—would be lost and hence the data too. This version of Hadoop integrated append branch into the main core, which increased the durability for HBase. Hadoop v1.0 has also implemented the facility of disk failure, making RegionServer more robust.
Hadoop v2.0 has integrated high availability of NameNode, which also enables HBase to be more reliable and robust by enabling the multiple HMaster instances. Now with this version of HBase, upgrading has become easy because it is made independent of HDFS upgrades. Let's see in the following table how recent versions of Hadoop have enhanced HBase on the basis of performance, availability, and features:
Note
Miscellaneous features in newer HBase are HBase isolation and allocation, online-automated repair of table integrity and region consistency problems, dynamic configuration changes, reverse scanning (stop row to start row), and many other features; users can visit https://issues.apache.org/jira/browse/HBASE for features and advancement of each HBase release.
ZooKeeper is a high-performance, centralized, multicoordination service system for distributed application, which provides a distributed synchronization and group service to HBase.
It enables the users and developer to focus on the application logic and not on the coordination with the cluster, for which it provides some API that can be used by the developers to use and implement coordination task such as master server, and managing application and cluster communication system.
In HBase, ZooKeeper is used to elect a cluster master in order to keep track of available and online servers, and to keep the metadata of the cluster. ZooKeeper APIs provide:
- Consistency, ordering, and durability
- Synchronization
- Concurrency for a distributed clustered system
The following figure shows ZooKeeper:
It was developed at Yahoo Research. And the reason behind the name ZooKeeper is that in Hadoop system, projects are based on animal names, and in discussion regarding naming this technology, this name emerged as it manages the availability and coordination between different components of a distributed system.
ZooKeeper not only simplifies the development but also sits on the top distributed system as an abstraction layer to facilitate the better reachability to the components of the system. The following figure shows the request and response flow:
Let's consider a scenario wherein we have a few people who want to fill 10 rooms with some items. One instance would be where we will show how they find their way to the room to keep the items. Some of the rooms will be locked, which will lead the people to move on to other rooms. The other instance would be where we can allocate some representatives with information about the rooms, condition of rooms, and state of rooms (open, closed, fit for storing, not fit, and so on). We can then send them with items to those representatives for the information. The representative will guide the person towards the right room, which is available for storage of items, and the person can directly move to the specified room and store the item. This will not only ease the communication and the storage process but also reduce the overhead from the process. The same technique can be applied in the case of the ZooKeepers.
ZooKeeper maintains a tree with ZooKeeper data internally called a znode. This can be of two types:
- Ephemeral, which is good for applications that need to understand whether a specific distributed resource is available or not.
- The persistent one will be stored till a client does not delete it explicitly and it stores some data of the application too.
Why an odd number of ZooKeepers?
ZooKeepers are based on a majority principle; it requires that we have a quorum of servers to be up, where quorum is ceil(n/2)
, for a cluster of three nodes ensemble means two nodes must be up and running at any point of time, and for five node ensemble, a minimum three nodes must be up. It's also important for election purpose for the ZooKeeper master. We will discuss more options of configuration and coding of ZooKeeper in later chapters.
HMaster is the component of the HBase cluster that can be thought of as NameNode in the case of Hadoop cluster; likewise, it acts as a master for RegionServers running on different machines. It is responsible for monitoring all RegionServers in an HBase cluster and also provides an interface to all HBase metadata for the client operations. It also handles RegionServer failover, and region splits.
There may be more than one instance of HMaster in an HBase cluster that provides High Availability (HA). So, if we have more than one master, only one master is active at a time; at the start up time, all the masters compete to become the active master in the cluster and whichever wins becomes the active master of the cluster. Meanwhile, all other master instances remain passive till the active master crashes, shuts down, or loses a lease from the ZooKeeper.
In short, it is a coordination component in an HBase cluster, which also manages and enables us to perform an administrative task on the cluster.
Let's now discuss the flow of starting up the HMaster process:
- Block (do not serve requests) until it becomes active HMaster.
- Finish initialization.
- Enter loop until stopped.
- Do cleansing when it is stopped.
HMaster exports some of the following interfaces that are metadata-based methods to enable us to interact with HBase:
In HBase, there is a table called .META.
(table name on file system), which keeps all information about regions that is referred by HMaster for information about the data. By default, HMaster runs on port number 60000 and its HTTP Web UI is available on port 60010, which can always be changed according to our need.
HMaster functionalities can be summarized as follows:
- Monitors RegionServers
- Handles RegionServers failover
- Handles metadata changes
- Assignment/unassignment of regions
- Interfaces all metadata changes
- Performs reload balancing in idle time
- It publishes its location to client using ZooKeeper
- HMaster Web UI provides all information about HBase cluster (table, regions, RegionServers and so on)
If a master node goes down
If master goes down, in this scenario, the cluster may continue working normally as clients talk directly to RegionServers. So, cluster may still function steadily. The HBase catalog table (.META.
and -ROOT-
) exists as HBase tables and it's not stored in master resistant memory. However, as master performs critical functions such as RegionServers' failovers and region splits, these functions may be hampered and if not taken care will create a huge setback to the overall cluster functioning, so the master must be started as soon as possible.
So now, Hadoop is HA enabled and thus HBase can always be made HA using multiple HMasters for better availability and robustness, so we can now consider having multiple HMaster.
RegionServers are responsible for holding the actual raw HBase data. Recall that in a Hadoop cluster, a NameNode manages the metadata and a DataNode holds the raw data. Likewise, in HBase, an HBase master holds the metadata and RegionServer's store. These are the servers that hold the HBase data, as we may already know that in Hadoop cluster, NameNode manages the metadata and DataNode holds the actual data. Likewise, in HBase cluster, RegionServers store the raw actual data. As you might guess, a RegionServer is run or is hosted on top of a DataNode, which utilizes the underlying DataNodes at underlying file system, that is, HDFS.
The following figure shows the architecture of RegionServer:
RegionServer performs the following tasks:
- Serving regions(tables) assigned to it
- Handling client read/write requests
- Flushing cache to HDFS
- Maintaining HLogs
- Performing compactions
- Responsible for handling region splits
Components of a RegionServer
The following are the components of RegionServers
- Write-Ahead logs: This is also called edit. When data is read/modified to HBase, it's not directly written in the disk rather it is kept in memory for some time (threshold, which we can configure based on size and time). Keeping this data in memory may result in a loss of data if the machine goes down abruptly. So to solve this, the data is first written in an intermediate file, which is called Write-Ahead logfile and then in memory. So in the case of system failure, data can be reconstructed using this logfile.
- HFile: These are the actual files where the raw data is stored physically on the disk. This is the actual store file.
- Store: Here the HFile is stored. It corresponds to a column family for a table in HBase.
- MemStore: This component is in memory data store; this resides in the main memory and records the current data operation. So, when data is stored in WAL, RegionServers stores key-value in memory store.
- Region: These are the splits of HBase table; the table is divided into regions based on the key and are hosted by RegionServers. There may be different regions in a RegionServer.
We will discuss more about these components in the next chapter.
Client is responsible for finding the RegionServer, which is hosting the particular row (data). It is done by querying the catalog tables. Once region is found, the client directly contacts RegionServers and performs the data operation. Once this information is fetched, it is cached by the client for further fast retrieval. The client can be written in Java or any other language using external APIs.
There are two tables that maintain the information about all RegionServers and regions. This is a kind of metadata for the HBase cluster. The following are the two catalog tables that exist in HBase:
-ROOT-
: This includes information about the location of .META.
table.META.
: This table holds all regions and their locations
At the beginning of the start up process, the .mMeta
location is set to root from where the actual metadata of tables are read and read/write continues. So, whenever a client wants to connect to HBase and read or write into table, these two tables are referred and information is returned to client for direct read and write to the RegionServers and the regions of the specific table.
When should we think of using HBase?
Using HBase is not the solution to all problems; however, it can solve a lot of problems efficiently. The first thing is that we should think about the amount of data; if we have a few million rows and a few read and writes, then we can avoid using it. However, think of billions of columns and thousands of read/write data operations in a short interval, we can surely think of using HBase.
Let's consider an example, Facebook uses HBase for its real-time messaging infrastructure and we can think of how many messages or rows of data Facebook will be receiving per second. Considering that amount of data and I/O, we can currently think of using HBase. The following list details a few scenarios when we can consider using HBase:
- If data needs to have a dynamic or variable schema
- If a number of columns contain more null values (blank columns)
- When we have a huge number of dynamic rows
- If our data contains a variable number of columns
- If we need to maintain versions of data
- If high scalability is needed
- If we need in-built compression on records
- If a high volume of I/O is needed
There are many other cases where we can use HBase and it can be beneficial, which is discussed in later chapters.