Before we start HBase in fully distributed mode, we will be setting up first Hadoop-2.2.0 in a distributed mode, and then on top of Hadoop cluster we will set up HBase because HBase stores data in HDFS.
The first step will be to create a directory at user/u/HBase B
and download the TAR file from the location given later. The location can be local, mount points or in cloud environments; it can be block storage:
wget wget –b http://apache.mirrors.pair.com/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz
Tip
This –b option will download the tar file as a background process. The output will be piped to wget-log. You can tail this log file using tail -200f wget-log.
Untar it using the following commands:
This is used to untar the file in a folder hadoop-2.2.0 in your current directory location.
Once the untar process is done, for clarity it's recommended use two different folders one for NameNode
and other for DataNode
.
Tip
I am assuming app is a user and app is a group on a Linux platform which has access to read/write/execute access to the locations, if not please create a user app and group app if you have sudo su - or root/admin
access, in case you don't have please ask your administrator to create this user and group for you in all the nodes and directorates you will be accessing.
To keep the NameNodeData
and the DataNodeData
for clarity let's create two folders by using the following command, inside /u/HBase B
:
NameNodeData
will have the data which is used by the name nodes and DataNodeData
will have the data which will be used by the data nodes:
The steps in choosing Hadoop cluster are:
- Hardware details required for it
- Software required to do the setup
- OS required to do the setup
- Configuration steps
HDFS core architecture is based on master/slave, where an HDFS cluster comprises of solo NameNode
, which is essentially used as a master node, and owns the accountability for that orchestrating, handling the file system, namespace, and controlling access to files by client. It performs this task by storing all the modifications to the underlying file system and propagates these changes as logs, appends to the native file system files, and edits. SecondaryNameNode
is designed to merge the fsimage
and the edits log
files regularly and controls the size of edit logs to an acceptable limit.
In a true cluster/distributed environment, it runs on a different machine. It works as a checkpoint in HDFS.
We will require the following for the NameNode
:
RAID is nothing but a random access inexpensive drive or independent disk. There are many levels of RAID drives, but for master or a NameNode
, RAID 1 will be enough.
JBOD stands for Just a bunch of Disk. The design is to have multiple hard drives stacked over each other with no redundancy. The calling software needs to take care of the failure and redundancy. In essence, it works as a single logical volume:
Before we start for the cluster setup, a quick recap of the Hadoop setup is essential with brief descriptions.
Let's create a directory where you will have all the software components to be downloaded:
- For the simplicity, let's take it as
/u/HBase B
. - Create different users for different purposes.
- The format will be as follows
user/group
, this is essentially required to differentiate different roles for specific purposes:Hdfs/hadoop
is for handling Hadoop-related setupYarn/hadoop
is for yarn related setupHBase /hadoop
Pig/hadoop
Hive/hadoop
Zookeeper/hadoop
Hcat/hadoop
- Set up directories for Hadoop cluster. Let's assume
/u
as a shared mount point. We can create specific directories that will be used for specific purposes.Tip
Please make sure that you have adequate privileges on the folder to add, edit, and execute commands. Also, you must set up password less communication between different machines like from name node to the data node and from HBase master to all the region server nodes.
Once the earlier-mentioned structure is created; we can download the tar files from the following locations:
- You can download these tar files from the following location:
Here, we will list the procedure to achieve the end result of the recipe. This section will follow a numbered bullet form. We do not need to give the reason that we are following a procedure. Numbered single sentences would do fine.
Let's assume that there is a /u
directory and you have downloaded the entire stack of software from: /u/HBase B/hadoop-2.2.0/etc/hadoop/
and look for the file core-site.xml
.
Place the following lines in this configuration
file:
Tip
You can specify a port that you want to use, and it should not clash with the ports that are already in use by the system for various purposes.
Save the file. This helps us create a master /NameNode
.
Now, let's move to set up SecondryNodes
, let's edit /u/HBase B/hadoop-2.2.0/etc/hadoop/
and look for the file core-site.xml
:
Note
The separation of the directory structure is for the purpose of a clean separation of the HDFS block separation and to keep the configurations as simple as possible. This also allows us to do a proper maintenance.
Now, let's move towards changing the setup for hdfs
; the file location will be /u/HBase B/hadoop-2.2.0/etc/hadoop/hdfs-site.xml
.
Add these properties in hdfs-site.xml
:
For NameNode
:
For DataNode
:
Now, let's go for NameNode
for http address or to access using http protocol:
We can go for the https setup for the NameNode
too, but let's keep it optional for now:
Let's set up the yarn resource manager:
- Let's look for Yarn setup:
- For resource tracker a part of yarn resource manager:
- For resource schedule part of yarn resource scheduler:
- For scheduler address:
- For scheduler admin address:
- To set up a local dir:
- To set up a log location:
This completes the configuration changes required for Yarn.
Now, let's make the changes for Map reduce:
- Let's open the mapred-site.xml:
- Now, let's place this property configuration setup in the
mapred-site.xml
and place it between the following: - Once we have configured Map reduce job history details, we can move on to configure HBase .
- Let's go to this path
/u/HBase B/HBase -0.98.3-hadoop2/conf
and open HBase -site.xml
.You will see a template having the following:
- We need to add the following lines between the starting and ending tags:
- This competes the HBase changes.
ZooKeeper: Now, let's focus on the setup of ZooKeeper. In distributed env
, let's go to this location and rename the zoo_sample.cfg
to zoo.cfg
:
Open zoo.cfg
by vi zoo.cfg
and place the details as follows; this will create two instances of zookeeper on different ports:
If you want to test this setup locally, please use different port combinations. In a production-like setup as mentioned earlier, yourzooKeeperserver.1=zoo1:2888:3888
is server.id=host:port:port
:
Atomic broadcasting is an atomic messaging system that keeps all the servers in sync and provides reliable delivery, total order, casual order, and so on.
Region servers: Before concluding it, let's go through the region server setup process.
Go to this folder /u/HBase B/HBase -0.98.3-hadoop2/conf
and edit the regionserver
file.
Specify the region servers accordingly:
Note
RegionServer1 equal to the IP or fully qualified CNAME of 1 Region server.
You can have as many region servers (1. N=4 in our case), but its CNAME and mapping in the region server file need to be different.
Copy all the configuration files of HBase and ZooKeeper to the relative host dedicated for HBase and ZooKeeper. As the setup is in a fully distributed cluster mode, we will be using a different host for HBase and its components and a dedicated host for ZooKeeper.
Next, we validate the setup we've worked on by adding the following to the bashrc, this will make sure later we are able to configure the NameNode
as expected:
Tip
It preferred to use it in your profile, essentially /etc/profile
; this will make sure the shell which is used is only impacted.
Now let's format NameNode
:
Before formatting we need to take care of the following.
Check whether there is a Hadoop cluster running and using the same HDFS; if it's done accidentally all the data will be lost.
Now let's go to the SecondryNodes
:
Repeating the same procedure in DataNode
:
See if you can reach from your browser http://namenode.full.hostname:50070
:
Now, hello.txt
file will be created in tmp
location:
Tip
apphduser is a directory which is created in hdfs for a specific user.
So that the data is separated based on the users, in a true production env
many users will be using it.
Tip
You can also use hdfs dfs –ls
/
commands if it shows hadoop command as depricated.
You must see hello.txt
once the command executes:
Tip
It is important to change the data host name and other parameters accordingly.
You should see the details on the DataNode
. Once you hit the preceding URL you will get the following screenshot:
On the command line it will be as follows:
Validate Yarn/MapReduce
setup and execute this command from the resource manager:
Execute the following command from NodeManager
:
Executing the following commands will create the directories in the hdfs and apply the respective access rights:
Start jobhistory
servers:
Let's have a few tests to be sure we have configured properly:
Test 01: From the browser or from curl use the link to browse: http://yourresourcemanager.full.hostname:8088/
.
Test 02:
Validate the HBase
setup:
Now login as $HBase _USER
:
This command will start the master node. Now let's move to HBase Region server nodes:
This command will start the regionservers
:
Note
For a single machine, direct sudo ./HBase
master start can also be used.
Please check the logs in case of any logs at this location /opt/HBase B/HBase -0.98.5-hadoop2/logs
.
You can check the log files and check for any errors:
Now let's login using:
We will connect HBase
to the master.
Validate the ZooKeeper setup. If you want to use an external zookeeper, make sure there is no internal HBase based zookeeper running while working with the external zookeeper or existing zookeeper and is not managed by HBase :
For this you have to edit /opt/HBase B/HBase -0.98.5-hadoop2/conf/ HBase -env.sh
.
Change the following statement (HBase _MANAGES_ZK=false
):
# Tell HBase whether it should manage its own instance of Zookeeper or not.
Once this is done we can add zoo.cfg
to HBase 's CLASSPATH
.
HBase looks into zoo.cfg as a default lookup for configurations
# this is the place where the zooData will be present
# IP and port for server 01
# IP and port for server 02
You can edit the log4j.properties file which is located at /opt/HBase B/zookeeper-3.4.6/conf
and point the location where you want to keep the logs.
# Define some default values that can be overridden by system properties:
Once this is done you start zookeeper with the following command:
You can also pipe the log to the ZooKeeper logs:
2
: refers to the second file descriptor for the process, that is stderr
.
Apache Yarn is a robust, distributed, application management framework that surpasses the traditional Apache Hadoop MapReduce framework to process data in a large Hadoop clusters.
This change was needed because during the map phase of the mapreduce process, the data is chunked into small discrete packets that can be processed, followed by a second phase reduce, which allows this split data to be aggregated and thus produces the desired results. This works well with small, mid-sized and to some extent large clusters, but for the very large cluster (more than 4000 nodes), the unpredictable behavior starts to surface. The core issue was replication of data during the cascading failure.
Thus, it helps us in reliability, scalability, and sharing. Hadoop Yarn essentially works with JobTracker and splits the multiple accountabilities into resource management, job monitoring and scheduling into more granular and distributed by resource manager and application Master.
It works in synchronicity with per-node NodeManager
and the per-application ApplicationMaster.
NodeManager
takes a remote invocation from resource manager and manage resources available on a single node.
ApplicationMaster is responsible for negotiating resource with the resourceManager
and works with the NodeManager
to start the containers.
HBase provides low-latency random read and writes on top of HDFS, being a large-scale key value store, the main differentiating factor for HBase is that it can scan petabyte of data at a very high speed. It also comes with an inbuilt capability of autosharding by splitting the tables dynamically when the table becomes too large.
This enables HBase to horizontally scale. This is quantified as regions. Regions are a portion of table data, which are stored together and of prime efficiency. This does not make sense. The slave servers in HBase are the region server. It does a fair bit of work and provides true distribution across different regions. It can serve one or more regions based on the needs, each reason is assigned to a region server or start-up.
HBase 0.96 removed the concept of ROOT containing the META
table location, rather it moved it to ZooKeeper
as the META
table cannot split and can be in only single region:
HMaster
: This does administrative operations and coordinated cluster.HTable
: It allows client for
, get
, put
, delete
, and other data manipulation options. This interacts directly with the region server. Essentially, it finds the region server, which is responsible for serving the particular row range.HFile
: This is a physical representation of data in HBase, the read of data in always done using the region servers. It's generated by flush or compactions. There are two versions of HFile V2, and V3.HFile V2
: The main issues with HFile V1
were to load all the monolithic indexes and large bloom filter in memory. V2 was introduced to provide efficiency as compared to V1, while sorting large amount of data by using multilevel indexes and a block level bloom filter. It also improves the caching and memory utilization. Index is also moved to block level. This essentially means that each block has its own leaf index, which allows multilevel index. The multilevel index is like b+ tree and uses last key of each block to facilitate intermediate. The detailed explanation is beyond the scope of this book:MemStore
: It collects data edits as they're received and buffers them in memory. It helps the system to push the data on the disk at one go, and on the other hand, it keeps the data in memory for subsequent access and avoid the expensive disk seeks. It also helps in keeping the data block size to the HDFS block size specified. It is also needed to mention about the sorting it does before flushing to Hfile
.Block cache
: For efficient I/O usage, HBase is programmed to read the entire block at one go and kept in memory (In JVM memory) per region servers. It is initialized during the region server startup and stays the same during the lifetime of the server startup.LruBlockCache
: The data blocks are cached in-memory (JVM heap). The block is divided into different size, 25% (for single access), 50% (multi access), 25% (in-memory) of total block size, respectively.SlabCache
: It's a way off-heap memory outside the JVM heap using the DirectByteBuffer. SlabCache minimizes the fragmentation but the other part of HBase that is JVM-dependent, still can do fragmentations. The main advantage that we get is, it reduces the frequency of stop the world pause GC cycle, which can lead to the no heartbeats of the region servers and can signal as dead, this can be catastrophic in an actual production system. While reading the data from the slabcache, the data is copied from the disk based on "copy on read approach", which means reading data from the JVM if the data is present. If the data is not copied then the data is copied on the heap from the slab: http://en.wikipedia.org/wiki/XOR_swap_algorithm.
SlabCache works as an L2 cache, and replaces the FS cache. The on-heap JVM cache works as the L1 cache.
This approach allows us to use large memory without losing the performance of the system, and it reduces the chances of missed heartbeats because of stop the world GC process.
This is mainly achieved due to the Direct ByteBuffer class available in the java.nio package, which allows us to allocate memory outside the normal Java Heap/JVM very similar to malloc()
in C programming. The Garbage collection process will not remove the unreferenced objects when the memory is allocated by direct bytebuffer.
Bucket cache
: It's an implementation of block cache similar to LruBlockCache. It can be also used as a secondary cache to expand the cache space. The blocks of data can be stored in memory or on the file system. It significantly helps the CMS and heap fragments by Java garbage cleaning (GC) process.Multilevel caching
: It's a design strategy of effective and large cache management. The first-level cache is an L1 level cache, which is LruBlockCahce. The second level is L2. Both the cache levels interact independently to each other and are checked in case of eviction and retrieve block of data.