Introduction
As you know Hbase is a database, which essentially takes all the advantage of using the core foundation of HDFS and MapReduce. This allows it to scale in a distributed architecture. Its schemaless design allows us to bring in a unique ability and flexibility to make design decisions based on the way the data needs to be stored at a very large scale. It also allows fast scans, growth at a rate of petabytes and still gives cell level transaction flexibility to the application. As in any efficient database, Hbase I/O is designed to perform at high concurrency and I/O. The internal design incorporates an ordered write log and a log-structured merge-tree algorithm, which allows us to do this. It's very important to understand these fundamental blocks, which will enable us to make the right choice of design for scalability.
We will discuss some of the core concepts by the individual recipes, as follows:
Seek versus transfer
Storage
Write Path
Read Path
Replication
WAL(Write Ahead logs)