Reads and writes
Let's look at the internal mechanics of how reads and writes are executed within a RegionServer instance.
The HBase write path
HDFS is an append-only file system, so how could a database that supports random record updates be built on top of it?
HBase is what's called a log-structured merge tree, or an LSM, database. In an LSM database, data is stored within a multilevel storage hierarchy, with movement of data between levels happening in batches. Cassandra is another example of an LSM database.
When a write for a key is issued from the HBase client, the client looks up Zookeeper to get the location of the RegionServer that hosts the META region. It then queries the META region to find out a table's regions, their key ranges, and the RegionServers they are hosted on.
The client then makes an RPC call to the RegionServer that contains the key in the write request. The RegionServer receives the data for the key, immediately persists this in an in-memory structure called the Memstore...