Introduction
As the usage of HBase is becoming a more real-time-based, it's important to make sure that HBase performs the read and write at a consistent speed agnostic to the scale at which it is running:
Type of work |
Usage patterns |
Remarks |
---|---|---|
Data intensive |
IO, Network |
Copying data from HDFS |
Text searching |
CPU, IO, Network |
Locating different sets of data |
Machine learning |
CPU, IO, Network |
Grouping, categorizing, applying models |
MapReduce, indexing, grouping |
IO, CPU, Networking |
Filtering, realigning, combining |
Master, slave, region server, Zookeeper communication |
Network, CPU, IO |
Communicating between different systems, load balancing |
In doing so, we need to make sure that all the distributed components perform at a benchmark that is executable on the Linux-based OS.
The list of distributed components is as follows:
Hardware
OS (Linux, REH6.5)
JVM/Java 1.7.0_10 and above
HDFS
Read/write on the OS
Region servers
Master servers/HMaster
Client servers/HClient
Zookeeper
Table design
Large...