The Schema design
HBase schema is drastically different from RDBMS schema design as the requirement and the constraints are different. HBase schema should be designed as required by the application and the schema is recommended to be de-normalized. Data distribution depends on the rowkey, which is selected to be uniform across the cluster. Rowkey also has a good impact on the scan performance of the request.
Things to take care of in HBase schema design are as follows:
Hotspotting: Hotspotting is when one or a few Regions have a huge load of data and the data range is frequently written or accessed causing performance degradation. To prevent hotspotting, we can hash a value of rowkey or a particular column so that the probability of uniform distribution is high and the read and write will be optimized.
Monotonically increasing Rowkeys/Timeseries data: A problem arising with multiple Regions is that a range of rowkeys could reach the threshold of splitting and can lead to a period of timeout...