Bulk utilities
The process for loading data using Bulk utilities is very similar:
- Extracting data from the source.
- Transforming the data into HFiles.
- Loading the files into HBase by guiding the region servers as to where to find them.
Getting ready...
The following points have to be remembered when using Bulk utilities:
- HBase/Hadoop cluster with MapReduce/Yarn should be running. You can run jps to check it.
- Access rights (user/group) are needed to execute the program.
- Table schema needs to be designed to the input structure.
- Split points need to be taken into consideration.
- The entire stack (compaction, split, block size, max file size, flush size, version compression, mem store size, block cache, garbage collections nproc, and so on) needs to be fine-tuned to make the best of it.
The WAL are not written; thus, data lost during the failure may not be recovered as there is no replication performed by reading the WAL.
How to do it…
There are multiple ways to do this work, such as writing your own...