Using Hive with Apache HBase
Hive is an ETL engine for HBase/Hadoop. It has an SQL-like query language, popularly known as Hive QA for SELECT(read) and INSERT(write). The main objective is to do ad hoc analysis on Petabyte-level data. Hive integration was originally introduced in HIVE-705.
Getting ready
- HBase and Hadoop cluster should be up and running.
Download Hive from https://archive.apache.org/dist/hive/hive-0.12.0/hive-0.12.0.tar.gz, or you can use this command:
wget –o
, https://archive.apache.org/dist/hive/hive-0.12.0/hive-0.12.0.tar.gzThis is if you are using the Linux command line.
- Untar it into the location, say
/u/HbaseB
- Hive uses an integration interface as HbaseStorageHandler, which enables Hive to talk to HBase (Hive projects need these optional JAR files to interact with HBase). Extensive discussions are beyond the current scope of this book. For more information, take a look at the see also section.
How to do it…
The first step is to use HbaseStorageHandler to register...