The Hadoop ecosystem client
So far, we discussed that HBase clients which work in the interactive mode are synchronous in nature. For batch processing that runs background work such as building search indexes, building statistical data for reporting needs, and so on, a Hadoop ecosystem client such as Hive is used.
Note
The Hadoop MapReduce framework is used to process a large scale of data. For these MapReduce jobs, Hbase can be used in variety of ways such as data source or target or both. This section does not talk about MapReduce usage as it is already covered in the previous chapter.
Hive
Hive is a data warehouse infrastructure built on top of Hadoop. Hive provides a SQL-like query language called HiveQL that allows querying the semi-structured data stored in Hadoop. This query is converted into a MapReduce job and is executed as a MapReduce cluster. These jobs, like any other MR (MapReduce) job, can read and process data other than the Hive table stored on HDFS. In Hive, tables can be defined...