Overview of Hive
Hive is a data warehouse that uses MapReduce to analyze data stored on HDFS. In particular, it provides a query language called HiveQL that closely resembles the common Structured Query Language (SQL) standard.
Why use Hive?
In Chapter 4, Developing MapReduce Programs, we introduced Hadoop Streaming and explained that one large benefit of Streaming is how it allows faster turn-around in the development of MapReduce jobs. Hive takes this a step further. Instead of providing a way of more quickly developing map and reduce tasks, it offers a query language based on the industry standard SQL. Hive takes these HiveQL statements and immediately and automatically translates the queries into one or more MapReduce jobs. It then executes the overall MapReduce program and returns the results to the user. Whereas Hadoop Streaming reduces the required code/compile/submit cycle, Hive removes it entirely and instead only requires the composition of HiveQL statements.
This interface to Hadoop...