Hive
If you were curious to explore the source code of the WordCount MapReduce job example from Chapter 2, Installing and Configuring Hadoop, or tried to write some code yourself, you should have realized by now that this is a very low-level way of processing data in Hadoop. Indeed, if writing MapReduce jobs was the only way to access data in Hadoop, its usability would be pretty limited.
Hive was designed to solve this particular problem. It turned out, that lots of MapReduce code that deal with data filtering, aggregation, and grouping can be generated automatically. So, it is possible to design a high-level data processing language, which can then be compiled into native Java MapReduce code. Actually, there is no need to design a new language for this. SQL has been a de facto standard for processing data in relational databases. For Hive developers, the solution was obvious: take a SQL dialect and build Hive as a compiler from SQL to MapReduce. The language that Hive provides is called...