Chapter 8. A Relational View on Data with Hive
MapReduce is a powerful paradigm which enables complex data processing that can reveal valuable insights. However, it does require a different mindset and some training and experience on the model of breaking processing analytics into a series of map and reduce steps. There are several products that are built atop Hadoop to provide higher-level or more familiar views on the data held within HDFS. This chapter will introduce one of the most popular of these tools, Hive .
In this chapter, we will cover:
What Hive is and why you may want to use it
How to install and configure Hive
Using Hive to perform SQL-like analysis of the UFO data set
How Hive can approximate common features of a relational database such as joins and views
How to efficiently use Hive across very large data sets
How Hive allows the incorporation of user-defined functions into its queries
How Hive complements another common tool, Pig