Chapter 7. Hadoop and SQL
MapReduce is a powerful paradigm that enables complex data processing that can reveal valuable insights. As discussed in earlier chapters however, it does require a different mindset and some training and experience on the model of breaking processing analytics into a series of map and reduce steps. There are several products that are built atop Hadoop to provide higher-level or more familiar views of the data held within HDFS, and Pig is a very popular one. This chapter will explore the other most common abstraction implemented atop Hadoop: SQL.
In this chapter, we will cover the following topics:
- What the use cases for SQL on Hadoop are and why it is so popular
- HiveQL, the SQL dialect introduced by Apache Hive
- Using HiveQL to perform SQL-like analysis of the Twitter dataset
- How HiveQL can approximate common features of relational databases such as joins and views
- How HiveQL allows the incorporation of user-defined functions into its queries
- How SQL on Hadoop...