Summary
In its early days, Hadoop was sometimes erroneously seen as the latest supposed relational database killer. Over time, it has become more apparent that the more sensible approach is to view it as a complement to RDBMS technologies and that, in fact, the RDBMS community has developed tools such as SQL that are also valuable in the Hadoop world.
HiveQL is an implementation of SQL on Hadoop and was the primary focus of this chapter. In regard to HiveQL and its implementations, we covered the following topics:
How HiveQL provides a logical model atop data stored in HDFS in contrast to relational databases where the table structure is enforced in advance
How HiveQL supports many standard SQL data types and commands including joins and views
The ETL-like features offered by HiveQL, including the ability to import data into tables and optimize the table structure through partitioning and similar mechanisms
How HiveQL offers the ability to extend its core set of operators with user-defined code...