Preface
Hive is an open source big data framework in the Hadoop ecosystem. It provides an SQL-like interface to query data stored in HDFS. Underlying it runs MapReduce programs corresponding to the SQL query. Hive was initially developed by Facebook and later added to the Hadoop ecosystem.
Hive is currently the most preferred framework to query data in Hadoop. Because most of the historical data is stored in RDBMS data stores, including Oracle and Teradata. It is convenient for the developers to run similar SQL statements in Hive to query data.
Along with simple SQL statements, Hive supports wide variety of windowing and analytical functions, including rank, row num, dense rank, lead, and lag.
Hive is considered as de facto big data warehouse solution. It provides a number of techniques to optimize storage and processing of terabytes or petabytes of data in a cost-effective way.
Hive could be easily integrated with a majority of other frameworks, including Spark and HBase. Hive allows developers or analysts to execute SQL on it. Hive also supports querying data stored in different formats such as JSON.