Test driving Hive and Sqoop
In the previous section, we verified that MySQL, Hive, and Sqoop were available on our Hadoop Sandbox. We will now test drive Hive and Sqoop.
Querying data using Hive
We run Hive queries to select data from tables. Hive has two types of tables:
- Managed tables
- External tables
Hive creates managed tables by default. To create external tables, we specify the keyword external
during table creation.
In the case of managed tables, the table lifecycle is completely managed by Hive. If you drop a managed table, then the associated data and metadata are also deleted by Hive. The external table reads data from an HDFS file. This file is not deleted when the table is dropped by Hive. Other tools can also access the HDFS file while at the same time we can run Hive queries on the HDFS by defining an external table for the file.
In Chapter 1, Hadoop and Big Data, of this book, we used a dataset containing the historical stock price of IBM to run a MapReduce job that calculated...