Time for action – importing data from a raw query
Let's see an example of an import where a raw SQL statement is used to select the data to be imported.
Delete any existing output directory:
$ hadoop fs –rmr employees
Drop any existing Hive employee table:
$ hive -e 'drop table employees'
Import data using an explicit query:
sqoop import --connect jdbc:mysql://10.0.0.100/hadooptest --username hadoopuser -P --target-dir employees --query 'select first_name, dept, salary, timestamp(start_date) as start_date from employees where $CONDITIONS' --hive-import --hive-table employees --map-column-hive start_date=timestamp -m 1
Examine the created table:
$ hive -e "describe employees"
You will receive the following response:
OK first_name string dept string salary int start_date timestamp Time taken: 2.591 seconds
Examine the data:
$ hive -e "select * from employees"
You will receive the following response:
OK Alice Engineering 50000 2009-03-12 00:00:00 BobSales 35000 2011-10...