Time for action – a more selective import
Let's see how this works by performing an import that is limited by a conditional expression.
Delete any existing employee import directory:
$ hadoop fs -rmr employees
You will receive the following response:
Deleted hdfs://head:9000/user/hadoop/employees
Import selected columns with a predicate:
sqoop import --connect jdbc:mysql://10.0.0.100/hadooptest --username hadoopuser -P --table employees --columns first_name,salary --where "salary > 45000" --hive-import --hive-table salary
You will receive the following response:
12/05/23 15:02:03 INFO hive.HiveImport: Hive import complete.
Examine the created table:
$ hive -e "describe salary"
You will receive the following response:
OK first_name string salary int Time taken: 2.57 seconds
Examine the imported data:
$ hive -e "select * from salary"
You will see the following output:
OK Alice 50000 David 75000 Time taken: 2.754 seconds
What just happened?
This time, our Sqoop command first...