Time for action – validating the table
The easiest way to do some initial validation is to perform some summary queries to validate the import. This is similar to the types of activities for which we used Hadoop Streaming in Chapter 4, Developing MapReduce Programs.
Instead of using the Hive shell, pass the following HiveQL to the
hive
command-line tool to count the number of entries in the table:$ hive -e "select count(*) from ufodata;"
You will receive the following response:
Total MapReduce jobs = 1 Launching Job 1 out of 1 … Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2012-03-03 16:15:15,510 Stage-1 map = 0%, reduce = 0% 2012-03-03 16:15:21,552 Stage-1 map = 100%, reduce = 0% 2012-03-03 16:15:30,622 Stage-1 map = 100%, reduce = 100% Ended Job = job_201202281524_0006 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 HDFS Read: 75416209 HDFS Write: 6 SUCESS Total MapReduce CPU Time Spent: 0 msec OK 61393 Time taken: 28.218 seconds
Display a sample...