Time for action – redefining the table with the correct column separator
Let's fix our table specification as follows:
Create the following file as
commands.hql
:DROP TABLE ufodata ; CREATE TABLE ufodata(sighted string, reported string, sighting_location string, shape string, duration string, description string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ; LOAD DATA INPATH '/tmp/ufo.tsv' OVERWRITE INTO TABLE ufodata ;
Copy the data file onto HDFS:
$ hadoop fs -put ufo.tsv /tmp/ufo.tsv
Execute the HiveQL script:
$ hive -f commands.hql
You will receive the following response:
OK Time taken: 5.821 seconds OK Time taken: 0.248 seconds Loading data to table default.ufodata Deleted hdfs://head:9000/user/hive/warehouse/ufodata OK Time taken: 0.285 seconds
Validate the number of rows in the table:
$ hive -e "select count(*) from ufodata;"
You will receive the following response:
OK 61393 Time taken: 28.077 seconds
Validate the contents of the reported column:
$ hive -e "select reported from ufodata...