Time for action – performing a join
Joins are a very frequently used tool in SQL, though sometimes appear a little intimidating to those new to the language. Essentially a join allows rows in multiple tables to be logically combined together based on a conditional statement. Hive has rich support for joins which we will now examine.
Create the following as
join.hql
:SELECT t1.sighted, t2.full_name FROM ufodata t1 JOIN states t2 ON (LOWER(t2.abbreviation) = LOWER(SUBSTR( t1.sighting_location, (LENGTH(t1.sighting_location)-1)))) LIMIT 5 ;
Execute the query:
$ hive -f join.hql
You will receive the following response:
OK 20060930 Alaska 20051018 Alaska 20050707 Alaska 20100112 Alaska 20100625 Alaska Time taken: 33.255 seconds
What just happened?
The actual join
query is relatively straightforward; we want to extract the sighted date and location for a series of records but instead of the raw location field, we wish to map this into the full state name. The HiveQL file we created performs...