Performing table joins in Hive
In the previous chapter, we talked about how to perform joins in Pig. In this recipe, we are going to take a look at how to perform joins in Hive. Hive supports various types of joins such as inner, outer, and so on.
Getting ready
To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Hive installed on it. Here, I am using Hive 1.2.1.
How to do it...
To perform joins, we will need two types of datasets, which have something in common to join. Consider a situation where we have two employee tables and departments, and every employee table has a structure (ID, name, salary, and department ID) and every department table has an ID and a name. We will quickly create tables and load data into them:
CREATE TABLE emp( id INT, name STRING, salary DOUBLE, deptId INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH 'emp.txt' INTO TABLE emp;
hive> select * from emp; OK ...