Executing map side joins in Hive
Map side joins are special types of optimizations; Hive executes these automatically based on table sizes. In this recipe, we are going to explore map side joins in further detail.
Getting ready
To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Hive installed on it. Here, I am using Hive 1.2.1.
How to do it...
To perform map joins, we need two types of datasets that have something in common to join. One dataset also has to be big, and the other has to be small in comparison. Consider a situation where we have two tables for employees and departments; the employee table has a structure (ID, name, salary, and department ID) and the department table has an ID and a name.
We will quickly create tables and load data into them:
CREATE TABLE emp( id INT, name STRING, salary DOUBLE, deptId INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH 'emp.txt' INTO TABLE emp; hive>...