Special JOIN – MAPJOIN
The MAPJOIN
statement means doing the JOIN
operation only by map without the reduce job. The MAPJOIN
statement reads all the data from the small table to memory and broadcasts to all maps. During the map phase, the JOIN
operation is performed by comparing each row of data in the big table with small tables against the join conditions. Because there is no reduce needed, the JOIN
performance is improved. When the hive.auto.convert.join
setting is set to true
, Hive automatically converts the JOIN
to MAPJOIN
at runtime if possible instead of checking the map join hint. In addition, MAPJOIN
can be used for unequal joins to improve performance since both MAPJOIN
and WHERE
are performed in the map phase. The following is an example of MAPJOIN
that is enabled by query hint:
jdbc:hive2://> SELECT /*+ MAPJOIN(employee) */ emp.name, emph.sin_number . . . . . . .> FROM employee emp . . . . . . .> CROSS JOIN employee_hr emph WHERE emp.name <> emph.name;
The MAPJOIN...