Using a map-side join
In this recipe, you will learn how to use a map-side joins in Hive.
While joining multiple tables in Hive, there comes a scenario where one of the tables is small in terms of rows while another is large. In order to produce the result in an efficient manner, Hive uses map-side joins. In map-side joins, the smaller table is cached in the memory while the large table is streamed through mappers. By doing so, Hive completes the joining at the mapper side only, thereby removing the reducer job. By doing so, performance is improved tremendously.
How to do it…
There are two ways of using map-side joins in Hive.
One is to use the /*+ MAPJOIN(<table_name>)*/
hint just after the select keyword. table_name
has to be the table that is smaller in size. This is the old way of using map-side
joins.
The other way of using a map-side
join is to set the following property to true
and then run a join
query:
set hive.auto.convert.join=true;
Follow these steps to use a map-side
join in...