Performing JOINS in Pig
In this recipe, we will learn how to perform various joins in Pig in order to join datasets.
Getting ready
To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Pig installed on it.
How to do it...
JOIN operations are very famous in SQL. Pig Latin also supports joining datasets based on a common attribute between them. Pig supports both Inner and Outer joins. Let's understand these syntaxes one by one.
In order to learn about Joins in Pig, we'll need two datasets. The first one is the employee dataset, which we have been using in earlier recipes, the second is the ID location dataset, which contains information about the ID of an employee and their location.
The employee dataset will look like this:
1 Tanmay ENGINEERING 5000 2 Sneha PRODUCTION 8000 3 Sakalya ENGINEERING 7000 4 Avinash SALES 6000 5 Manisha SALES 5700 6 Vinit FINANCE 6200
The ID location dataset will look like this:
1 Pune 2 London 3 Mumbai 4 Pune
Like the emps
data...