Performing Order By queries in Pig
In this recipe, we will use the Order By operator in Pig scripts to get the desired output.
Getting ready
To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Pig installed on it.
How to do it..
Order By is a very useful operator for data analysis when it comes to sequencing data records based on their values for certain attributes. In order to sequence the records in the proper order, Pig supports Order By.
To learn its usage, we will use the dataset that we took a look at in the previous recipe; in case you don't have the employee dataset, you can perform the following actions.
First of all, load the data in HDFS:
hadoop fs -mkdir /pig/emps_data hadoop fs -put emps.txt /pig/emps_data
Next, we load data into a bag called emps
, and then perform the Order By operation on this data on the basis of salary:
emps = LOAD '/pig/emps_data/emps.txt' AS (id, name, dept, salary);
Next, we will sequence the data by salary. We can...