Time for action – reduce-side join using MultipleInputs
We can perform the report explained in the previous section using a reduce-side join by performing the following steps:
Create the following tab-separated file and name it
sales.txt
:00135.992012-03-15 00212.492004-07-02 00413.422005-12-20 003499.992010-12-20 00178.952012-04-02 00221.992006-11-30 00293.452008-09-10 0019.992012-05-17
Create the following tab-separated file and name it
accounts.txt
:001John AllenStandard2012-03-15 002Abigail SmithPremium2004-07-13 003April StevensStandard2010-12-20 004Nasser HafezPremium2001-04-23
Copy the datafiles onto HDFS.
$ hadoop fs -mkdir sales $ hadoop fs -put sales.txt sales/sales.txt $ hadoop fs -mkdir accounts $ hadoop fs -put accounts/accounts.txt
Create the following file and name it
ReduceJoin.java
:import java.io.* ; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce...