Creating the target List
Now our MapReduce program is ready to run on the Hadoop cluster. We are now going to prepare the input data from the customer master database of Furnitica. The customer master data contains many details that might not be very relevant for our MapReduce job.
A subset of fields available in the master data is as follows:
Customer ID
Date of birth
Income
Gender
Let us assume here that we will now make a selection of customers living in the city where we are going to send the campaign folders. This city is the target of the campaign. A single row in our selection is shown in Table 3:
Customer ID |
10023 |
Age (derived from date of birth) |
55 |
Income |
75000 |
Gender (derived from M/F, where 0 is male and 1 is female) |
0 |
Table 3 A selection from the customer master data
We want to send the folder number 1 to our target customers so we will add this information in our inputdata.csv
as well. The resulting input data file inputdata.csv
is as follows:
10023,25,75000,1,1
10024,55,25000...