Time for action – generating shape summaries in MapReduce
In this section we will write a mapper that takes as input the UFO sighting record we defined earlier. It will output the shape and a count of 1
, and the reducer will take this shape and count records and produce a new structured Avro datafile type containing the final counts for each UFO shape. Perform the following steps:
Copy the
sightings.avro
file to HDFS.$ hadoopfs -mkdiravroin $ hadoopfs -put sightings.avroavroin/sightings.avro
Create the following as
AvroMR.java
:import java.io.IOException; import org.apache.avro.Schema; import org.apache.avro.generic.*; import org.apache.avro.Schema.Type; import org.apache.avro.mapred.*; import org.apache.avro.reflect.ReflectData; import org.apache.avro.util.Utf8; import org.apache.hadoop.conf.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapred.*; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.io.* ; import org.apache.hadoop.util.*; // Output record definition...