Time for action – using the Distributed Cache to improve location output
Let's now use the Distributed Cache to share a list of U.S. state names and abbreviations across the cluster:
Create a datafile called
states.txt
on the local filesystem. It should have the state abbreviation and full name tab separated, one per line. Or retrieve the file from this book's homepage. The file should start like the following:AL Alabama AK Alaska AZ Arizona AR Arkansas CA California …
Place the file on HDFS:
$ hadoop fs -put states.txt states.txt
Copy the previous
UFOLocation.java
file to UFOLocation2.java file and make the changes by adding the following import statements:import java.io.* ; import java.net.* ; import java.util.* ; import org.apache.hadoop.fs.Path; import org.apache.hadoop.filecache.DistributedCache ;
Add the following line to the driver main method after the job name is set:
DistributedCache.addCacheFile(new URI ("/user/hadoop/states.txt"), conf) ;
Replace the
map
...