Downloading the code and data
This chapter makes use of data on individual income by the zip code provided by the U.S. Internal Revenue Service (IRS). The data contains selected income and tax items classified by state, zip code, and income classes.
It's 100 MB in size and can be downloaded from http://www.irs.gov/pub/irs-soi/12zpallagi.csv to the example code's data directory. Since the file contains the IRS Statistics of Income (SoI), we've renamed the file to soi.csv
for the examples.
Note
The example code for this chapter is available from the Packt Publishing's website or https://github.com/clojuredatascience/ch5-big-data.
As usual, a script has been provided to download and rename the data for you. It can be run on the command line from within the project directory with:
script/download-data.sh
If you run this, the file will be downloaded and renamed automatically.
Inspecting the data
Once you've downloaded the data, take a look at the column headings in the first...