Parsing CSV files with Cascalog
In the previous recipe, the file we read was a CSV file, but we read it line by line. That's not optimal. Cascading provides a number of taps—sources of data or sinks to send data to—including one for CSV and other delimited data formats. Also, Cascalog has some good wrappers for several of these taps, but not for the CSV one.
In truth, creating a wrapper that exposes all the functionality of the delimited text format tap will be complex. There are options for delimiter characters, quote characters, including a header row, the types of columns, and other things. That's a lot of options, and dispatching to the right method can be tricky.
We won't worry about how to handle all the options right here. For this recipe, we will create a simple wrapper around the delimited text file tap that includes some of the more common options to read CSV files.
Getting ready
First, we'll need to use some of the same dependencies as the ones we've been using as well as some new...