Reading CSV data into Incanter datasets
One of the simplest data formats is comma-separated values (CSV), and you'll find that it's everywhere. Excel reads and writes CSV directly, as do most databases. Also, because it's really just plain text, it's easy to generate CSV files or to access them from any programming language.
Getting ready
First, let's make sure that we have the correct libraries loaded. Here's how the project Leiningen (https://github.com/technomancy/leiningen) project.clj
file should look (although you might be able to use more up-to-date versions of the dependencies):
(defproject getting-data "0.1.0-SNAPSHOT" :dependencies [[org.clojure/clojure "1.6.0"] [incanter "1.5.5"]])
Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Also, in your REPL or your file, include these lines:
(use 'incanter.core 'incanter.io)
Finally, downloaded a list of rest area locations from POI Factory at http://www.poi-factory.com/node/6643. The data is in a file named data/RestAreasCombined(Ver.BN).csv
. The version designation might be different though, as the file is updated. You'll also need to register on the site in order to download the data. The file contains this data, which is the location and description of the rest stops along the highway:
-67.834062,46.141129,"REST AREA-FOLLOW SIGNS SB I-95 MM305","RR, PT, Pets, HF" -67.845906,46.138084,"REST AREA-FOLLOW SIGNS NB I-95 MM305","RR, PT, Pets, HF" -68.498471,45.659781,"TURNOUT NB I-95 MM249","Scenic Vista-NO FACILITIES" -68.534061,45.598464,"REST AREA SB I-95 MM240","RR, PT, Pets, HF"
In the project directory, we have to create a subdirectory named data
and place the file in this subdirectory.
I also created a copy of this file with a row listing the names of the columns and named it RestAreasCombined(Ver.BN)-headers.csv
.
How to do it…
- Now, use the
incanter.io/read-dataset
function in your REPL:user=> (read-dataset "data/RestAreasCombined(Ver.BJ).csv") | :col0 | :col1 | :col2 | :col3 | |------------+-----------+--------------------------------------+----------------------------| | -67.834062 | 46.141129 | REST AREA-FOLLOW SIGNS SB I-95 MM305 | RR, PT, Pets, HF | | -67.845906 | 46.138084 | REST AREA-FOLLOW SIGNS NB I-95 MM305 | RR, PT, Pets, HF | | -68.498471 | 45.659781 | TURNOUT NB I-95 MM249 | Scenic Vista-NO FACILITIES | | -68.534061 | 45.598464 | REST AREA SB I-95 MM240 | RR, PT, Pets, HF | | -68.539034 | 45.594001 | REST AREA NB I-95 MM240 | RR, PT, Pets, HF | …
- If we have a header row in the CSV file, then we include
:header true
in the call toread-dataset
:user=> (read-dataset "data/RestAreasCombined(Ver.BJ)-headers.csv" :header true) | :longitude | :latitude | :name | :codes | |------------+-----------+--------------------------------------+----------------------------| | -67.834062 | 46.141129 | REST AREA-FOLLOW SIGNS SB I-95 MM305 | RR, PT, Pets, HF | | -67.845906 | 46.138084 | REST AREA-FOLLOW SIGNS NB I-95 MM305 | RR, PT, Pets, HF | | -68.498471 | 45.659781 | TURNOUT NB I-95 MM249 | Scenic Vista-NO FACILITIES | | -68.534061 | 45.598464 | REST AREA SB I-95 MM240 | RR, PT, Pets, HF | | -68.539034 | 45.594001 | REST AREA NB I-95 MM240 | RR, PT, Pets, HF | …
How it works…
Together, Clojure and Incanter make a lot of common tasks easy, which is shown in the How to do it section of this recipe.
We've taken some external data, in this case from a CSV file, and loaded it into an Incanter dataset. In Incanter, a dataset is a table, similar to a sheet in a spreadsheet or a database table. Each column has one field of data, and each row has an observation of data. Some columns will contain string data (all of the columns in this example did), some will contain dates, and some will contain numeric data. Incanter tries to automatically detect when a column contains numeric data and coverts it to a Java int
or double
. Incanter takes away a lot of the effort involved with importing data.
There's more…
For more information about Incanter datasets, see Chapter 6, Working with Incanter Datasets.