Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Clojure Data Analysis Cookbook - Second Edition

You're reading from   Clojure Data Analysis Cookbook - Second Edition Dive into data analysis with Clojure through over 100 practical recipes for every stage of the analysis and collection process

Arrow left icon
Product type Paperback
Published in Jan 2015
Publisher
ISBN-13 9781784390297
Length 372 pages
Edition 2nd Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Eric Richard Rochester Eric Richard Rochester
Author Profile Icon Eric Richard Rochester
Eric Richard Rochester
Arrow right icon
View More author details
Toc

Table of Contents (14) Chapters Close

Preface 1. Importing Data for Analysis 2. Cleaning and Validating Data FREE CHAPTER 3. Managing Complexity with Concurrent Programming 4. Improving Performance with Parallel Programming 5. Distributed Data Processing with Cascalog 6. Working with Incanter Datasets 7. Statistical Data Analysis with Incanter 8. Working with Mathematica and R 9. Clustering, Classifying, and Working with Weka 10. Working with Unstructured and Textual Data 11. Graphing in Incanter 12. Creating Charts for the Web Index

Reading CSV data into Incanter datasets

One of the simplest data formats is comma-separated values (CSV), and you'll find that it's everywhere. Excel reads and writes CSV directly, as do most databases. Also, because it's really just plain text, it's easy to generate CSV files or to access them from any programming language.

Getting ready

First, let's make sure that we have the correct libraries loaded. Here's how the project Leiningen (https://github.com/technomancy/leiningen) project.clj file should look (although you might be able to use more up-to-date versions of the dependencies):

(defproject getting-data "0.1.0-SNAPSHOT"
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [incanter "1.5.5"]])

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Also, in your REPL or your file, include these lines:

(use 'incanter.core
     'incanter.io)

Finally, downloaded a list of rest area locations from POI Factory at http://www.poi-factory.com/node/6643. The data is in a file named data/RestAreasCombined(Ver.BN).csv. The version designation might be different though, as the file is updated. You'll also need to register on the site in order to download the data. The file contains this data, which is the location and description of the rest stops along the highway:

-67.834062,46.141129,"REST AREA-FOLLOW SIGNS SB I-95 MM305","RR, PT, Pets, HF"
-67.845906,46.138084,"REST AREA-FOLLOW SIGNS NB I-95 MM305","RR, PT, Pets, HF"
-68.498471,45.659781,"TURNOUT NB I-95 MM249","Scenic Vista-NO FACILITIES"
-68.534061,45.598464,"REST AREA SB I-95 MM240","RR, PT, Pets, HF"

In the project directory, we have to create a subdirectory named data and place the file in this subdirectory.

I also created a copy of this file with a row listing the names of the columns and named it RestAreasCombined(Ver.BN)-headers.csv.

How to do it…

  1. Now, use the incanter.io/read-dataset function in your REPL:
    user=> (read-dataset "data/RestAreasCombined(Ver.BJ).csv")
    
    |      :col0 |     :col1 |                                :col2 |                      :col3 |
    |------------+-----------+--------------------------------------+----------------------------|
    | -67.834062 | 46.141129 | REST AREA-FOLLOW SIGNS SB I-95 MM305 |           RR, PT, Pets, HF |
    | -67.845906 | 46.138084 | REST AREA-FOLLOW SIGNS NB I-95 MM305 |           RR, PT, Pets, HF |
    | -68.498471 | 45.659781 |                TURNOUT NB I-95 MM249 | Scenic Vista-NO FACILITIES |
    | -68.534061 | 45.598464 |              REST AREA SB I-95 MM240 |           RR, PT, Pets, HF |
    | -68.539034 | 45.594001 |              REST AREA NB I-95 MM240 |           RR, PT, Pets, HF |
    …
  2. If we have a header row in the CSV file, then we include :header true in the call to read-dataset:
    user=> (read-dataset "data/RestAreasCombined(Ver.BJ)-headers.csv" :header true)
    
    | :longitude | :latitude |                                :name |                     :codes |
    |------------+-----------+--------------------------------------+----------------------------|
    | -67.834062 | 46.141129 | REST AREA-FOLLOW SIGNS SB I-95 MM305 |           RR, PT, Pets, HF |
    | -67.845906 | 46.138084 | REST AREA-FOLLOW SIGNS NB I-95 MM305 |           RR, PT, Pets, HF |
    | -68.498471 | 45.659781 |                TURNOUT NB I-95 MM249 | Scenic Vista-NO FACILITIES |
    | -68.534061 | 45.598464 |              REST AREA SB I-95 MM240 |           RR, PT, Pets, HF |
    | -68.539034 | 45.594001 |              REST AREA NB I-95 MM240 |           RR, PT, Pets, HF |
    …

How it works…

Together, Clojure and Incanter make a lot of common tasks easy, which is shown in the How to do it section of this recipe.

We've taken some external data, in this case from a CSV file, and loaded it into an Incanter dataset. In Incanter, a dataset is a table, similar to a sheet in a spreadsheet or a database table. Each column has one field of data, and each row has an observation of data. Some columns will contain string data (all of the columns in this example did), some will contain dates, and some will contain numeric data. Incanter tries to automatically detect when a column contains numeric data and coverts it to a Java int or double. Incanter takes away a lot of the effort involved with importing data.

There's more…

For more information about Incanter datasets, see Chapter 6, Working with Incanter Datasets.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime