Packt+ | Advance your knowledge in tech

You're reading from Mastering Clojure Data Analysis If you'd like to apply your Clojure skills to performing data analysis, this is the book for you. The example based approach aids fast learning and covers basic to advanced topics. Get deeper into your data.

Product type Paperback

Published in May 2014

Publisher

ISBN-13 9781783284139

Length 340 pages

Edition Edition

Languages

Clojure

Concepts

Data Analysis

Author (1):

Eric Richard Rochester

View More author details

Table of Contents (17) Chapters

Mastering Clojure Data Analysis

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

1. Network Analysis – The Six Degrees of Kevin Bacon FREE CHAPTER

2. GIS Analysis – Mapping Climate Change

3. Topic Modeling – Changing Concerns in the State of the Union Addresses

4. Classifying UFO Sightings

5. Benford's Law – Detecting Natural Progressions of Numbers

6. Sentiment Analysis – Categorizing Hotel Reviews

7. Null Hypothesis Tests – Analyzing Crime Data

8. A/B Testing – Statistical Experiments for the Web

9. Analyzing Social Data Participation

10. Modeling Stock Data

Index

Getting the data

A couple of small datasets of the Facebook network data are available on the Internet. None of them are particularly large or complete, but they do give us a reasonable snapshot of part of Facebook's network. As the Facebook graph is a private data source, this partial view is probably the best that we can hope for.

We'll get the data from the Stanford Large Network Dataset Collection (http://snap.stanford.edu/data/). This contains a number of network datasets, from Facebook and Twitter, to road networks and citation networks. To do this, we'll download the facebook.tar.gz file from http://snap.stanford.edu/data/egonets-Facebook.html. Once it's on your computer, you can extract it. When I put it into the folder with my source code, it created a directory named facebook.

The directory contains 10 sets of files. Each group is based on one primary vertex (user), and each contains five files. For vertex 0, these files would be as follows:

0.edges: This contains the vertices that the primary one links to.
0.circles: This contains the groupings that the user has created for his or her friends.
0.feat: This contains the features of the vertices that the user is adjacent to and ones that are listed in 0.edges.
0.egofeat: This contains the primary user's features.
0.featnames: This contains the names of the features described in 0.feat and 0.egofeat. For Facebook, these values have been anonymized.

For these purposes, we'll just use the *.edges files.

Now let's turn our attention to the data in the files and what they represent.

You're reading from Mastering Clojure Data Analysis If you'd like to apply your Clojure skills to performing data analysis, this is the book for you. The example based approach aids fast learning and covers basic to advanced topics. Get deeper into your data.

Table of Contents (17) Chapters

Getting the data

Authors (1)

Personalised recommendations for you