Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering Clojure Data Analysis

You're reading from   Mastering Clojure Data Analysis If you'd like to apply your Clojure skills to performing data analysis, this is the book for you. The example based approach aids fast learning and covers basic to advanced topics. Get deeper into your data.

Arrow left icon
Product type Paperback
Published in May 2014
Publisher
ISBN-13 9781783284139
Length 340 pages
Edition Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Eric Richard Rochester Eric Richard Rochester
Author Profile Icon Eric Richard Rochester
Eric Richard Rochester
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Mastering Clojure Data Analysis
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
1. Network Analysis – The Six Degrees of Kevin Bacon FREE CHAPTER 2. GIS Analysis – Mapping Climate Change 3. Topic Modeling – Changing Concerns in the State of the Union Addresses 4. Classifying UFO Sightings 5. Benford's Law – Detecting Natural Progressions of Numbers 6. Sentiment Analysis – Categorizing Hotel Reviews 7. Null Hypothesis Tests – Analyzing Crime Data 8. A/B Testing – Statistical Experiments for the Web 9. Analyzing Social Data Participation 10. Modeling Stock Data Index

Getting the data


To get a copy of the SOTU addresses, we'll visit the website for the American Presidency Project at the University of California, Santa Barbara (http://www.presidency.ucsb.edu/). This site has the text for the SOTU addresses as well as an archive of many messages, letters, public papers, and other documents for various presidents. It's a great resource for looking at political rhetoric.

In this case, we'll write some code to visit the index page for the SOTU addresses. From there, we'll visit each of the pages that contain an address; remove the menus, headers, and footers; and strip out the HTML. We'll save this in a file in the data directory.

We won't see all of the code for this. To see the rest, look at the download.clj file in the src/tm_sotu/ directory in the downloaded code.

To handle downloading and parsing the files, we'll use the Enlive library (https://github.com/cgrand/enlive/wiki). This library provides a DSL to navigate and pull data from HTML pages. The syntax...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime