Packt+ | Advance your knowledge in tech

You're reading from Mastering Clojure Data Analysis If you'd like to apply your Clojure skills to performing data analysis, this is the book for you. The example based approach aids fast learning and covers basic to advanced topics. Get deeper into your data.

Product type Paperback

Published in May 2014

Publisher

ISBN-13 9781783284139

Length 340 pages

Edition Edition

Languages

Clojure

Concepts

Data Analysis

Author (1):

Eric Richard Rochester

View More author details

Table of Contents (17) Chapters

Mastering Clojure Data Analysis

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

1. Network Analysis – The Six Degrees of Kevin Bacon FREE CHAPTER

2. GIS Analysis – Mapping Climate Change

3. Topic Modeling – Changing Concerns in the State of the Union Addresses

4. Classifying UFO Sightings

5. Benford's Law – Detecting Natural Progressions of Numbers

6. Sentiment Analysis – Categorizing Hotel Reviews

7. Null Hypothesis Tests – Analyzing Crime Data

8. A/B Testing – Statistical Experiments for the Web

9. Analyzing Social Data Participation

10. Modeling Stock Data

Index

Preparing the data

For this experiment, I've randomly selected 500 hotel reviews and classified them manually. A better option might be to use Amazon's Mechanical Turk (https://www.mturk.com/mturk/) to get more reviews classified than any one person might be able to do easily. Really, a few hundred is about the minimum that we'd like to use as both the training and test sets need to come from this. I made sure that the sample contained an equal number of positive and negative reviews. (You can find the sample in the data directory of the code download.)

The data files are tab-separated values (TSV). After being manually classified, each line had four fields: the classification as a + or - sign, the date of the review, the title of the review, and the review itself. Some of the reviews are quite long.

After tagging the files, we'll take those files and create feature vectors from the vocabulary of the title and create a review for each one. For this chapter, we'll see what works best: unigrams...

The rest of the chapter is locked

You're reading from Mastering Clojure Data Analysis If you'd like to apply your Clojure skills to performing data analysis, this is the book for you. The example based approach aids fast learning and covers basic to advanced topics. Get deeper into your data.

Table of Contents (17) Chapters

Preparing the data

Authors (1)

Personalised recommendations for you

You're reading from Mastering Clojure Data Analysis If you'd like to apply your Clojure skills to performing data analysis, this is the book for you. The example based approach aids fast learning and covers basic to advanced topics. Get deeper into your data.

Table of Contents (17) Chapters

Preparing the data

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you