Packt+ | Advance your knowledge in tech

You're reading from Mastering Clojure Data Analysis If you'd like to apply your Clojure skills to performing data analysis, this is the book for you. The example based approach aids fast learning and covers basic to advanced topics. Get deeper into your data.

Product type Paperback

Published in May 2014

Publisher

ISBN-13 9781783284139

Length 340 pages

Edition Edition

Languages

Clojure

Concepts

Data Analysis

Author (1):

Eric Richard Rochester

View More author details

Table of Contents (17) Chapters

Mastering Clojure Data Analysis

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

1. Network Analysis – The Six Degrees of Kevin Bacon FREE CHAPTER

2. GIS Analysis – Mapping Climate Change

3. Topic Modeling – Changing Concerns in the State of the Union Addresses

4. Classifying UFO Sightings

5. Benford's Law – Detecting Natural Progressions of Numbers

6. Sentiment Analysis – Categorizing Hotel Reviews

7. Null Hypothesis Tests – Analyzing Crime Data

8. A/B Testing – Statistical Experiments for the Web

9. Analyzing Social Data Participation

10. Modeling Stock Data

Index

Analyzing the text

Our goal for analyzing the news articles is to generate a vector space model of the collection of documents. This attempts to pull the salient features for the documents into a vector of floating-point numbers. Features can be words or information from the documents' metadata encoded for the vector. The feature values can be 0 or 1 for presence, an integer for raw frequency, or the frequency scaled in some form.

In our case, we'll use the feature vector to represent a selection of the tokens in a document. Often, we can use all the tokens, or all the tokens that occur more than once or twice. However, in this case, we don't have a lot of data, so we'll need to be more selective in the features that we include. We'll consider how we select these in a few sections.

For the feature values, we'll use a scaled version of the token frequency called term frequency-inverse document frequency (tf-idf). There are good libraries for this, but this is a basic metric in working with...

The rest of the chapter is locked

You're reading from Mastering Clojure Data Analysis If you'd like to apply your Clojure skills to performing data analysis, this is the book for you. The example based approach aids fast learning and covers basic to advanced topics. Get deeper into your data.

Table of Contents (17) Chapters

Analyzing the text

Authors (1)

Personalised recommendations for you

You're reading from Mastering Clojure Data Analysis If you'd like to apply your Clojure skills to performing data analysis, this is the book for you. The example based approach aids fast learning and covers basic to advanced topics. Get deeper into your data.

Table of Contents (17) Chapters

Analyzing the text

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you