Packt+ | Advance your knowledge in tech

You're reading from Practical Data Analysis For small businesses, analyzing the information contained in their data using open source technology could be game-changing. All you need is some basic programming and mathematical skills to do just that.

Product type Paperback

Published in Oct 2013

Publisher Packt

ISBN-13 9781783280995

Length 360 pages

Edition 1st Edition

Languages

Python

Tools

NLTK

Concepts

Data Analysis

Author (1):

Hector Cuesta

View More author details

Table of Contents (24) Chapters

Practical Data Analysis

Credits

Foreword

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Preface

1. Getting Started FREE CHAPTER

2. Working with Data

3. Data Visualization

4. Text Classification

5. Similarity-based Image Retrieval

6. Simulation of Stock Prices

7. Predicting Gold Prices

8. Working with Support Vector Machines

9. Modeling Infectious Disease with Cellular Automata

10. Working with Social Graphs

11. Sentiment Analysis of Twitter Data

12. Data Processing and Aggregation with MongoDB

13. Working with MapReduce

14. Online Data Analysis with IPython and Wakari

Setting Up the Infrastructure

Index

Data preparation

In Chapter 11, Sentiment Analysis of Twitter Data, we explored how to create a bag of words from the Tweets Sentiment140 dataset. In this chapter, we will complement the example by using MongoDB. First we will prepare and transform the dataset from CSV to a JSON format in order to add it into a MongoDB collection.

Tip

We can download the Sentiment140 training and test data from http://help.sentiment140.com/for-students.

We will download and open the test data, the columns represent sentiment, id, date, via, user, and text. The first five records will look like this:

4,1,Mon May 11 03:21:41 UTC 2009,kindle2,yamarama,@mikefish  Fair enough. But i have the Kindle2 and I think it's perfect  :)
4,2,Mon May 11 03:26:10 UTC 2009, jquery,dcostalis,Jquery is my new best friend.
4,3,Mon May 11 03:27:15 UTC 2009,twitter,PJ_King,Loves twitter
4,4,Mon May 11 03:29:20 UTC 2009,obama,mandanicole,how can you not love Obama? he makes jokes about himself.
4,5,Mon May 11 05:22:12 UTC 2009,lebron...