Packt+ | Advance your knowledge in tech

You're reading from Practical Data Analysis For small businesses, analyzing the information contained in their data using open source technology could be game-changing. All you need is some basic programming and mathematical skills to do just that.

Product type Paperback

Published in Oct 2013

Publisher Packt

ISBN-13 9781783280995

Length 360 pages

Edition 1st Edition

Languages

Python

Tools

NLTK

Concepts

Data Analysis

Author (1):

Hector Cuesta

View More author details

Table of Contents (24) Chapters

Practical Data Analysis

Credits

Foreword

About the Author

Acknowledgments

About the Reviewers

www.PacktPub.com

Preface

1. Getting Started

2. Working with Data FREE CHAPTER

3. Data Visualization

4. Text Classification

5. Similarity-based Image Retrieval

6. Simulation of Stock Prices

7. Predicting Gold Prices

8. Working with Support Vector Machines

9. Modeling Infectious Disease with Cellular Automata

10. Working with Social Graphs

11. Sentiment Analysis of Twitter Data

12. Data Processing and Aggregation with MongoDB

13. Working with MapReduce

14. Online Data Analysis with IPython and Wakari

Setting Up the Infrastructure

Index

Processing the image dataset

The image set used in this chapter is the Caltech-256, obtained from the Computational Vision Lab at CALTECH. We can download the collection of all 30607 images and 256 categories from http://www.vision.caltech.edu/Image_Datasets/Caltech256/.

In order to implement the DTW, first we need to extract a time series (pixel sequences) from each image. The time series will have a length of 768 values adding the 256 values of each color in the RGB (Red, Green, and Blue) color model of each image. The following code implements the Image.open("Image.jpg") function and cast into an array, then simply add the three vectors of color in the list:

from PIL import Image
img = Image.open("Image.jpg")
arr = array(img)
list = []
for n in arr: list.append(n[0][0]) #R
for n in arr: list.append(n[0][1]) #G
for n in arr: list.append(n[0][2]) #B

Tip

Pillow is a PIL fork by Alex Clark, compatible with Python 2.x and 3.x. PIL is the Python Imaging Library by Fredrik Lundh. In this chapter...

The rest of the chapter is locked

You're reading from Practical Data Analysis For small businesses, analyzing the information contained in their data using open source technology could be game-changing. All you need is some basic programming and mathematical skills to do just that.

Table of Contents (24) Chapters

Processing the image dataset

Tip

Authors (1)

Personalised recommendations for you

You're reading from Practical Data Analysis For small businesses, analyzing the information contained in their data using open source technology could be game-changing. All you need is some basic programming and mathematical skills to do just that.

Table of Contents (24) Chapters

Processing the image dataset

Tip

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you