Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Python Data Visualization Cookbook
Python Data Visualization Cookbook

Python Data Visualization Cookbook: As a developer with knowledge of Python you are already in a great position to start using data visualization. This superb cookbook shows you how in plain language and practical recipes, culminating with 3D animations.

eBook
€17.99 €25.99
Paperback
€32.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Python Data Visualization Cookbook

Chapter 2. Knowing Your Data

In this chapter we will cover the following recipes:

  • Importing data from CSV

  • Importing data from Microsoft Excel files

  • Importing data from fixed-width datafiles

  • Importing data from tab-delimited files

  • Importing data from a JSON resource

  • Exporting data to JSON, CSV, and Excel

  • Importing data from a database

  • Cleaning up data from outliers

  • Reading files in chunks

  • Reading streaming data sources

  • Importing image data into NumPy arrays

  • Generating controlled random datasets

  • Smoothing the noise in real-world data

Introduction


This chapter covers basics about importing and exporting data from various formats. Also covered are ways of cleaning data, such as normalizing values, adding missing data, live data inspection, and usage of some similar tricks to get data correctly prepared for visualization.

Importing data from CSV


In this recipe we will work with the most common file format that one will encounter in the wild world of data, CSV. It stands for Comma Separated Values, which almost explains all the formatting there is. (There is also a header part of the file, but those values are also comma separated.)

Python has a module called csv that supports reading and writing CSV files in various dialects. Dialects are important because there is no standard CSV and different applications implement CSV in slightly different ways. A file's dialect is almost always recognizable by the first look into the file.

Getting ready

What we need for this recipe is the CSV file itself. We will use sample CSV data that you can download from ch02-data.csv.

We assume that sample datafiles is in the same folder as the code reading it.

How to do it...

The following code example demonstrates how to import data from a CSV file. We will:

  1. Open the ch02-data.csv file for reading.

  2. Read the header first.

  3. Read the rest...

Importing data from Microsoft Excel files


Although Microsoft Excel supports some charting, sometimes you need more flexible and powerful visualization and need to export data from existing spreadsheets into Python for further use.

A common approach to importing data from Excel files is to export data from Excel into CSV-formatted files and use the tools described in the previous recipe to import data using Python from the CSV file. This is a fairly easy process if we have one or two files (and have Microsoft Excel or OpenOffice.org installed), but if we are automating a data pipe for many files (as part of an ongoing data processing effort), we are not in a position to manually convert every Excel file into CSV. So, we need a way to read any Excel file.

Python has decent support for reading and writing Excel files through the project www.python-excel.org. This support is available in the form of different modules for reading and writing, and is platform independent; in other words, we don...

Importing data from fixed-width datafiles


Logfiles from events and time series datafiles are common sources for data visualizations. Sometimes, we can read them using CSV dialect for tab-separated data, but sometimes they are not separated by any specific character. Instead, fields are of fixed widths and we can infer the format to match and extract data.

One way to approach this is to read a file line by line and then use string manipulation functions to split a string into separate parts. This approach seems straightforward, and if performance is not an issue, should be tried first.

If performance is more important or the file to parse is large (hundreds of megabytes), using the Python module struct (http://docs.python.org/library/struct.html) can speed us up as the module is implemented in C rather than in Python.

Getting ready

As the module struct is part of the Python Standard Library, we don't need to install any additional software to implement this recipe.

How to do it...

We will use...

Importing data from tab-delimited files


Another very common format of flat datafile is the tab-delimited file. This can also come from an Excel export but can be the output of some custom software we must get our input from.

The good thing is that usually this format can be read in almost the same way as CSV files, as the Python module csv supports so-called dialects that enable us to use the same principles to read variations of similar file formats—one of them being the tab delimited format.

Getting ready

We are already able to read CSV files. If not, please refer the Importing data from CSV recipe first.

How to do it...

We will re-use the code from the Importing data from CSV recipe, where all we need to change is the dialect we are using.

import csv

filename = 'ch02-data.tab'

data = []
try:
    with open(filename) as f:
        reader = csv.reader(f, dialect=csv.excel_tab)
    header = reader.next()
       data = [row for row in reader]
except csv.Error as e:
    print "Error reading CSV...

Importing data from a JSON resource


This recipe will show us how we can read the JSON data format. Moreover, we will be using a remote resource in this recipe. It will add a tiny level of complexity to the recipe, but it will make it much more useful because, in real life, we will encounter more remote resources than local.

JavaScript Object Notation (JSON) is widely used as a platform-independent format to exchange data between systems or applications.

A resource, in this context, is anything we can read, be it a file or a URL endpoint (which can be the output of a remote process/program or just a remote static file). In short, we don't care who produced a resource and how; we just need it to be in a known format, such as JSON.

Getting ready

In order to get started with this recipe, we need the requests module installed and importable (in PYTHONPATH) in our virtual environment. We have installed this module in Chapter 1, Preparing Your Working Environment.

We also need Internet connectivity...

Exporting data to JSON, CSV, and Excel


While, as producers of data visualization, we are mostly using other people's data; importing and reading data are major activities. We do need to write or export data that we produced or processed, whether it is for our or others' current or future use.

We will demonstrate how to use the previously mentioned Python modules to import, export, and write data to various formats such as JSON, CSV, and XLSX.

For demonstration purposes, we are using the pregenerated dataset from the Importing data from fixed-width datafiles recipe.

Getting ready

For the Excel writing part, we will need to install the xlwt module (inside our virtual environment) by executing the following command:

$ pip install xlwt

How to do it...

We will present one code sample that contains all the formats that we want to demonstrate: CSV, JSON, and XLSX. The main part of the program accepts the input and calls appropriate functions to transform data. We will walk through separate sections...

Importing data from a database


Very often, our work on data analysis and visualization is at the consumer end of the data pipeline. We most often use the already produced data, rather than producing the data ourselves. A modern application, for example, holds different datasets inside relational databases (or other databases), and we use these databases and produce beautiful graphs.

This recipe will show you how to use SQL drivers from Python to access data.

We will demonstrate this recipe using a SQLite database because it requires the least effort to setup, but the interface is similar to most other SQL-based database engines (MySQL and PostgreSQL). There are, however, differences in the SQL dialect that those database engines support. This example uses simple SQL language and should be reproducible on most common SQL database engines.

Getting ready

To be able to execute this recipe, we need to install the SQLite library.

$ sudo apt-get install sqlite3

Python support for SQLite is there by...

Cleaning up data from outliers


This recipe describes how to deal with datasets coming from the real world and how to clean them before doing any visualization.

We will present a few techniques, different in essence but with the same goal, which is to get the data cleaned.

However, cleaning should not be fully automatic. We need to understand the data as given and be able to understand what the outliers are and what the data points represent before we apply any of the robust modern algorithms made to clean the data. This is not something that can be defined in a recipe because it relies on vast areas such as statistics, knowledge of the domain, and a good eye (and then some luck).

Getting ready

We will use the standard Python modules we already know about, so no additional installation is required.

In this recipe, I will introduce a new term, MAD. Median absolute deviation (MAD) in statistics represents a measure of the variability of a univariate (possessing one variable) sample of quantitative...

Reading files in chunks


Python is very good at handling reading and writing files or file-like objects. For example, if you try to load big files, say a few hundred MB, assuming you have a modern machine with at least 2 GB of RAM, Python will be able to handle it without any issue. It will not try to load everything at once, but play smart and load it as needed.

So even with decent file sizes, doing something as simple as the following code will work straight out of the box:

with open('/tmp/my_big_file', 'r') as bigfile:
    for line in bigfile:
        # line based operation, like 'print line'

But if we want to jump to a particular place in the file or do other nonsequential reading, we will need to use the handcrafted approach and use IO functions such as seek(), tell(), read(), and next() that allow enough flexibility for most users. Most of these functions are just bindings to C implementations (and are OS-specific), so they are fast, but their behavior can vary based on the OS we are running...

Reading streaming data sources


What if the data that is coming from the source is continuous? What if we need to read continuous data? This recipe will demonstrate a simple solution that will work for many common real-life scenarios, though it is not universal and you will need to modify it if you hit a special case in your application.

How to do it...

In this recipe, we will show you how to read an always-changing file and print the output. We will use the common Python module to accomplish this.

import time
import os
import sys

if len(sys.argv) != 2:
    print >> sys.stderr, "Please specify filename to read"

filename = sys.argv[1]

if not os.path.isfile(filename):
    print >> sys.stderr, "Given file: \"%s\" is not a file" % filename

with open(filename,'r') as f:
    # Move to the end of file
    filesize = os.stat(filename)[6]
    f.seek(filesize)

    # endlessly loop
    while True:
        where = f.tell()
        # try reading a line
        line = f.readline()
      ...

Importing image data into NumPy arrays


We are going to demonstrate how to do image processing using Python's libraries such as NumPy and SciPy.

In scientific computing, images are usually seen as n-dimensional arrays. They are usually two-dimensional arrays; in our examples, they are represented as a NumPy array data structure. Therefore, functions and operations performed on those structures are seen as matrix operations.

Images in this sense are not always two-dimensional. For medical or bio-sciences, images are data structures of higher dimensions, such as 3D (having the z axis as depth or as the time axis) or 4D (having three spatial dimensions and a temporal one as the fourth dimension). We will not be using those in this recipe.

We can import images using various techniques; they all depend on what you want to do with image. Also, it depends on the larger ecosystem of tools you are using and the platform you are running your project on.

In this recipe we will demonstrate several ways to...

Generating controlled random datasets


In this recipe, we will show different ways of generating random number sequences and word sequences. Some of the examples use standard Python modules, and some use NumPy/SciPy functions.

We will go into some statistics terminology, but we will explain every term so you don't have to have a statistical reference book with you while reading this recipe.

We generate artificial datasets using common Python modules. By doing so, we are able to understand distributions, variance, sampling, and similar statistical terminology. More importantly, we can use this fake data as a way to understand if our statistical method is capable of discovering models we want to discover. We can do that because we know the model in advance and verify our statistical method by applying it over our known data. In real life, we don't have that ability and there is always a percentage of uncertainty that we must assume, giving way to errors.

Getting ready

We don't need anything new...

Smoothing the noise in real-world data


In this recipe, we introduce a few advanced algorithms to help with cleaning the data coming from real-world sources. These algorithms are well known in the signal processing world, and we will not go deep into mathematics but will just exemplify how and why they work and for what purposes they can be used.

Getting ready

Data that comes from different real-life sensors usually is not smooth and clean and contains some noise that we usually don't want to show on diagrams and plots. We want graphs and plots to be clear and to display information and cost viewers minimal efforts to interpret.

We don't need any new software installed because we are going to use some already familiar Python packages: NumPy, SciPy, and matplotlib.

How to do it...

The basic algorithm is based on using the rolling window (for example, convolution). This window rolls over the data and is used to compute the average over that window.

For our discrete data, we use NumPy's convolve...

Left arrow icon Right arrow icon

Key benefits

  • Learn how to set up an optimal Python environment for data visualization
  • Understand the topics such as importing data for visualization and formatting data for visualization
  • Understand the underlying data and how to use the right visualizations

Description

Today, data visualization is a hot topic as a direct result of the vast amount of data created every second. Transforming that data into information is a complex task for data visualization professionals, who, at the same time, try to understand the data and objectively transfer that understanding to others. This book is a set of practical recipes that strive to help the reader get a firm grasp of the area of data visualization using Python and its popular visualization and data libraries. Python Data Visualization Cookbook will progress the reader from the point of installing and setting up a Python environment for data manipulation and visualization all the way to 3D animations using Python libraries. Readers will benefit from over 60 precise and reproducible recipes that guide the reader towards a better understanding of data concepts and the building blocks for subsequent and sometimes more advanced concepts. Python Data Visualization Cookbook starts by showing you how to set up matplotlib and the related libraries that are required for most parts of the book, before moving on to discuss some of the lesser-used diagrams and charts such as Gantt Charts or Sankey diagrams. During the book, we go from simple plots and charts to more advanced ones, thoroughly explaining why we used them and how not to use them. As we go through the book, we will also discuss 3D diagrams. We will peep into animations just to show you what it takes to go into that area. Maps are irreplaceable for displaying geo-spatial data, so we also show you how to build them. In the last chapter, we show you how to incorporate matplotlib into different environments, such as a writing system, LaTeX, or how to create Gantt charts using Python. This book will help those who already know how to program in Python to explore a new field – one of data visualization. As this book is all about recipes that explain how to do something, code samples are abundant, and they are followed by visual diagrams and charts to help you understand the logic and compare your own results with what is explained in the book.

Who is this book for?

Python Data Visualization Cookbook is for developers that already know about Python programming in general. If you have heard about data visualization but you don't know where to start, then this book will guide you from the start and help you understand data, data formats, data visualization, and how to use Python to visualize data. You will need to know some general programming concepts, and any kind of programming experience will be helpful, but the code in this book is explained almost line by line. You don't need maths for this book, every concept that is introduced is thoroughly explained in plain English, and references are available for further interest in the topic.

What you will learn

  • Install and use iPython
  • Use Python s virtual environments
  • Install and customize NumPy and matplotlib
  • Draw common and advanced plots
  • Visualize data using maps
  • Create 3D animated data visualizations
  • Import data from various formats
  • Export data from various formats

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Nov 25, 2013
Length: 280 pages
Edition : 1st
Language : English
ISBN-13 : 9781782163374
Category :
Languages :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Nov 25, 2013
Length: 280 pages
Edition : 1st
Language : English
ISBN-13 : 9781782163374
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 74.98
Building Machine Learning Systems with Python
€41.99
Python Data Visualization Cookbook
€32.99
Total 74.98 Stars icon

Table of Contents

8 Chapters
Preparing Your Working Environment Chevron down icon Chevron up icon
Knowing Your Data Chevron down icon Chevron up icon
Drawing Your First Plots and Customizing Them Chevron down icon Chevron up icon
More Plots and Customizations Chevron down icon Chevron up icon
Making 3D Visualizations Chevron down icon Chevron up icon
Plotting Charts with Images and Maps Chevron down icon Chevron up icon
Using Right Plots to Understand Data Chevron down icon Chevron up icon
More on matplotlib Gems Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
(9 Ratings)
5 star 66.7%
4 star 11.1%
3 star 0%
2 star 0%
1 star 22.2%
Filter icon Filter
Top Reviews

Filter reviews by




Amazon Customer Jan 29, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I use this book to better understand the mathplotlib which in my case helped me to replace math labIt was well documented and very nice written .A must have for python mid and advance programmers
Amazon Verified review Amazon
carl Mar 19, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I am an intermedium python developer. My past python experience is on system admin, DevOps, deployment and web management. Data visualization is a fairly new area to me. So this book is a perfect fit for me.Author uses lots of examples to demonstrate different visualization terminology, which really helps people to understand the abstract image processing technology. This book also shows you how to setup the virtual env to isolate development environment. Although the main purpose of this book is to teach how to visualize data, many of the example programs also show the best python development practice. Majority of the code is runnable without touch-up. Some typos are pretty easy to be spotted. I would recommend it to people who already have python experience and would like to extend their experience to data visualization area.
Amazon Verified review Amazon
David Jensen Jan 29, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The majority of software documentation is similar to a remark made by the developer of a well-known, difficult language; "Maybe you are not smart enough?". In contrast, this book has made sure that nothing is implied without being oversimplified. The book covers; installing and customizing libraries, reading in data, extensive information on 2D and 3D plots, using images and maps, determining the right plots for specified data types, and additional information for matplotlib. References are provided to other sources throughout the book.
Amazon Verified review Amazon
Jack Golding Feb 25, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Python Data Visualization Cookbook introduces the process of doing data visualisation with the Python programming language. The book uses the Scipy stack for data visualisation (however was published before the new Bokeh package was released) and introduces how to install the libraries in multiple operating systems which can be a task in itself for those unfamiliar with Python. The book covers the basics of data visualization and touches on exploratory data analysis, mostly in a scientific context. Given the size of the field of data visualization, it is unrealistic to expect that a book can introduce the semantics of a programming language as well as all of its applications. In conclusion this book is recommended to professionals who are interested in scientific data visualisation with a novice level understanding of both mathematics and programming.
Amazon Verified review Amazon
Bernie Ongewe Nov 29, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is a nice tour of modules and techniques for importing and scrubbing data from various sources (CSV, databases, Excel, etc), manipulating said data and presenting it in an intuitive manner. The author is generous with examples, which allows you to start right away.While this is not a rigorous tutorial, the author goes into exactly the right depth to allow you to make a decision on methodology and begin implementing right away.If, rather than becoming a NumPy scholar, you expect to have to deliver results from varied species of data, having this in your back pocket will help you accomplish that.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.