Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Python Data Visualization Cookbook (Second Edition)
Python Data Visualization Cookbook (Second Edition)

Python Data Visualization Cookbook (Second Edition): Visualize data using Python's most popular libraries

Arrow left icon
Profile Icon Igor Milovanovic Profile Icon Foures Profile Icon Giuseppe Vettigli
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Empty star icon 4 (6 Ratings)
Paperback Nov 2015 302 pages 1st Edition
eBook
$27.98 $39.99
Paperback
$48.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Igor Milovanovic Profile Icon Foures Profile Icon Giuseppe Vettigli
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Empty star icon 4 (6 Ratings)
Paperback Nov 2015 302 pages 1st Edition
eBook
$27.98 $39.99
Paperback
$48.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$27.98 $39.99
Paperback
$48.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Python Data Visualization Cookbook (Second Edition)

Chapter 2. Knowing Your Data

In this chapter, we'll cover the following topics:

  • Importing data from CSV
  • Importing data from Microsoft Excel files
  • Importing data from fixed-width data files
  • Importing data from tab-delimited files
  • Importing data from a JSON resource
  • Exporting data to JSON, CSV, and Excel
  • Importing and manipulating data with Pandas
  • Importing data from a database
  • Cleaning up data from outliers
  • Reading files in chunks
  • Reading streaming data sources
  • Importing image data into NumPy arrays
  • Generating controlled random datasets
  • Smoothing the noise in real-world data

Introduction

This chapter covers basics about importing and exporting data from various formats. We first introduce how to import data by just using only the capabilities of the Python standard library; then we introduce the powerful Pandas library which is becoming the de facto standard in data manipulation in Python. Also we've covered the ways of cleaning data such as normalizing values, adding missing data, live data inspection, and usage of some similar tricks to get data correctly prepared for visualization.

Importing data from CSV

In this recipe, we'll work with the most common file format that you will encounter in the wild world of data—CSV. It stands for Comma Separated Values, which almost explains all the formatting there is. (There is also a header part of the file, but those values are also comma separated.)

Python has a module called csv that supports reading and writing CSV files in various dialects. Dialects are important because there is no standard CSV, and different applications implement CSV in slightly different ways. A file's dialect is almost always recognizable by the first look into the file.

Getting ready

What we need for this recipe is the CSV file itself. We'll use sample CSV data that you can download from ch02-data.csv.

We assume that sample data files are in the same folder as the code reading them.

How to do it...

The following code example demonstrates how to import data from a CSV file. We will perform the following steps for this:

  1. Open the ch02-data...

Importing data from Microsoft Excel files

Although Microsoft Excel supports some charting, sometimes you need more flexible and powerful visualization and need to export data from existing spreadsheets into Python for further use.

A common approach to importing data from Excel files is to export data from Excel into CSV-formatted files and use the tools described in the previous recipe to import data using Python from the CSV file. This is a fairly easy process if we have one or two files (and have Microsoft Excel or OpenOffice.org installed), but if we are automating a data pipe for many files (as part of an ongoing data processing effort), we are not in a position to manually convert every Excel file into CSV. So, we need a way to read any Excel file.

Python has decent support for reading and writing Excel files through the project www.python-excel.org. This support is available in the form of different modules for reading and writing and is platform-independent; in other words, we don...

Importing data from fixed-width data files

Log files from events and time series data files are common sources for data visualizations. Sometimes, we can read them using CSV dialect for tab-separated data, but sometimes they are not separated by any specific character. Instead, fields are of fixed widths and we can infer the format to match and extract data.

One way to approach this is to read a file line by line and then use string manipulation functions to split a string into separate parts. This approach seems straightforward, and if performance is not an issue, it should be tried first.

If performance is more important or the file to parse is large (hundreds of megabytes), using the Python module struct (http://docs.python.org/library/struct.html) can speed us up as the module is implemented in C rather than in Python.

Getting ready

As the module struct is part of the Python Standard Library, we don't need to install any additional software to implement this recipe.

How to do it...

We...

Importing data from tab-delimited files

Another very common format of flat datafile is the tab-delimited file. This can also come from an Excel export but can be the output of some custom software we must get our input from.

The good thing is that usually this format can be read in almost the same way as CSV files as the Python module csv supports the so-called dialects that enable us to use the same principles to read variations of similar file formats, one of them being the tab- delimited format.

Getting ready

Now you're already able to read CSV files. If not, please refer to the Importing data from CSV recipe first.

How to do it...

We will reuse the code from the Importing data from CSV recipe, where all we need to change is the dialect we are using as shown in the following code:

import csv

filename = 'ch02-data.tab'

data = []
try:
    with open(filename) as f:
        reader = csv.reader(f, dialect=csv.excel_tab)
       header = reader.next()
       data = [row for row in...

Importing data from a JSON resource

This recipe will show us how we can read the JSON data format. Moreover, we'll be using a remote resource in this recipe. It will add a tiny level of complexity to the recipe, but it will also make it much more useful because in real life we will encounter more remote resources than local ones.

JavaScript Object Notation (JSON) is widely used as a platform-independent format to exchange data between systems or applications.

A resource, in this context, is anything we can read, be it a file or a URL endpoint (which can be the output of a remote process/program or just a remote static file). In short, we don't care who produced a resource and how they did it; we just need it to be in a known format like JSON.

Getting ready

In order to get started with this recipe, we need the requests module installed and importable (in PYTHONPATH) in our virtual environment. We have installed this module in Chapter 1, Preparing Your Working Environment.

We also...

Introduction


This chapter covers basics about importing and exporting data from various formats. We first introduce how to import data by just using only the capabilities of the Python standard library; then we introduce the powerful Pandas library which is becoming the de facto standard in data manipulation in Python. Also we've covered the ways of cleaning data such as normalizing values, adding missing data, live data inspection, and usage of some similar tricks to get data correctly prepared for visualization.

Importing data from CSV


In this recipe, we'll work with the most common file format that you will encounter in the wild world of data—CSV. It stands for Comma Separated Values, which almost explains all the formatting there is. (There is also a header part of the file, but those values are also comma separated.)

Python has a module called csv that supports reading and writing CSV files in various dialects. Dialects are important because there is no standard CSV, and different applications implement CSV in slightly different ways. A file's dialect is almost always recognizable by the first look into the file.

Getting ready

What we need for this recipe is the CSV file itself. We'll use sample CSV data that you can download from ch02-data.csv.

We assume that sample data files are in the same folder as the code reading them.

How to do it...

The following code example demonstrates how to import data from a CSV file. We will perform the following steps for this:

  1. Open the ch02-data.csv file for reading...

Importing data from Microsoft Excel files


Although Microsoft Excel supports some charting, sometimes you need more flexible and powerful visualization and need to export data from existing spreadsheets into Python for further use.

A common approach to importing data from Excel files is to export data from Excel into CSV-formatted files and use the tools described in the previous recipe to import data using Python from the CSV file. This is a fairly easy process if we have one or two files (and have Microsoft Excel or OpenOffice.org installed), but if we are automating a data pipe for many files (as part of an ongoing data processing effort), we are not in a position to manually convert every Excel file into CSV. So, we need a way to read any Excel file.

Python has decent support for reading and writing Excel files through the project www.python-excel.org. This support is available in the form of different modules for reading and writing and is platform-independent; in other words, we don't...

Importing data from fixed-width data files


Log files from events and time series data files are common sources for data visualizations. Sometimes, we can read them using CSV dialect for tab-separated data, but sometimes they are not separated by any specific character. Instead, fields are of fixed widths and we can infer the format to match and extract data.

One way to approach this is to read a file line by line and then use string manipulation functions to split a string into separate parts. This approach seems straightforward, and if performance is not an issue, it should be tried first.

If performance is more important or the file to parse is large (hundreds of megabytes), using the Python module struct (http://docs.python.org/library/struct.html) can speed us up as the module is implemented in C rather than in Python.

Getting ready

As the module struct is part of the Python Standard Library, we don't need to install any additional software to implement this recipe.

How to do it...

We will...

Importing data from tab-delimited files


Another very common format of flat datafile is the tab-delimited file. This can also come from an Excel export but can be the output of some custom software we must get our input from.

The good thing is that usually this format can be read in almost the same way as CSV files as the Python module csv supports the so-called dialects that enable us to use the same principles to read variations of similar file formats, one of them being the tab- delimited format.

Getting ready

Now you're already able to read CSV files. If not, please refer to the Importing data from CSV recipe first.

How to do it...

We will reuse the code from the Importing data from CSV recipe, where all we need to change is the dialect we are using as shown in the following code:

import csv

filename = 'ch02-data.tab'

data = []
try:
    with open(filename) as f:
        reader = csv.reader(f, dialect=csv.excel_tab)
       header = reader.next()
       data = [row for row in reader]
except...

Importing data from a JSON resource


This recipe will show us how we can read the JSON data format. Moreover, we'll be using a remote resource in this recipe. It will add a tiny level of complexity to the recipe, but it will also make it much more useful because in real life we will encounter more remote resources than local ones.

JavaScript Object Notation (JSON) is widely used as a platform-independent format to exchange data between systems or applications.

A resource, in this context, is anything we can read, be it a file or a URL endpoint (which can be the output of a remote process/program or just a remote static file). In short, we don't care who produced a resource and how they did it; we just need it to be in a known format like JSON.

Getting ready

In order to get started with this recipe, we need the requests module installed and importable (in PYTHONPATH) in our virtual environment. We have installed this module in Chapter 1, Preparing Your Working Environment.

We also need Internet...

Exporting data to JSON, CSV, and Excel


While as producers of data visualization, we are mostly using other people's data, importing and reading data are our major activities. We do need to write or export data that we produced or processed, whether it is for our or others' current or future use.

We will demonstrate how to use the previously mentioned Python modules to import, export, and write data to various formats such as JSON, CSV, and XLSX.

For demonstration purposes, we are using the pregenerated dataset from the Importing data from fixed-width data files recipe.

Getting ready

For the Excel writing part, we will need to install the xlwt module (inside our virtual environment) by executing the following command:

$ pip install xlwt

How to do it...

We will present one code sample that contains all the formats that we want to demonstrate: CSV, JSON, and XLSX. The main part of the program accepts the input and calls appropriate functions to transform data. We will walk through separate sections...

Importing and manipulating data with Pandas


Until now we have seen how to import and export data using mostly the tools provided in the Python standard library. Now, we'll see how to do some of the operations shown above in just few lines using the Pandas library. Pandas is an open source, BSD-licensed library that simplifies the process of data import and manipulation thus providing data structures and parsing functions.

We will demonstrate how to import, manipulate and export data using Pandas.

Getting ready

To be able to use the code in this section, we need to install Pandas.This can be done again using pip as shown here:

pip install pandas

How to do it...

Here, we will import again the data ch2-data.csv, add a new column to the original data and export the result in csv, as shown in the following code snippet:

data = pd.read_csv('ch02-data.csv')
data['amount_x_2'] = data['amount']*2
data.to_csv('ch02-data_more.csv)

How it works...

First, we import Pandas in our environment and then we use...

Importing data from a database


Very often, our work on data analysis and visualization is at the consumer end of the data pipeline. We most often use the already produced data rather than producing the data ourselves. A modern application, for example, holds different datasets inside relational databases (or other databases like MongoDB), and we use these databases to produce beautiful graphs.

This recipe will show you how to use SQL drivers from Python to access data.

We will demonstrate this recipe using a SQLite database because it requires the least effort to set up, but the interface is similar to most other SQL-based database engines (MySQL and PostgreSQL). There are, however, differences in the SQL dialect that those database engines support. This example uses simple SQL language and should be reproducible on most common SQL database engines.

Getting ready

To be able to execute this recipe, we need to install the SQLite library as shown here:

$ sudo apt-get install sqlite3

Python support...

Left arrow icon Right arrow icon

Key benefits

  • Learn how to set up an optimal Python environment for data visualization
  • Understand how to import, clean and organize your data
  • Determine different approaches to data visualization and how to choose the most appropriate for your needs

Description

Python Data Visualization Cookbook will progress the reader from the point of installing and setting up a Python environment for data manipulation and visualization all the way to 3D animations using Python libraries. Readers will benefit from over 60 precise and reproducible recipes that will guide the reader towards a better understanding of data concepts and the building blocks for subsequent and sometimes more advanced concepts. Python Data Visualization Cookbook starts by showing how to set up matplotlib and the related libraries that are required for most parts of the book, before moving on to discuss some of the lesser-used diagrams and charts such as Gantt Charts or Sankey diagrams. Initially it uses simple plots and charts to more advanced ones, to make it easy to understand for readers. As the readers will go through the book, they will get to know about the 3D diagrams and animations. Maps are irreplaceable for displaying geo-spatial data, so this book will also show how to build them. In the last chapter, it includes explanation on how to incorporate matplotlib into different environments, such as a writing system, LaTeX, or how to create Gantt charts using Python.

Who is this book for?

If you already know about Python programming and want to understand data, data formats, data visualization, and how to use Python to visualize data then this book is for you.

What you will learn

  • Introduce yourself to the essential tooling to set up your working environment.
  • Explore your data using the capabilities of standard Python Data Library and Panda Library
  • Draw your first chart and customize it
  • Use the most popular data visualization Python libraries
  • Make 3D visualizations mainly using mplot3d
  • Create charts with images and maps
  • Understand the most appropriate charts to describe your data
  • Know the matplotlib hidden gems
  • Use plot.ly to share your visualization online

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Nov 30, 2015
Length: 302 pages
Edition : 1st
Language : English
ISBN-13 : 9781784396695
Category :
Languages :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Nov 30, 2015
Length: 302 pages
Edition : 1st
Language : English
ISBN-13 : 9781784396695
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 146.97
Python Machine Learning
$48.99
Learning Data Mining with Python
$48.99
Python Data Visualization Cookbook (Second Edition)
$48.99
Total $ 146.97 Stars icon

Table of Contents

10 Chapters
1. Preparing Your Working Environment Chevron down icon Chevron up icon
2. Knowing Your Data Chevron down icon Chevron up icon
3. Drawing Your First Plots and Customizing Them Chevron down icon Chevron up icon
4. More Plots and Customizations Chevron down icon Chevron up icon
5. Making 3D Visualizations Chevron down icon Chevron up icon
6. Plotting Charts with Images and Maps Chevron down icon Chevron up icon
7. Using the Right Plots to Understand Data Chevron down icon Chevron up icon
8. More on matplotlib Gems Chevron down icon Chevron up icon
9. Visualizations on the Clouds with Plot.ly Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
(6 Ratings)
5 star 66.7%
4 star 0%
3 star 0%
2 star 33.3%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Reader May 29, 2016
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Very clear recipes and explanations, everything I hoped it would be.
Amazon Verified review Amazon
Oleg Okun Jan 16, 2016
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The title of this book includes the word "cookbook" and as a cookbook the book contains a plenty of practical recipes of data visualization in Python. It presents not a mere description of Python packages and commands related to visualization, but embeds these tools into real-world scenarios. Not only visualization itself but also data manipulation enabling insightful visualization are discussed in detail. Needless to say, the discussion of every topic is accompanied by ready-to-use Python code.
Amazon Verified review Amazon
Amazon Customer Dec 07, 2015
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book helped me to understand how to visualize data with python anf find a good solution to implement own little datamart at home for home automation project.
Amazon Verified review Amazon
Amazon Customer Dec 31, 2015
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I received a free copy of this book in exchange for my review. I think this is a great book. The examples for the different plotting methods and customizations all worked. The first chapter describe set-up and code samples for using data in different formats. I remember when I was first given the task to add a chart to a report to represent data and how it took me a minute to ensure I was doing things correctly. This book would helped me a great deal at that time. Many questions I had previously about plotting and correctly coding solutions for charts I haven't been asked to make yet, were answered. I have been creating reports and charts for a University Research team and this book has been a godsend. I think this book would have helped me when I was working using java for reports and charts. I just this is a great book.
Amazon Verified review Amazon
Jonathan Jul 14, 2017
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
Please don"t get me wrong, the book is quite useful and it"s quite frankly more handy for me to look things up in a book than on the internet.But in essence, all the information is freely available on the internet and therefore the book is very, very expensive for a black-and-white handbook!
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.