Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Pandas Cookbook
Pandas Cookbook

Pandas Cookbook: Recipes for Scientific Computing, Time Series Analysis and Data Visualization using Python

eBook
$29.99 $43.99
Paperback
$54.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Pandas Cookbook

Essential DataFrame Operations

In this chapter, we will cover the following topics:

  • Selecting multiple DataFrame columns
  • Selecting columns with methods
  • Ordering column names sensibly
  • Operating on the entire DataFrame
  • Chaining DataFrame methods together
  • Working with operators on a DataFrame
  • Comparing missing values
  • Transposing the direction of a DataFrame operation
  • Determining college campus diversity

Introduction

This chapter covers many fundamental operations of the DataFrame. Many of the recipes will be similar to those in Chapter 1, Pandas Foundations which primarily covered operations on a Series.

Selecting multiple DataFrame columns

Selecting a single column is accomplished by passing the desired column name as a string to the indexing operator of a DataFrame. This was covered in the Selecting a Series recipe in Chapter 1, Pandas Foundations. It is often necessary to focus on a subset of the current working dataset, which is accomplished by selecting multiple columns.

Getting ready

In this recipe, all the actor and director columns will be selected from the movie dataset.

How to do it...

  1. Read in the movie dataset, and pass in a list of the desired columns to the...

Selecting columns with methods

Although column selection is usually done directly with the indexing operator, there are some DataFrame methods that facilitate their selection in an alternative manner. select_dtypes and filter are two useful methods to do this.

Getting ready

You need to be familiar with all pandas data types and how to access them. The Understanding data types recipe in Chapter 1, Pandas Foundations, has a table with all pandas data types.

How it works...

  1. Read in the movie dataset, and use the title of the movie to label each row. Use the get_dtype_counts...

Ordering column names sensibly

One of the first tasks to consider after initially importing a dataset as a DataFrame is to analyze the order of the columns. This basic task is often overlooked but can make a big difference in how an analysis proceeds. Computers have no preference for column order and computations are not affected either. As human beings, we naturally view and read columns left to right, which directly impacts our interpretations of the data. Haphazard column arrangement is similar to haphazard clothes arrangement in a closet. It does no good to place suits next to shirts and pants on top of shorts. It's far easier to find and interpret information when column order is given consideration.

There are no standardized set of rules that dictate how columns should be organized within a dataset. However, it is good practice to develop a set of guidelines that you...

Operating on the entire DataFrame

In the Calling Series methods recipe in Chapter 1, Pandas Foundations, a variety of methods operated on a single column or Series of data. When these same methods are called from a DataFrame, they perform that operation for each column at once.

Getting ready

In this recipe, we explore a variety of the most common DataFrame attributes and methods with the movie dataset.

How to do it...

  1. Read in the movie dataset, and grab the basic descriptive attributes, shape, size, and ndim, along with running the len function:
>>> movie =...

Chaining DataFrame methods together

Whether you believe method chaining is a good practice or not, it is quite common to encounter it during data analysis with pandas. The Chaining Series methods together recipe in Chapter 1, Pandas Foundations, showcased several examples of chaining Series methods together. All the method chains in this chapter will begin from a DataFrame. One of the keys to method chaining is to know the exact object being returned during each step of the chain. In pandas, this will nearly always be a DataFrame, Series, or scalar value.

Getting ready

In this recipe, we count all the missing values in each column of the move dataset.

...

Working with operators on a DataFrame

A primer on operators was given in the Working with operators on a Series recipe from Chapter 1, Pandas Foundations, which will be helpful here. The Python arithmetic and comparison operators work directly on DataFrames, as they do on Series.

Getting ready

When a DataFrame operates directly with one of the arithmetic or comparison operators, each value of each column gets the operation applied to it. Typically, when an operator is used with a DataFrame, the columns are either all numeric or all object (usually strings). If the DataFrame does not contain homogeneous data, then the operation is likely to fail. Let's see an example of this failure with the college dataset, which contains...

Comparing missing values

Pandas uses the NumPy NaN (np.nan) object to represent a missing value. This is an unusual object, as it is not equal to itself. Even Python's None object evaluates as True when compared to itself:

>>> np.nan == np.nan
False
>>> None == None
True

All other comparisons against np.nan also return False, except not equal to:

>>> np.nan > 5
False
>>> 5 > np.nan
False
>>> np.nan != 5
True

Getting ready

Series and DataFrames use the equals operator, ==, to make element-by-element comparisons that return an object of the same size. This recipe shows you how to use the equals operator, which is very different from the equals method.

As in the previous recipe...

Transposing the direction of a DataFrame operation

Many DataFrame methods have an axis parameter. This important parameter controls the direction in which the operation takes place. Axis parameters can only be one of two values, either 0 or 1, and are aliased respectively as the strings index and columns.

Getting ready

Nearly all DataFrame methods default the axis parameter to 0/index. This recipe shows you how to invoke the same method, but with the direction of its operation transposed. To simplify the exercise, only the columns that reference the percentage race of each school from the college dataset will be used.

How to do...

Determining college campus diversity

Many articles are written every year on the different aspects and impacts of diversity on college campuses. Various organizations have developed metrics attempting to measure diversity. US News is a leader in providing rankings for many different categories of colleges, with diversity being one of them.

Their top 10 diverse colleges with Diversity Index are given as follows:

>> pd.read_csv('data/college_diversity.csv', index_col='School')

Getting ready

Our college dataset classifies race into nine different categories. When trying to quantify something without an obvious definition, such as diversity, it helps to start with something very simple. In this recipe...

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Use the power of pandas 0.20 to solve most complex scientific computing problems with ease
  • Leverage fast, robust data structures in pandas 0.20 to gain useful insights from your data
  • Practical, easy to implement recipes for quick solutions to common problems in data using pandas 0.20

Description

This book will provide you with unique, idiomatic, and fun recipes for both fundamental and advanced data manipulation tasks with pandas 0.20. Some recipes focus on achieving a deeper understanding of basic principles, or comparing and contrasting two similar operations. Other recipes will dive deep into a particular dataset, uncovering new and unexpected insights along the way. The pandas library is massive, and it's common for frequent users to be unaware of many of its more impressive features. The official pandas documentation, while thorough, does not contain many useful examples of how to piece together multiple commands like one would do during an actual analysis. This book guides you, as if you were looking over the shoulder of an expert, through practical situations that you are highly likely to encounter. Many advanced recipes combine several different features across the pandas 0.20 library to generate results.

Who is this book for?

This book is for data scientists, analysts and Python developers who wish to explore data analysis and scientific computing in a practical, hands-on manner. The recipes included in this book are suitable for both novice and advanced users, and contain helpful tips, tricks and caveats wherever necessary. Some understanding of pandas will be helpful, but not mandatory.

What you will learn

  • Master the fundamentals of pandas 0.20 to quickly begin exploring any dataset
  • Isolate any subset of data by properly selecting and querying the data
  • Split data into independent groups before applying aggregations and transformations to each group
  • Restructure data into tidy form to make data analysis and visualization easier
  • Prepare real-world messy datasets for machine learning
  • Combine and merge data from different sources through pandas SQL-like operations
  • Utilize pandas unparalleled time series functionality
  • Create beautiful and insightful visualizations through pandas 0.20 direct hooks to Matplotlib and Seaborn

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Oct 23, 2017
Length: 532 pages
Edition : 1st
Language : English
ISBN-13 : 9781784393342
Category :
Languages :
Concepts :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Oct 23, 2017
Length: 532 pages
Edition : 1st
Language : English
ISBN-13 : 9781784393342
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 153.97
Learning pandas
$54.99
Pandas Cookbook
$54.99
Python Machine Learning, Second Edition
$43.99
Total $ 153.97 Stars icon

Table of Contents

11 Chapters
Pandas Foundations Chevron down icon Chevron up icon
Essential DataFrame Operations Chevron down icon Chevron up icon
Beginning Data Analysis Chevron down icon Chevron up icon
Selecting Subsets of Data Chevron down icon Chevron up icon
Boolean Indexing Chevron down icon Chevron up icon
Index Alignment Chevron down icon Chevron up icon
Grouping for Aggregation, Filtration, and Transformation Chevron down icon Chevron up icon
Restructuring Data into a Tidy Form Chevron down icon Chevron up icon
Combining Pandas Objects Chevron down icon Chevron up icon
Time Series Analysis Chevron down icon Chevron up icon
Visualization with Matplotlib, Pandas, and Seaborn Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.3
(32 Ratings)
5 star 75%
4 star 3.1%
3 star 6.3%
2 star 9.4%
1 star 6.3%
Filter icon Filter
Top Reviews

Filter reviews by




Tomas Dec 12, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is an excellent book if you want to learn pandas and if you want to understand pandas. It covers all cases, clearly explains what and why pandas do, and the chapters are organized really well and it depends on you if you just want to stay on surface or go deeper.
Amazon Verified review Amazon
Scott B. Oct 26, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Ted Petrou in Pandas Cookbook gives real world problems easy to follow and implement recipes for data wrangling success. This books does an excellent job in highlighting the foundations of Pandas, critical operations in Pandas, how to use Pandas in data analysis, and data visualization with Matplotlib, Pandas and Seaborn libraries.Ted uses actual datasets in his examples. If you go to download supporting files with this book, you will get a copy of his Jupyter Notebooks for each chapter and over 30 real datasets for analysis.This is a must have book when I want to find actual solutions to problems with unruly data.
Amazon Verified review Amazon
Dimitri Shvorob Nov 12, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
A self-regarding preface, if I may. This is my second attempt at reviewing "Pandas Cookbook". The first one was written in a dyspeptic mood - thank you, Kindle Unlimited free trial, for exposing me to the horrors of self-published rip-offs - did not put its emphases right, and was in one regard simply misleading. This prompted criticism from the book's author, supported by detailed objections, and led me to reconsider. I know that there will be a Version 3, already in December, as I will want to compare "Pandas Cookbook" and Daniel Chen's "Pandas for Everyone", which I expect to be both similar and good. I will leave speculation at that, and focus on the present.Before "Pandas Cookbook", I had seen five books about Pandas:"Python for Data Analysis" by Wes McKinney, 2nd ed., 544 pages, 2017"Learning the Pandas Library" by Matt Harrison, 212 pages, 2016"Learning pandas" by Michael Heydt, 504 pages, Packt, 2015"Mastering pandas" by Femi Anthony, 364 pages, Packt, 2015"Python Data Analytics" by Fabio Nelli, 364 pages, Apress, 2015I can confidently say that (a) you don't need to consider books other than McKinney's and Petrou's, and (b) you want to see both, and possibly leave both, depending on your budget and personal preference.The one wrong suggestion in my original review was that PC was "far behind" PDA in terms of coverage. Having checked PDA, however, I realized that PDA did not have many things which I thought I learned from it, but in fact picked up from other sources - Pandas online doc, Stack Overflow, and, early on, Chris Albon's site. Surprisingly, the bread-and-butter "nunique" function, for example, is not in PDA, and neither is "filter" or "query". (I actually learned about "query" from "Pandas Cookbook"; my office Python environment predates Pandas 0.18.0). "Behind" is debatable - or moot: you have bits in one book, and not the other, either way you look - and "far" is false. The upshot is that you can get a good handle on Pandas with either reference.The word "reference" fits PDA better - it has a methodical, clearly structured, but somewhat terse style, reminding me of O'Reilly's "Nutshell" series. PC, on the other hand, is pretty relaxed, and goes at a slower pace, with illustrations that are much more likely to stay with you than McKinney's, because (a) they use real datasets, as opposed to quick artificial ones, (b) often are part of a sequence of steps, providing context and identifying the use case. Packt's no-frills typesetting puts PC at a disadvantage, but it is not too bad.Comparing "Pandas Cookbook" to what was available before, I see and appreciate the qualitative change from (a) reductive digests of McKinney's book, to (b) something that builds on, and complements, McKinney's book. If Chen's book does the same, the Pandas newbie will get even more options. For now, kudos to Ted Petrou for an original and useful book.
Amazon Verified review Amazon
Risiko Lektor May 27, 2019
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is certainly a great book. Like something we've been waiting for or at least I've been. It has clear structure and very knowledgeable content. Everything is clearly presented and above all it contains lots of stuff that is not presented elsewhere (at least to my knowledge). It's a little thick but that's alright. It is now my faithful companion on my desk to Pandas.
Amazon Verified review Amazon
Petter Dischington Dec 24, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is a comprehensive volume covering many aspects of the Pandas data science library for the Python programming language. It's not merely a reference work, but can also be used as a way to learn effective data analysis methodology as you are following the the instructions. The book finds that holy middle ground between being a complete beginners guide, a reference guide, and the specific guidance offered by a stack overflow answer.In my experience stack overflow can be too specific if you are unsure about what exactly you need to do to your data in order prepare it for further analysis, and that is really where this book shines. It explains the whys as well as the hows, and the author will give explanations of what are the most efficient ways to do so from a human workload/intuition perspective, as well as what is most computationally efficient, highlighting the preceding and following steps.Finally, the author himself is very accessible in case of problems, and there are a number of supporting materials to the book available.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.