Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Pandas Cookbook
Pandas Cookbook

Pandas Cookbook: Recipes for Scientific Computing, Time Series Analysis and Data Visualization using Python

eBook
$43.99
Paperback
$54.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Table of content icon View table of contents Preview book icon Preview Book

Pandas Cookbook

Essential DataFrame Operations

In this chapter, we will cover the following topics:

  • Selecting multiple DataFrame columns
  • Selecting columns with methods
  • Ordering column names sensibly
  • Operating on the entire DataFrame
  • Chaining DataFrame methods together
  • Working with operators on a DataFrame
  • Comparing missing values
  • Transposing the direction of a DataFrame operation
  • Determining college campus diversity

Introduction

This chapter covers many fundamental operations of the DataFrame. Many of the recipes will be similar to those in Chapter 1, Pandas Foundations which primarily covered operations on a Series.

Selecting multiple DataFrame columns

Selecting a single column is accomplished by passing the desired column name as a string to the indexing operator of a DataFrame. This was covered in the Selecting a Series recipe in Chapter 1, Pandas Foundations. It is often necessary to focus on a subset of the current working dataset, which is accomplished by selecting multiple columns.

Getting ready

In this recipe, all the actor and director columns will be selected from the movie dataset.

How to do it...

  1. Read in the movie dataset, and pass in a list of the desired columns to the...

Selecting columns with methods

Although column selection is usually done directly with the indexing operator, there are some DataFrame methods that facilitate their selection in an alternative manner. select_dtypes and filter are two useful methods to do this.

Getting ready

You need to be familiar with all pandas data types and how to access them. The Understanding data types recipe in Chapter 1, Pandas Foundations, has a table with all pandas data types.

How it works...

  1. Read in the movie dataset, and use the title of the movie to label each row. Use the get_dtype_counts...

Ordering column names sensibly

One of the first tasks to consider after initially importing a dataset as a DataFrame is to analyze the order of the columns. This basic task is often overlooked but can make a big difference in how an analysis proceeds. Computers have no preference for column order and computations are not affected either. As human beings, we naturally view and read columns left to right, which directly impacts our interpretations of the data. Haphazard column arrangement is similar to haphazard clothes arrangement in a closet. It does no good to place suits next to shirts and pants on top of shorts. It's far easier to find and interpret information when column order is given consideration.

There are no standardized set of rules that dictate how columns should be organized within a dataset. However, it is good practice to develop a set of guidelines that you...

Operating on the entire DataFrame

In the Calling Series methods recipe in Chapter 1, Pandas Foundations, a variety of methods operated on a single column or Series of data. When these same methods are called from a DataFrame, they perform that operation for each column at once.

Getting ready

In this recipe, we explore a variety of the most common DataFrame attributes and methods with the movie dataset.

How to do it...

  1. Read in the movie dataset, and grab the basic descriptive attributes, shape, size, and ndim, along with running the len function:
>>> movie =...

Chaining DataFrame methods together

Whether you believe method chaining is a good practice or not, it is quite common to encounter it during data analysis with pandas. The Chaining Series methods together recipe in Chapter 1, Pandas Foundations, showcased several examples of chaining Series methods together. All the method chains in this chapter will begin from a DataFrame. One of the keys to method chaining is to know the exact object being returned during each step of the chain. In pandas, this will nearly always be a DataFrame, Series, or scalar value.

Getting ready

In this recipe, we count all the missing values in each column of the move dataset.

...

Working with operators on a DataFrame

A primer on operators was given in the Working with operators on a Series recipe from Chapter 1, Pandas Foundations, which will be helpful here. The Python arithmetic and comparison operators work directly on DataFrames, as they do on Series.

Getting ready

When a DataFrame operates directly with one of the arithmetic or comparison operators, each value of each column gets the operation applied to it. Typically, when an operator is used with a DataFrame, the columns are either all numeric or all object (usually strings). If the DataFrame does not contain homogeneous data, then the operation is likely to fail. Let's see an example of this failure with the college dataset, which contains...

Comparing missing values

Pandas uses the NumPy NaN (np.nan) object to represent a missing value. This is an unusual object, as it is not equal to itself. Even Python's None object evaluates as True when compared to itself:

>>> np.nan == np.nan
False
>>> None == None
True

All other comparisons against np.nan also return False, except not equal to:

>>> np.nan > 5
False
>>> 5 > np.nan
False
>>> np.nan != 5
True

Getting ready

Series and DataFrames use the equals operator, ==, to make element-by-element comparisons that return an object of the same size. This recipe shows you how to use the equals operator, which is very different from the equals method.

As in the previous recipe...

Transposing the direction of a DataFrame operation

Many DataFrame methods have an axis parameter. This important parameter controls the direction in which the operation takes place. Axis parameters can only be one of two values, either 0 or 1, and are aliased respectively as the strings index and columns.

Getting ready

Nearly all DataFrame methods default the axis parameter to 0/index. This recipe shows you how to invoke the same method, but with the direction of its operation transposed. To simplify the exercise, only the columns that reference the percentage race of each school from the college dataset will be used.

How to do...

Determining college campus diversity

Many articles are written every year on the different aspects and impacts of diversity on college campuses. Various organizations have developed metrics attempting to measure diversity. US News is a leader in providing rankings for many different categories of colleges, with diversity being one of them.

Their top 10 diverse colleges with Diversity Index are given as follows:

>> pd.read_csv('data/college_diversity.csv', index_col='School')

Getting ready

Our college dataset classifies race into nine different categories. When trying to quantify something without an obvious definition, such as diversity, it helps to start with something very simple. In this recipe...

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Use the power of pandas 0.20 to solve most complex scientific computing problems with ease
  • Leverage fast, robust data structures in pandas 0.20 to gain useful insights from your data
  • Practical, easy to implement recipes for quick solutions to common problems in data using pandas 0.20

Description

This book will provide you with unique, idiomatic, and fun recipes for both fundamental and advanced data manipulation tasks with pandas 0.20. Some recipes focus on achieving a deeper understanding of basic principles, or comparing and contrasting two similar operations. Other recipes will dive deep into a particular dataset, uncovering new and unexpected insights along the way. The pandas library is massive, and it's common for frequent users to be unaware of many of its more impressive features. The official pandas documentation, while thorough, does not contain many useful examples of how to piece together multiple commands like one would do during an actual analysis. This book guides you, as if you were looking over the shoulder of an expert, through practical situations that you are highly likely to encounter. Many advanced recipes combine several different features across the pandas 0.20 library to generate results.

Who is this book for?

This book is for data scientists, analysts and Python developers who wish to explore data analysis and scientific computing in a practical, hands-on manner. The recipes included in this book are suitable for both novice and advanced users, and contain helpful tips, tricks and caveats wherever necessary. Some understanding of pandas will be helpful, but not mandatory.

What you will learn

  • Master the fundamentals of pandas 0.20 to quickly begin exploring any dataset
  • Isolate any subset of data by properly selecting and querying the data
  • Split data into independent groups before applying aggregations and transformations to each group
  • Restructure data into tidy form to make data analysis and visualization easier
  • Prepare real-world messy datasets for machine learning
  • Combine and merge data from different sources through pandas SQL-like operations
  • Utilize pandas unparalleled time series functionality
  • Create beautiful and insightful visualizations through pandas 0.20 direct hooks to Matplotlib and Seaborn
Estimated delivery fee Deliver to South Korea

Standard delivery 10 - 13 business days

$12.95

Premium delivery 5 - 8 business days

$45.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Oct 23, 2017
Length: 532 pages
Edition : 1st
Language : English
ISBN-13 : 9781784393878
Category :
Languages :
Concepts :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Estimated delivery fee Deliver to South Korea

Standard delivery 10 - 13 business days

$12.95

Premium delivery 5 - 8 business days

$45.95
(Includes tracking information)

Product Details

Publication date : Oct 23, 2017
Length: 532 pages
Edition : 1st
Language : English
ISBN-13 : 9781784393878
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 153.97
Learning pandas
$54.99
Pandas Cookbook
$54.99
Python Machine Learning, Second Edition
$43.99
Total $ 153.97 Stars icon

Table of Contents

11 Chapters
Pandas Foundations Chevron down icon Chevron up icon
Essential DataFrame Operations Chevron down icon Chevron up icon
Beginning Data Analysis Chevron down icon Chevron up icon
Selecting Subsets of Data Chevron down icon Chevron up icon
Boolean Indexing Chevron down icon Chevron up icon
Index Alignment Chevron down icon Chevron up icon
Grouping for Aggregation, Filtration, and Transformation Chevron down icon Chevron up icon
Restructuring Data into a Tidy Form Chevron down icon Chevron up icon
Combining Pandas Objects Chevron down icon Chevron up icon
Time Series Analysis Chevron down icon Chevron up icon
Visualization with Matplotlib, Pandas, and Seaborn Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.3
(32 Ratings)
5 star 75%
4 star 3.1%
3 star 6.3%
2 star 9.4%
1 star 6.3%
Filter icon Filter
Top Reviews

Filter reviews by




Tomas Dec 12, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is an excellent book if you want to learn pandas and if you want to understand pandas. It covers all cases, clearly explains what and why pandas do, and the chapters are organized really well and it depends on you if you just want to stay on surface or go deeper.
Amazon Verified review Amazon
Scott B. Oct 26, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Ted Petrou in Pandas Cookbook gives real world problems easy to follow and implement recipes for data wrangling success. This books does an excellent job in highlighting the foundations of Pandas, critical operations in Pandas, how to use Pandas in data analysis, and data visualization with Matplotlib, Pandas and Seaborn libraries.Ted uses actual datasets in his examples. If you go to download supporting files with this book, you will get a copy of his Jupyter Notebooks for each chapter and over 30 real datasets for analysis.This is a must have book when I want to find actual solutions to problems with unruly data.
Amazon Verified review Amazon
Dimitri Shvorob Nov 12, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
A self-regarding preface, if I may. This is my second attempt at reviewing "Pandas Cookbook". The first one was written in a dyspeptic mood - thank you, Kindle Unlimited free trial, for exposing me to the horrors of self-published rip-offs - did not put its emphases right, and was in one regard simply misleading. This prompted criticism from the book's author, supported by detailed objections, and led me to reconsider. I know that there will be a Version 3, already in December, as I will want to compare "Pandas Cookbook" and Daniel Chen's "Pandas for Everyone", which I expect to be both similar and good. I will leave speculation at that, and focus on the present.Before "Pandas Cookbook", I had seen five books about Pandas:"Python for Data Analysis" by Wes McKinney, 2nd ed., 544 pages, 2017"Learning the Pandas Library" by Matt Harrison, 212 pages, 2016"Learning pandas" by Michael Heydt, 504 pages, Packt, 2015"Mastering pandas" by Femi Anthony, 364 pages, Packt, 2015"Python Data Analytics" by Fabio Nelli, 364 pages, Apress, 2015I can confidently say that (a) you don't need to consider books other than McKinney's and Petrou's, and (b) you want to see both, and possibly leave both, depending on your budget and personal preference.The one wrong suggestion in my original review was that PC was "far behind" PDA in terms of coverage. Having checked PDA, however, I realized that PDA did not have many things which I thought I learned from it, but in fact picked up from other sources - Pandas online doc, Stack Overflow, and, early on, Chris Albon's site. Surprisingly, the bread-and-butter "nunique" function, for example, is not in PDA, and neither is "filter" or "query". (I actually learned about "query" from "Pandas Cookbook"; my office Python environment predates Pandas 0.18.0). "Behind" is debatable - or moot: you have bits in one book, and not the other, either way you look - and "far" is false. The upshot is that you can get a good handle on Pandas with either reference.The word "reference" fits PDA better - it has a methodical, clearly structured, but somewhat terse style, reminding me of O'Reilly's "Nutshell" series. PC, on the other hand, is pretty relaxed, and goes at a slower pace, with illustrations that are much more likely to stay with you than McKinney's, because (a) they use real datasets, as opposed to quick artificial ones, (b) often are part of a sequence of steps, providing context and identifying the use case. Packt's no-frills typesetting puts PC at a disadvantage, but it is not too bad.Comparing "Pandas Cookbook" to what was available before, I see and appreciate the qualitative change from (a) reductive digests of McKinney's book, to (b) something that builds on, and complements, McKinney's book. If Chen's book does the same, the Pandas newbie will get even more options. For now, kudos to Ted Petrou for an original and useful book.
Amazon Verified review Amazon
Risiko Lektor May 27, 2019
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is certainly a great book. Like something we've been waiting for or at least I've been. It has clear structure and very knowledgeable content. Everything is clearly presented and above all it contains lots of stuff that is not presented elsewhere (at least to my knowledge). It's a little thick but that's alright. It is now my faithful companion on my desk to Pandas.
Amazon Verified review Amazon
Petter Dischington Dec 24, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is a comprehensive volume covering many aspects of the Pandas data science library for the Python programming language. It's not merely a reference work, but can also be used as a way to learn effective data analysis methodology as you are following the the instructions. The book finds that holy middle ground between being a complete beginners guide, a reference guide, and the specific guidance offered by a stack overflow answer.In my experience stack overflow can be too specific if you are unsure about what exactly you need to do to your data in order prepare it for further analysis, and that is really where this book shines. It explains the whys as well as the hows, and the author will give explanations of what are the most efficient ways to do so from a human workload/intuition perspective, as well as what is most computationally efficient, highlighting the preceding and following steps.Finally, the author himself is very accessible in case of problems, and there are a number of supporting materials to the book available.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela