Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Pandas Cookbook
Pandas Cookbook

Pandas Cookbook: Recipes for Scientific Computing, Time Series Analysis and Data Visualization using Python

eBook
€22.99 €32.99
Paperback
€41.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Pandas Cookbook

Essential DataFrame Operations

In this chapter, we will cover the following topics:

  • Selecting multiple DataFrame columns
  • Selecting columns with methods
  • Ordering column names sensibly
  • Operating on the entire DataFrame
  • Chaining DataFrame methods together
  • Working with operators on a DataFrame
  • Comparing missing values
  • Transposing the direction of a DataFrame operation
  • Determining college campus diversity

Introduction

This chapter covers many fundamental operations of the DataFrame. Many of the recipes will be similar to those in Chapter 1, Pandas Foundations which primarily covered operations on a Series.

Selecting multiple DataFrame columns

Selecting a single column is accomplished by passing the desired column name as a string to the indexing operator of a DataFrame. This was covered in the Selecting a Series recipe in Chapter 1, Pandas Foundations. It is often necessary to focus on a subset of the current working dataset, which is accomplished by selecting multiple columns.

Getting ready

In this recipe, all the actor and director columns will be selected from the movie dataset.

How to do it...

  1. Read in the movie dataset, and pass in a list of the desired columns to the...

Selecting columns with methods

Although column selection is usually done directly with the indexing operator, there are some DataFrame methods that facilitate their selection in an alternative manner. select_dtypes and filter are two useful methods to do this.

Getting ready

You need to be familiar with all pandas data types and how to access them. The Understanding data types recipe in Chapter 1, Pandas Foundations, has a table with all pandas data types.

How it works...

  1. Read in the movie dataset, and use the title of the movie to label each row. Use the get_dtype_counts...

Ordering column names sensibly

One of the first tasks to consider after initially importing a dataset as a DataFrame is to analyze the order of the columns. This basic task is often overlooked but can make a big difference in how an analysis proceeds. Computers have no preference for column order and computations are not affected either. As human beings, we naturally view and read columns left to right, which directly impacts our interpretations of the data. Haphazard column arrangement is similar to haphazard clothes arrangement in a closet. It does no good to place suits next to shirts and pants on top of shorts. It's far easier to find and interpret information when column order is given consideration.

There are no standardized set of rules that dictate how columns should be organized within a dataset. However, it is good practice to develop a set of guidelines that you...

Operating on the entire DataFrame

In the Calling Series methods recipe in Chapter 1, Pandas Foundations, a variety of methods operated on a single column or Series of data. When these same methods are called from a DataFrame, they perform that operation for each column at once.

Getting ready

In this recipe, we explore a variety of the most common DataFrame attributes and methods with the movie dataset.

How to do it...

  1. Read in the movie dataset, and grab the basic descriptive attributes, shape, size, and ndim, along with running the len function:
>>> movie =...

Chaining DataFrame methods together

Whether you believe method chaining is a good practice or not, it is quite common to encounter it during data analysis with pandas. The Chaining Series methods together recipe in Chapter 1, Pandas Foundations, showcased several examples of chaining Series methods together. All the method chains in this chapter will begin from a DataFrame. One of the keys to method chaining is to know the exact object being returned during each step of the chain. In pandas, this will nearly always be a DataFrame, Series, or scalar value.

Getting ready

In this recipe, we count all the missing values in each column of the move dataset.

...

Working with operators on a DataFrame

A primer on operators was given in the Working with operators on a Series recipe from Chapter 1, Pandas Foundations, which will be helpful here. The Python arithmetic and comparison operators work directly on DataFrames, as they do on Series.

Getting ready

When a DataFrame operates directly with one of the arithmetic or comparison operators, each value of each column gets the operation applied to it. Typically, when an operator is used with a DataFrame, the columns are either all numeric or all object (usually strings). If the DataFrame does not contain homogeneous data, then the operation is likely to fail. Let's see an example of this failure with the college dataset, which contains...

Comparing missing values

Pandas uses the NumPy NaN (np.nan) object to represent a missing value. This is an unusual object, as it is not equal to itself. Even Python's None object evaluates as True when compared to itself:

>>> np.nan == np.nan
False
>>> None == None
True

All other comparisons against np.nan also return False, except not equal to:

>>> np.nan > 5
False
>>> 5 > np.nan
False
>>> np.nan != 5
True

Getting ready

Series and DataFrames use the equals operator, ==, to make element-by-element comparisons that return an object of the same size. This recipe shows you how to use the equals operator, which is very different from the equals method.

As in the previous recipe...

Transposing the direction of a DataFrame operation

Many DataFrame methods have an axis parameter. This important parameter controls the direction in which the operation takes place. Axis parameters can only be one of two values, either 0 or 1, and are aliased respectively as the strings index and columns.

Getting ready

Nearly all DataFrame methods default the axis parameter to 0/index. This recipe shows you how to invoke the same method, but with the direction of its operation transposed. To simplify the exercise, only the columns that reference the percentage race of each school from the college dataset will be used.

How to do...

Determining college campus diversity

Many articles are written every year on the different aspects and impacts of diversity on college campuses. Various organizations have developed metrics attempting to measure diversity. US News is a leader in providing rankings for many different categories of colleges, with diversity being one of them.

Their top 10 diverse colleges with Diversity Index are given as follows:

>> pd.read_csv('data/college_diversity.csv', index_col='School')

Getting ready

Our college dataset classifies race into nine different categories. When trying to quantify something without an obvious definition, such as diversity, it helps to start with something very simple. In this recipe...

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Use the power of pandas 0.20 to solve most complex scientific computing problems with ease
  • Leverage fast, robust data structures in pandas 0.20 to gain useful insights from your data
  • Practical, easy to implement recipes for quick solutions to common problems in data using pandas 0.20

Description

This book will provide you with unique, idiomatic, and fun recipes for both fundamental and advanced data manipulation tasks with pandas 0.20. Some recipes focus on achieving a deeper understanding of basic principles, or comparing and contrasting two similar operations. Other recipes will dive deep into a particular dataset, uncovering new and unexpected insights along the way. The pandas library is massive, and it's common for frequent users to be unaware of many of its more impressive features. The official pandas documentation, while thorough, does not contain many useful examples of how to piece together multiple commands like one would do during an actual analysis. This book guides you, as if you were looking over the shoulder of an expert, through practical situations that you are highly likely to encounter. Many advanced recipes combine several different features across the pandas 0.20 library to generate results.

Who is this book for?

This book is for data scientists, analysts and Python developers who wish to explore data analysis and scientific computing in a practical, hands-on manner. The recipes included in this book are suitable for both novice and advanced users, and contain helpful tips, tricks and caveats wherever necessary. Some understanding of pandas will be helpful, but not mandatory.

What you will learn

  • Master the fundamentals of pandas 0.20 to quickly begin exploring any dataset
  • Isolate any subset of data by properly selecting and querying the data
  • Split data into independent groups before applying aggregations and transformations to each group
  • Restructure data into tidy form to make data analysis and visualization easier
  • Prepare real-world messy datasets for machine learning
  • Combine and merge data from different sources through pandas SQL-like operations
  • Utilize pandas unparalleled time series functionality
  • Create beautiful and insightful visualizations through pandas 0.20 direct hooks to Matplotlib and Seaborn

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Oct 23, 2017
Length: 532 pages
Edition : 1st
Language : English
ISBN-13 : 9781784393878
Category :
Languages :
Concepts :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Oct 23, 2017
Length: 532 pages
Edition : 1st
Language : English
ISBN-13 : 9781784393878
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 116.97
Learning pandas
€41.99
Pandas Cookbook
€41.99
Python Machine Learning, Second Edition
€32.99
Total 116.97 Stars icon

Table of Contents

11 Chapters
Pandas Foundations Chevron down icon Chevron up icon
Essential DataFrame Operations Chevron down icon Chevron up icon
Beginning Data Analysis Chevron down icon Chevron up icon
Selecting Subsets of Data Chevron down icon Chevron up icon
Boolean Indexing Chevron down icon Chevron up icon
Index Alignment Chevron down icon Chevron up icon
Grouping for Aggregation, Filtration, and Transformation Chevron down icon Chevron up icon
Restructuring Data into a Tidy Form Chevron down icon Chevron up icon
Combining Pandas Objects Chevron down icon Chevron up icon
Time Series Analysis Chevron down icon Chevron up icon
Visualization with Matplotlib, Pandas, and Seaborn Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.3
(32 Ratings)
5 star 75%
4 star 3.1%
3 star 6.3%
2 star 9.4%
1 star 6.3%
Filter icon Filter
Top Reviews

Filter reviews by




Tomas Dec 12, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is an excellent book if you want to learn pandas and if you want to understand pandas. It covers all cases, clearly explains what and why pandas do, and the chapters are organized really well and it depends on you if you just want to stay on surface or go deeper.
Amazon Verified review Amazon
Scott B. Oct 26, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Ted Petrou in Pandas Cookbook gives real world problems easy to follow and implement recipes for data wrangling success. This books does an excellent job in highlighting the foundations of Pandas, critical operations in Pandas, how to use Pandas in data analysis, and data visualization with Matplotlib, Pandas and Seaborn libraries.Ted uses actual datasets in his examples. If you go to download supporting files with this book, you will get a copy of his Jupyter Notebooks for each chapter and over 30 real datasets for analysis.This is a must have book when I want to find actual solutions to problems with unruly data.
Amazon Verified review Amazon
Dimitri Shvorob Nov 12, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
A self-regarding preface, if I may. This is my second attempt at reviewing "Pandas Cookbook". The first one was written in a dyspeptic mood - thank you, Kindle Unlimited free trial, for exposing me to the horrors of self-published rip-offs - did not put its emphases right, and was in one regard simply misleading. This prompted criticism from the book's author, supported by detailed objections, and led me to reconsider. I know that there will be a Version 3, already in December, as I will want to compare "Pandas Cookbook" and Daniel Chen's "Pandas for Everyone", which I expect to be both similar and good. I will leave speculation at that, and focus on the present.Before "Pandas Cookbook", I had seen five books about Pandas:"Python for Data Analysis" by Wes McKinney, 2nd ed., 544 pages, 2017"Learning the Pandas Library" by Matt Harrison, 212 pages, 2016"Learning pandas" by Michael Heydt, 504 pages, Packt, 2015"Mastering pandas" by Femi Anthony, 364 pages, Packt, 2015"Python Data Analytics" by Fabio Nelli, 364 pages, Apress, 2015I can confidently say that (a) you don't need to consider books other than McKinney's and Petrou's, and (b) you want to see both, and possibly leave both, depending on your budget and personal preference.The one wrong suggestion in my original review was that PC was "far behind" PDA in terms of coverage. Having checked PDA, however, I realized that PDA did not have many things which I thought I learned from it, but in fact picked up from other sources - Pandas online doc, Stack Overflow, and, early on, Chris Albon's site. Surprisingly, the bread-and-butter "nunique" function, for example, is not in PDA, and neither is "filter" or "query". (I actually learned about "query" from "Pandas Cookbook"; my office Python environment predates Pandas 0.18.0). "Behind" is debatable - or moot: you have bits in one book, and not the other, either way you look - and "far" is false. The upshot is that you can get a good handle on Pandas with either reference.The word "reference" fits PDA better - it has a methodical, clearly structured, but somewhat terse style, reminding me of O'Reilly's "Nutshell" series. PC, on the other hand, is pretty relaxed, and goes at a slower pace, with illustrations that are much more likely to stay with you than McKinney's, because (a) they use real datasets, as opposed to quick artificial ones, (b) often are part of a sequence of steps, providing context and identifying the use case. Packt's no-frills typesetting puts PC at a disadvantage, but it is not too bad.Comparing "Pandas Cookbook" to what was available before, I see and appreciate the qualitative change from (a) reductive digests of McKinney's book, to (b) something that builds on, and complements, McKinney's book. If Chen's book does the same, the Pandas newbie will get even more options. For now, kudos to Ted Petrou for an original and useful book.
Amazon Verified review Amazon
Risiko Lektor May 27, 2019
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is certainly a great book. Like something we've been waiting for or at least I've been. It has clear structure and very knowledgeable content. Everything is clearly presented and above all it contains lots of stuff that is not presented elsewhere (at least to my knowledge). It's a little thick but that's alright. It is now my faithful companion on my desk to Pandas.
Amazon Verified review Amazon
Petter Dischington Dec 24, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is a comprehensive volume covering many aspects of the Pandas data science library for the Python programming language. It's not merely a reference work, but can also be used as a way to learn effective data analysis methodology as you are following the the instructions. The book finds that holy middle ground between being a complete beginners guide, a reference guide, and the specific guidance offered by a stack overflow answer.In my experience stack overflow can be too specific if you are unsure about what exactly you need to do to your data in order prepare it for further analysis, and that is really where this book shines. It explains the whys as well as the hows, and the author will give explanations of what are the most efficient ways to do so from a human workload/intuition perspective, as well as what is most computationally efficient, highlighting the preceding and following steps.Finally, the author himself is very accessible in case of problems, and there are a number of supporting materials to the book available.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.