Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Pandas Cookbook
Pandas Cookbook

Pandas Cookbook: Practical recipes for scientific computing, time series, and exploratory data analysis using Python , Third Edition

Arrow left icon
Profile Icon William Ayd Profile Icon Matthew Harrison
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8 (4 Ratings)
Paperback Oct 2024 404 pages 3rd Edition
eBook
$27.98 $39.99
Paperback
$49.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon William Ayd Profile Icon Matthew Harrison
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8 (4 Ratings)
Paperback Oct 2024 404 pages 3rd Edition
eBook
$27.98 $39.99
Paperback
$49.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$27.98 $39.99
Paperback
$49.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Pandas Cookbook

pandas Foundations

The pandas library is useful for dealing with structured data. What is structured data? Data that is stored in tables, such as CSV files, Excel spreadsheets, or database tables, is all structured. Unstructured data consists of free-form text, images, sound, or video. If you find yourself dealing with structured data, pandas will be of great utility to you.

pd.Series is a one-dimensional collection of data. If you are coming from Excel, you can think of this as a column. The main difference is that, like a column in a database, all of the values within pd.Series must have a single, homogeneous type.

pd.DataFrame is a two-dimensional object. Much like an Excel sheet or database table can be thought of as a collection of columns, pd.DataFrame can be thought of as a collection of pd.Series objects. Each pd.Series has a homogeneous data type, but the pd.DataFrame is allowed to be heterogeneous and store a variety of pd.Series objects with different data types.

pd.Index does not have a direct analogy with other tools. Excel may offer the closest with auto-numbered rows on the left-hand side of a worksheet, but those numbers tend to be for display purposes only. pd.Index, as you will find over the course of this book, can be used for selecting values, joining tables, and much more.

The recipes in this chapter will show you how to manually construct pd.Series and pd.DataFrame objects, customize the pd.Index object(s) associated with each, and showcase common attributes of the pd.Series and pd.DataFrame that you may need to inspect during your analyses.

We are going to cover the following recipes in this chapter:

  • Importing pandas
  • Series
  • DataFrame
  • Index
  • Series attributes
  • DataFrame attributes

Importing pandas

Most users of the pandas library will use an import alias so they can refer to it as pd. In general, in this book, we will not show the pandas and NumPy imports, but they look like this:

import pandas as pd
import numpy as np

While it is an optional dependency in the 2.x series of pandas, many examples in this book will also leverage the PyArrow library, which we assume to be imported as:

import pyarrow as pa

Series

The basic building block in pandas is a pd.Series, which is a one-dimensional array of data paired with a pd.Index. The index labels can be used as a simplistic way to look up values in the pd.Series, much like the Python dictionary built into the language uses key/value pairs (we will expand on this and much more pd.Index functionality in Chapter 2, Selection and Assignment).

The following section demonstrates a few ways of creating a pd.Series directly.

How to do it

The easiest way to construct a pd.Series is to provide a sequence of values, like a list of integers:

pd.Series([0, 1, 2])
0    0
1    1
2    2
dtype: int64

A tuple is another type of sequence, making it valid as an argument to the pd.Series constructor:

pd.Series((12.34, 56.78, 91.01))
0    12.34
1    56.78
2    91.01
dtype: float64

When generating sample data, you may often reach for the Python range function:

pd.Series(range(0, 7, 2))
0    0
1    2
2    4
3    6
dtype: int64

In all of the examples so far, pandas will try and infer a proper data type from its arguments for you. However, there are times when you will know more about the type and size of your data than can be inferred. Providing that information explicitly to pandas via the dtype= argument can be useful to save memory or ensure proper integration with other typed systems, like SQL databases.

To illustrate this, let’s use a simple range argument to fill a pd.Series with a sequence of integers. When we did this before, the inferred data type was a 64-bit integer, but we, as developers, may know that we never expect to store larger values in this pd.Series and would be fine with only 8 bits of storage (if you do not know the difference between an 8-bit and 64-bit integer, that topic will be covered in Chapter 3, Data Types). Passing dtype="int8" to the pd.Series constructor will let pandas know we want to use the smaller data type:

pd.Series(range(3), dtype="int8")
0    0
1    1
2    2
dtype: int8

A pd.Series can also have a name attached to it, which can be specified via the name= argument (if not specified, the name defaults to None):

pd.Series(["apple", "banana", "orange"], name="fruit")
0     apple
1     banana
2     orange
Name: fruit, dtype: object

DataFrame

While pd.Series is the building block, pd.DataFrame is the main object that comes to mind for users of pandas. pd.DataFrame is the primary and most commonly used object in pandas, and when people think of pandas, they typically envision working with a pd.DataFrame.

In most analysis workflows, you will be importing your data from another source, but for now, we will show you how to construct a pd.DataFrame directly (input/output will be covered in Chapter 4, The pandas I/O System).

How to do it

The most basic construction of a pd.DataFrame happens with a two-dimensional sequence, like a list of lists:

pd.DataFrame([
    [0, 1, 2],
    [3, 4, 5],
    [6, 7, 8],
])
    0   1   2
0   0   1   2
1   3   4   5
2   6   7   8

With a list of lists, pandas will automatically number the row and column labels for you. Typically, users of pandas will at least provide labels for columns, as it makes indexing and selecting from a pd.DataFrame much more intuitive (see Chapter 2, Selection and Assignment, for an introduction to indexing and selecting). To label your columns when constructing a pd.DataFrame from a list of lists, you can provide a columns= argument to the constructor:

pd.DataFrame([
    [1, 2],
    [4, 8],
], columns=["col_a", "col_b"])
     col_a    col_b
0    1          2
1    4          8

Instead of using a list of lists, you could also provide a dictionary. The keys of the dictionary will be used as column labels, and the values of the dictionary will represent the values placed in that column of the pd.DataFrame:

pd.DataFrame({
    "first_name": ["Jane", "John"],
    "last_name": ["Doe", "Smith"],
})
            first_name      last_name
0           Jane            Doe
1           John            Smith

In the above example, our dictionary values were lists of strings, but the pd.DataFrame does not strictly require lists. Any sequence will work, including a pd.Series:

ser1 = pd.Series(range(3), dtype="int8", name="int8_col")
ser2 = pd.Series(range(3), dtype="int16", name="int16_col")
pd.DataFrame({ser1.name: ser1, ser2.name: ser2})
             int8_col         int16_col
0            0                0
1            1                1
2            2                2

Index

When constructing both the pd.Series and pd.DataFrame objects in the previous sections, you likely noticed the values to the left of these objects starting at 0 and incrementing by 1 for each new row of data. The object responsible for those values is the pd.Index, highlighted in the following image:

Figure 1.1: Default pd.Index, highlighted in red

In the case of a pd.DataFrame, you have a pd.Index not only to the left of the object (often referred to as the row index or even just index) but also above (often referred to as the column index or columns):

A screenshot of a computer

Figure 1.2: A pd.DataFrame with a row and column index

Unless explicitly provided, pandas will create an auto-numbered pd.Index for you (technically, this is a pd.RangeIndex, a subclass of the pd.Index class). However, it is very rare to use pd.RangeIndex for your columns, as referring to a column named City or Date is more expressive than referring to a column in the nth position. The pd.RangeIndex appears more commonly in the row index, although you may still want custom labels to appear there as well. More advanced selection operations with the default pd.RangeIndex and custom pd.Index values will be covered in Chapter 2, Selection and Assignment, to help you understand different use cases, but for now, let’s just look at how you would override the construction of the row and column pd.Index objects during pd.Series and pd.DataFrame construction.

How to do it

When constructing a pd.Series, the easiest way to change the row index is by providing a sequence of labels to the index= argument. In this example, the labels dog, cat, and human will be used instead of the default pd.RangeIndex numbered from 0 to 2:

pd.Series([4, 4, 2], index=["dog", "cat", "human"])
dog          4
cat          4
human        2
dtype: int64

If you want finer control, you may want to construct the pd.Index yourself before passing it as an argument to index=. In the following example, the pd.Index is given the name animal, and the pd.Series itself is named num_legs, providing more context to the data:

index = pd.Index(["dog", "cat", "human"], name="animal")
pd.Series([4, 4, 2], name="num_legs", index=index)
animal
dog          4
cat          4
human        2
Name: num_legs, dtype: int64

A pd.DataFrame uses a pd.Index for both dimensions. Much like with the pd.Series constructor, the index= argument can be used to specify the row labels, but you now also have the columns= argument to control the column labels:

pd.DataFrame([
    [24, 180],
    [42, 166],
], columns=["age", "height_cm"], index=["Jack", "Jill"])
         age    height_cm
Jack     24     180
Jill     42     166

Series attributes

Once you have a pd.Series, there are quite a few attributes you may want to inspect. The most basic attributes can tell you the type and size of your data, which is often the first thing you will inspect when reading in data from a data source.

How to do it

Let’s start by creating a pd.Series that has a name, alongside a custom pd.Index, which itself has a name. Although not all of these elements are required, having them will help us more clearly understand what the attributes we access through this recipe are actually showing us:

index = pd.Index(["dog", "cat", "human"], name="animal")
ser = pd.Series([4, 4, 2], name="num_legs", index=index)
ser
animal
dog      4
cat      4
human    2
Name: num_legs, dtype: int64

The first thing users typically want to know about their data is the type of pd.Series. This can be inspected via the pd.Series.dtype attribute:

ser.dtype
dtype('int64')

The name may be inspected via the pd.Series.name attribute. The data we constructed in this recipe was created with the name="num_legs" argument, which is what you will see when accessing this attribute (if not provided, this will return None):

ser.name
num_legs

The associated pd.Index can be accessed via pd.Series.index:

ser.index
Index(['dog', 'cat', 'human'], dtype='object', name='animal')

The name of the associated pd.Index can be accessed via pd.Series.index.name:

ser.index.name
animal

The shape can be accessed via pd.Series.shape. For a one-dimensional pd.Series, the shape is returned as a one-tuple where the first element represents the number of rows:

ser.shape
3

The size (number of elements) can be accessed via pd.Series.size:

ser.size
3

The Python built-in function len can show you the length (number of rows):

len(ser)
3

DataFrame attributes

The pd.DataFrame shares many of the attributes of the pd.Series, with some slight differences. Generally, pandas tries to share as many attributes as possible between the pd.Series and pd.DataFrame, but the two-dimensional nature of the pd.DataFrame makes it more natural to express some things in plural form (for example, the .dtype attribute becomes .dtypes) and gives us a few more attributes to inspect (for example, .columns exists for a pd.DataFrame but not for a pd.Series).

How to do it

Much like we did in the previous section, we are going to construct a pd.DataFrame with a custom pd.Index in the rows, while also using custom labels in the columns. This will be more helpful when inspecting the various attributes:

index = pd.Index(["Jack", "Jill"], name="person")
df = pd.DataFrame([
    [24, 180, "red"],
    [42, 166, "blue"],
], columns=["age", "height_cm", "favorite_color"], index=index)
df
           age    height_cm    favorite_color
person
Jack       24     180          red
Jill       42     166          blue

The types of each column can be inspected via the pd.DataFrame.dtypes attribute. This attribute returns a pd.Series where each row shows the data type corresponding to each column in our pd.DataFrame:

df.dtypes
age                int64
height_cm          int64
favorite_color     object
dtype: object

The row index can be accessed via pd.DataFrame.index:

df.index
Index(['Jack', 'Jill'], dtype='object', name='person')

The column index can be accessed via pd.DataFrame.columns:

df.columns
Index(['age', 'height_cm', 'favorite_color'], dtype='object')

The shape can be accessed via pd.DataFrame.shape. For a two-dimensional pd.DataFrame, the shape is returned as a two-tuple where the first element represents the number of rows and the second element represents the number of columns:

df.shape
2     3

The size (number of elements) can be accessed via pd.DataFrame.size:

df.size
6

The Python built-in function len can show you the length (number of rows):

len(df)
2

Join our community on Discord

Join our community’s Discord space for discussions with the authors and other readers:

https://packt.link/pandas

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • This book targets features in pandas 2.x and beyond
  • Practical, easy to implement recipes for quick solutions to common problems in data using pandas
  • Master the fundamentals of pandas to quickly begin exploring any dataset

Description

The pandas library is massive, and it's common for frequent users to be unaware of many of its more impressive features. The official pandas documentation, while thorough, does not contain many useful examples of how to piece together multiple commands as one would do during an actual analysis. This book guides you, as if you were looking over the shoulder of an expert, through situations that you are highly likely to encounter. With this latest edition unlock the full potential of pandas 2.x onwards. Whether you're a beginner or an experienced data analyst, this book offers a wealth of practical recipes to help you excel in your data analysis projects. This cookbook covers everything from fundamental data manipulation tasks to advanced techniques for handling big data, visualization, and more. Each recipe is designed to address common real-world challenges, providing clear explanations and step-by-step instructions to guide you through the process. Explore cutting-edge topics such as idiomatic pandas coding, efficient handling of large datasets, and advanced data visualization techniques.  Whether you're looking to sharpen or expand your skills, the "Pandas Cookbook" is your essential companion for mastering data analysis and manipulation with pandas 2.x, and beyond.

Who is this book for?

This book is for Python developers, data scientists, engineers, and analysts. pandas is the ideal tool for manipulating structured data with Python and this book provides ample instruction and examples. Not only does it cover the basics required to be proficient, but it goes into the details of idiomatic pandas

What you will learn

  • The pandas type system and how to best navigate it
  • Import/export DataFrames to/from common data formats
  • Data exploration in pandas through dozens of practice problems
  • Grouping, aggregation, transformation, reshaping, and filtering data
  • Merge data from different sources through pandas SQL-like operations
  • Leverage the robust pandas time series functionality in advanced analyses
  • Scale pandas operations to get the most out of your system
  • The large ecosystem that pandas can coordinate with and supplement

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Oct 31, 2024
Length: 404 pages
Edition : 3rd
Language : English
ISBN-13 : 9781836205876
Category :
Languages :
Concepts :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Oct 31, 2024
Length: 404 pages
Edition : 3rd
Language : English
ISBN-13 : 9781836205876
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Table of Contents

12 Chapters
pandas Foundations Chevron down icon Chevron up icon
Selection and Assignment Chevron down icon Chevron up icon
Data Types Chevron down icon Chevron up icon
The pandas I/O System Chevron down icon Chevron up icon
Algorithms and How to Apply Them Chevron down icon Chevron up icon
Visualization Chevron down icon Chevron up icon
Reshaping DataFrames Chevron down icon Chevron up icon
Group By Chevron down icon Chevron up icon
Temporal Data Types and Algorithms Chevron down icon Chevron up icon
General Usage and Performance Tips Chevron down icon Chevron up icon
The pandas Ecosystem Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.8
(4 Ratings)
5 star 75%
4 star 25%
3 star 0%
2 star 0%
1 star 0%
Robert Nov 03, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The third edition of Cooking with Pandas was a welcome resource for learning about Pandas in Python. The book starts with the foundations and continues to build throughout. Like any good cookbook, there is a quick explanation of the material and a section on how to perform the task.The book is for anyone interested in Pandas, from beginners to well-seasoned developers. You can't go wrong by picking up this book. You will learn a lot!
Amazon Verified review Amazon
Kindle Customer Nov 03, 2024
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
The third edition is another winner!Very well written, lots of "recipes" to follow along with and practice. As a newbie I found it challenging but it is worth it. Excellent reference for individual problem solving. You can jump into a certain topic and learn new things.Also teaches you how to read data all the way to create visualizations for reports.Will definitely use this as a reference and recommend to colleagues.
Amazon Verified review Amazon
JW Nov 02, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Pandas Cookbook third edition is an excellent reference for pandas users. Ihave been writing Python code and using pandas for more than 10 years, andstill managed to learn something every chapter reading this book. I found it tobe written in such a way that it allowed for reading cover to cover, but wouldalso be useful jumping straight to the sections you need when trying to use asa reference.The examples in the book remained short enough for readers to use themselvesbut still clear enough to demonstrate both the "how" and "why." This book Ifound approachable enough that I would even recommend it to people just gettingintroduced to pandas. Experienced pandas will likely appreciate this bookexplains not just the "how" to accomplish a task but also the "why."Covering topics like reading data in from different sources, various ways toselect data, how to perform aggregations and transformations well, working withcomplex types (such as datetimes), performance tuning, and visualizations, thisis a book that I will find myself reaching for regularly.
Amazon Verified review Amazon
Souvik Roy Nov 02, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book is perfect for beginners as there are tons of resources online, and this book tries to bring it all up in one place. I am new to Time Series, so this book will personally help me to know more about it and make me efficient in using the knowledge in the book in my real-world projects.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.