To get the most out of this book
There are a couple of things you can do to get the most out of this book. First, and most importantly, you should download all the code, which is stored in Jupyter Notebook. While reading through each recipe, run each step of code in the notebook. Make sure you explore on your own as you run through the code. Second, have the pandas official documentation open (http://pandas.pydata.org/pandas-docs/stable/) in one of your browser tabs. The pandas documentation is an excellent resource containing over 1,000 pages of material. There are examples for most of the pandas operations in the documentation, and they will often be directly linked from the See also section. While it covers the basics of most operations, it does so with trivial examples and fake data that don’t reflect situations that you are likely to encounter when analyzing datasets from the real world.
What you need for this book
pandas is a third-party package for the Python programming language and, as of the printing of this book, is transitioning from the 2.x to the 3.x series. The examples in this book should work with a minimum pandas version of 2.0 along with Python versions 3.9 and above.
The code in this book will make use of the pandas, NumPy, and PyArrow libraries. Jupyter Notebook files are also a popular way to visualize and inspect code. All of these libraries should be installable via pip
or the package manager of your choice. For pip users, you can run:
python -m pip install pandas numpy pyarrow notebook
Download the example code files
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support/errata and register to have the files emailed directly to you.
You can download the code files by following these steps:
- Log in or register at www.packt.com.
- Select the Support tab.
- Click on Code Downloads.
- Enter the name of the book in the Search box and follow the on-screen instructions.
The code bundle for the book is also hosted on GitHub at https://github.com/WillAyd/Pandas-Cookbook-Third-Edition. In case there is an update to the code, it will be updated in the existing GitHub repository.
Running a Jupyter notebook
The suggested method to work through the content of this book is to have a Jupyter notebook up and running so that you can run the code while reading through the recipes. Following along on your computer allows you to go off exploring on your own and gain a deeper understanding than by just reading the book alone.
After installing Jupyter notebook, open a Command Prompt (type cmd
at the search bar on Windows, or open Terminal on Mac or Linux) and type:
jupyter notebook
It is not necessary to run this command from your home directory. You can run it from any location, and the contents in the browser will reflect that location. Although we have now started the Jupyter Notebook program, we haven’t actually launched a single individual notebook where we can start developing in Python. To do so, you can click on the New button on the right-hand side of the page, which will drop down a list of all the possible kernels available for you to use. If you are working from a fresh installation, then you will only have a single kernel available to you (Python 3). After selecting the Python 3 kernel, a new tab will open in the browser, where you can start writing Python code.
You can, of course, open previously created notebooks instead of beginning a new one. To do so, navigate through the filesystem provided in the Jupyter Notebook browser home page and select the notebook you want to open. All Jupyter Notebook files end in .ipynb
.
Alternatively, you may use cloud providers for a notebook environment. Both Google and Microsoft provide free notebook environments that come preloaded with pandas.
Download the color images
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://packt.link/gbp/9781836205876.
Conventions
There are a number of text conventions used throughout this book.
CodeInText
: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter/X handles. Here is an example: “You may need to install xlwt
or openpyxl
to write XLS or XLSX files, respectively.”
A block of code is set as follows:
import pandas as pd
import numpy as np
movies = pd.read_csv("data/movie.csv")
movies
Bold: Indicates an important word, or words that you see on the screen. Here is an example: “Select System info from the Administration panel.”
Italics: Indicates terminology that has extra importance within the context of the writing.
Important notes
Appear like this.
Tips
Appear like this.
Assumptions for every recipe
It should be assumed that at the beginning of each recipe, pandas, NumPy, PyArrow, and Matplotlib are imported into the namespace:
import numpy as np
import pyarrow as pa
import pandas as pd
Dataset descriptions
There are about two dozen datasets that are used throughout this book. It can be very helpful to have background information on each dataset as you complete the steps in the recipes. A detailed description of each dataset may be found in the dataset_descriptions
Jupyter Notebook file found at https://github.com/WillAyd/Pandas-Cookbook-Third-Edition. For each dataset, there will be a list of the columns, information about each column, and notes on how the data was procured.