You're reading from Pandas 1.x Cookbook Practical recipes for scientific computing, time series analysis, and exploratory data analysis using Python

Product type Paperback

Published in Feb 2020

Publisher Packt

ISBN-13 9781839213106

Length 626 pages

Edition 2nd Edition

Languages

Python

Tools

Pandas

Concepts

Data Analysis

Authors (2):

Theodore Petrou

Matthew Harrison

View More author details

Table of Contents (17) Chapters

Preface

1. Pandas Foundations

2. Essential DataFrame Operations FREE CHAPTER

3. Creating and Persisting DataFrames

4. Beginning Data Analysis

5. Exploratory Data Analysis

6. Selecting Subsets of Data

7. Filtering Rows

8. Index Alignment

9. Grouping for Aggregation, Filtration, and Transformation

10. Restructuring Data into a Tidy Form

11. Combining Pandas Objects

12. Time Series Analysis

13. Visualization with Matplotlib, Pandas, and Seaborn

14. Debugging and Testing Pandas

15. Other Books You May Enjoy

16. Index

To get the most out of this book

There are a couple of things you can do to get the most out of this book. First, and most importantly, you should download all the code, which is stored in Jupyter Notebooks. While reading through each recipe, run each step of code in the notebook. Make sure you explore on your own as you run through the code. Second, have the pandas official documentation open (http://pandas.pydata.org/pandas-docs/stable/) in one of your browser tabs. The pandas documentation is an excellent resource containing over 1,000 pages of material. There are examples for most of the pandas operations in the documentation, and they will often be directly linked from the See also section. While it covers the basics of most operations, it does so with trivial examples and fake data that don't reflect situations that you are likely to encounter when analyzing datasets from the real world.

What you need for this book

pandas is a third-party package for the Python programming language and, as of the printing of this book, is on version 1.0.1. Currently, Python is at version 3.8. The examples in this book should work fine in versions 3.6 and above.

There are a wide variety of ways in which you can install pandas and the rest of the libraries mentioned on your computer, but an easy method is to install the Anaconda distribution. Created by Anaconda, it packages together all the popular libraries for scientific computing in a single downloadable file available on Windows, macOS, and Linux. Visit the download page to get the Anaconda distribution (https://www.anaconda.com/distribution).

In addition to all the scientific computing libraries, the Anaconda distribution comes with Jupyter Notebook, which is a browser-based program for developing in Python, among many other languages. All of the recipes for this book were developed inside of a Jupyter Notebook and all of the individual notebooks for each chapter will be available for you to use.

It is possible to install all the necessary libraries for this book without the use of the Anaconda distribution. For those that are interested, visit the pandas installation page (http://pandas.pydata.org/pandas-docs/stable/install.html).

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support/errata and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at www.packt.com.
Select the Support tab.
Click on Code Downloads.
Enter the name of the book in the Search box and follow the on-screen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Pandas-Cookbook-Second-Edition. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Running a Jupyter Notebook

The suggested method to work through the content of this book is to have a Jupyter Notebook up and running so that you can run the code while reading through the recipes. Following along on your computer allows you to go off exploring on your own and gain a deeper understanding than by just reading the book alone.

Assuming that you have installed the Anaconda distribution on your machine, you have two options available to start the Jupyter Notebook, from the Anaconda GUI or the command line. I highly encourage you to use the command line. If you are going to be doing much with Python, you will need to feel comfortable from there.

After installing Anaconda, open a command prompt (type cmd at the search bar on Windows, or open a Terminal on Mac or Linux) and type:

$ jupyter-notebook

It is not necessary to run this command from your home directory. You can run it from any location, and the contents in the browser will reflect that location.

Although we have now started the Jupyter Notebook program, we haven't actually launched a single individual notebook where we can start developing in Python. To do so, you can click on the New button on the right-hand side of the page, which will drop down a list of all the possible kernels available for you to use. If you just downloaded Anaconda, then you will only have a single kernel available to you (Python 3). After selecting the Python 3 kernel, a new tab will open in the browser, where you can start writing Python code.

You can, of course, open previously created notebooks instead of beginning a new one. To do so, navigate through the filesystem provided in the Jupyter Notebook browser home page and select the notebook you want to open. All Jupyter Notebook files end in .ipynb.

Alternatively, you may use cloud providers for a notebook environment. Both Google and Microsoft provide free notebook environments that come preloaded with pandas.

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781839213106_ColorImages.pdf.

Conventions

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "You may need to install xlwt or openpyxl to write XLS or XLSX files respectively."

A block of code is set as follows:

import pandas as pd
import numpy as np
movies = pd.read_csv("data/movie.csv")
movies

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

import pandas as pd
import numpy as np
movies = pd.read_csv("data/movie.csv")
movies

Any command-line input or output is written as follows:

>>> employee = pd.read_csv('data/employee.csv')
>>> max_dept_salary = employee.groupby('DEPARTMENT')['BASE_SALARY'].max()

Bold: Indicates a new term, an important word, or words that you see on the screen, for example, in menus or dialog boxes, also appear in the text like this. Here is an example: "Select System info from the Administration panel."

Warnings or important notes appear like this.

Tips and tricks appear like this.

Assumptions for every recipe

It should be assumed that at the beginning of each recipe pandas, NumPy, and matplotlib are imported into the namespace. For plots to be embedded directly within the notebook, you must also run the magic command %matplotlib inline. Also, all data is stored in the data directory and is most commonly stored as a CSV file, which can be read directly with the read_csv function:

>>> %matplotlib inline
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> import pandas as pd
>>> my_dataframe = pd.read_csv('data/dataset_name.csv')

Dataset descriptions

There are about two dozen datasets that are used throughout this book. It can be very helpful to have background information on each dataset as you complete the steps in the recipes. A detailed description of each dataset may be found in the dataset_descriptions Jupyter Notebook found at https://github.com/PacktPublishing/Pandas-Cookbook-Second-Edition. For each dataset, there will be a list of the columns, information about each column and notes on how the data was procured.