Using groupby to organize data by groups
At a certain point in most data analysis projects, we have to generate summary statistics by groups. While this can be done using the approaches in the previous recipe, in most cases the pandas DataFrame groupby
method is a better choice. If groupby
can handle an aggregation task—and it usually can—it is likely the most efficient way to accomplish that task. We make good use of groupby
in the remaining recipes in this chapter. We go over the basics in this recipe.
Getting ready
We will work with the COVID-19 daily data in this recipe.
How to do it…
We will create a pandas groupby
DataFrame and use it to generate summary statistics by group:
- Import
pandas
andnumpy
, and load the Covid case daily data:>>> import pandas as pd >>> import numpy as np >>> coviddaily = pd.read_csv("data/coviddaily720.csv", parse_dates=["casedate"])
- Create a pandas
groupby
DataFrame...