Using groupby to change the unit of analysis of a DataFrame
The DataFrame that we created in the last step of the previous recipe was something of a fortunate by-product of our efforts to generate multiple summary statistics by groups. There are times when we really do need to aggregate data to change the unit of analysis—say, from monthly utility expenses per family to annual utility expenses per family, or from students’ grades per course to students’ overall Grade Point Average (GPA).
groupby
is a good tool for collapsing the unit of analysis, particularly when summary operations are required. When we only need to select unduplicated rows—perhaps the first or last row for each individual over a given interval—then the combination of sort_values
and drop_duplicates
will do the trick. But we often need to do some calculation across the rows for each group before collapsing. That is when groupby
comes in very handy.
Getting ready
We will...