Introduction
One of the most fundamental tasks during data analysis involves splitting data into independent groups before performing a calculation on each group. This methodology has been around for quite some time but has more recently been referred to as split-apply-combine. This chapter covers the powerful .groupby
method, which allows you to group your data in any way imaginable and apply any type of function independently to each group before returning a single dataset.
Before we get started with the recipes, we will need to know just a little terminology. All basic groupby operations have grouping columns, and each unique combination of values in these columns represents an independent grouping of the data. The syntax looks as follows:
df.groupby(['list', 'of', 'grouping', 'columns'])
df.groupby('single_column') # when grouping by a single column
The result of calling the .groupby
method is a groupby
object...