Group By
One of the most fundamental tasks during data analysis involves splitting data into independent groups before performing a calculation on each group. This methodology has been around for quite some time, but has more recently been referred to as split-apply-combine.
Within the apply step of the split-apply-combine paradigm, it is additionally helpful to know whether we are trying to perform a reduction (also referred to as an aggregation) or a transformation. The former reduces the values in a group down to one value whereas the latter attempts to maintain the shape of the group.
To illustrate, here is what split-apply-combine looks like for a reduction:
Figure 8.1: Split-apply-combine paradigm for a reduction
Here is the same paradigm for a transformation:
Figure 8.2: Split-apply-combine paradigm for a transformation
In pandas, the pd.DataFrame.groupby
method is responsible for splitting, applying a function of your choice, and combining...