Many data analysis problems utilize a pattern of processing data referred to as split-apply-combine. In this pattern, three steps are taken to analyze data:
- A dataset is split into smaller pieces based on certain criteria
- Each of these pieces are operated upon independently
- All the results are then combined back and presented as a single unit
The following diagram demonstrates a simple split-apply-combine process to calculate the mean of values grouped by a character-based key (a or b):
The data is then split by the index label into two groups (one each for a and b). The mean of the values in each group is calculated. The resulting values from the group are then combined into a single pandas object, which is indexed by the label representing each group.
Splitting in pandas is performed using the .groupby() method of a Series or DataFrame...