Unstacking after a groupby aggregation
Grouping data by a single column and performing an aggregation on a single column returns a simple and straightforward result that is easy to consume. When grouping by more than one column, a resulting aggregation might not be structured in a manner that makes consumption easy. Since groupby
operations by default put the unique grouping columns in the index, the unstack
method can be extremely useful to rearrange the data so that it is presented in a manner that is more useful for interpretation.
Getting ready
In this recipe, we use the employee
dataset to perform an aggregation, grouping by multiple columns. We then use the unstack
method to reshape the result into a format that makes for easier comparisons of different groups.
How to do it...
- Read in the employee dataset and find the mean salary by race:
>>> employee = pd.read_csv('data/employee.csv') >>> employee.groupby('RACE')['BASE_SALARY'].mean().astype(int) RACE American Indian...