Unstacking after a groupby aggregation
Grouping data by a single column and performing an aggregation on a single column returns a result that is easy to consume. When grouping by more than one column, a resulting aggregation might not be structured in a manner that makes consumption easy. Since .groupby
operations, by default, put the unique grouping columns in the index, the .unstack
method can be beneficial to rearrange the data so that it is presented in a manner that is more useful for interpretation.
In this recipe, we use the employee dataset to perform an aggregation, grouping by multiple columns. We then use the .unstack
method to reshape the result into a format that makes for easier comparisons of different groups.
How to do it…
- Read in the employee dataset and find the mean salary by race:
>>> employee = pd.read_csv('data/employee.csv') >>> (employee ... .groupby('RACE') ... ['BASE_SALARY...