Grouping with a custom aggregation function
pandas provides a number of aggregation functions to use with the groupby
object. At some point, you may need to write your own custom user-defined function that does not exist in pandas or NumPy.
In this recipe, we use the college dataset to calculate the mean and standard deviation of the undergraduate student population per state. We then use this information to find the maximum number of standard deviations from the mean that any single population value is per state.
How to do it…
- Read in the college dataset, and find the mean and standard deviation of the undergraduate population by state:
>>> college = pd.read_csv('data/college.csv') >>> (college ... .groupby('STABBR') ... ['UGDS'] ... .agg(['mean', 'std']) ... .round(0) ... ) mean std STABBR AK 2493.0 4052.0 AL ...