Customizing an aggregation function
Pandas provides a number of the most common aggregation functions for you to use with the groupby object. At some point, you will need to write your own customized user-defined functions that don't exist in pandas or NumPy.
Getting ready
In this recipe, we use the college dataset to calculate the mean and standard deviation of the undergraduate student population per state. We then use this information to find the maximum number of standard deviations from the mean that any single population value is per state.
How to do it...
- Read in the college dataset, and find the mean and standard deviation of the undergraduate population by state:
>>> college = pd.read_csv('data/college.csv') >>> college.groupby('STABBR')['UGDS'].agg(['mean', 'std']) \ .round(0).head()
- This output isn't quite what we desire. We are not looking for the mean and standard deviations of the entire group but the maximum number of standard...