Aggregating data
We already got a sneak peek at aggregation when we discussed window calculations and pipes in the previous section. Here, we will focus on summarizing the dataframe through aggregation, which will change the shape of our dataframe (often through row reduction). We also saw how easy it is to take advantage of vectorized NumPy functions on pandas
data structures, especially to perform aggregations. This is what NumPy does best: it performs computationally efficient mathematical operations on numeric arrays.
NumPy pairs well with aggregating dataframes since it gives us an easy way to summarize data with different pre-written functions; often, when aggregating, we just need the NumPy function, since most of what we would want to write ourselves has previously been built. We have already seen some NumPy functions commonly used for aggregations, such as np.sum()
, np.mean()
, np.min()
, and np.max()
; however, we aren't limited to numeric operations—we can use...