6.3 Group-by reduction from many items to fewer
The idea of a reduction can apply in many ways. We’ve looked at the essential recursive definition of a reduction that produces a single value from a collection of values. This leads us to optimizing the recursion so we have the ability to compute summaries without the overheads of a naive Pythonic implementation.
Creating subgroups in Python isn’t difficult, but it can help to understand the formalisms that support it. This understanding can help to avoid implementations that perform extremely poorly.
A very common operation is a reduction that groups values by some key or indicator. The raw data is grouped by some column’s value, and reductions (sometimes called aggregate functions) are applied to other columns.
In SQL, this is often called the GROUP
BY
clause of the SELECT
statement. The SQL aggregate functions include SUM
, COUNT
, MAX
, and MIN
, and often many more.
Python offers us several ways to group data...