Group-by reductions – from many to fewer
A very common operation is a reduction that groups values by some key or indicator. In SQL, this is often called the SELECT GROUP BY
operation. The raw data is grouped by some columns value and reductions (sometimes aggregate functions) are applied to other columns. The SQL aggregate functions include SUM
, COUNT
, MAX
, and MIN
.
The statistical summary called the mode is a count that's grouped by some independent variable. Python offers us several ways to group data before computing a reduction of the grouped values. We'll start by looking at two ways to get simple counts of grouped data. Then we'll look at ways to compute different summaries of grouped data.
We'll use the trip data that we computed in Chapter 4, Working with Collections. This data started as a sequence of latitude-longitude waypoints. We restructured it to create legs represented by three tuples of start, end, and distance for the leg
. The data looks as follows...