Aggregating data with Cascalog
So far, the Cascalog queries you saw have all returned tables of results. However, sometimes you'll want to aggregate the tables in order to boil them down to a single value or into a table where groups from the original data are aggregated.
Cascalog also makes this easy to do, and it includes a number of aggregate functions. For this recipe, we'll only use two—cascalog.logic.opts/distinct-count
and cascalog.logic.ops/sumsum
—but you can find more easily in the API documentation on the Cascalog website (http://nathanmarz.github.io/cascalog/cascalog.logic.ops.html).
Getting ready
We'll use the same dependencies and imports as we did in Parsing CSV Files with Cascalog. We'll also use the same data that we defined in that recipe.
How to do it…
We'll take a look at a couple of examples on how to aggregate data with the count
function:
First, we'll query how many:
user=> (?<- (stdout) [?count] ((hfs-text-delim "data/16285/flights_with_colnames.csv" ...