Operations on groups
Operations groupAll
and groupBy
are essential building blocks of Scalding applications, and they generate groups. groupAll
generates a single group containing all the available tuples. groupBy
generates m number of groups, where m is the number of unique keys in the data.
For example, if groupBy('color)
is executed and three unique colors exist in the data, then three groups will be generated. Once grouping is achieved, a number of group operations can be applied to them.
The first seven group operations average
, count
, min
, max
, sum
, size
, and sizeAveStdev
are useful to extract statistics from data, and their syntax is as follows:
group.average(field -> newField) group.count(field -> newField) { function } group.min(field -> newField) group.max(field -> newField) group.sum(field -> newField) group.size(newField) group.sizeAveStdev(field -> sizeField,averageField, stdField)
We can also apply multiple group operations on the same group. To calculate...