Basic aggregation – GROUP BY
Data aggregation is any process to gather and express data in a summary form to get more information about particular groups based on specific conditions. Hive offers several built-in aggregate functions, such as MAX
, MIN
, AVG
, and so on. Hive also supports advanced aggregation by using GROUPING SETS
, ROLLUP
, CUBE
, analytic functions, and windowing.
The Hive basic built-in aggregate functions are usually used with the GROUP BY
clause. If there is no GROUP BY
clause specified, it aggregates over the whole table by default. Besides aggregate functions, all other columns that are selected must also be included in the GROUP BY
clause. The following are a few examples using the built-in aggregate functions:
Aggregation without
GROUP BY
columns:jdbc:hive2://> SELECT count(*) AS row_cnt FROM employee; +----------+ | row_cnt | +----------+ | 5 | +----------+ 1 row selected (60.709 seconds)
Aggregation with
GROUP BY
columns:jdbc:hive2://> SELECT sex_age.sex...