ORDER and SORT
Another aspect to manipulate data in Hive is to properly order or sort the data or result sets to clearly identify the important facts, such as top N values, maximum, minimum, and so on.
There are the following keywords used in Hive to order and sort data:
ORDER BY (ASC|DESC)
: This is similar to the RDBMSORDER BY
statement. A sorted order is maintained across all of the output from every reducer. It performs the global sort using only one reducer, so it takes a longer time to return the result. Usage withLIMIT
is strongly recommended forORDER BY
. Whenhive.mapred.mode = strict
(by default,hive.mapred.mode = nonstrict
) is set and we do not specifyLIMIT
, there are exceptions. This can be used as follows:jdbc:hive2://> SELECT name FROM employee ORDER BY NAME DESC; +----------+ | name | +----------+ | Will | | Shelley | | Michael | | Lucy | +----------+ 4 rows selected (57.057 seconds)
SORT BY (ASC|DESC)
: This indicates which columns to sort when ordering...