Table and partition statistics in Hive
The first development in statistical computation is to support tables and partition-level statistics. With other metadata, the table and partition statistics are also stored in a configured metastore. The statistics are supported for both existing and new tables. The following are the statistics currently supported for tables and partitions:
The number of rows
The number of files
Size in bytes
Max, min, and average row sizes
Max, min, and average file sizes
The number of partitions (in the case of tables)
Getting ready
This recipe requires Hive installed as described in the Installing Hive recipe of Chapter 1, Developing Hive. You will also need Hive CLI or the beeline client to run the commands.
How to do it…
For newly created table or partitions using the INSERT OVERWRITE
command, statistics are computed automatically at table level. If you want to disable statistics calculations for a table, you need to set hive.stats.autogather
to false either for the session...