Hive partitioning
Partitioning in Hive can be best explained with an example. Suppose a telecom organization generates 1 TB of data every day and different regional managers query this data based on their own state. For each query by a regional manager, Hive scans the complete data in HDFS and files the results for a particular state.
The manager runs the same query daily for his own state analysis and the query gives the result in four hours on a 1 TB dataset. For analytics, the same query could be executed daily on a one-month or six-month dataset. The query would take ten hours on a month's data.
If the data is somehow partitioned based on state, then when a regional manager runs the same query for his state, only the data of that state is scanned and the execution time could be reduced significantly.
How to do it…
Partitioning can be done in one of the following two ways:
Static partitioning
Dynamic partitioning
Static partitioning
In static partitioning, you need to manually insert data in...