By default, a simple HQL query scans the whole table. This slows down the performance when querying a big table. This issue could be resolved by creating partitions, which are very similar to what's in the RDBMS. In Hive, each partition corresponds to a predefined partition column(s), which maps to subdirectories in the table's directory in HDFS. When the table gets queried, only the required partitions (directory) of data in the table are being read, so the I/O and time of the query is greatly reduced. Using partition is a very easy and effective way to improve performance in Hive.
The following is an example of partition creation in HQL:
> CREATE TABLE employee_partitioned ( > name STRING, > work_place ARRAY<STRING>, > gender_age STRUCT<gender:STRING,age:INT>, > skills_score MAP<STRING,INT>, > depart_title MAP<STRING...