Exploring indexes
Indexes are useful for increasing the performance of frequent queries based on certain columns. But Hive has limited a capability to index data as indexing large datasets requires sufficient additional storage space and processing overheads. Hive can index the columns to speed up some operations. It stores the indexed data in another table.
How to do it…
Indexes could be created on the tables in Hive. Let us create a sales
table in Hive on which we are going to create indexes:
Create table sales(id int, fname string, state string, zip string, ip string, pid string) Row format delimited fields terminated by '\t';
Let us create an index on the state
column of this table:
CREATE INDEX index_ip ON TABLE sales(ip) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD;
In the metastore, it is stored in the IDXS
table as shown in the following screenshot: