Hive is a widely used data warehouse tool on top of Hadoop and plays an important role in running daily batch jobs and business reporting queries by using an execution engine such as MapReduce, Apache Tez, Apache Spark, and so on. It is important to do a benchmark test for it.
Hive
TPC-DS
One of the important benchmarking paradigms for Hive is TPC-DS. This benchmarking standard is created especially for big data systems that cater for multiple business needs and different kinds of queries, such as data mining, ad hoc, transaction-oriented, and reporting. You can use the Hortonworks hive-testbench open source package to run TPC-DS benchmark. The following are the steps to execute with HIVE 13:
- Clone the lastest GitHub repository...