Running automated benchmarks
The spark-sql-perf
library allows you to run automated benchmarks against the queries of the TPC-DS specifications. If you are interested in studying the queries, you can study the query templates that are bundled in the specifications. If you are interested in studying the Databricks SQL versions of these queries, you can navigate to spark-sql-perf/src/main/resources/tpcds_2_4
. The following screenshot shows how to navigate the IDE:
Figure 13.11 – TPC-DS benchmark queries
As we noted in the Understanding the TPC-DS dataset section, we are not interested in recreating benchmarks. However, if you do want to do so, you can do so by following the README
file of spark-sql-perf
. Let me quickly show you how to run a benchmark in a Databricks workspace.
Note
The spark-sql-perf
library can only run benchmarks against a Spark cluster. It does not have provisions to execute the automated benchmark on SQL Warehouses.
We will...