Experimenting with TPC-DS in Databricks SQL
Now that we have the TPC-DS data generated and ready to query, you are free to experiment and validate everything that we’ve learned in the previous chapters – especially Chapter 8, The Delta Lake.
If you intend to use the TPC-DS benchmarking queries themselves, please note that you will have to import the Databricks versions of the queries into Databricks SQL manually. See Figure 13.11 to learn how to obtain the queries. Otherwise, you can refer to the TPC-DS specification on the ER diagram and row counts to craft your own queries of varying complexity that test the features you want to test.
Keep the metrics you want to measure in mind. A measure such as speed requires that you keep the cluster configuration constant and account for the fact that Databricks SQL will cache table data and query results. Depending on the test, data skipping effectiveness might be a better metric to measure.
As we saw in the Generating...