Optimizing query performance in Synapse Spark pools
There are several methods you can use to optimize the performance of queries in a lake database, such as caching, indexing, partitioning, Z-ordering, data skipping, and using query hints. This recipe will showcase the following two methods to optimize the performance of a query:
- Z-ordering: Z-ordering helps the Spark engine easily locate columns with the same value
- Partitioning: Partitioning will partition the Delta lake table into smaller chunks, creating subfolders in the data lake storage account for each distinct value on the partitioned column
Getting ready
To get started, log into https://portal.azure.com using your Azure credentials.
Create a Synapse Analytics workspace, as explained in the Provisioning an Azure Synapse Analytics workspace recipe of Chapter 8, Processing Data Using Azure Synapse Analytics.
Create a Spark pool cluster, as explained in the Provisioning and configuring Spark pools recipe...