Finding shuffling in a pipeline
As we learned in the previous section, shuffling data is a very expensive operation and we should try to reduce it as much as possible. In this section, we will learn how to identify shuffles in the query execution path for both Synapse SQL and Spark.
Identifying shuffles in a SQL query plan
To identify shuffles, print the query plan using the EXPLAIN
statement. Here is an example.
Consider a Synapse SQL table, DimDriver
, as shown in the following screenshot:
Here is a sample EXPLAIN
statement:
EXPLAIN WITH_RECOMMENDATIONS SELECT [gender], SUM([salary]) as Totalsalary FROM dbo.DimDriver GROUP BY [gender]
This will generate a plan similar to the one shown in the following screenshot....