Logical statement processing flow
When writing T-SQL, it is important to be familiar with the order in which the SQL Database Engine interprets queries, to later create an execution plan. This helps anticipate possible performance issues arising from poorly written queries, as well as helping you understand cases of unintended results. The following steps outline a summarized view of the method that the Database Engine follows to process a T-SQL statement:
- Process all the source and target objects stated in the
FROM
clause (tables, views, and TVFs), together with the intended logical operation (JOIN
andAPPLY
) to perform on those objects. - Apply whatever pre-filters are defined in the
WHERE
clause to reduce the number of incoming rows from those objects. - Apply any aggregation defined in the
GROUP BY
or aggregate functions (for example, aMIN
orMAX
function). - Apply filters that can only be applied on the aggregations as defined in the
HAVING
clause. - Compute the logic for windowing functions such as
ROW_NUMBER
,RANK
,NTILE
,LAG
, andLEAD
. - Keep only the required columns for the output as specified in the
SELECT
clause, and if aUNION
clause is present, combine the row sets. - Remove duplicates from the row set if a
DISTINCT
clause exists. - Order the resulting row set as specified by the
ORDER
BY
clause. - Account for any limits stated in the
TOP
clause.
It becomes clearer now that properly defining how tables are joined (the logical join type) is important to any scalable T-SQL query, namely by carefully planning on which columns the tables are joined. For example, in an inner join, these join arguments are the first level of data filtering that can be enforced, because only the rows that represent the intersection of two tables are eligible for subsequent operations.
Then it also makes sense to filter out rows from the result set using a WHERE
clause, rather than applying any post-filtering conditions that apply to sub-groupings using a HAVING
clause. Consider these two example queries:
SELECT p.ProductNumber, AVG(sod.UnitPrice) FROM Production.Product AS p INNER JOIN Sales.SalesOrderDetail AS sod ON p.ProductID = sod.ProductID GROUP BY p.ProductNumber HAVING p.ProductNumber LIKE 'L%'; SELECT p.ProductNumber, AVG(sod.UnitPrice) FROM Production.Product AS p INNER JOIN Sales.SalesOrderDetail AS sod ON p.ProductID = sod.ProductID WHERE p.ProductNumber LIKE 'L%' GROUP BY p.ProductNumber;
While these two queries are logically equivalent, the second one is more efficient because the rows that do not have a ProductNumber
starting with L will be filtered out of the results before the aggregation is calculated. This is because the SQL Database Engine evaluates a WHERE
clause before a HAVING
clause and can limit the row count earlier in the execution phase, translating into reduced I/O and memory requirements, and also reduced CPU usage when applying the post-filter to the group.
The following diagram summarizes the logical statement-processing flow for the building blocks discussed previously in this chapter:
Figure 1.1: Flow chart summarizing the logical statement-processing flow of a query
Now that we understand the order in which the SQL Database Engine processes queries, let’s explore the essentials of query compilation.