Search icon CANCEL
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
€8.99 | ALL EBOOKS & VIDEOS
Save more on purchases! Buy 2 and save 10%, Buy 3 and save 15%, Buy 5 and save 20%
Learn T-SQL Querying - Second Edition
Learn T-SQL Querying - Second Edition

Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code, Second Edition

By Pedro Lopes , Pam Lahoud
€29.99 €20.98
Book Feb 2024 456 pages 2nd Edition
eBook
€23.99 €8.99
Print
€29.99 €20.98
Subscription
€14.99 Monthly
eBook
€23.99 €8.99
Print
€29.99 €20.98
Subscription
€14.99 Monthly

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Black & white paperback book shipped to your address
Product feature icon AI Assistant (beta) to help accelerate your learning
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now
Table of content icon View table of contents Preview book icon Preview Book

Learn T-SQL Querying - Second Edition

Understanding Query Processing

Transact-SQL, or T-SQL as it has become commonly known, is the language used to communicate with Microsoft SQL Server and Azure SQL Database. Any actions a user wishes to perform in a server, such as retrieving or modifying data in a database, creating objects, or changing server configurations, are all done via T-SQL commands.

The first step in learning to write efficient T-SQL queries is understanding how the SQL Database Engine processes and executes the query. The Query Processor is a component, therefore a noun, should not be all lowercased includes query compilation, query optimization, and query execution essentials: how does the SQL Database Engine compile an incoming T-SQL statement? How does the SQL Database Engine optimize and execute a T-SQL statement? How does the SQL Database Engine use parameters? Are parameters an advantage? When and why does the SQL Database Engine cache execution plans for certain T-SQL statements but not for others? When is that an advantage and when is it a problem? This is information that any T-SQL practitioner needs to keep as a reference for proactive T-SQL query writing, as well as reactive troubleshooting and optimization purposes. This chapter will be referenced throughout all following chapters, as we bridge the gap between architectural topics and real-world usage.

In this chapter, we’re going to cover the following main topics:

  • Logical statement processing flow
  • Query compilation essentials
  • Query optimization essentials
  • Query execution essentials
  • Plan caching and reuse
  • The importance of parameters

Technical requirements

The examples used in this chapter are designed for use on SQL Server 2022 and Azure SQL Database, but they should work on SQL Server version 2012 or later. The Developer Edition of SQL Server is free for development environments and can be used to run all the code samples. There is also a free tier of Azure SQL Database you can use for testing at https://aka.ms/freedb.

You will need the sample database AdventureWorks2016_EXT (referred to as AdventureWorks), which can be found on GitHub at https://github.com/Microsoft/sql-server-samples/releases/tag/adventureworks. The code samples for this chapter can also be found on GitHub at https://github.com/PacktPublishing/Learn-T-SQL-Querying-Second-Edition/tree/main/ch1.

Logical statement processing flow

When writing T-SQL, it is important to be familiar with the order in which the SQL Database Engine interprets queries, to later create an execution plan. This helps anticipate possible performance issues arising from poorly written queries, as well as helping you understand cases of unintended results. The following steps outline a summarized view of the method that the Database Engine follows to process a T-SQL statement:

  1. Process all the source and target objects stated in the FROM clause (tables, views, and TVFs), together with the intended logical operation (JOIN and APPLY) to perform on those objects.
  2. Apply whatever pre-filters are defined in the WHERE clause to reduce the number of incoming rows from those objects.
  3. Apply any aggregation defined in the GROUP BY or aggregate functions (for example, a MIN or MAX function).
  4. Apply filters that can only be applied on the aggregations as defined in the HAVING clause.
  5. Compute the logic for windowing functions such as ROW_NUMBER, RANK, NTILE, LAG, and LEAD.
  6. Keep only the required columns for the output as specified in the SELECT clause, and if a UNION clause is present, combine the row sets.
  7. Remove duplicates from the row set if a DISTINCT clause exists.
  8. Order the resulting row set as specified by the ORDER BY clause.
  9. Account for any limits stated in the TOP clause.

It becomes clearer now that properly defining how tables are joined (the logical join type) is important to any scalable T-SQL query, namely by carefully planning on which columns the tables are joined. For example, in an inner join, these join arguments are the first level of data filtering that can be enforced, because only the rows that represent the intersection of two tables are eligible for subsequent operations.

Then it also makes sense to filter out rows from the result set using a WHERE clause, rather than applying any post-filtering conditions that apply to sub-groupings using a HAVING clause. Consider these two example queries:

SELECT p.ProductNumber, AVG(sod.UnitPrice)
FROM Production.Product AS p
INNER JOIN Sales.SalesOrderDetail AS sod ON p.ProductID = sod.ProductID
GROUP BY p.ProductNumber
HAVING p.ProductNumber LIKE 'L%';
SELECT p.ProductNumber, AVG(sod.UnitPrice)
FROM Production.Product AS p
INNER JOIN Sales.SalesOrderDetail AS sod ON p.ProductID = sod.ProductID
WHERE p.ProductNumber LIKE 'L%'
GROUP BY p.ProductNumber;

While these two queries are logically equivalent, the second one is more efficient because the rows that do not have a ProductNumber starting with L will be filtered out of the results before the aggregation is calculated. This is because the SQL Database Engine evaluates a WHERE clause before a HAVING clause and can limit the row count earlier in the execution phase, translating into reduced I/O and memory requirements, and also reduced CPU usage when applying the post-filter to the group.

The following diagram summarizes the logical statement-processing flow for the building blocks discussed previously in this chapter:

Figure 1.1: Flow chart summarizing the logical statement-processing flow of a query

Figure 1.1: Flow chart summarizing the logical statement-processing flow of a query

Now that we understand the order in which the SQL Database Engine processes queries, let’s explore the essentials of query compilation.

Query compilation essentials

The main stages of query processing can be seen in the following overview diagram, which we will expand on throughout this chapter:

Figure 1.2: Flow chart representing the states of query processing

Figure 1.2: Flow chart representing the states of query processing

The Query Processor is the component inside the SQL Database Engine that is responsible for compiling a query. In this section, we will focus on the highlighted steps of the following diagram that handle query compilation:

Figure 1.3: States of query processing related to query compilation

Figure 1.3: States of query processing related to query compilation

The first stage of query processing is generally known as query compilation and includes a series of tasks that will eventually lead to the creation of a query plan. When an incoming T-SQL statement is parsed to perform syntax validations and ensure that it is correct T-SQL, a query hash value is generated that represents the statement text exactly as it was written. If that query hash is already mapped to a cached query plan, then it can just attempt to reuse that plan. However, if a query plan for the incoming query is not already found in the cache, query compilation proceeds with the following tasks:

  1. Perform binding, which is the process of verifying that the referenced tables and columns exist in the database schema.
  2. References to a view are replaced with the definition of that view (this is called expanding the view).
  3. Load metadata for the referenced tables and columns. This metadata is as follows:
    1. The definition of tables, indexes, views, constraints, and so on, that apply to the query.
    2. Data distribution statistics on the applicable schema object.
  4. Verify whether data conversions are required for the query.

Note

When the query compilation process is complete, a structure that can be used by the Query Optimizer is produced, known as the algebrizer tree or query tree.

The following diagram further details these compilation tasks:

Figure 1.4: Flow of compilation tasks for T-SQL statements

Figure 1.4: Flow of compilation tasks for T-SQL statements

If the T-SQL statement is a Data Definition Language (DDL) statement, there’s no possible optimization, and so a plan is produced immediately. However, if the T-SQL statement is a Data Manipulation Language (DML) statement, the SQL Database Engine will move to an exploratory process known as query optimization, which we will explore in the next section.

Query optimization essentials

The Query Processor is also the component inside the SQL Database Engine that is responsible for query optimization. This is the second stage of query processing and its goal is to produce a query plan that can then be cached for all subsequent uses of the same query. In this section, we will focus on the highlighted sections of the following diagram that handle query optimization:

Figure 1.5: States of query processing related to query optimization

Figure 1.5: States of query processing related to query optimization

The SQL Database Engine uses cost-based optimization, which means that the Query Optimizer is driven mostly by estimations of the required cost to access and transform data (such as joins and aggregations) that will produce the intended result set. The purpose of the optimization process is to reasonably minimize the I/O, memory, and compute resources needed to execute a query in the fastest way possible. But it is also a time-bound process and can time out. This means that the Query Optimizer may not iterate through all the possible optimization permutations of a given T-SQL statement, but rather stops itself after finding an estimated “good enough” compromise between low resource usage and faster execution times.

For this, the Query Optimizer takes several inputs to later produce what is called a query execution plan. These inputs are the following:

  • The incoming T-SQL statement, including any input parameters
  • The loaded metadata, such as statistics histograms, available indexes and indexed views, partitioning, and the number of available schedulers

Note

We will further discuss the role of statistics in Chapter 2, Mechanics of the Query Optimizer, and dive deeper into execution plans in Chapter 3, Exploring Query Execution Plans, later in this book.

As part of the optimization process, the SQL Database Engine also uses internal transformation rules and some heuristics to narrow the optimization space – in other words, to narrow the number of transformation rules that can be applied to the incoming T-SQL statement. The SQL Database Engine has over 400 such transformation rules that are applicable depending on the incoming T-SQL statement. For reference, these rules are exposed in the undocumented dynamic management view sys.dm_exec_query_transformation_stats. The name column in this DMV contains the internal name for the transformation rule. An example is LOJNtoNL: an implementation rule to transform a logical LEFT OUTER JOIN to a physical nested loops join operator.

And so, the Query Optimizer may transform the T-SQL statement as written by a developer before it is allowed to execute. This is because T-SQL is a declarative language: the developer declares what is intended, but the SQL Database Engine determines how to carry out the declared intent. When evaluating transformations, the Query Optimizer must adhere to the rules of logical operator precedence. When a complex expression has multiple operators, operator precedence determines the sequence in which the operations are performed. For example, in a query that uses comparison and arithmetic operators, the arithmetic operators are handled before the comparison operators. This determines whether a Compute Scalar operator can be placed before or after a Filter operator.

The Query Optimizer will consider numerous strategies to search for an efficient execution plan, including the following:

  • Index selection

    Are there indexes to cover the whole or parts of the query? This is done based on which search and join predicates (conditions) are used, and which columns are required for the query output.

  • Logical join reordering

    The order in which tables are actually joined may not be the same order as they are written in the T-SQL statement itself. The SQL Database Engine uses heuristics as well as statistics to narrow the number of possible join permutations to test, and then estimate which join order results in early filtering of rows and less resource usage. For example, depending on how a query that joins 6 tables is written, possible join reordering permutations range from roughly 700 to over 30,000.

  • Partitioning

    Is data partitioned? If so, and depending on the predicate, can the SQL Database Engine avoid accessing some partitions that are not relevant for the query?

  • Parallelism

    Is it estimated that execution will be more efficient if multiple CPUs are used?

  • Whether to expand views

    Is it better to use an indexed view, or conversely expand and inline the view definition to account for the base tables?

  • Join elimination

    Are two tables being joined in a way that the number of rows resulting from that join is zero? If so, the join may not even be executed.

  • Sub-query elimination

    This relies on the same principle as join elimination. Was it estimated that the correlated or non-correlated sub-query will produce zero rows? If so, the sub-query may not even be executed.

  • Constraint simplification

    Is there an active constraint that prevents any rows from being generated? For example, does a column have a non-nullable constraint, but the query predicate searches for null values in that column? If so, then that part of the query may not even be executed.

  • Eligibility for parameter sensitivity optimization

    Is the database where the query is executing subject to Database Compatibility Level 160? If so, are there parameterized predicates considered at risk of being impacted by parameter sniffing?

  • Halloween protection

    Is this an update plan? If so, is there a need to add a blocking operator?

Note

An update plan has two parts: a read part that identifies the rows to be updated and a write part that performs the updates, which must be executed in two separate steps. In other words, the actual update of rows must not affect the selection of which rows to update. This problem of ensuring that the write cursor of an update plan does not affect the read cursor is known as “Halloween protection” as it was discovered by IBM researchers more than 40 years ago, precisely on Halloween.

For the Query Optimizer to do its job efficiently in the shortest amount of time possible, data professionals need to do their part, which can be distilled into three main principles:

  • Design for performance

    Ensure that our tables are designed with purposeful use of the appropriate data types and lengths, that our most used predicates are covered by indexes, and that the engine is allowed to identify and create the required statistical information.

  • Write simple T-SQL queries

    Be purposeful with the number of joined tables, how the joins are expressed, the number of columns needed for the result set, how parameters and variables are declared, and which data transformations are used. Complexity comes at a cost and it may be a wise strategy to break down long T-SQL statements into smaller parts that create intermediate result sets.

  • Maintain our database health

    From a performance standpoint alone, ensure that index maintenance and statistics updates are done regularly.

At this point, it starts to become clear that how we write a query is fundamental to achieving good performance. But it is equally important to make sure the Query Optimizer is given a chance to do its job to produce an efficient query plan. That job is dependent on having metadata available that accurately portrays the data distribution in base tables and indexes. Later in this book, in Chapter 5, Writing Elegant T-SQL Queries, we will further distill what data professionals need to know to write efficient T-SQL that performs well.

Also, in the Mechanics of the Query Optimizer chapter, we will cover the Query Optimizer and the estimation process in greater detail. Understanding how the SQL Database Engine optimizes a query and what the process looks like is a fundamental step toward troubleshooting query performance – a task that every data professional will do at some point in their career.

Now that we have reviewed query compilation and optimization, the next step is query execution, which we will explore in the following section.

Query execution essentials

Query execution is driven by the Relational Engine in the SQL Database Engine. This means executing the plan that resulted from the optimization process. In this section, we will focus on the highlighted parts of the following diagram that handle query execution:

Figure 1.6: States of query processing related to query execution

Figure 1.6: States of query processing related to query execution

Before execution starts, the Relational Engine needs to initialize the estimated amount of memory needed to run the query, known as a memory grant. Along with the actual execution, the Relational Engine schedules the worker threads (also known as threads or workers) for the processes to run on and provides inter-thread communication. The number of worker threads spawned depends on two key aspects:

  • Whether the plan is eligible for parallelism as determined by the Query Optimizer.
  • What the actual available degree of parallelism (DOP) is in the system based on the current load. This may differ from the estimated DOP, which is based on the server configuration max degree of parallelism (MaxDOP). For example, the MaxDOP may be 8 but the available DOP at runtime can be only 2, which impacts query performance.

During execution, as parts of the plan that require data from the base tables are processed, the Relational Engine requests that the Storage Engine provide data from the relevant rowsets. The data returned from the Storage Engine is processed into the format defined by the T-SQL statement, and returns the result set to the client.

This doesn’t change even on highly concurrent systems. However, as the SQL Database Engine needs to handle many requests with limited resources, waiting and queuing are how this is achieved.

To understand waits and queues in the SQL Database Engine, it is important to introduce other query execution-related concepts. From an execution standpoint, this is what happens when a client application needs to execute a query:

Figure 1.7: Timeline of events when a client application executes a query

Figure 1.7: Timeline of events when a client application executes a query

Tasks and workers can naturally accumulate waits until a request completes – we will see how to monitor these in Building diagnostic queries using DMVs and DMFs. These waits are surfaced in each request, which can be in one of three different statuses during its execution:

Figure 1.8: States of task execution in the Database Engine

Figure 1.8: States of task execution in the Database Engine

  • Running: When a task is actively running within a scheduler.
  • Suspended: When a task that is running in a scheduler finds out that a required resource is not available at the moment, such as a data page, it voluntarily yields its allotted processor time so that another request can proceed instead of allowing for idle processor time. But a task can be in this state before it even gets on a scheduler. For example, if there isn’t enough memory to grant to a new incoming query, that query must wait for memory to become available before starting actual execution.
  • Runnable: When a task is waiting on a first-in first-out queue for scheduler time, but otherwise has access to the required resources such as data pages.

All these concepts and terms play a fundamental role in understanding query execution and are also important to keep in mind when troubleshooting query performance. We will further explore how to detect some of these execution conditions in Chapter 3, Exploring Query Execution Plans.

Plan caching and reuse

As we have now established, the process of optimizing a query can consume a large amount of resources and take a significant amount of time, so it makes sense to avoid that effort if possible whenever a query is executed. The SQL Database Engine caches nearly every plan that is created so that it can be reused when the same query is executed again. But not all execution plans are eligible for caching; for example, no DDL statements are cached, such as CREATE TABLE. As for DML statements, most simple forms that only have one possible execution plan are also not cached, such as INSERT INTO … VALUES.

There are several different methods for plan caching. The method that is used is typically based on how the query is called from the client. The different methods of plan caching that will be covered in this section are the following:

  • Stored procedures
  • Ad hoc plan caching
  • Parameterization (simple and forced)
  • The sp_executesql procedure
  • Prepared statements

Stored procedures

A stored procedure is a group of one or more T-SQL statements that is stored as an object in a SQL database. Stored procedures are like procedures in other programming languages in that they can accept input parameters and return output parameters, they can contain control flow logic such as conditional statements (IF … ELSE), loops (WHILE), and error handling (TRY … CATCH), and they can return a status value to the caller indicating success or failure. They can even contain calls to other stored procedures. There are many benefits to using stored procedures, but in this section, we will focus mainly on their benefit of reducing the overhead of the compilation process through caching.

The first time a stored procedure is executed, the SQL Database Engine compiles and optimizes the T-SQL within the procedure, and the resulting execution plan is cached for future use. Every subsequent call to the procedure reuses the cached plan, until such a time as the plan is removed from the cache due to reasons such as the following:

  • Memory pressure
  • Server restart
  • Plan invalidation – when the underlying objects are changed in some way or a significant amount of data is changed

Stored procedures are the preferred method for plan caching as they provide the most effective mechanism of caching and reusing query plans in the SQL Database Engine.

Ad hoc plan caching

An ad hoc query is a T-SQL query that is sent to the server as a block of text with no parameter markers or other constructs. They are typically built on the fly as needed, such as a query that is typed into a query window in SQL Server Management Studio (SSMS) and executed, or one that is sent to the server using the EXECUTE command as in the following code example, which can be executed in the AdventureWorks sample database:

EXECUTE (N'SELECT LastName, FirstName, MiddleName
FROM Person.Person
WHERE PersonType = N''EM'';')

Note

The letter N preceding a string in a T-SQL script indicates that the string should be interpreted as Unicode with UTF-16 encoding. In order to avoid implicit data-type conversions, be sure to specify N for all Unicode string literals when writing T-SQL scripts that involve the NCHAR and NVARCHAR data types. We discuss implicit conversions and their impact on performance in Chapter 6, Discovering T-SQL Anti-Patterns in Depth.

The process of parsing and optimizing an ad hoc query is like that of a stored procedure, and will be just as costly, so it is worth the SQL Database Engine storing the resulting plan in the cache in case the exact same query is ever executed again. The problem with ad hoc caching is that it is extremely difficult to ensure that the resulting plan is reused.

For the SQL Database Engine to reuse an ad hoc plan, the incoming query must match the cached query exactly. Every character must be the same, including spaces, line breaks, and capitalization. The reason for this is that the SQL Database Engine uses a hash function across the entire string to match the T-SQL statement. If even one character is off, the hash values will not match, and the SQL Database Engine will again compile, optimize, and cache the incoming ad hoc statement. For this reason, ad hoc caching cannot be relied upon as an effective caching mechanism.

Note

Even if the database is configured to use case-insensitive collation, this does not apply to query parsing. The ad hoc plan matching is still case sensitive because of the algorithm used to generate the hash value for the query string.

If there are many ad hoc queries being sent to an instance of the SQL Database Engine, the plan cache can become bloated with single-use plans. This can cause performance issues on the system as the plan cache will be unnecessarily large, taking up memory that could be better used elsewhere in the system. In this case, turning on the optimize for ad hoc workloads server configuration option is recommended. When this option is turned on, the SQL Database Engine will cache a small plan stub object the first time an ad hoc query is executed. This object takes up much less space than a full plan object and will minimize the size of the ad hoc cache. If the query is ever executed a second time, the full plan will be cached.

Tip

See the chapter Building Diagnostic Queries using DMVs and DMFs later in this book for a query that will help identify single-use plans in the cache.

Parameterization

Parameterization is the practice of replacing a literal value in a T-SQL statement with a parameter marker. Building on the example from the Ad hoc plan caching section, the following code block shows an example of a parameterized query executed in the AdventureWorks sample database:

DECLARE @PersonType AS nchar(2) = N'EM';
SELECT LastName, FirstName, MiddleName
FROM Person.Person
WHERE PersonType = @PersonType;

In this case, the literal value 'EM' is moved from the T-SQL statement itself into a DECLARE statement, and the variable is used in the query instead. This allows the query plan to be reused for different @PersonType values, whereas sending different values directly in the query string would result in a separate cached ad hoc plan.

Simple parameterization

In order to minimize the impact of ad hoc queries, the SQL Database Engine will automatically parameterize some simple queries by default. This is called simple parameterization and is the default setting of the Parameterization database option. With parameterization set to Simple, the SQL Database Engine will automatically replace literal values in an ad hoc query with parameter markers in order to make the resulting query plan reusable. This works for some queries, but there is a very small class of queries that can be parameterized this way.

As an example, the query we introduced previously in the Parameterization section would not be automatically parameterized in simple mode because it is considered unsafe. This is because different PersonType values may yield a different number of rows, and thus require a different execution plan. However, the following query executed in the AdventureWorks sample database would qualify for simple automatic parameterization:

SELECT LastName, FirstName, MiddleName
FROM Person.Person
WHERE BusinessEntityID = 5;

This query would not be cached as-is. The SQL Database Engine would convert the literal value of 5 to a parameter marker, and it would look something like this in the cache:

(@1 tinyint) SELECT LastName, FirstName, MiddleName
FROM Person.Person
WHERE BusinessEntityID = @1;

Forced parameterization

If an application tends to generate many ad hoc queries, and there is no way to modify the application to parameterize the queries, the Parameterization database option can be changed to Forced. When forced parameterization is turned on, the SQL Database Engine will replace all literal values in all ad hoc queries with parameter markers for the majority of use cases. However, note that there are documented exceptions that are either of the following:

  • Edge cases that most developers will not face, such as statements that contain more than 2,097 literals
  • Non-starters because statements will not be parameterized irrespective of whether forced parameterization is enabled or not, such as when statements contain the RECOMPILE query hint, statements inside the bodies of stored procedures, triggers, user-defined functions, or prepared statements that have already been parameterized on the client-side application

Take the example of the following query executed in the AdventureWorks sample database:

SELECT LastName, FirstName, MiddleName
FROM Person.Person
WHERE PersonType = N'EM' AND BusinessEntityID IN (5, 7, 13, 17, 19);

This query would be automatically parameterized under forced parameterization as follows:

(@1 nchar(2), @2 int, @3 int, @4 int, @5 int, @6 int) SELECT LastName, FirstName, MiddleName
FROM Person.Person
WHERE PersonType = @1 AND BusinessEntityID IN (@2, @3, @4, @5, @6);

This has the benefit of increasing the reusability of all ad hoc queries, but there are some risks to parameterizing all literal values in all queries, which will be discussed later in the The importance of parameters section.

The sp_executesql procedure

The sp_executesql procedure is the recommended method for sending an ad hoc T-SQL statement to the SQL Database Engine. If stored procedures cannot be leveraged for some reason, such as when T-SQL statements must be constructed dynamically by the application, sp_executesql allows the user to send an ad hoc T-SQL statement as a parameterized query, which uses a similar caching mechanism to stored procedures. This ensures that the plan can be reused whenever the same query is executed again. Building on our example from the Ad hoc plan caching section, we can re-write the query using sp_executesql as in the following example, which can be executed in the AdventureWorks sample database:

EXECUTE sp_executesql @stmt = N'SELECT LastName,
      FirstName, MiddleName
      FROM Person.Person
      WHERE PersonType = @PersonType;',
@params = N'@PersonType nchar(2)',
@PersonType = N'EM';

This ensures that any time the same query is sent with the same parameter markers, the plan will be reused, even if the statement is dynamically generated by the application.

Prepared statements

Another method for sending parameterized T-SQL statements to the SQL Database Engine is by using prepared statements. Leveraging prepared statements involves three different system procedures:

  1. sp_prepare: Defines the statement and parameters that are to be executed, creates an execution plan for the query, and sends a statement handle back to the caller that can be used for subsequent execution.
  2. sp_execute: Executes the statement defined by sp_prepare by sending the statement handle along with any parameters to the SQL Database Engin.
  3. sp_unprepare: Discards the execution plan created by sp_prepare for the query specified by the statement handle

Steps 1 and 2 can optionally be combined into a single sp_prepexec statement to save a round-trip to the server.

This method is not generally recommended for plan reuse as it is a legacy construct and may not take advantage of some of the benefits of parameterized statements that sp_executesql and stored procedures can leverage. It is worth mentioning, however, because it is used by some cross-platform database connectivity libraries such as Open Database Connectivity (ODBC) and Java Database Connectivity (JDBC) as the default mechanism for sending queries to the SQL Database Engine.

Now that we’ve learned the different ways that plans may be cached, let’s explore how plans may be reused during query processing.

How query processing impacts plan reuse

It’s important to contextualize what happens in terms of query processing that can result in plan caching and reuse. In this section, we will focus on the highlighted section of the following diagram that determines whether a query plan can be reused from the cache or needs to be recompiled:

Figure 1.9: States of query processing related to query compilation/recompilation

Figure 1.9: States of query processing related to query compilation/recompilation

As mentioned before, when an incoming T-SQL statement is parsed, a query hash value representing that statement is generated, and if that query hash is already mapped to a cached query plan, then it can just attempt to reuse that plan – unless special circumstances exist that don’t even allow plan caching, such as when the RECOMPILE hint is present in the T-SQL statement.

Assuming no such pre-existing conditions exist, after matching the query hash with a plan hash, the currently cached plan is tested for correctness, meaning that the SQL Database Engine will check whether anything has changed in the underlying referenced objects that would require the plan to be recompiled. For example, if a new index was created or an existing index referenced in the plan was dropped, the plan must be recompiled.

If the cached plan is found to be correct, then the SQL Database Engine also checks whether enough data has changed to warrant a new plan. This refers to the statistics objects associated with tables and indexes used in the T-SQL statement, and if any are deemed outdated – meaning its modification counter is high enough as it relates to the overall cardinality of the table to consider it stale.

Note

In SQL Server 2022 and Azure SQL Database, if the new Parameter Sensitive Plan (PSP) Optimization feature is used, one query hash can map to multiple query plan hashes. Each different plan hash is a standalone query plan called a variant, and maps to a single query hash that was deemed eligible for PSP Optimization. Each plan variant can be recompiled independently. PSP Optimization will be discussed later in the The importance of parameters section.

We will further discuss the role of statistics in the chapter Mechanics of the Query Optimizer, and query hashes and query plan hashes in the chapter Exploring Query Execution Plans, in the Operator-level properties section.

If nothing has significantly changed, then the query plan can be executed, as we discussed in this chapter in the Query execution essentials section.

The following picture depicts the high-level process for an already cached plan that can be executed as-is:

Figure 1.10: Process for executing a cached plan as-is

Figure 1.10: Process for executing a cached plan as-is

However, if any of the preceding checks fail, then the SQL Database Engine invalidates the cached plan and a new query plan needs to be compiled, as the available optimization space may be different from the last time the plan was compiled and cached. In this case, the T-SQL statement needs to undergo recompilation and go through the optimization process driven by the Query Optimizer so that a new query execution plan is generated (we will describe this process in greater detail in the chapter Mechanics of the Query Optimizer). If eligible, this newly generated query plan is cached.

Note

The same process is followed for new incoming queries where no query plan is yet cached.

Now that we understand how the SQL Database Engine caches and reuses query plans, let’s explore one of the most important factors that determines whether a plan may be reused – parameters.

The importance of parameters

As we discussed in the previous section on caching methods, the primary reason to parameterize queries is to ensure that query execution plans get reused – but why is this important and what other reasons might there be to use parameters?

Security

One reason for using parameterized queries is for security. Using a properly formatted parameterized query can protect against SQL injection attacks. A SQL injection attack is one where a malicious user can execute database code (in this case, T-SQL) on a server by appending it to a data entry field in the application. As an example, assume we have an application that contains a form that asks the user to enter their name into a text box. If the application were to use an ad hoc statement to insert this data into the database, it would generally concatenate a T-SQL string with the user input, as in the following code:

DECLARE @sql nvarchar(MAX);
SET @sql = N'INSERT Users (Name) VALUES (''' + <user input> + ''');';
EXECUTE (@sql);

A malicious user might enter the following value into the text box:

Bob'); DROP TABLE Users; --

If this is the case, the actual code that gets sent to the SQL Database Engine would look like the following:

INSERT Users (Name) VALUES ('Bob'); DROP TABLE Users; --');

This is a valid T-SQL syntax that would successfully execute. It would first insert a row into the Users table with the Name column set to 'Bob', then it would drop the Users table. This would of course break the application, and unless there was some sort of auditing in place, we would never know what happened.

Let’s look at this example again using a parameterized query. The code might look like the following:

EXECUTE sp_executesql @stmt = N'INSERT Users (Name) VALUES (@name)', @params = N'@name nvarchar(100)', @name = <user input>

This time, if the user were to send the same input, rather than executing the query that the user embedded in the string, the Database Engine would insert a row into the Users table with the Name column set to 'Bob'); DROP TABLE Users; --'. This would obviously look a bit strange, but it wouldn’t break the application nor breach security.

Performance

Another reason to leverage parameters is performance. In a busy SQL system, particularly one that has a primarily Online Transaction Processing (OLTP) workload, we may have hundreds or even thousands of queries executing per second.

Assume that each one of these queries takes about 100 ms to compile and consumes about the same amount of CPU. This would mean that each second on the system, the server could be consuming hundreds of seconds of CPU time just compiling queries. That’s a lot of resources to consume just for preparing the queries for execution, and it doesn’t leave a lot of overhead for actually executing them.

Also recall that when plans are not reused, the procedure cache can become very large and consume memory that in turn won’t be available for storing data and executing queries. In short, a system that spends too much time compiling queries may become CPU and/or memory bound and may perform poorly.

Parameter sniffing

Given that query plan reuse is so important, why wouldn’t the SQL Database Engine parameterize every query that comes in by default? One of the reasons for this is to avoid query performance issues that may result from parameter sniffing. Parameter sniffing is something the SQL Database Engine does in order to optimize a parameterized query. The first time a stored procedure or other parameterized query executes, the input parameter values are used to drive the optimization process and produce the execution plan, as discussed in the Query optimization essentials section.

That execution plan will then be cached and reused by subsequent executions of the procedure or query. For most queries, this is a good thing because using a specific value will result in a more accurate cost estimation. In some situations, however, particularly where the data distribution is skewed in some way, the parameters that are sent the first time the query is executed may not represent the typical use case of the query, and the plan that is generated may perform poorly when other parameter values are sent. This is a case where reusing a plan might not be a good thing, because the plan is highly sensitive to user-defined runtime parameters that have widely different data distributions for the same column.

Parameter sniffing, or parameter sensitivity, is a very common cause of plan variability and performance issues in the SQL Database Engine.

Parameter Sensitive Plan Optimization

SQL Server 2022 introduces the Parameter Sensitive Plan Optimization feature (commonly referred to as PSP Optimization), which allows the Database Engine to simultaneously cache multiple plans for a single parameterized query that uses equality predicates.

With PSP Optimization, during the initial compilation of a parameterized query, the Query Optimizer will evaluate up to three parameters that are likely sensitive to non-uniform (skewed) data distributions. The feature uses the statistics histograms to search for where the cardinality difference between the least-occurring value and the most-occurring value for a given column is orders of magnitude off. The result is the creation of what is called a dispatcher plan, which contains the logic (dispatcher expression) that bucketizes the predicates’ values, upon which different plan variants can be compiled independently.

For each cardinality bucket, a query plan variant will only be compiled if needed, based on actual runtime parameters. If the parameter values that would result in a given plan variant are never used at runtime, then that variant of the plan defined in the dispatcher plan will never actually get compiled. This behavior prevents plan-cache bloating by compiling a plan only if and when its predicate value demands it.

The following diagram shows the possible plan variants found for a parameterized query with a WHERE person.ID = @param search predicate:

Figure 1.11: Example of a dispatcher plan defining three query plan variants

Figure 1.11: Example of a dispatcher plan defining three query plan variants

We will discuss parameter sensitivity behavior in more detail later in this book, in Chapter 5, Writing Elegant T-SQL Queries, and Chapter 6, Discovering T-SQL Anti-Patterns in Depth.

To cache or not to cache

In general, caching and reusing query plans is a good thing, and writing T-SQL code that encourages plan reuse is recommended.

In some cases, such as with a reporting or OLAP workload, caching queries might make less sense. These types of systems tend to have a heavy ad hoc workload. The queries that run are typically long-running and, while they may consume a large amount of resources in a single execution, they typically run with less frequency than OLTP systems. Since these queries tend to be long-running, saving a few hundred milliseconds by reusing a cached plan doesn’t make as much sense as creating a new plan that is designed specifically for that execution of the query. Spending that time compiling a new plan may even result in saving more time in the long run, since a fresh plan will likely perform better than a plan that was generated based on a different set of parameter values.

In summary, for most workloads in the SQL Database Engine, leveraging stored procedures and/or parameterized queries is recommended to encourage plan reuse. For workloads that have heavy ad hoc queries and/or long-running reporting-style queries, consider enabling the optimize for ad hoc workloads server setting and leveraging the RECOMPILE hint to guarantee a new plan for each execution (provided that the queries are run with a low frequency), or use forced parameterization to improve plan reuse opportunities. Also, be sure to review Chapter 8, Building Diagnostic Queries Using DMVs and DMFs, for techniques to identify single-use plans, monitor for excessive recompilation, and identify plan variability and potential parameter sniffing issues.

Summary

As this chapter has shown, the way a T-SQL query is written and submitted to the server influences how it is interpreted and executed by the SQL Database Engine. Even before a single T-SQL query is written, the choice of development style (for example, using stored procedures versus ad hoc statements) can have a direct impact on the performance of the application. As we continue our exploration of the internals of SQL Database Engine query processing and optimization, we will find more and more opportunities to write T-SQL queries in a way that encourages optimal query performance, starting with the next chapter.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • A definitive guide to mastering the techniques of writing efficient T-SQL code
  • Learn query optimization fundamentals, query analysis, and how query structure impacts performance
  • Discover insightful solutions to detect, analyze, and tune query performance issues
  • Purchase of the print or Kindle book includes a free PDF eBook

Description

Data professionals seeking to excel in Transact-SQL (T-SQL) for Microsoft SQL Server and Azure SQL Database often lack comprehensive resources. This updated second edition of Learn T-SQL Querying focuses on indexing queries and crafting elegant T-SQL code, catering to all data professionals seeking mastery in modern SQL Server versions and Azure SQL Database. Starting with query processing fundamentals, this book lays a solid foundation for writing performant T-SQL queries. You’ll explore the mechanics of the Query Optimizer and Query Execution Plans, learning how to analyze execution plans for insights into current performance and scalability. Through dynamic management views (DMVs) and dynamic management functions (DMFs), you’ll build diagnostic queries. This book thoroughly covers indexing for T-SQL performance and provides insights into SQL Server’s built-in tools for expedited resolution of query performance and scalability issues. Further, hands-on examples will guide you through implementing features such as avoiding UDF pitfalls, understanding predicate SARGability, Query Store, and Query Tuning Assistant. By the end of this book, you‘ll have developed the ability to identify query performance bottlenecks, recognize anti-patterns, and skillfully avoid such pitfalls.

What you will learn

Identify opportunities to write well-formed T-SQL statements Familiarize yourself with the Cardinality Estimator for query optimization Create efficient indexes for your existing workloads Implement best practices for T-SQL querying Explore Query Execution Dynamic Management Views Utilize the latest performance optimization features in SQL Server 2017, 2019, and 2022 Safeguard query performance during upgrades to newer versions of SQL Server
Estimated delivery fee Deliver to Finland

Premium delivery 7 - 10 business days

€26.95
(Includes tracking information)

Product Details

Country selected

Publication date : Feb 29, 2024
Length 456 pages
Edition : 2nd Edition
Language : English
ISBN-13 : 9781837638994
Vendor :
Microsoft
Category :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Black & white paperback book shipped to your address
Product feature icon AI Assistant (beta) to help accelerate your learning
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now
Estimated delivery fee Deliver to Finland

Premium delivery 7 - 10 business days

€26.95
(Includes tracking information)

Product Details


Publication date : Feb 29, 2024
Length 456 pages
Edition : 2nd Edition
Language : English
ISBN-13 : 9781837638994
Vendor :
Microsoft
Category :

Table of Contents

18 Chapters
Preface Chevron down icon Chevron up icon
1. Part 1: Query Processing Fundamentals Chevron down icon Chevron up icon
2. Chapter 1: Understanding Query Processing Chevron down icon Chevron up icon
3. Chapter 2: Mechanics of the Query Optimizer Chevron down icon Chevron up icon
4. Part 2: Dos and Don’ts of T-SQL Chevron down icon Chevron up icon
5. Chapter 3: Exploring Query Execution Plans Chevron down icon Chevron up icon
6. Chapter 4: Indexing for T-SQL Performance Chevron down icon Chevron up icon
7. Chapter 5: Writing Elegant T-SQL Queries Chevron down icon Chevron up icon
8. Chapter 6: Discovering T-SQL Anti- Patterns in Depth Chevron down icon Chevron up icon
9. Part 3: Assembling Our Query Troubleshooting Toolbox Chevron down icon Chevron up icon
10. Chapter 7: Building Diagnostic Queries Using DMVs and DMFs Chevron down icon Chevron up icon
11. Chapter 8: Building XEvent Profiler Traces Chevron down icon Chevron up icon
12. Chapter 9: Comparative Analysis of Query Plans Chevron down icon Chevron up icon
13. Chapter 10: Tracking Performance History with Query Store Chevron down icon Chevron up icon
14. Chapter 11: Troubleshooting Live Queries Chevron down icon Chevron up icon
15. Chapter 12: Managing Optimizer Changes Chevron down icon Chevron up icon
16. Index Chevron down icon Chevron up icon
17. Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Top Reviews
No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela