Query compilation essentials
The main stages of query processing can be seen in the following overview diagram, which we will expand on throughout this chapter:
Figure 1.2: Flow chart representing the states of query processing
The Query Processor is the component inside the SQL Database Engine that is responsible for compiling a query. In this section, we will focus on the highlighted steps of the following diagram that handle query compilation:
Figure 1.3: States of query processing related to query compilation
The first stage of query processing is generally known as query compilation and includes a series of tasks that will eventually lead to the creation of a query plan. When an incoming T-SQL statement is parsed
to perform syntax validations and ensure that it is correct T-SQL, a query hash value is generated that represents the statement text exactly as it was written. If that query hash is already mapped to a cached query plan, then it can just attempt to reuse that plan. However, if a query plan for the incoming query is not already found in the cache, query compilation proceeds with the following tasks:
- Perform binding, which is the process of verifying that the referenced tables and columns exist in the database schema.
- References to a view are replaced with the definition of that view (this is called expanding the view).
- Load metadata for the referenced tables and columns. This metadata is as follows:
- The definition of tables, indexes, views, constraints, and so on, that apply to the query.
- Data distribution statistics on the applicable schema object.
- Verify whether data conversions are required for the query.
Note
When the query compilation process is complete, a structure that can be used by the Query Optimizer is produced, known as the algebrizer tree or query tree.
The following diagram further details these compilation tasks:
Figure 1.4: Flow of compilation tasks for T-SQL statements
If the T-SQL statement is a Data Definition Language (DDL) statement, there’s no possible optimization, and so a plan is produced immediately. However, if the T-SQL statement is a Data Manipulation Language (DML) statement, the SQL Database Engine will move to an exploratory process known as query optimization, which we will explore in the next section.