[box type="note" align="" class="" width=""]This article is an excerpt from a book SQL Server 2017 Administrator's Guide written by Marek Chmel and Vladimír Mužný. This book will help you learn to implement and administer successful database solution with SQL Server 2017.[/box]
Today, we will perform SQL Server analysis, and also learn ways for efficient performance monitoring and tuning.
Performance monitoring and tuning is a crucial part of your database administration skill set so as to keep the performance and stability of your server great, and to be able to find and fix the possible issues. The overall system performance can decrease over time; your system may work with more data or even become totally unresponsive. In such cases, you need the skills and tools to find the issue to bring the server back to normal as fast as possible. We can use several tools on the operating system layer and, then, inside the SQL Server to verify the performance and the possible root cause of the issue.
The first tool that we can use is the performance monitor, which is available on your Windows Server:
Performance monitor can be used to track important SQL Server counters, which can be very helpful in evaluating the SQL Server Performance. To add a counter, simply right-click on the monitoring screen in the Performance monitoring and tuning section and use the Add Counters item. If the SQL Server instance that you're monitoring is a default instance, you will find all the performance objects listed as SQL Server. If your instance is named, then the performance objects will be listed as MSSQL$InstanceName in the list of performance objects. We can split the important counters to watch between the system counters for the whole server and specific SQL Server counters.
The list of system counters to watch include the following:
Once you have added all these system counters, you can see the values real time or you can configure a data collection, which will run for a specified selected time and periodically collect the information:
With SQL Server-specific counters, we can dig deeper into the CPU, memory, and storage utilization to see what the SQL Server is doing and how the SQL Server is utilizing the subsystems.
Important counters to watch for SQL Server memory utilization include counters from the
SQL Server: Buffer Manager performance object and from SQL Server:Memory Manager:
Important counters to watch for SQL Server memory utilization include counters from the SQL Server: Buffer Manager performance object and from SQL Server:Memory Manager:
To check the performance counters, you can also use a T-SQL query, where you can query the sys.dm_os_performance_counters DMV:
SELECT [counter_name] as [Counter Name], [cntr_value]/1024 as [Server Memory (MB)]
FROM sys.dm_os_performance_counters
WHERE
[object_name] LIKE '%Memory Manager%'
AND [counter_name] IN ('Total Server Memory (KB)', 'Target Server Memory (KB)')
This query will return two values—one for target memory and one for total memory. These two should be close to each other on a warmed up system. Another query you can use is to get the information from a DMV named sys.dm_0s_sys_memory:
SELECT total_physical_memory_kb/1024/1024 AS [Physical Memory (GB)],
available_physical_memory_kb/1024/1024 AS [Available Memory (GB)],
system_memory_state_desc AS [System Memory State]
FROM sys.dm_os_sys_memory WITH (NOLOCK) OPTION (RECOMPILE)
This query will display the available physical memory and the total physical memory of your server with several possible memory states:
The memory grants can be verified with a T-SQL query:
SELECT [object_name] as [Object name] , cntr_value AS [Memory Grants Pending]
FROM sys.dm_os_performance_counters WITH (NOLOCK)
WHERE
[object_name] LIKE N'%Memory Manager%'
AND counter_name = N'Memory Grants Pending' OPTION (RECOMPILE);
If you face memory issues, there are several steps you can take for improvements:
The important counters to watch for SQL Server storage utilization would include counters from the SQL Server:Access Methods performance object:
Monitoring the disk system is crucial, since your disk is used as a storage for the following:
To verify the disk latency and IOPS metric of your drives, you can use the Performance monitor, or the T-SQL commands, which will query the sys.dm_os_volume_stats and sys.dm_io_virtual_file_stats DMF. Simple code to start with would be a T-SQL script utilizing the DMF to check the space available within the database files:
SELECT f.database_id, f.file_id, volume_mount_point, total_bytes, available_bytes
FROM sys.master_files AS f
CROSS APPLY sys.dm_os_volume_stats(f.database_id, f.file_id);
To check the I/O file stats with the second provided DMF, you can use a T-SQL code for checking the information about tempDB data files:
SELECT * FROM sys.dm_io_virtual_file_stats (NULL, NULL) vfs
join sys.master_files mf on mf.database_id = vfs.database_id and mf.file_id
= vfs.file_id
WHERE mf.database_id = 2 and mf.type = 0
To measure the disk performance, we can use a tool named DiskSpeed, which is a replacement for older SQLIO tool, which was used for a long time.
DiskSpeed is an external utility, which is not available on the operating system by default. This tool can be downloaded from GitHub or the Technet Gallery at https://github.com/microsoft/diskspd. |
The following example runs a test for 15 seconds using a single thread to drive 100 percent random 64 KB reads at a depth of 15 overlapped (outstanding) I/Os to a regular file:
DiskSpd –d300 -F1 -w0 -r –b64k -o15 d:datafile.dat
We can use the whole Wait Statistics approach for a thorough understanding of the SQL Server workload and undertake performance troubleshooting based on the collected data. Wait Statistics are based on the fact that, any time a request has to wait for a resource, the SQL Server tracks this information, and we can use this information for further analysis.
When we consider any user process, it can include several threads. A thread is a single unit of execution on SQL Server, where SQLOS controls the thread scheduling instead of relying on the operating system layer. Each processor core has it's own scheduler component responsible for executing such threads. To see the available schedulers in your SQL Server, you can use the following query:
SELECT * FROM sys.dm_os_schedulers
Such code will return all the schedulers in your SQL Server; some will be displayed as visible online and some as hidden online. The hidden ones are for internal system tasks while the visible ones are used by user tasks running on the SQL Server. There is one more scheduler, which is displayed as Visible Online (DAC). This one is used for dedicated administration connection, which comes in handy when the SQL Server stops responding. To use a dedicated admin connection, you can modify your SSMS connection to use the DAC, or you can use a switch with the sqlcmd.exe utility, to connect to the DAC. To connect to the default instance with DAC on your server, you can use the following command:
sqlcmd.exe -E -A
Each thread can be in three possible states:
Each running thread runs until it has to wait for a resource to become available or until it has exhausted the CPU time for a running thread, which is set to 4 ms. This 4 ms time can be visible in the output of the previous query to sys.dm_os_schedulers and is called a quantum. When a thread requires any resource, it is moved away from the processor to a waiter list, where the thread waits for the resource to become available. Once the resource is available, the thread is notified about the resource availability and moves to the bottom of the runnable queue.
Any waiting thread can be found via the following code, which will display the waiting threads and the resource they are waiting for:
SELECT * FROM sys.dm_os_waiting_tasks
The threads then transition between the execution at the CPU, waiter list, and runnable queue. There is a special case when a thread does not need to wait for any resource and has already run for 4 ms on the CPU, then the thread will be moved directly to the runnable queue instead of the waiter list.
In the following image, we can see the thread states and the objects where the thread resides:
When the thread is waiting on the waiter list, we can talk about a resource wait time. When the thread is waiting on the runnable queue to get on the CPU for execution, we can talk about the signal time. The total wait time is, then, the sum of the signal and resource wait times. You can find the ratio of the signal to resource wait times with the following script:
Select signalWaitTimeMs=sum(signal_wait_time_ms)
,'%signal waits' = cast(100.0 * sum(signal_wait_time_ms) / sum (wait_time_ms) as numeric(20,2))
,resourceWaitTimeMs=sum(wait_time_ms - signal_wait_time_ms)
,'%resource waits'= cast(100.0 * sum(wait_time_ms - signal_wait_time_ms) / sum (wait_time_ms) as numeric(20,2)) from sys.dm_os_wait_stats
When the ratio goes over 30 percent for the signal waits, then there will be a serious CPU pressure and your processor(s) will have a hard time handling all the incoming requests from the threads.
The following query can then grab the wait statistics and display the most frequent wait types, which were recorded through the thread executions, or actually during the time the threads were waiting on the waiter list for any particular resource:
WITH [Waits] AS
(SELECT
[wait_type],
[wait_time_ms] / 1000.0 AS [WaitS],
([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS], [signal_wait_time_ms] / 1000.0 AS [SignalS],
[waiting_tasks_count] AS [WaitCount],
100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage],
ROW_NUMBER() OVER(ORDER BY [wait_time_ms] DESC) AS [RowNum] FROM sys.dm_os_wait_stats
WHERE [wait_type] NOT IN ( N'BROKER_EVENTHANDLER', N'BROKER_RECEIVE_WAITFOR', N'BROKER_TASK_STOP', N'BROKER_TO_FLUSH', N'BROKER_TRANSMITTER', N'CHECKPOINT_QUEUE', N'CHKPT', N'CLR_AUTO_EVENT', N'CLR_MANUAL_EVENT', N'CLR_SEMAPHORE', N'DIRTY_PAGE_POLL',
N'DISPATCHER_QUEUE_SEMAPHORE', N'EXECSYNC',
N'FSAGENT', N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'FT_IFTSHC_MUTEX', N'HADR_CLUSAPI_CALL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION', N'HADR_LOGCAPTURE_WAIT', N'HADR_NOTIFICATION_DEQUEUE', N'HADR_TIMER_TASK',
N'HADR_WORK_QUEUE', N'KSOURCE_WAKEUP', N'LAZYWRITER_SLEEP', N'LOGMGR_QUEUE', N'MEMORY_ALLOCATION_EXT', N'ONDEMAND_TASK_QUEUE', N'PREEMPTIVE_XE_GETTARGETSTATE',
N'PWAIT_ALL_COMPONENTS_INITIALIZED', N'PWAIT_DIRECTLOGCONSUMER_GETNEXT', N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP', N'QDS_ASYNC_QUEUE', N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP', N'QDS_SHUTDOWN_QUEUE',
N'REDO_THREAD_PENDING_WORK', N'REQUEST_FOR_DEADLOCK_SEARCH', N'RESOURCE_QUEUE', N'SERVER_IDLE_CHECK', N'SLEEP_BPOOL_FLUSH', N'SLEEP_DBSTARTUP',
N'SLEEP_DCOMSTARTUP', N'SLEEP_MASTERDBREADY', N'SLEEP_MASTERMDREADY', N'SLEEP_MASTERUPGRADED', N'SLEEP_MSDBSTARTUP', N'SLEEP_SYSTEMTASK', N'SLEEP_TASK', N'SLEEP_TEMPDBSTARTUP', N'SNI_HTTP_ACCEPT', N'SP_SERVER_DIAGNOSTICS_SLEEP', N'SQLTRACE_BUFFER_FLUSH',
N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', N'SQLTRACE_WAIT_ENTRIES', N'WAIT_FOR_RESULTS',
N'WAITFOR', N'WAITFOR_TASKSHUTDOWN', N'WAIT_XTP_RECOVERY', N'WAIT_XTP_HOST_WAIT', N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG', N'WAIT_XTP_CKPT_CLOSE', N'XE_DISPATCHER_JOIN', N'XE_DISPATCHER_WAIT', N'XE_TIMER_EVENT'
)
AND [waiting_tasks_count] > 0
) SELECT
MAX ([W1].[wait_type]) AS [WaitType],
CAST (MAX ([W1].[WaitS]) AS DECIMAL (16,2)) AS [Wait_S],
CAST (MAX ([W1].[ResourceS]) AS DECIMAL (16,2)) AS [Resource_S], CAST (MAX ([W1].[SignalS]) AS DECIMAL (16,2)) AS [Signal_S],
MAX ([W1].[WaitCount]) AS [WaitCount],
CAST (MAX ([W1].[Percentage]) AS DECIMAL (5,2)) AS [Percentage],
CAST ((MAX ([W1].[WaitS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4))
AS [AvgWait_S],
CAST ((MAX ([W1].[ResourceS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgRes_S],
CAST ((MAX ([W1].[SignalS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgSig_S]
FROM [Waits] AS [W1]
INNER JOIN [Waits] AS [W2] ON [W2].[RowNum] <= [W1].[RowNum] GROUP BY [W1].[RowNum]
HAVING SUM ([W2].[Percentage]) - MAX( [W1].[Percentage] ) < 95 GO
This code is available on the whitepaper, published by SQLSkills, named SQL Server Performance Tuning Using Wait Statistics by Erin Stellato and Jonathan Kehayias, which then refers the URL on SQL Skills and uses the full query by Paul Randal available at https://www.sqlskills.com/ blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/. |
Some of the typical wait stats you can see are:
The PAGEIOLATCH wait type is used when the thread is waiting for a page to be read into the buffer pool from the disk. This wait type comes with two main forms:
You may quickly assume that the storage has to be the problem, but that may not be the case. Like any other wait, they need to be considered in correlation with other wait types and other counters available to correctly find the root cause of the slow SQL Server operations. The page may be read into the buffer pool, because it was previously removed due to memory pressure and is needed again. So, you may also investigate the following:
Also, you need to consider the following as a possible factor to the PAGEIOLATCH wait types:
Large scans versus seeks on the indexes
This wait type is quite frequently misplaced with PAGEIOLATCH but PAGELATCH is used for pages already present in the memory. The thread waits for the access to such a page again with possible PAGELATCH_SH and PAGELATCH_EX wait types.
A pretty common situation with this wait type is a tempDB contention, where you need to analyze what page is being waited for and what type of query is actually waiting for such a resource. As a solution to the tempDB, contention you can do the following:
This wait type is encountered when any thread is running in parallel. The CXPACKET wait type itself does not mean that there is really any problem on the SQL Server. But if such a wait type is accumulated very quickly, it may be a signal of skewed statistics, which require an update or a parallel scan on the table where proper indexes are missing.
The option for parallelism is controlled via MAX DOP setting, which can be configured on the following:
We learned about SQL Server analysis with the Wait Statistics troubleshooting methodology and possible DMVs to get more insight on the problems occurring in the SQL Server.
To know more about how to successfully create, design, and deploy databases using SQL Server 2017, do checkout the book SQL Server 2017 Administrator's Guide.