Documentation forDatabase Performance Analyzer

SQL Server metrics collected by DPA

The following sections list the metrics that DPA collects for SQL Server database instances. Some metrics are not collected for every instance.

  • Learn how to view these metrics and change the thresholds.
  • For detailed information about resolving issues, click the next to the metric on the Resources tab. The Information link is not available for all metrics.
  • If the database instance runs on a virtual machine (VM), metrics collected for the VM are described in VM metrics collected by DPA

For DPA to collect metrics from a monitored SQL Server instance, Azure SQL instance, or ASMI, the SQL option NUMERIC_ROUNDABOUT must be set to OFF.

Backups

When database backups fail or are not performed regularly, organizations run the risk of losing valuable data. Use these metrics to make you aware of any issues and ensure that backups are performed on schedule.

Metric Description
Active Backup Jobs The number of currently running backup jobs for the instance. If this number is higher than expected, it can have performance implications or indicate issues with the scheduled backups.
Longest Time for a DB without a Successful DB Backup (Diff or Full)

The longest time that any database in a SQL Server has gone without a successful differential or full backup.

Use the "Longest Time" metric values to determine if Service Level Objectives for backup frequency are being met, and use the historical values of these metrics to identify whether recent delays are a one-time problem or a recurring problem (for example, nightly backups aren't happening every Tuesday). If a metric value is higher than expected:

  • Review the backup schedules for full backups to verify that they are correct and not disabled.
  • Review the backup results to determine if any errors are causing backup failures to occur.

To limit the databases that are included in the metric results, you can exclude SQL Server databases from backup metrics.

Longest Time for a DB without a Successful Full Backup

The longest time that any database in a SQL Server has gone without a successful full backup. See Longest Time for a DB without a Successful DB Backup (Diff or Full) for recommendations.

Longest Time for a DB without a Successful Transaction Log Backup The longest time that any database in a SQL Server has gone without a successful transaction log backup. See Longest Time for a DB without a Successful DB Backup (Diff or Full) for recommendations.
Size of Transaction Logs Not Yet Archived The size of all transaction logs in MB that have not yet been archived to free up space for logging future transactions.
Sum of All Backup Assets Required for Recovery of All DBs The cumulative size, in GB, of all the backup assets for all databases in the SQL Server instance that are required to recover to the current point in time. For each DB, this is the size of the last full backup plus the last differential backup plus all transaction logs created after the most recent full or differential backup. Use this metric to track changes to the minimum required storage space needed to do a complete recovery of the SQL Server. It is also important to understand how much temporary free space could be required to restore all the backup assets for a complete recovery.

Connections

Metric Description
Connected Devices The number of distinct client machines connected to this instance (even if the connection is idle).
Connected Users The number of distinct users (that is, login names) connected to this instance (even if the connection is idle).
Sessions The number of sessions connected to this instance (even if the connection is idle).

CPU

Metric Description
Core Count The number of cores used by the instance.
Instance CPU Utilization The CPU Utilization for this specific SQL Server instance. This is a subset of the O/S CPU Utilization metric.
O/S CPU Queue Length via WMI The number of O/S threads waiting to access the CPU for the entire system (includes all instances on this machine).
O/S CPU Utilization The percentage of CPU being used for the entire system (includes all instances on this machine). Potential solutions to a CPU bottleneck are to reduce the server load by tuning the queries waiting on CPU, get faster CPUs, or get more CPUs.
Signal Waits

The percentage of total waits that are runnable and waiting for an available CPU. Anything over 20% indicates that there is a possible CPU resource bottleneck.

Examine the overall wait events for the server as a whole. A high signal wait percentage could be due to an increased number of sessions, so examine the overall workload for the server as well. Take steps to either reduce the overall runtime for queries or reduce the total number of sessions.

Disk

Metric Description
O/S Disk Queue Length The number of I/O operations waiting for disk drives to become available for the entire system (includes all instances on this machine). Spikes of high disk queue length may be normal, but if this is high for an extended period, you could have an I/O bottleneck. Drill in to DPA Trends charts to examine queries with I/O wait types during the timeframe.
O/S Disk Queue Length via WMI The number of I/O operations waiting for disk drives to become available for the entire system (includes all instances on this machine).
Page Reads The number of SQL Server physical reads from disk to memory. OLTP workloads are typically about 80-90 per second with higher values (or spikes) being an indication of insufficient storage performance, insufficient indexing, or not enough memory.
Page Writes The number of SQL Server physical writes from memory to disk. OLTP workloads are typically about 80-90 per second. If this is high (or spikes) it needs to be cross checked with lazy-writes/sec and checkpoints in order to determine if the issue might be due to low memory.
Physical Read Rate via WMI The number of kilobytes being read from disk every second for the entire system (includes all instances on this machine). If this is high, drill in to the DPA Trends charts and review queries waiting on physical read wait types, such as PAGEIOLATCH_SH or PAGEIOLATCH_EX.
Physical Write Rate via WMI The number of kilobytes being written to disk every second for the entire system (includes all instances on this machine). If this is high, drill in to the DPA Trends charts and review queries waiting on write wait types, such as IO_COMPLETION or PAGEIOLATCH.
Physical I/O Rate via WMI The number of kilobytes being read and written to disk every second for the entire system (includes all instances on this machine). If this is high, drill in to the DPA Trends charts and review physical read and write wait types.
SQL Disk Read Latency Disk read latency from dm_io_virtual_file_stats DMO.
SQL Disk Write Latency Disk write latency from dm_io_virtual_file_stats DMO.
Total I/O Wait Time

The sum of all I/O activity for all database files. If this is high:

  1. Examine the current physical structure of databases on the server to see if it is possible to reduce I/O load by redistributing the database files to distinct disks.
  2. Examine queries and database design to determine if they can be tuned to reduce I/O.
Total Read I/O Wait Time The sum of all read I/O activity for all database files.
Total Write I/O Wait Time The sum of all write I/O activity for all database files.

Memory

Metric Description
Buffer Cache Hit Ratio

The rate at which SQL Server finds the data blocks it needs in memory rather than having to read from disk for this instance. By itself, the buffer cache hit ratio is not very meaningful except for servers with undersized memory settings. Tuning queries and performing index optimization is the best way to increase buffer cache hit ratios. To see the current metrics for the buffer cache, run the following query:

select * from master..sysperfinfo where object_name like 'Buffer Manager'

Buffer Cache Size The current size of the SQL Server Buffer Cache.
Log Bytes Flushed The number of bytes of information being flushed per second.
Log Flushes The number of log flushes that occur per second.
Memory Paging Rate via WMI The number of pages read from or to the disk to resolve memory references to pages that were not in memory at the time of the reference. This metric is for the entire system (includes all instances on this machine). High rates may indicate excessive memory contention (thrashing).
O/S Memory Utilization The percentage of memory being used for the entire system (includes all instances on this machine). If this is high and the Memory Paging Rate metric is high, you might need to increase the amount of physical RAM in the server, reduce the load on the server, or change the server memory configuration. Run sp_configure and review settings for "max server memory" and "min server memory" to determine amount of memory allocated to SQL Server.
Page Life Expectancy The number of seconds a page will stay in the buffer pool without references. A lower value (for example, under 300) indicates the buffer pool is under memory pressure and you should add more memory to the system (enable AWE on 32-bit systems) or find the process in Task Manager that is consuming outside of SQL Server.
Plan Cache Size The current size of the SQL Server Plan Cache.
Procedure Cache Hit Ratio

The percentage of time when SQL Server looks for an execution plan in the procedure cache and finds it for this instance. If this is low, try to write more reusable code or consider increasing the size of the procedure cache. To see current metrics for the procedure cache, run the following query:

select * from master..sysperfinfo where object_name like '%Plan Cache%';

SQL Compilations The number of compilations performed by SQL Server per second. Compilations are a natural part of SQL Server operations but do utilize CPU and other resources. Compare this to the Batch Requests/sec metric to understand if this metric is too high. Minimizing compilations will help overall performance. For more information, see the following Microsoft Knowledgebase article: http://support.microsoft.com/kb/243588.
SQL Re-Compilations The number of re-compilations performed by SQL Server per second. Re-compilations occur for many reasons but this number should typically be low.

Network

Metric Description
Round-trip Time The round-trip time when running "select 1" against this instance (includes network time but not connect time). If this is high, contact your network administrator to understand network latency.

Sessions

Metric Description
Active Sessions The number of sessions in this instance actively performing work or waiting for a resource (excludes idle sessions).
Batch Requests The number of batches being executed by SQL Server every second.
Blocked Sessions The number of sessions that are blocked in this instance because another session is using a needed resource.
Transaction Rate The number of transactions being executed every second in this instance (the Transactions/sec statistic from sysperfinfo for the instance).

TempDB

Space required by the TempDB database fluctuates based on the number of queries running and the nature of those queries. If TempDB fills up and cannot autogrow, the performance of all queries is affected. Use TempDB metrics to monitor the amount of space required and determine what types of objects require the most space.

Metric Description
TempDB Free Space

The amount of free space in TempDB. Spaced used in TempDB fluctuates based on the nature and volume of the SQL statements that are currently running.

If TempDB fills up and there is not enough disk space for it to autogrow (or it is not set to autogrow), the performance of all SQL statements will be affected as they wait for access to TempDB.

TempDB Internal Objects

The amount of space in TempDB used by internal objects. Internal objects are created by SQL Server to process queries. For example, internal objects can be used for spooling operations, for sort space, or for hash tables. Queries that process large amounts of data can increase the space required for internal objects in TempDB.

TempDB Log File % Free Space

The percentage of space allocated to the TempDB log file that is not currently being used.

TempDB Log File Free Space

The amount of space allocated to the TempDB log file that is currently free.

TempDB Log File Utilized Space The amount of space allocated to the TempDB log file that is currently being used.
TempDB Mixed Extents The amount of space in TempDB used by mixed extents. Mixed extents are shared by up to eight objects.
TempDB User Objects

The amount of space in TempDB used by user objects. User objects are temporary objects explicitly created by users. They include temporary tables and indexes, temporary stored procedures, table variables, and cursors.

TempDB Version Store

The amount of space in TempDB used by the version store. While a table row is being updated or deleted, the version store contains the committed version of that row. SELECT operations that need to access the row being updated or deleted are not blocked because they can read the row in the version store. When the transaction is committed, the row is removed from the version store.

Long-running or orphaned transactions can increase the size of the version store. A large version store can affect database performance because of the overhead of reading the large version store.

Total TempDB Log File Size

The amount of disk space allocated for the log file in the TempDB database over time. Each time SQL Server is restarted, TempDB is re-created, and the log file is created using the default size or the size specified by the DBA. If the TempDB log file requires more space, by default it autogrows as needed. However, autogrowth can affect performance because the TempDB log file cannot be used during autogrowth, and because autogrowth can lead to file fragmentation.

Use the Total TempDB Log File Size metric to:

  • Determine the size that the TempDB log file typically grows to over time and specify an initial size that prevents excessive autogrowth.

  • Identify sudden growth spikes and investigate what queries could have caused the spikes.

Waits

Metric Description
Total Instance Wait Time The total wait time for the instance.