Amazon RDS for SQL Server metrics collected by DPA

The following sections list the metrics that DPA collects for Amazon RDS for SQL Server database instances. Some metrics are not collected for every instance.

Learn how to view these metrics and change the thresholds.
For detailed information about resolving issues, click the next to the metric on the Resources tab. The Information link is not available for all metrics.

Backups

When database backups fail or are not performed regularly, organizations run the risk of losing valuable data. Use these metrics to make you aware of any issues and ensure that backups are performed on schedule.

Metric	Description
Active Backup Jobs	The number of currently running backup jobs for the instance. If this number is higher than expected, it can have performance implications or indicate issues with the scheduled backups.
Longest Time for a DB without a Successful DB Backup (Diff or Full)	The longest time that any database in a SQL Server has gone without a successful differential or full backup. Use the "Longest Time" metric values to determine if Service Level Objectives for backup frequency are being met, and use the historical values of these metrics to identify whether recent delays are a one-time problem or a recurring problem (for example, nightly backups aren't happening every Tuesday). If a metric value is higher than expected: Review the backup schedules for full backups to verify that they are correct and not disabled. Review the backup results to determine if any errors are causing backup failures to occur. To limit the databases that are included in the metric results, you can exclude SQL Server databases from backup metrics.
Longest Time for a DB without a Successful Full Backup	The longest time that any database in a SQL Server has gone without a successful full backup. See Longest Time for a DB without a Successful DB Backup (Diff or Full) for recommendations.
Longest Time for a DB without a Successful Transaction Log Backup	The longest time that any database in a SQL Server has gone without a successful transaction log backup. See Longest Time for a DB without a Successful DB Backup (Diff or Full) for recommendations.
Size of Transaction Logs Not Yet Archived	The size of all transaction logs in MB that have not yet been archived to free up space for logging future transactions.
Sum of All Backup Assets Required for Recovery of All DBs	The cumulative size, in GB, of all the backup assets for all databases in the SQL Server instance that are required to recover to the current point in time. For each DB, this is the size of the last full backup plus the last differential backup plus all transaction logs created after the most recent full or differential backup. Use this metric to track changes to the minimum required storage space needed to do a complete recovery of the SQL Server. It is also important to understand how much temporary free space could be required to restore all the backup assets for a complete recovery.

Connections

Metric	Description
Connected Devices	The number of distinct client machines connected to this instance (even if the connection is idle).
Connected Users	The number of distinct users (that is, login names) connected to this instance (even if the connection is idle).
Sessions	The number of sessions connected to this instance (even if the connection is idle).

CPU

Metric	Description
Core Count	The number of cores used by the instance.
Instance CPU Utilization	The CPU being used for this specific SQL Server instance.
Signal Waits	The percentage of total waits that are runnable and waiting for an available CPU. Anything over 20% indicates that there is a possible CPU resource bottleneck. Examine the overall wait events for the server as a whole. A high signal wait percentage could be due to an increased number of sessions, so examine the overall workload for the server as well. Take steps to either reduce the overall runtime for queries or reduce the total number of sessions.

Metric

Description

Core Count

The number of cores used by the instance.

Instance CPU Utilization

The CPU being used for this specific SQL Server instance.

Signal Waits

The percentage of total waits that are runnable and waiting for an available CPU. Anything over 20% indicates that there is a possible CPU resource bottleneck.

Examine the overall wait events for the server as a whole. A high signal wait percentage could be due to an increased number of sessions, so examine the overall workload for the server as well. Take steps to either reduce the overall runtime for queries or reduce the total number of sessions.

Disk

Metric	Description
SQL Disk Read Latency	Disk read latency from `dm_io_virtual_file_stats` DMO.
SQL Disk Write Latency	Disk write latency from `dm_io_virtual_file_stats` DMO.
Total I/O Wait Time	The sum of all I/O activity for all database files. If this is high: Examine the current physical structure of databases on the server to see if it is possible to reduce I/O load by redistributing the database files to distinct disks. Examine queries and database design to determine if they can be tuned to reduce I/O.
Total Read I/O Wait Time	The sum of all read I/O activity for all database files.
Total Write I/O Wait Time	The sum of all write I/O activity for all database files.

Memory

Metric	Description
Buffer Cache Hit Ratio	The rate at which SQL Server finds the data blocks it needs in memory rather than having to read from disk for this instance. By itself, the buffer cache hit ratio is not very meaningful except for servers with undersized memory settings. Tuning queries and performing index optimization is the best way to increase buffer cache hit ratios. To see the current metrics for the buffer cache, run the following query: `select * from master..sysperfinfo where object_name like 'Buffer Manager'`
Buffer Cache Size	The current size of the SQL Server Buffer Cache.
Log Bytes Flushed	The number of bytes of information being flushed per second.
Log Flushes	The number of log flushes that occur per second.
Page Life Expectancy	The number of seconds a page will stay in the buffer pool without references. A lower value (for example, under 300) indicates the buffer pool is under memory pressure and you should add more memory to the system (enable AWE on 32-bit systems) or find the process in Task Manager that is consuming outside of SQL Server. The default threshold for Page Life Expectancy (PLE) is 300. For modern database systems, DBAs recommend using a formula such as the following to calculate an appropriate PLE threshold: `DataCacheSizeInGB / 4 GB * 300`
Plan Cache Size	The current size of the SQL Server Plan Cache.
Procedure Cache Hit Ratio	The percentage of time when SQL Server looks for an execution plan in the procedure cache and finds it for this instance. If this is low, try to write more reusable code or consider increasing the size of the procedure cache. To see current metrics for the procedure cache, run the following query: `select * from master..sysperfinfo where object_name like '%Plan Cache%';`
SQL Compilations	The number of compilations performed by SQL Server per second. Compilations are a natural part of SQL Server operations but do use CPU and other resources. Compare this to the Batch Requests/sec metric to understand if this metric is too high. Minimizing compilations will help overall performance. For more information, see the following Microsoft Knowledgebase article: http://support.microsoft.com/kb/243588.
SQL Re-Compilations	The number of re-compilations performed by SQL Server per second. Re-compilations occur for many reasons but this number should typically be low.

Network

Metric	Description
Round-trip Time	The round-trip time when running "select 1" against this instance (includes network time but not connect time). If this is high, contact your network administrator to understand network latency.

Sessions

Metric	Description
Active Sessions	The number of sessions in this instance actively performing work or waiting for a resource (excludes idle sessions).
Batch Requests	The number of batches being executed by SQL Server every second.
Blocked Sessions	The number of sessions that are blocked in this instance because another session is using a needed resource.
Transaction Rate	The number of transactions being executed every second in this instance (the Transactions/sec statistic from sysperfinfo for the instance).

TempDB

Space required by the tempDB database fluctuates based on the number of queries running and the nature of those queries. If tempDB fills up and cannot autogrow, the performance of all queries is affected. Use tempDB metrics to monitor the amount of space required and determine what types of objects require the most space.

Metric	Description
TempDB % Free Space	The percentage of unused space in tempDB. If tempDB fills up and cannot autogrow, the performance of all queries will be affected as they wait for access to tempDB.
TempDB Free Space	The amount of unused space in tempDB.
TempDB Internal Objects	The amount of space in tempDB used by internal objects. Internal objects are created by SQL Server to process queries. For example, internal objects can be used for spooling operations, for sort space, or for hash tables. Queries that process large amounts of data can increase the space required for internal objects in tempDB.
TempDB Log File % Free Space	The percentage of space allocated to the tempDB log file that is not currently being used.
TempDB Log File Free Space	The amount of space allocated to the tempDB log file that is currently free.
TempDB Log File Utilized Space	The amount of space allocated to the tempDB log file that is currently being used.
TempDB Mixed Extents	The amount of space in tempDB used by mixed extents. Mixed extents are shared by up to eight objects.
TempDB Size	The amount of space currently allocated for tempDB. Use this value to track the amount of space tempDB typically uses and plan for storage requirements. Each time SQL Server is restarted, tempDB is recreated using the default size. By default, tempDB grows automatically as needed. However, the files cannot be used during that process, and excessive autogrowth can lead to fragmentation. If tempDB typically needs to grow by a large amount, consider increasing the initial size.
TempDB User Objects	The amount of space in tempDB used by user objects. User objects are temporary objects explicitly created by users. They include temporary tables and indexes, temporary stored procedures, table variables, and cursors.
TempDB Version Store	The amount of space in tempDB used by the version store. While a table row is being updated or deleted, the version store contains the committed version of that row. `SELECT` operations that need to access the row being updated or deleted are not blocked because they can read the row in the version store. When the transaction is committed, the row is removed from the version store. Long-running or orphaned transactions can increase the size of the version store. A large version store can affect database performance because of the overhead of reading the large version store.
Total TempDB Log File Size	The amount of disk space allocated for the log file in the tempDB database. Each time SQL Server is restarted, tempDB is re-created, and the log file is created using the default size or the size specified by the DBA. If the tempDB log file requires more space, by default it autogrows as needed. However, autogrowth can affect performance because the tempDB log file cannot be used during autogrowth, and because autogrowth can lead to file fragmentation. Use the Total TempDB Log File Size metric to: Determine the size that the tempDB log file typically grows to over time and specify an initial size that prevents excessive autogrowth. Identify sudden growth spikes and investigate what queries could have caused the spikes.

Waits

Metric	Description
Total Instance Wait Time	The total wait time for the instance.

Search SolarWinds Support