Amazon RDS for SQL Server metrics collected by DPA
The following sections list the metrics that DPA collects for Amazon RDS for SQL Server database instances. Some metrics are not collected for every instance.
- Learn how to view these metrics and change the thresholds.
- For detailed information about resolving issues, click the next to the metric on the Resources tab. The Information link is not available for all metrics.
Backups
When database backups fail or are not performed regularly, organizations run the risk of losing valuable data. Use these metrics to make you aware of any issues and ensure that backups are performed on schedule.
Metric | Description |
---|---|
Active Backup Jobs | The number of currently running backup jobs for the instance. If this number is higher than expected, it can have performance implications or indicate issues with the scheduled backups. |
Longest Time for a DB without a Successful DB Backup (Diff or Full) |
The longest time that any database in a SQL Server has gone without a successful differential or full backup. Use the "Longest Time" metric values to determine if Service Level Objectives for backup frequency are being met, and use the historical values of these metrics to identify whether recent delays are a one-time problem or a recurring problem (for example, nightly backups aren't happening every Tuesday). If a metric value is higher than expected:
To limit the databases that are included in the metric results, you can exclude SQL Server databases from backup metrics. |
Longest Time for a DB without a Successful Full Backup |
The longest time that any database in a SQL Server has gone without a successful full backup. See Longest Time for a DB without a Successful DB Backup (Diff or Full) for recommendations. |
Longest Time for a DB without a Successful Transaction Log Backup | The longest time that any database in a SQL Server has gone without a successful transaction log backup. See Longest Time for a DB without a Successful DB Backup (Diff or Full) for recommendations. |
Size of Transaction Logs Not Yet Archived | The size of all transaction logs in MB that have not yet been archived to free up space for logging future transactions. |
Sum of All Backup Assets Required for Recovery of All DBs | The cumulative size, in GB, of all the backup assets for all databases in the SQL Server instance that are required to recover to the current point in time. For each DB, this is the size of the last full backup plus the last differential backup plus all transaction logs created after the most recent full or differential backup. Use this metric to track changes to the minimum required storage space needed to do a complete recovery of the SQL Server. It is also important to understand how much temporary free space could be required to restore all the backup assets for a complete recovery. |
Connections
Metric | Description |
---|---|
Connected Devices | The number of distinct client machines connected to this instance (even if the connection is idle). |
Connected Users | The number of distinct users (that is, login names) connected to this instance (even if the connection is idle). |
Sessions | The number of sessions connected to this instance (even if the connection is idle). |
CPU
Metric | Description |
---|---|
Core Count | The number of cores used by the instance. |
Instance CPU Utilization | The CPU Utilization for this specific SQL Server instance. |
Signal Waits |
The percentage of total waits that are runnable and waiting for an available CPU. Anything over 20% indicates that there is a possible CPU resource bottleneck. Examine the overall wait events for the server as a whole. A high signal wait percentage could be due to an increased number of sessions, so examine the overall workload for the server as well. Take steps to either reduce the overall runtime for queries or reduce the total number of sessions. |
Disk
Metric | Description |
---|---|
SQL Disk Read Latency | Disk read latency from dm_io_virtual_file_stats DMO. |
SQL Disk Write Latency | Disk write latency from dm_io_virtual_file_stats DMO. |
Total I/O Wait Time |
The sum of all I/O activity for all database files. If this is high:
|
Total Read I/O Wait Time | The sum of all read I/O activity for all database files. |
Total Write I/O Wait Time | The sum of all write I/O activity for all database files. |
Memory
Metric | Description |
---|---|
Buffer Cache Hit Ratio |
The rate at which SQL Server finds the data blocks it needs in memory rather than having to read from disk for this instance. By itself, the buffer cache hit ratio is not very meaningful except for servers with undersized memory settings. Tuning queries and performing index optimization is the best way to increase buffer cache hit ratios. To see the current metrics for the buffer cache, run the following query:
|
Buffer Cache Size | The current size of the SQL Server Buffer Cache. |
Log Bytes Flushed | The number of bytes of information being flushed per second. |
Log Flushes | The number of log flushes that occur per second. |
Page Life Expectancy |
The number of seconds a page will stay in the buffer pool without references. A lower value (for example, under 300) indicates the buffer pool is under memory pressure and you should add more memory to the system (enable AWE on 32-bit systems) or find the process in Task Manager that is consuming outside of SQL Server. The default threshold for Page Life Expectancy (PLE) is 300. For modern database systems, DBAs recommend using a formula such as the following to calculate an appropriate PLE threshold:
|
Plan Cache Size | The current size of the SQL Server Plan Cache. |
Procedure Cache Hit Ratio |
The percentage of time when SQL Server looks for an execution plan in the procedure cache and finds it for this instance. If this is low, try to write more reusable code or consider increasing the size of the procedure cache. To see current metrics for the procedure cache, run the following query:
|
SQL Compilations | The number of compilations performed by SQL Server per second. Compilations are a natural part of SQL Server operations but do utilize CPU and other resources. Compare this to the Batch Requests/sec metric to understand if this metric is too high. Minimizing compilations will help overall performance. For more information, see the following Microsoft Knowledgebase article: http://support.microsoft.com/kb/243588. |
SQL Re-Compilations | The number of re-compilations performed by SQL Server per second. Re-compilations occur for many reasons but this number should typically be low. |
Network
Metric | Description |
---|---|
Round-trip Time | The round-trip time when running "select 1" against this instance (includes network time but not connect time). If this is high, contact your network administrator to understand network latency. |
Sessions
Metric | Description |
---|---|
Active Sessions | The number of sessions in this instance actively performing work or waiting for a resource (excludes idle sessions). |
Batch Requests | The number of batches being executed by SQL Server every second. |
Blocked Sessions | The number of sessions that are blocked in this instance because another session is using a needed resource. |
Transaction Rate | The number of transactions being executed every second in this instance (the Transactions/sec statistic from sysperfinfo for the instance). |
TempDB
Space required by the TempDB database fluctuates based on the number of queries running and the nature of those queries. If TempDB fills up and cannot autogrow, the performance of all queries is affected. Use TempDB metrics to monitor the amount of space required and determine what types of objects require the most space.
Metric | Description |
---|---|
TempDB Free Space |
The amount of free space in TempDB. Spaced used in TempDB fluctuates based on the nature and volume of the SQL statements that are currently running. If TempDB fills up and there is not enough disk space for it to autogrow (or it is not set to autogrow), the performance of all SQL statements will be affected as they wait for access to TempDB. |
TempDB Internal Objects |
The amount of space in TempDB used by internal objects. Internal objects are created by SQL Server to process queries. For example, internal objects can be used for spooling operations, for sort space, or for hash tables. Queries that process large amounts of data can increase the space required for internal objects in TempDB. |
TempDB Log File % Free Space |
The percentage of space allocated to the TempDB log file that is not currently being used. |
TempDB Log File Free Space |
The amount of space allocated to the TempDB log file that is currently free. |
TempDB Log File Utilized Space | The amount of space allocated to the TempDB log file that is currently being used. |
TempDB Mixed Extents | The amount of space in TempDB used by mixed extents. Mixed extents are shared by up to eight objects. |
TempDB User Objects |
The amount of space in TempDB used by user objects. User objects are temporary objects explicitly created by users. They include temporary tables and indexes, temporary stored procedures, table variables, and cursors. |
TempDB Version Store |
The amount of space in TempDB used by the version store. While a table row is being updated or deleted, the version store contains the committed version of that row. Long-running or orphaned transactions can increase the size of the version store. A large version store can affect database performance because of the overhead of reading the large version store. |
Total TempDB Log File Size |
The amount of disk space allocated for the log file in the TempDB database over time. Each time SQL Server is restarted, TempDB is re-created, and the log file is created using the default size or the size specified by the DBA. If the TempDB log file requires more space, by default it autogrows as needed. However, autogrowth can affect performance because the TempDB log file cannot be used during autogrowth, and because autogrowth can lead to file fragmentation. Use the Total TempDB Log File Size metric to:
|
Waits
Metric | Description |
---|---|
Total Instance Wait Time | The total wait time for the instance. |