AppInsight for SQL
Assign this SAM application monitor template to nodes to monitor the health of SQL Server instances, as described in Monitor with AppInsight for SQL.
To configure target servers, see AppInsight for SQL Directory requirements and permissions.
Note the following details about AppInsight templates, in general:
-
Due to the complexity of AppInsight templates:
- You cannot add component monitors.
- Some component monitors have default settings that cannot be modified.
- You cannot import or export AppInsight templates.
- AppInsight templates are updated automatically during upgrades.
-
WinRM is the default transport method for WMI-based component monitors.
-
For component-based SAM licenses, AppInsight applications consume licenses at flat rates.
-
All AppInsight templates support the SolarWinds Platform agent for Windows.
- To learn about AppInsight widgets, see SAM online help.
You can disable dynamic AppInsight for SQL components if known issues in your environment (for example, a SQL server outage) will increase false-positive alerts and logs. You can also turn off components that aren't important for your organization, to reduce polling loads. See Edit AppInsight for SQL template for details.
Component monitors
All Databases
Collects data for all monitored databases on the SQL instance.
Top Tables for Database
Performance counter that collects data to provide metrics for top table resources per database details view.
Top Indexes for Database
Performance counter that collects data to provide data for clustered and non-clustered index resources per database details view.
Database Files
Performance counter that collects data to provide metrics for file size resource per database details view.
Top Active Sessions with Statistics
Collects data for the top active sessions by duration active. Each session includes statistics including duration, idle time, and bytes transferred.
Top Expensive Queries
Collects data for the most expensive queries by CPU usage.
SQL Agent Job Info
Collects and provides stats and details data for administrative jobs run by the SQL Agent service.
SQL Server Events
Collects and provides data on SQL server events.
SQL Active Directory Helper Service
SQL Server
Collects and provides status on SQL server counters of statistics and values. Each statistic is consumed and displayed by other process monitors.
SQL Server Agent
Collects and provides data, status, and metrics for the SQL Server Agent process.
SQL Server Browser
Collects and provides data, status, and metrics for the SQL Server Browser process.
SQL Server VSS Writer
Collects and provides data, status, and metrics for the SQL VSS Writer process.
SQL Event Log Monitor
Collects and provides event log data for a database instance. These events are consumed by other process monitors.
Buffer Cache Hit Ratio
Indicates the percentage of pages found in the buffer cache without having to read from disk. The ratio is the total number of cache hits divided by the total number of cache lookups over the last few thousand page accesses.
After a long period of time, the ratio moves very little. Because reading from the cache is much less expensive than reading from disk, you want this ratio to be high. Generally, you can increase the buffer cache hit ratio by increasing the amount of memory available to SQL Server. The higher this ratio, the less often SQL Server has to go to the hard disk to fetch data, and performance is boosted overall. This counter averages the Buffer Cache Hit Ratio from the time the last instance of SQL Server was restarted. This counter is not a real-time measurement, but an average of all the days since SQL Server was last restarted. In Online Analytical Processing (OLAP) applications, the ratio could be much lower because of the nature of how OLAP works.
Possible issues: If the returned value is low, this could indicate that your SQL server does not have enough memory to function at peak performance.
Resolution: Check your SQL server and verify its memory is being used efficiently. Applications other than SQL may be using a great deal of memory. To recover memory, close unnecessary applications. Installing additional memory may also help.
Cache Used/Min
The number of times the cache object was retrieved.
This is not incremented when parameterized queries find a plan in the cache. However, this can be incremented multiple times when using Showplan. Good plan reuse is one sign of a healthy system. Compiling a query plan can be a CPU-intensive operation, especially with complex queries, so reuse is encouraged.
Possible issues: If the returned value is low, this can indicate ineffective reusing of plan cache which could cause a CPU spike. Conversely, if the returned value is high for a large number of plans, this could indicate that your cache space is being taken up with plans that were compiled and used once to execute an ad hoc query, then never used again.
Resolution: A helpful database setting to plan reuse is forced parameterizaton. (You can set this using the following command: ALTER DATABASE <databaseName> SET PARAMETERIZATION FORCED). Normally, the query optimizer is very conservative when deciding what queries can be parameterized, but this setting makes the optimizer be more liberal in choosing parameters.
Page Splits/Batch Request
This performance counter displays the number of page splits per second that occur as the result of overflowing index pages. The returned value needs to be low as possible.
Possible issues: High values could mean poor table or index design.
Resolution: If the number of page splits is high, consider increasing the Fill Factor of your indexes. An increased Fill Factor helps reduce page splits because there is more room in data pages before it fills up and a page split has to occur. Note that this counter also includes the new page allocations as well and does not necessarily pose a problem. The other place to confirm the page splits that involve data or index rows moves are the fragmented indexes on page splits.
Lazy Writers/sec
The lazy writer is a system process that flushes out buffers that contain changes that must be written back to disk before the buffer can be reused for a different page and makes them available to user processes.
This performance counter tracks how many times per second that the Lazy Writer process is moving dirty pages from the buffer to disk to free up buffer space. The Lazy Writer eliminates the need to perform frequent checkpoints to create available buffers. This should not be a high value, for example more than 20 per second. Ideally, it should be close to zero. If it is zero, this indicates that your SQL Server's buffer cache is large and your SQL Server does not need to free up dirty pages.
Possible issues: If the returned value is high, this can indicate that your SQL Server's buffer cache is small and that your SQL Server needs to free up dirty pages.
Resolution: Check your SQL server and verify its memory is being used efficiently. Applications other than SQL may be using a great deal of memory. Try to recover memory by closing unnecessary applications. Installing additional memory may also help.
Free list stalls/sec
Indicates the number of requests per second spent waiting for a free page. This displays the frequency with which requests for available database pages are suspended because no buffers are available.
Possible issues: If the returned value is high, this indicates that not enough memory is available for the SQL Server.
Resolution: Check your SQL server and verify its memory is being used efficiently. Applications other than SQL may be using a great deal of memory. Try to recover memory by closing unnecessary applications. Installing additional memory may also help.
User Connections
This performance counter identifies the number of different users that are connected to your SQL Server at the time the sample was taken.
Watch this counter over time to understand your baseline user connection numbers. Since each user connection consumes some memory space, a high number of user connections can impact throughput and cause a performance slow-down. After you can determine high and low thresholds during normal usage of your system, look for times when this counter exceeds these high and low marks.
Possible issues: If the returned value of this counter goes down and the load on the system remains stable, you might have a bottleneck that blocks your server from handling the normal load. Keep in mind that this counter value might go down just because less people are using your SQL Server instance. If you see this number jump by 500% from your baseline, you may be seeing a slowdown of your server activity.
Resolution: You may want to boost the SQL Server configuration setting, Maximum Worker Threads to a value that exceeds the default setting, 255. The setting for Maximum Worker Threads should be higher than the maximum number of user connections your server might have.
Active Sessions
A current connection that is actively processing. To find information about an active session, Use Sp_Who2 Active or Sys.Dm_Exec_Sessions (Available in SQL 2005 or greater).
Possible issues: Blocking session.
Resolution: Wait until blocking clears or kill the session. If killed, the process would roll back changes. If there are many changes, it could take a while to roll-back.
Inactive Sessions
A current connection that is not currently processing. To find information about an inactive sessions, use Sp_Who2.
Possible issues: Blocking session.
Resolution: Kill the session if blocking a process needs to finish. If killed, the process would roll back changes. If there are many changes, it could take a while to roll-back.
System Sessions
A System initiated connection. Normally these numbers are less than 50.
Possible issues: Blocking session.
Resolution: Do not kill system sessions.
User Sessions
A user-initiated connection. Normally these numbers are greater than 50.
Possible issues: Blocking session.
Resolution: Kill the session if blocking a process needs to finish. If killed, the process would roll back changes. If there are many changes, it could take a while to roll-back.
Logins/sec
This performance counter returns the total number of logins started, per second, and does not include pooled connections. Opening and closing connections is an expensive process. A pooled connection is one which is kept open by an application for other requests to re-use.
Possible issues: If the returned value is high, this can indicate that the application is not correctly using connection pooling.
Resolution: Review the Connection Polling configuration.
Logout/sec
This performance counter returns the total number of logout operations started, per second. Opening and closing connections is an expensive process. When applications do not use a connection pool, each request needs to establish its own connection before the query can be executed. It then has to close it. A pooled connection is one which is kept open by an application for other requests to re-use.
Possible issues: If the returned value is high, this can indicate that the application is not correctly using connection pooling.
Resolution: Review the Connection Polling configuration.
Lock Requests/sec
This performance counter returns the number of requests for a type of lock, per second. Blocking is an unavoidable characteristic of any relational database management system (RDBMS) with lock-based concurrency.
On SQL Server, blocking occurs when one Server Process ID (SPID) holds a lock on a specific resource and a second SPID attempts to acquire a conflicting lock type on the same resource. Typically, the time frame for which the first SPID locks the resource is very small. When it releases the lock, the second connection is free to acquire its own lock on the resource and continue processing. This normal behavior may happen many times throughout the course of a day with no noticeable effect on system performance. To tune thresholds to your environment, use the Baseline Calculator for this performance counter.
Possible issues: A high returned value may indicate that queries are accessing large numbers of rows. If you notice a high Average Wait time, it may indicate a SQL Server blocking issue.
Resolution: Review High Read queries and kill the connection that is causing the lock. Fixing locking, blocking, and deadlocking issues is often a matter of redesign. You should examine either the schema of your database and/or the way the application is designed. An efficient ways to deal with locking is to ensure that the design of your system does not have to deal with excessive locks. The best way to do this is to normalize your tables, using more atomic structures that allow the query to get in and get out faster. Another method is to code with locks in mind. If the design requires less normalization, evaluate the programs to ensure that they have the right isolation levels.
Lock Waits/sec
This performance counter reports the number of times users waited to acquire a lock over the past second. This counter only gets incremented only when you “wake up” after waiting on the lock. To tune thresholds to your environment, use the Baseline Calculator for this performance counter.
Possible issues: Non-zero values indicate that there is at least some level of SQL Server blocking occurring. If you combine this with the Lock Wait Time counter, you can calculate how long the blocking lasted. A zero value for this counter can definitively rule out blocking as a potential cause.
Resolution: High Read queries should be reviewed.
Lock Timeouts/sec
This performance counter returns the number of lock requests per second that have timed out, including internal requests for NoWait locks.
The timeout_period is the number of milliseconds that will pass before Microsoft SQL Server returns a locking error. A value of -1 (default) indicates no time-out period (that is, wait forever). When a wait for a lock exceeds the time-out value, an error is returned. A value of 0 means to not wait at all and return a message as soon as a lock is encountered. To tune thresholds to your environment, use the Baseline Calculator for this performance counter.
Possible issues: If you see a value above 0 for this counter, your users will experience problems as their queries are not completing.
Resolution: You should review your queries to determine which queries are causing this situation.
Average Latch Wait Time
This performance counter reports the average latch wait time, in milliseconds, for latch requests that had to wait.
SQL Server two lightweight mutual exclusion solutions—Latches and Spinlocks—which are less robust than locks but are less resource-intensive. The value of this counter should generally correlate to Latch Waits/sec and move up or down with it accordingly.
Possible issues: If you see this number jump above 300, you may have contention for your server's resources. High values for this counter could potentially block other users.
Resolution: You should examine tasks currently waiting using the following command: sys.dm_os_waiting_tasks DMV.
Average Lock Wait Time
This performance counter reports the average amount of Wait time, in milliseconds, for each lock request that resulted in a wait. The lower the value, the better. This value should correlate to the Lock Waits/sec counter and move up or down with it accordingly.
Possible issues: An average wait time longer than 500ms may indicate excessive SQL Server blocking.
Resolution: You should determine which queries are generating locks to identify where the blocking is originating.
Deadlocks/sec
The number of lock requests that resulted in a deadlock. Since only a COMMIT, ROLLBACK, or deadlock can terminate a transaction (excluding failures or errors), this is an important value to track.
Possible issues: When returning value is above zero users and applications will experience problems. Their queries will abort and the applications may fail.
Resolution: You should trace deadlocks and examine trace output in SQL Server log. Deadlocks can be prevented by one or more of the following methods:
- Adding missing indexes to support faster queries
- Dropping unnecessary indexes which may slow down INSERTs for example
- Redesigning indexes to be "thinner", for example, removing columns from composite indexes or making table columns "thinner"
- Adding index hints to queries
- Redesigning tables with "thinner" columns like smalldatetime vs. datetime or smallint vs. int
- Modifying the stored procedures to access tables in a similar pattern
- Keeping transactions as short and quick as possible: "mean & lean"
- Removing unnecessary extra activity from the transactions like triggers
- Removing JOINs to Linked Server (remote) tables
- Implementing regular index maintenance; usually weekend schedule suffices; use FILLFACTOR = 80 for dynamic tables
- Setting MAXDOP=1 solves deadlocking in some cases
Lock Wait Time
This performance counter displays the total time spent waiting across all transactions, in milliseconds, to acquire a lock in the last second. This counts how many milliseconds SQL Server is waiting on locks during the last second. This counter starts recording at the end of locking event. Peaks most likely represent one large locking event.
Possible issues: If the returned value is greater than 60 seconds (60,000ms) then there may be extended blocking which could be an issue.
Resolution: Thoroughly analyze the blocking script output. Some applications are written for timing out after 60 seconds. Because SQL Server records a lock at the end of a locking event, remember that an application with large transactions may have inflated lock wait times while still performing as expected. For example, an application that issues multi-million record updates might have very long lock wait times while performing exactly as it was designed.
Total Server Memory
This performance counter measures the current amount of memory that SQL Server is using.
If the value of this counter continues to grow larger, the server has not yet reached its steady state and is still trying to populate the cache and get pages loaded into memory. Performance will likely be somewhat slower if this value continually grows larger since more disk I/O is required. This behavior is normal. Eventually Total Server Memory should approximate Target Server Memory.
Possible issues: If the Total Server Memory counter is greater than or equal to the Target Server Memory counter, this can indicate that your SQL Server may be under memory pressure.
Resolution: Installing additional memory into your SQL server should resolve the problem.
Target Server Memory
This performance counter measures the total amount of dynamic memory the server can consume. This performance counter tells you how much memory SQL Server would like to use to operate efficiently. Compare with Total Server Memory.
Possible issues: If the Total Server Memory counter is greater than or equal to the Target Server Memory counter, this could indicate that your SQL Server may be under memory pressure.
Resolution: Installing additional memory into your SQL server should resolve the problem.
SQL Cache Memory
This performance counter measures the total amount of dynamic memory the server is using for the Dynamic SQL cache.
Most memory used by SQL Server is allocated to the Buffer Pool, which is used to store data pages. SQL Server steals a proportion of this memory for use in caching query plans. The overall amount of memory available to SQL Server depends upon the amount of memory installed on the server, the architecture of the server, the version and edition of SQL Server and the amount of memory pressure being experienced by SQL Server. This pressure can be internal (SQL Server resources need memory) or external (operating system needs memory). SQL Server is designed to respond to memory pressure when necessary.
Possible issues: Memory contention with the buffer pool.
Resolution: Increase memory available to SQL server.
Lock Memory
This performance counter returns the total amount of dynamic memory the server is using for locks.
Lock pages in memory is used to prevent older versions of Windows and SQL from allowing Windows operating system page out of the buffer pool.
Possible issues: Lock pages in memory determines which accounts can use a process to keep data in physical memory, which prevents the system from paging the data to virtual memory on disk. Exercising this privilege could significantly affect system performance by decreasing the amount of available RAM.
Resolution: Upgrade to Windows 2008 R2 or greater and SQL 2008 or greater.
Optimizer Memory
This performance counter returns the total amount of dynamic memory the server is using for query optimization. There are no thresholds associated with this performance counter.
Connection Memory
This performance counter returns the total amount of dynamic memory the server is using for maintaining connections.
SQL Server sets aside three packet buffers for every connection made from a client. Each buffer is sized according to the default network packet size specified by the sp_configure stored procedure. If the default network packet size is less than 8KB, the memory for these packets comes from SQL Server's buffer pool. If it is 8KB or larger, the memory is allocated from SQL Server's MemToLeave region. It is worth noting that the default network packet size for the .NET Framework Data Provider for SQL Server is 8KB, so the buffers associated with managed code client connections typically come from SQL Server's MemToLeave region. This contrasts with classic ADO applications, where the default packet size is 4KB, and the buffers are allocated form the SQL Server buffer pool.
Memory Grants Pending
This performance counter returns the total number of processes waiting for a workspace memory grant.
Memory resources are required for each user request. If sufficient memory is not available, the user waits until there is adequate memory for the query to run.
Possible issues: Returned values greater than zero for a sustained period of time is a very strong indicator of memory pressure.
Resolution: You should first examine the database design, queries, and indexes to ensure the system is properly tuned before installing additional RAM. There may be query inefficiencies in the instance that is causing excessive memory grant requirements, such as large Sorts or Hashes that can be resolved by tuning the indexing or queries being executed.
Compare with Memory Grants Outstanding. If the number of pending grants increases, try the following:
- Add more memory to SQL Server
- Add more physical memory to the server.
- Check for memory pressure. See and correct indexing if you experience “Out of memory” conditions.
Workfiles Created/sec
This is the number of work files created per second. For example, work files could be used to store temporary results for hash joins and hash aggregates. The returned value should be less than 20. Tempdb work files are used in processing hash operations when the amount of data being processed is too large to fit into the available memory.
Possible issues: High values can indicate thrash in the tempdb file as well as poorly coded queries.
Resolution: It is possible to reduce the value this monitor returns by making queries more efficient by adding/changing indexes. Adding additional memory will also help.
Worktables Created/sec
This performance counter displays the number of work tables created per second. For example, work tables could be used to store temporary results for query spool, lob variables, XML variables, and cursors. The returned value should be less than 20. Worktables are used for queries that use various spools (table spool, index spool, and so on).
Possible issues: High values could cause general slowdown.
Resolution: Remediation requires rewriting your procedures.
Cache Hit Ratio
This metric is the ratio between Cache Hits and Lookups. Cache Hit Ratio measures how much the plan cache is being used.
A high percentage here means that your SQL Server is not building a new plan for every query it is executing and is working effectively and efficiently. A low percentage here means that, for some reason, the SQL Server is doing more work than it needs to. This metric needs to be considered alongside the plan cache reuse metric which looks at the spread of plan reuse through your cache.
Plan cache is memory used to store objects such as stored procedures, ad hoc and prepared Transact-SQL statements, and triggers. The plan cache is the component of SQL memory that stores query plans for re-use. When a query is issued against SQL, the optimizer will attempt to re-use a cached plan if the traits of the query permit - but can only do so if that plan resides in cache, otherwise it needs to compile a new plan. This is not to say that every plan that is cached is re-used. Changes in schema, a query running parallel that may have run serially before, or changes in parameters may require a new plan to be compiled even if a similar plan exists in cache. Plan compilations are expensive though. Ideally this counter should be near 100%.
Possible issues: The value of this counter should never fall below 90%. Low cache hit ratio (<20%) along with a sustained query execution rate (Batch Requests/sec) indicates that compiled plans are not being re-used. It should be noted that the hit ratio counter may be skewed by internal lookups performed.
Resolution: The amount of caching should be reduced by examining the workload to see if queries were parameterized or can be rewritten with stored procedures.
Page Reads/sec
This performance counter returns the number of physical database page reads issued. 80 – 90 physical database page reads per second is normal.
Possible issues: Returned values that are high could indicate indexing or memory constraint.
Resolution: Attempt to tune the application so that fewer I/O operations are required. For example, perhaps I/O operations would be reduced if there were the appropriate indexes, or if the database design were de-normalized. If the applications cannot be tuned, you will need to acquire disk devices with more capacity. Because physical I/O operations are expensive, you may be able to minimize the cost either by using a larger data cache, intelligent indexes, more efficient queries, or by changing the database design.
Page Writes/sec
This performance counter returns the number of physical database page writes issued. 80 – 90 physical database page writes per second is normal.
Possible issues: If the returned values are high, you should check the Lazy Writer/sec monitor. This can indicate that your SQL Server's buffer cache is small and that your SQL Server needs to free up dirty pages.
Resolution: Check your SQL server and verify its memory is being used efficiently. Applications other than SQL may be using a great deal of memory. Try and recover memory by closing unnecessary applications. Installing additional memory may also help.
Page Splits/sec
This performance counter returns the number of page splits per second. The returned value for this monitor should be as low as possible.
Possible issues: Returned values that are high can indicate the overflowing of index pages. A high value for this counter is not bad in situations where many new pages are being created, since it includes new page allocations.
Resolution: To avoid Page Splits, review the table and index design so as to reduce non-sequential inserts. You can also implement Fillfactor
and Pad_Index
to leave more empty space per page.
Page Life Expectancy
This performance counter returns the number of seconds a page will stay in the buffer pool without references. This performance monitor reports, on average, how long data pages are staying in the buffer.
Possible issues: If this value gets below 300 seconds, this is a potential indication that your SQL Server could use more memory to boost performance. Consistently having pages that stay in memory for less than that amount of time may indicate the need for more memory.
Resolution: Add additional memory to your SQL server.
Free pages
This performance counter displays the total number of pages on all free lists. This is not main indicator of memory problems and could only be used to signal possible memory pressure. To tune thresholds to your environment, use the Baseline Calculator for this performance counter.
Possible issues: If this counter is critical and other memory counters are good, it is possible that there are no problems with memory.
Resolution: Check other memory counters. If they have critical values, you may try to install additional memory into SQL server.
Free Memory
The amount of memory available for SQL Server to use.
When SQL Server is using memory dynamically, it queries the system periodically to determine the amount of free memory. Maintaining this free memory prevents the operating system (OS) from paging. If less memory is free, SQL Server releases memory to the OS. If more memory is free, SQL Server may allocate more memory. SQL Server adds memory only when its workload requires more memory; a server at rest does not increase the size of its virtual address space. The returned value should be as high as possible.
Possible issues: If Resource Semaphore does not find enough free memory, then it puts the current query into the waiting queue.
Resolution:
- Increase RAM
- Increase SQL Server’s Max Server Memory
- Consider OS requirements and other instances
Total Pages
This performance counter returns the number of pages in the buffer pool. The returned value includes database, free, and stolen pages.
This counter is not available in SQL 2012.
SQL Compilation/sec
This performance counter returns the number of times per second that SQL Server compilations have occurred. This value should be as low as possible.
Possible issues: If you see a high value, say above100, then this can be an indication that there are a great deal of ad hoc queries that are running which may cause increased CPU usage.
Resolution: Re-write the running ad hoc queries as stored procedures or use the following command: sp_executeSQL
.
Full Scans/sec
This performance counter returns the number of Full Scans on base tables or indexes. This is the number of unrestricted full scans per second. These can be either base-table or full-index scans. To tune thresholds to your environment, use the Baseline Calculator for this performance counter.
Possible issues: Values greater than 1 or 2 indicate table/Index page scans are occurring. If the CPU is running high, you should investigate the cause as related to this counter measurement. You can rule this out as a problem if the full scans are on small tables.
Resolution: Following are the main causes of high Full Scans/sec values:
- Missing indexes
- Too many rows requested; Queries with missing indexes or too many rows requested will have a large number of logical reads and an increased CPU time.
- Scans are IO-intensive and should run within your databases minimally. Identify the tables that have a large number of scans against them. Review the fill factor you have set up on your indexes and minimize it where appropriate.
Batch Requests/sec
This performance counter returns the number of Batch Requests that SQL Server receives per second. The values this monitor returns generally follows in step as to how busy your server's CPUs are. From a network bottleneck approach, a typical 100Mbs network card is only able to handle about 3,000 batch requests per second.
Possible issues: Generally speaking, over 1,000 batch requests per second indicates a very busy SQL Server. If this is the case, you may soon experience a CPU bottleneck, if you are not already. Of course, this is a relative number, and the more powerful your hardware, the more batch requests per second your SQL Server can handle.
Sometimes low Batch Requests/Sec can be misleading. If there were a SQL Statements/sec counter, this would be a more accurate measure of the amount of SQL Server activity. For example, an application may call only three stored procedures, yet each stored procedure does a great deal of work. In this case, we will see a low number for Batch Requests/sec, but each stored procedure (one batch) will execute many SQL statements that drive up CPU and other resources. As a result, many counter thresholds based on the number of Batch Requests/sec will seem to identify issues because the batch requests on such a server are unusually low for the level of activity on the server.
Resolution: Check your SQL server and verify system resources are being used efficiently. Applications other than SQL may be using unnecessary system resources. Try and recover memory by closing unnecessary applications. Installing additional memory and upgrading your hardware should solve this problem.
Data File(s) Sizer
This performance counter shows the cumulative size (in kilobytes) of all the data files in the database including any automatic growth. Monitoring this counter is useful for determining the correct size of tempdb. The value returned from this monitor reports the size of your database (not including the Transaction log). Information about distinct primary data files and secondary data files is shown under the Files section of the Database Overview page in SQL Server Management Studio.
Having a file size increase on demand is expensive and will degrade performance significantly. Autogrowth should only be used to let a database increase in size if you are running out of disk space. Autogrowth should not be used to manage your MDF size increases. Data files should be sized properly when they are initially created to meet the expected growth. This also helps avoid fragmentation, which leads to better database performance.
Possible issues: Be wary of large and unusual increases to the size of your data files. Generally speaking, your database should be sized to minimize Autogrowth (http://msdn.microsoft.com/en-us/library/ee308850.aspx). An increase in size is expensive concerning I/O. Additionally, this will fragment your data and log files. If the returned value occasionally drops to zero, this can indicate a problem with the following file: sys.master_files
. This file is what the SQL Monitor uses to collect data.
Resolution:
- Use Full Recovery Mode for your databases and regularly backup everything, including the transaction log.
- Manually add space as data increases in size. It is recommended that you have approximately one year of space available for data growth. If this is not possible, move the database to a higher capacity drive or simply upgrade the drive.
Compressing files causes fragmentation and is therefore not recommended.
Log File(s) Size
This performance counter shows the cumulative size (in kilobytes) of all the transaction log files in the database. Every transaction in the database is logged to the Transaction log. This is useful during recovery in the event of data loss. Information about the log file is shown under the Files section of the Database Overview page in SQL Server Management Studio.
The size of this file is based on the logging level set for the database. By default, a new database uses Full Recovery Mode or the recovery mode that the model database is set up as. Transactions in the log file are only removed when a complete backup or transaction log backup is initiated. This enables pinpoint recovery for critical applications. Be aware that if the transaction log is not backed up on a regular basis, the log will grow until your disk is completely full. The transaction log should be adjusted to a reasonable size. This will depend on the number of transactions you expect, and how often you perform backups.
Set the correct Autogrowth properties: The default of 10% Autogrowth for data and log files should be enough for low use databases. Conversely, a 500 MB Autogrowth rate may be better suited for a busy database, preventing a large I/O impact caused by normal Autogrowth operations.
Possible issues: If the returned value occasionally drops to zero, this can indicate a problem with the following file: sys.master_files
. This file is what the SQL Monitor uses to collect data.
With the Simple Recovery Model, fast writing to the transaction log triggers Autogrowth.
Resolution:
- If performing an insert operation, consider switching the recovery model to Bulk Logged for the insert.
- If you do not need pinpoint recovery, it is recommended you switch to Simple Recovery model.
- If your database is configured with the Full or Bulk Logged recovery model, back up the log on a regular basis so it can be truncated to free up log space.
This action removes inactive virtual log files, but does not reduce the file size.
Total Size
This performance counter returns the total size of the database including white space, in MB.
Possible issues: Running out of storage space.
Resolution: Shrink the database if free space is running low.
Total Size: Percent Change 24 Hrs
This value is similar to the Total Size performance counter, but calculated as a percentage of change over the last 24 hours. You can use this metric to create an alert that includes configured thresholds, which is triggered if a SQL database experiences sudden growth.
Possible issues: A database that quickly changes in size by a high percentage may indicate a problem that should be investigated.
Active Transactions
This performance counter returns the number of active transactions for the database. During an active transaction, no other transaction can modify the data held by the active transaction. While you are modifying data, an exclusive lock is held on that data. Conversely, you cannot read data held by another transaction because another transaction holds an exclusive lock on the data that prevents it from being read. The returned value for this monitor should be less than 200.
Possible issues: A high number of active transactions can indicate that certain transactions are never completing.
Resolution: Keeping a low level of active transactions assists in keeping requested data accessible.
Transactions/sec
This performance counter returns the number of transactions started for the database, per second. Transaction rate is affected by general system performance and resource constraints, such as I/O, number of users, cache size, and the complexity of requests.
Possible issues: A high rate of transactions can indicate that some transactions are not completing.
Resolution: Using stored procedures in transactions speeds transaction processing because SQL code in stored procedures runs locally on the server. The transactions can finish much faster without the network traffic required by the transactions.
Log Cache Hit Ratio
This performance counter returns the percentage of Log cache reads satisfied from the Log cache. The log cache is used to hold information that will be written to the transaction log. There are no thresholds associated with this performance counter.
Log Cache Reads/sec
This performance counter returns the amount of Reads performed per second through the Log Manager cache.
Log Bytes Flushed/sec
This performance counter shows the average log bytes flushed per second for the current database since the last data point. The value returned helps identify trends of the transaction log. A log flush happens when data is written from the log cache to the transaction log on disk, each time a transaction happens.
This is not available in SQL 2012.
Possible issues: The rate at which log records are written to the disk. This is the log generation rate of the application. It plays a very important role in determining database mirroring performance. This counter is in the Databases performance object.
Log Bytes Flushed/sec can indicate many small transactions leading to high mechanical disk activity.
Resolution: Look at the statements associated with the log flush and determine if the number of log flushes can be reduced. When a single transaction is used, the log records for the transaction can be bundled and a single, larger write can be used to flush the gathered log records. The mechanical intervention is significantly reduced. It is not recommended that you increase your transaction scope. Long-running transactions can lead to excessive and unwanted blocking as well as increased overhead.
Log Flush Wait Time
This performance counter returns the total wait time (in milliseconds) to flush the log to the transaction log file. On an AlwaysOn secondary database, this value indicates the wait time for log records to be hardened to disk.
Log Flush Waits/sec
This performance counter returns the number of commits, per second, waiting for the log flush. This value should be as low as possible.
Possible issues: A high number of Log Flush Waits can indicate that it is taking longer than normal to flush the transaction log cache. This will slow the performance of your SQL Server.
Resolution: Check the value of the Disk avg. Write time monitor. If the returned value is greater than 5ms, then this can indicate that there is an I/O bottleneck. Move your transaction log file to a disk drive separate from your data files. This should increase performance since there is no access competition on the same disk. Consider upgrading to RAID 10 for transaction log storage. Also, adding spindles to your RAID array should increase performance.
Log Flushes/sec
This performance counter returns the number of log flushes per second, averaged since the last collection time. A log flush happens when a transaction is committed and data is written from the log cache to the transaction log file. The log cache is a location in memory where SQL Server stores data to be written to the log file. This is used to roll back a transaction before it is committed. Once complete, the log cache is flushed to the physical log file on the disk. Generally speaking, log flushes per second should parallel the number of transactions per second.
Possible issues: If the returned value is higher than expected, check your use of explicit transactions in your queries.
Resolution: Explicitly define the start and end of your transactions. This should reduce the number of log flushes, and reduce the impact on I/O. Also check the Log Bytes Flushed/Sec monitor.
Percent Log Used
This performance counter returns the percentage of space in the log that is in use. This is the size of the transaction log that actively contains data in relation to the total physical size on disk of the log file.
Possible issues: If the log file is 100% used, it will attempt to increase its size. If there is not sufficient disk space to accommodate the growth, the database will stop functioning.
Resolution: Perform a transaction log backup to truncate the log.
Average Disk sec/Write
This performance counter shows the average time, in seconds, of a write of data to the disk. The returned value indicates the average time of write data from the disk. 4-8ms is ideal. The returned value is considered acceptable up to 20ms. Any higher value needs further investigation. To tune thresholds to your environment, use the Baseline Calculator for this performance counter.
Possible issues: Values greater than 15-20ms nay indicate disk bottlenecks.
Resolution: Increase the number of hard disks.
Average Disk sec/Read
This performance counter shows the average time, in seconds, of a read of data from the disk. The returned value indicates the average time of read data from the disk. 4-8ms is ideal. The returned value is considered acceptable up to 20ms. Any higher value needs further investigation. To tune thresholds to your environment, use the Baseline Calculator for this performance counter.
Possible issues: If a value greater than 15-20ms is reported, this may indicate disk bottlenecks.
Resolution: Increase the number of hard disks.
Physical Disk sec/Write
This performance counter returns the ratio of elapsed time when the disk drive was busy with read or write requests. This counter is deceptive because it makes no accommodation for multiple spindles. Thus, the more spindles (i.e. physical hard disks) you have, the higher the percentile values can go. Conversely, if these spindles are shared across LUNs or other services, you may have high numbers on this counter without any correlation to SQL Server activity. The value for this counter should be below 50%. To tune thresholds to your environment, use the Baseline Calculator for this performance counter.
Possible issues: If this counter sustains an average above 70%, you may have contention with your drive or RAM.
Resolution: You increase number of hard drives used by SQL server.
Average Disk sec/Read
This performance counter returns the average number of pages faulted per second. This counter gives an idea of how many times information being requested is not where the application expects it to be. The information must either be retrieved from another location in memory or from the pagefile. While a sustained value may indicate trouble, you should be more concerned with hard page faults that represent actual reads or writes to the disk. Disk access is much slower than RAM.
Possible issues: Any measurement higher than zero delays response time and probably indicates that more RAM is needed.
Resolution: Add additional memory to your SQL server.
Physical Disk Time
This is the rate at which pages are read from or written to disk to resolve hard page faults. This is a primary indicator of the kinds of faults that cause system-wide delays. This should be close to zero on a dedicated SQL Server. You will see spikes during backups and restores, but this is normal. To tune thresholds to your environment, use the Baseline Calculator for this performance counter.
Possible issues: High values causes hard page faults, which can cause SQL Server to use the page, as opposed to RAM.
Resolution: You may want to add additional RAM to stop the paging.
Page Faults/sec
This is the amount of available physical memory on the server. An acceptable output for this may vary widely based on how much physical memory is in the machine. If you have 2 GB of RAM installed on the machine, it is common to see SQL Server use 1.7 GB. If no other processes are running on your SQL Server, ensure you have at least 80 MB available for Windows at any given time. To tune thresholds to your environment, use the Baseline Calculator for this performance counter.
Possible issues: Low values show that SQL server has lack of memory.
Resolution: Install additional memory.
Pages/sec
This is the rate at which pages are read from or written to disk to resolve hard page faults. This is a primary indicator of the kinds of faults that cause system-wide delays. This should be close to zero on a dedicated SQL Server. You will see spikes during backups and restores, but this is normal. To tune thresholds to your environment, use the Baseline Calculator for this performance counter.
Possible issues: High values causes hard page faults, which can cause SQL Server to use the page, as opposed to RAM.
Resolution: You may want to add additional RAM to stop the paging.
Available MBytes
This is the amount of available physical memory on the server. An acceptable output for this may vary widely based on how much physical memory is in the machine. If you have 2GB of RAM installed on the machine, it is common to see SQL Server use 1.7GB of RAM. If no other processes are running on your SQL Server, ensure you have at least 80MB available for Windows at any given time. To tune thresholds to your environment, use the Baseline Calculator for this performance counter.
Possible issues: Low values show that SQL server has lack of memory.
Resolution: Install additional memory.
Page Usage
This shows the percentage of the page file that is being utilized. A Page File is simply a file on the hard drive that handles situations where the system wants to move or “page out” sections of memory. There are several situations that cause this, but the one you should be most concerned about is when the system is out of physical memory. To tune thresholds to your environment, use the Baseline Calculator for this performance counter.
Possible issues: Values greater than 70 percent indicate paging and lack of memory. If the system runs out of memory it can “borrow” some storage from the hard drive to release some memory until it needs that data again. The reason that this is bad is because hard drives are amazingly slow in comparison with solid-state memory access. Using the page file slows SQL Server a great deal.
Resolution: Install additional memory.
Database Pages
This metric tells you have many database pages are currently being occupied in the data cache. The higher the buffer manager Database Pages is, the less room there is for SQL Server to cache more data pages. This means that SQL Server may have to free up data cache pages order to make room for pages being moved in from disk to the data cache, which can increase disk IO and hurt performance. There are no specific thresholds for this counter as each server is different. Instead, watch baseline values and look for sudden changes in the baseline value.
Possible issues: If the value for this counter increases above its typical baseline value, this may indicate memory pressure for the SQL Server instance.
Resolution: Investigate buffer management and disk I/O.
Longest Transaction Running Time
The length of time, in seconds, the transaction that has been running the longest, has been active. Transactions that run longer than others use more resources. They can be used to track down procedures and calls that are taking longer than expected by identifying the specific transaction(s).
Possible issues: Long running transactions can prevent truncation of transaction logs. This can cause the transaction log files to grow until they consume all available physical disk space shutting down access to the database.
Resolution: Check the functionality of the query and/or redesign the long running transaction.
Granted Workspace Memory
This performance counter returns the total amount of memory currently granted to executing processes, such as Hash, Sort, Bulk Copy, and Index creation operations.
This performance counter tells you how much memory has currently been granted to running queries. If there is memory pressure because of workspace memory, this value should be at least 25% of the virtual memory available to SQL Server.
Possible issues: If the memory pressure is severe, the server might return errors such as 701 or 8645.
Resolution: If this is the case, this might be a good reason to consider using SQL Server 64-bit.
Prove Scans/sec
This performance counter returns the number of Probe Scans, per second, that are used to find at most, one single qualified row in an index or base table directly.
Range Scans/sec
This performance counter returns the number of Qualified Range Scans through indexes per second. This monitor returns the number of qualified range scans through indexes per second.
Auto-Param Attempts/sec
This performance counter returns the number of auto-parameterization attempts per second.
The total for this monitor should be the sum of the failed, safe, and unsafe auto-parameterizations. Auto-parameterization occurs when an instance of SQL Server tries to parameterize a Transact-SQL request by replacing some literals with parameters so that reuse of the resulting cached execution plan across multiple similar-looking requests is possible. Note that auto-parameterizations are also known as simple parameterizations in newer versions of SQL Server. This counter does not include forced parameterizations.
Possible issues: Bad T-SQL coding practices can increase recompilation frequency and adversely affect SQL Server's performance. Such situations can be debugged and corrected in many cases.
Failed Auto-Params/sec
This performance counter returns the number of failed auto-parameterization attempts per second.
The value returned by this monitor should be low. Auto-parameterizations are also known as simple parameterizations in later versions of SQL Server. Preferred values should be near zero. To tune thresholds to your environment, use the Baseline Calculator for this performance counter.
Possible issues: Bad T-SQL coding practices can increase recompilation frequency and adversely affect SQL Server's performance. Such situations can be debugged and corrected in many cases.
Cache Object Counts
This performance counter returns the number of cache objects in the cache.
The Plan Cache object provides counters to monitor how SQL Server uses memory to store objects such as stored procedures, ad hoc and prepared Transact-SQL statements, and triggers. Multiple instances of the Plan Cache object can be monitored at the same time, with each instance representing a different type of plan to monitor.
Possible issues: High numbers of total cached objects use portions of the physical memory available to a SQL instance on a per database basis. This can result in one database cache impacting the performance of other local databases due to memory contention.
Resolution: Increase the memory available to SQL services, reduce the number of databases on this instance of SQL, or examine the volume of ad hoc queries running against the server.
Cache Objects in Use
This performance counter returns number of cache objects in use. The Plan Cache object provides counters to monitor how SQL Server uses memory to store objects such as stored procedures, ad hoc and prepared Transact-SQL statements, and triggers. Multiple instances of the Plan Cache object can be monitored at the same time, with each instance representing a different type of plan to monitor.
Possible issues: High numbers of cached objects in use consume the memory available to a SQL server at a higher rate than non-active objects on a per database basis. This can result in one database cache impacting the performance of other local databases due to memory contention.
Resolution: Increase the memory available to SQL server, reduce the number of active objects, consolidate stored procedures, consolidate and convert ad hoc queries to stored procedures where possible, or reduce the number of databases on the server.
Memory Grants Outstanding
This performance counter returns the total number of processes that have successfully acquired a workspace memory grant.
Look at Memory Grants Outstanding and Memory Grants Pending. If you see a long queue of pending grants as compared to outstanding grants, there is likely memory pressure because of query workspace memory. You can confirm this by checking the Granted Workspace Memory (KB) performance counter that tells you how much memory has currently been granted to running queries.
Possible issues: A returned value that is high can indicate peak user activity. If there is memory pressure because of workspace memory, this value should be at least 25% of the virtual memory available to SQL Server. If the memory pressure is severe, the server might even return errors such as 701 or 8645.
Resolution: If severe, and using 32-bit, consider using SQL Server 64-bit. See Memory Grants Pending.
Repl. Trans. Rate
This performance counter returns the number of transactions read out of the transaction log and sent to the distribution database
SQL Server Overall Replication Performance Analysis provides a graph of several SQL Server overall replication performance counters for the specified server during the specified time period. Performance counter data is averaged by the hour. The graph displays the following replication performance counters:
- SQLServer:Databases / Replication Pending Xacts (SQL Server; Scale=1\100)
- SQLServer:Databases / Replication Transaction Rate (SQL Server; Scale=1\100)
- SQLServer Replication Published DB / Replicated Transactions Per Second
Bulk Copy Rows/sec
This performance counter returns the number of rows bulk copied per second. Measure the number of rows coming into the database. Optimization yields a higher number.
Possible issues: BCP to bulk load data over the network, having the correct throughput configured is crucial.
Bulk Copy Throughout/sec
This performance counter returns the amount of data bulk copied (in kilobytes) per second.
Possible issues: Overall throughput will mostly be limited by your I/O subsystem.
Backup/Restore Throughput/sec
This performance counter shows the Read/Write throughput for backup and restore operations of a database per second.
You can measure how the performance of the database backup operation changes when more backup devices are used in parallel, or when faster devices are used. Throughput of a database backup or restore operation allows you to determine the progress and performance of your backup and restore operations. There are no thresholds associated with this performance counter.
Shrink Space
This is essentially the white space in the database that can be reclaimed by shrinking the database. Shrinking data files recovers space by moving pages of data from the end of the file to unoccupied space closer to the front of the file. When enough free space is created at the end of the file, data pages at end of the file can de-allocated and returned to the file system.
Possible issues: Data that is moved to shrink a file can be scattered to any available location in the file. This causes index fragmentation and can slow the performance of queries that search a range of the index.
Resolution: To eliminate the fragmentation, consider rebuilding the indexes on the file after shrinking.
Average Read Latency
This performance counter indicates the response time of the disk – how long it takes for a read request to get acknowledged by the disk. The average read latency should be less than 20ms for database files.
Average Write Latency
Avg. Disk sec/Write is the average time, in seconds, to write data to the disk. This analysis determines if any of the physical disks are responding slowly.
Possible issues: If the response times are greater than .015 (15 milliseconds), then the disk subsystem is keeping up with demand, but does not have much overhead left. If the response times are greater than .025 (25 milliseconds), then noticeable slow-downs and performance issues affecting users may be occurring.
Average Bytes/Read
Perfmon captures the total number of bytes sent to the disk (write) and retrieved from the disk (read) over a period of one second. The Disk Read Bytes/sec and the Disk Write Bytes/sec performance counters break down the results displaying only read bytes or only write bytes, respectively.
Average Bytes/Write
Perfmon captures the total number of bytes sent to the disk (write) and retrieved from the disk (read) over a period of one second. The Disk Read Bytes/sec and the Disk Write Bytes/sec performance counters break down the results displaying only read bytes or only write bytes, respectively.
Number of Reads/sec
This performance counter returns the number of physical database page reads issued per second. This statistic displays the total number of physical page reads across all databases. 80 to 90 per second is normal. Anything that is above 90 indicates indexing or memory constraint.
Possible issues: Because physical I/O is expensive, you may be able to minimize the cost, either by using a larger data cache, intelligent indexes, and more efficient queries, or by changing the database design.
Resolution: Attempt to tune the application so that fewer I/O operations are required. For example, perhaps I/O would be reduced if there were appropriate indexes or if the database design were de-normalized. If the applications cannot be tuned, you will need to acquire disk devices with more capacity.
Compare to the Memory: Pages/sec performance counter to see if there is paging while the SQL Server:Buffer Manager\Page reads/sec is high.
Before adjusting the fill factor, at a database level, compare the SQL Server:Buffer Manager\Page reads/sec performance counter to the SQL Server:Buffer Manager\Page writes/sec counter, and use the fill factor option only if writes are a substantial fraction of reads (greater than 30 percent).
Numbers of Writers/sec
This performance counter returns the number of data pages written per second. Page writes per second are the total number of physical writes, including checkpoint writes, lazy writer writes, and large block writes done during index creation or bulk copy operations.
Possible issues: If Page Writes/sec is high, this might indicate a SQL Server performance issue.
Resolution: If these values are high, you can try to reduce the values by tuning your application or database to reduce I/O operations such as index coverage, better indexes, normalization, increasing the I/O capacity of the hardware, or by adding memory.
Page Lookups/Batch Request
This performance counter displays the number of page splits per second that occur as the result of overflowing index pages. (Page lookups/sec) / (Batch Requests/sec) > 100
. To tune thresholds to your environment, use the Baseline Calculator for this performance counter.
Possible issues: When the ratio of page lookups to batch requests is much greater than 100, this is an indication that while query plans are looking up data in the buffer pool, these plans are inefficient.
Resolution: Identify queries with the highest amount of logical I/O's and tune them.
Compilations/Recompilations/Sec
Monitoring the number of query compilations and recompilations and the number of batches received by an instance of SQL Server gives you an indication of how quickly SQL Server is processing user queries and how effectively the query optimizer is processing the queries. Compilation is a significant part of a query's turnaround time. to save the compilation cost, the Database Engine saves the compiled query plan in a query cache. The objective of the cache is to reduce compilation by storing compiled queries for later reuse, therefore ending the requirement to recompile queries when later executed. However, each unique query must be compiled at least one time.
Compilations/sec divided by 10 minus recompilations/sec. In order for thresholds to be tuned to your environment, thresholds for this performance counter should be set using the Baseline Calculator.
Possible issues: Query recompilations can be caused by the following factors:
- Schema changes, including base schema changes such as adding columns or indexes to a table, or statistics schema changes such as inserting or deleting a significant number of rows from a table.
- Environment (Set statement) changes. Changes in session settings such as
Ansi_Padding
orAnsi_Nulls
can cause a query to be recompiled.
Plan Re-Use
A query plan is used to execute a query. Plan re-use is desirable for OLTP workloads because re-creating the same plan (for similar or identical transactions) is a waste of CPU resources. To tune thresholds to your environment, use the Baseline Calculator for this performance counter.
Possible issues: Zero cost plans will not be cached (not re-used) in SQL 2005 SP2. Applications that use zero cost plans will have a lower plan re-use but this is not a performance issue.
Resolution: Review your plan re-use design.
Forwarded Records/Batch Requests
This performance counter identifies the use of a pointer created when variable length columns caused a row to move to a new page in a heap.
Possible issues: Rows with Varchar columns can experience expansion when Varchar values are updated with a longer string. In the case where the row cannot fit in the existing page, the row migrates and access to the row will traverse a pointer. This only happens on heaps (tables without clustered indexes).
Resolution: Evaluate clustered indexes for heap tables. In cases where clustered indexes cannot be used, drop non-clustered indexes, build a clustered index to Reorg pages and rows, drop the clustered index, then recreate non-clustered indexes.
Total Latch Wait Time/Latch Waits
This monitor returns the ratio of Total Latch Wait Time, in milliseconds, for latch requests in the last second to the amount of latches, in a one-second period that had to wait. Latches are lightweight means of holding a very transient server resource, such as an address in memory.
This ratio should be less than 10. Monitoring the latches to determine user activity and resource usage can help you to identify performance bottlenecks. To tune thresholds to your environment, use the Baseline Calculator for this performance counter.
Possible issues: If the returned value is high, it is likely there is an I/O or memory bottleneck.
Resolution: Check your SQL server and verify its memory is being used efficiently. Applications other than SQL may be using a great deal of memory. Try and recover memory by closing unnecessary applications. Installing additional memory may also help.
Recompilations/Compilation
This performance counter shows the ratio of SQL Recompilations to SQL Compilations. SQL Recompilations should be less than 10% of SQL Compilations. To tune thresholds for your environment, consider using the Baseline Calculator. See Manage thresholds in SAM.
Possible issues: Returned values that are high can indicate more temporary tables in use.
Resolution: Change stored procedures so they do not change schemas. Use table variables instead of temporary tables.
Target - Total Server Memory
This performance counter shows the difference between the total amount of dynamic memory the server can consume and the current amount of memory that SQL Server is using.
Possible issues: If you have evidence of memory pressure and this counter is not close to zero, it may indicate that the buffer pool cannot expand. Examples of memory pressure include: (1) Memory Grants Pending metric stays above zero for any significant time. (2) Queries with a significant portion of wait time spent in the RESOURCE_SEMAPHORE
wait type.
Resolution: Installing additional memory into SQL server may resolve the problem or you may need to identify queries with high demands on memory.