Documentation forSolarWinds Platform Self-Hosted

SQL performance - measuring and monitoring a production server

In SQL 2008 servers the tool is called Reliability and Performance Monitor. The tool can be launched from either version using the perfmon (perfmon.exe) command. This tool makes it easy to locate and observe the counters you will use to measure SQL server performance.

In the Orion environment, the single most important SQL server performance metric is disk queue length. Queue length is a measurement of the SQL writes that are waiting to be written to disk. When disk write queues start lengthening and there is a steady load on the SQL writes, the queues may snowball to the extent that write requests get dropped. This leads to gaps in Orion data and will affect the overall performance of the SQL server and Orion.

A good rule of thumb is that disk queue length should not exceed two times the number of effective spindles in the SQL storage array. The effective spindle count is the number of striped spindles. For a RAID 10 direct attached storage unit with eight total disks the effective spindle count is four. Four of the spindles in this array are the primary striped array and the other four are secondary striped mirrors of the four primary spindles. Because no performance gain is achieved by mirroring disks, only the primary striped set is used to measure performance.

Keeping in mind that this is a rule of thumb, and that performance is a continuum rather than good/bad assessment, the recommendation does not mean that if you have five effective spindles and you peak the queue at fifteen, you will have performance issues. Also this number should not be used to determine the number of spindles required to lower the queue length. This is because the effect of queue length on performance is not linear. Adding one effective spindle to a four effective spindle array can lower the queue length by fifty percent or more.

The following table provides some other important SQL performance counters and their target ranges.

Metric Description Target Range
SQL Server: Access Methods – Page splits/sec Index or data page is split and partially allocated to a new page. May cause IO issues such as excessive disk queue length. Can be addressed by increasing SQL fill factor. The fewer the better. A static page splits number with low disk queue length indicates this is not an issue.
SQL Server: Access Methods – Full Scans/sec Table index failures. When table indexes are not used the result is a disk operation that could have been avoided. High full scan rates will impact transaction rates and overall SQL performance. The lower the better. Near 0 rates are to be expected for a well performing SQL server.
SQLServer: SQL Statistics – Batch Requests/Sec A measurement of transaction rate. High batch requests may cause a heavy CPU load. Static rate. Usual rates are in the thousands and are not a problem unless CPU utilization is too high. Monitor for a slow creep up or any sudden jump that does not recover to normal levels.
SQL Statistics – SQL Re-compilations/sec (re-comps) A query execution measurement. High re-comp rates will drive CPU utilizations The fewer the better. Should be 0 most of the time.
SQL Server: Locks: Dead locks/sec Dead lock rate. None - Dead locks rates higher than 0 should be investigated using MS SQL tools.
SQL Server: Buffer Manager – Buffer Cache hit ratio The percent of requests that can be served from cache rather than disk. Although writes are more important than reads in an Orion environment, this metric should be tracked as an indicator of memory issues. The more, the better.
Physical disk – Reads/sec, writes/sec The read and write rates. This should be measured for logs and data. Because these measure these actual read and write rates they are not an indicator of the maximum I/O rates unless they reach a level where disk queue lengths also increase. Static levels.

While this is not a complete list of the metrics you may wish to monitor, these do a good job of detecting bottlenecks for disk I/O, memory and CPU. The THWACK.com user community has posted several items.

Along with monitoring the SQL counter metrics, some basic metrics for the SQL server should also monitored. These include the following:

Metric Description Recommendation
Database size Total size of the SolarWinds Platform database. Normally less than 20GB for NPM with moderate syslog data and no NetFlow data. NetFlow can increase the database size sever hundreds of GB. Larger databases require more disks, memory and CPU resources. Large numbers of syslog messages also increase the database size. Balance data retention settings with database size and performance.
SQL Server Memory The amount of RAM used by SLQ. Increase the minimum SQL memory setting to allocate eighty to ninety percent of the total memory to SQL.
SQL Server CPU CPU utilization Keep the average CPU utilization as low as possible. CPU peaks of one hundred percent are acceptable as long as the average stays low and the peaks are brief.
AWE Memory AWE memory allow for more efficient use of memory on 32-bit SQL. Enable AWE for all 32-bit SQL. See the Microsoft SMDN Library for details.