Investigate a wait time anomaly with DPA
DPA uses an anomaly detection algorithm to identify unexpected increases in wait time. The following example shows how DPA can be used to find the root cause of an unexpected increase in wait time.
The problem
When you click a database instance from the DPA home page, DPA displays the Anomaly Detection chart below the Top SQL Statements chart. In this example, the chart shows that DPA detected critical anomalies for January 24.
The investigation
-
On the Anomaly Detection chart, click the bar that represents January 24.
The one-day Anomaly Detection chart shows that the anomaly occurred during the 2 AM hour.
-
Click the bar that represents the 2 AM hour.
The chart shows the top SQL statements that were executed during that hour. In this example, the COMMIT TRAN SQL statement had significantly higher wait times than any other statement. It is likely to be the cause of the anomaly.
-
Click the name of the COMMIT TRAN SQL statement to open the Query Details page, which shows detailed information about that SQL statement's performance during the selected time period (2:00 AM to 3:00 AM in this case).
Notice that the predominant wait type during that hour is HADR_SYNC_COMMIT. This wait indicates that the primary AG replica is waiting for secondary replicas to commit their data. The most likely issue is that there are slowdowns within the AG environment.
-
To get an idea of what is normal for COMMIT TRAN, click the date control in the top center of the screen and choose Last 90 Days.
The chart displays wait times for the COMMIT TRAN SQL statement during the last 90 days.
You can see that:
- The SQL statement had long HADR_SYNC_COMMIT waits from November 15 through December 10, but the waits were not as severe as January 24.
- From December 10 through January 24, there were almost no issues.
- Unusually long waits occurred on January 24, and DPA reported an anomaly.
- Even though the waits are not as severe as on January 24, the problem continues to occur.
-
Click the AG status tab, and then click the availability group name to view information about the AG status. In this case, you find that there are noticeable delays.
Based on this data, your team uses SolarWinds SAM to investigate further, and you identify network issues between the primary and replica databases as the underlying cause of the anomalies.
You also configure a Database Instance Wait Time anomaly alert to notify you when anomalies are detected.
Learn more
Learn more about the DPA approach to investigating performance issues. For other examples of using DPA to investigate a performance issue, see: