About anomaly detection in DPA

DPA uses an anomaly detection algorithm to determine if the wait times for a database instance are significantly higher than usual. In some cases, high wait times are normal and expected. With anomaly detection, DPA can alert you to unexpected increases in wait times, and help you investigate these anomalies.

How does DPA's anomaly detection work?

A machine learning algorithm uses wait time data that DPA collects to predict future wait times. DPA uses these predictions to detect wait times that are significantly higher than expected.

Step 1: 
Data collection

DPA gathers the data that the algorithm will use to learn what normal is and to predict future wait times. Up to 90 days of historical hourly data is used for learning.

Anomaly detection requires a minimum of three days of learning data. DPA does not show any information about anomalies until it has collected at least three days of data. Predictions improve as more data is collected.

Step 2: 
Data analysis and predictions

Based on the learning data, the algorithm calculates:

  • The amount of wait time that the database instance is likely to experience during each 1-hour period for the next 30 days.
  • The standard deviation for the entire data set (which is used to calculate thresholds).

When enough data is available, predictions include daily and weekly seasonality (patterns of predictable fluctuations): 

  • Daily seasonality accounts for differences during each hour. For example, normal wait times at 2 AM are probably different than normal wait times at 2 PM.
  • Weekly seasonality accounts for differences during each day of the week. For example, normal wait times at 2 PM on Saturday are probably different than normal wait times at 2 PM on Wednesday. (Weekly seasonality requires at least 30 days of learning data.)
Step 3: 
Anomaly detection

For each hour, DPA compares the actual amount of wait time during that hour to the predicted value. If the actual amount of wait time is above the warning or critical threshold, DPA: 

  • Changes the color of the wait time meter on the DPA homepage.
  • Displays yellow or red segments on the bars in Anomaly Detection charts.
  • Triggers the Database Instance Wait Time Anomaly alert, if it has been configured.

How DPA determines the status of an incomplete hour

To determine if the wait time meter and hourly Anomaly Detection chart should show a warning or critical status for an incomplete hour, DPA uses the last 6 completed 10-minute intervals (a rolling one-hour interval). The status is updated every 10 minutes. For example, to determine the status of the 2:00 hour:

  • From 2:00 to 2:09, DPA uses data from 1:00 to 1:59.
  • From 2:10 to 2:19, DPA uses data from 1:10 to 2:09.
  • From 2:20 to 2:29, DPA uses data from 1:20 to 2:19 (and so on).

SQL statements excluded from the trend charts

The anomaly detection algorithm uses the total wait time for the database instance, including wait time from any SQL statements that you have excluded from the trend charts. In most cases, a statement is excluded from the trend charts because it always has high wait times and the large bar dominates the charts. If the statement runs on a regular schedule with the expected amount of wait time, no anomaly would be detected during that time period, because high wait times are normal during that period. An anomaly would be detected only if wait times during that period were significantly higher than normal, in which case you might want to investigate the change.

Does anomaly detection work well for all database instances?

DPA's anomaly detection algorithm, like most algorithms associated with workloads, works best when:

  • The monitored database instances have a consistent workload executing against them.

  • Daily and weekly seasonality is consistent. For example, database wait times are similar each Monday at 10 AM.

  • DPA monitoring is always on (not shut down for hours or days at a time).

The algorithm might not work well when:

  • The workload for a database instance is sporadic (for example, QA or reporting instances with inconsistent wait times).

  • Daily and weekly seasonality is not consistent. For example, the workload on Monday at 10 AM varies from one week to the next, with no predictable pattern.

  • DPA is not monitoring the instance consistently, and so it cannot get a good understanding of what normal is.

If anomaly detection does not work well for any of your monitored instances, SolarWinds recommends disabling anomaly detection for those instances.

Large gaps in the learning data

If monitoring stops for more than 30 days, the anomaly detection algorithm does not make predictions based on the stale learning data collected before the 30-day gap. DPA collects new learning data and, after three days, begins to make predictions based on the current data.

Anomaly thresholds

Anomalies are classified as warning and critical. The threshold for each classification is based on the standard deviation of the wait times for the associated time period.

Standard deviation is a measure of how dispersed the values in a data set typically are.

The default values for the thresholds are listed below. You can edit the associated configuration option to change the default values.

Classification Default Threshold Configuration Option
Warning The predicted wait time for the hour + 2 standard deviations ANOMALY_DETECTION_THRESHOLD_WARNING
Critical The predicted wait time for the hour + 3 standard deviations ANOMALY_DETECTION_THRESHOLD_CRITICAL

Specify the learning date after the load on a database instance changes

If the load on a database instance changes significantly (for example, because of changes in the network environment), the previously collected learning data is no longer accurate. To prevent this data from being used for anomaly detection, set the advanced configuration option ANOMALY_DETECTION_FORCE_LEARNING_DATE to the date when the load change occurred. Wait time data collected before this date will not be used to predict future wait times.

Disable anomaly detection for a database instance

By default, anomaly detection is enabled for all database instances. To disable anomaly detection for a database instance that with an inconsistent workload or sporadic monitoring, set the advanced configuration option ANOMALY_DETECTION_ENABLED to False for that instance.