Documentation forSolarWinds Observability SaaS

Root Cause Assist

The Root Cause Assist feature for SolarWinds Observability provides rich contextual information of health degradation events for supported entity types or entity groups. Use Root Cause Assist to significantly reduce the mean time to identify (MTTI) and mean time to resolution (MTTR) of health degradation in your monitored environments.

Root Cause Assist helps you stay ahead of the monitoring and managing of system critical events , anomalies, and alerts in your environment with SolarWinds Observability.

How does Root Cause Assist work?

Root Cause Assist jumps into action when the healthscore of any monitored Kubernetes, database, or other entity type, or entity group, degrades within the last 24 hours. When a healthschore degradation is detected, Root Cause Assist collects all the correlated entity events including metrics, alerts, anomalies, and change events for all the source entities and related entities.

To see data that can help you investigate a specific degradation event, you can trigger an RCA. When you trigger an RCA, Root Cause Assist uses its correlation engine powered by SolarWinds AI to identify causal relationships between the collected events across all entities (source entities and related entities) and give you a detailed analysis with context and starting points for further investigation or resolution.

Root Cause Assist processes this data within minutes of a triggering event, helping you troubleshoot issues in real-time.

Currently, Root Cause Assist is available at the entity groups level and entity level for supported entity types. Root Cause Assist will only work for groups with up to 50 entities.

Trigger Root Cause Assist

After a health degradation event is detected for a monitored entity type, or group of entities, you can trigger a Root Cause Assist (RCA).

  1. Open the Entity Group or Entity Explorer for the supported entity type that recorded a health degradation event.

    See Entity Explorer for more details.
  2. Click the Root Cause Assist tab.

  3. Click Trigger RCA.

  4. Click the desired health degradation event within the last 24 hour period that you want to analyze. Click Trigger RCA.

    A RCA Triggered Successfully prompt is displayed. Note that it can take up to one minute to see the details of the Root Cause Assist.

  5. Refresh the Root Cause Assist tab to display your report.

View the details of your Root Cause Assist report

The Root Cause Assist tab displays any available reports for your supported entity group or entity type. Click the desired report to view more details.

Root Cause Assist reports use the "five whys" approach that asks relevant questions based on the recorded correlating events and specific context from the entities involved in the healthscore degradation. This "five whys" approach helps you to recognize the causality of the event, and it can provide you with long term remediation strategies to increase the health of your monitored environment.

Event metrics

The top of the report displays event and entity related metrics. Click the event arrow to display the event details.

Metric Description
Event Timestamp The date and time the event occurred.
Entity type The type of entity.
Entity name The name of the entity.
Health Degradation The range of health score degradation caused by the event.
Event Details Correlating event details.
Event Type The event type such as alert, anomaly, DB event, Kubernetes event.
Event A clickable link to the recorded event in the Alerts Explorer.
Event Details Details of the event.

SolarWinds AI Five Why Summary

Below the metrics, SolarWinds AI poses questions using the "five whys" approach, and a suggestion for improving the health of your monitored entity or entity group. In the image below, we can see that SolarWinds AI identifies the health score degradation issue as caused by underlying infrastructure issues. SolarWinds AI suggests reviewing additional telemetry such as EC2 monitoring metrics, EBS Volume metrics and network traffic, as well as reviewing infrastructure and sizing limits to trace the transaction end-to-end and identify bottlenecks.

Probable correlated events

Root Cause Assist identifies events such as alerts, anomalies, database events, and Kubernetes events that occurred near the time when the Root Cause Assist was triggered. Root Cause Assist takes any event that occurred within two hours of the health degradation event and lists them in tabular format. This helps you to identify the series of events that led to the health score degradation.

Metric Correlated View

The Metric-Correlated View overlays where any correlated events (alerts, anomalies, database events, Kubernetes events) occurred on the chart highlighting the time-series metric data.