Documentation forSolarWinds Observability

Entity health score

Metrics, performance data, entity availability, and other telemetry data is collected for each entity type added to SolarWinds Observability. Telemetry data is displayed in real-time or presented historically in entity widgets. Collected telemetry data helps create a baseline for the typical operating performance of your monitored entity. Anomalies indicate when the operating performance for your entity deviates from the baseline, and alerts notify users when key metrics, logs, or events match pre-defined conditions. An entity's overall health can be determined by analyzing the anomalies and alerts for key metrics.

A health score provides real-time insight into the overall health and performance of your monitored entities. The health score represents the deviation in performance from your entity's typical performance and is calculated based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health score is displayed as a single numerical value that ranges from a Good (70-100) to Moderate (40-69) to Bad (0-39) distinction. Each entity type's health score is calculated based on a combination of telemetry data:

  • Entity state: The current state of the entity compared to typical recorded telemetry. The entity state can be categorized into normal, warning, and critical states. The state of the entity impacts the overall health score.

  • Anomalous entity performance: Entity performance is based on a combination of key metrics that help to determine anomalous patterns in your entity. These key metrics and anomalies vary by entity type.

  • Triggered entity alerts: You can set alerts for each entity type that directly affect the overall health score. Some out-of-the-box alerts are created when an entity is added for monitoring.

    Alerts with a higher severity may have a greater impact on health score. The impact that alert severity has on the health score varies by entity type.

If telemetry data was not received for the entity at any point in time, the health score will not be available. To view the events that resulted in a change to an entity's health score, click the Health tab in the entity's details view.

For some entities, telemetry data cannot be received or a health score would not be relevant. In these cases, their health score is NULL (displayed as '—').

For entity-specific health score information and how to view the health score in SolarWinds Observability:

View entity health score

Since the health score is an important indicator for whether an entity needs immediate attention, it can be found throughout SolarWinds Observability. The following is a list of some of the places that include the health score of an entity.

Entity lists in the Entity Explorer and area overviews

In the Entity Explorer and some area overviews, entities are listed in either a grid or list view.

In Grid View, a hexagon represents each entity. The color of the hexagon indicates whether the entity's health score is Good, Moderate, Bad, or NULL.

In List View, a table lists each entity. The first column of the table displays the health score, with a colored icon and text indicating whether the entity's health score is Good, Moderate, or Bad, followed by a numeric health score. In case of a NULL healthscore, '—' is displayed.

Entity Explorer details view for an individual entity

Click an individual entity in the Entity Explorer. The Health Score widget for the current entity is included in the Overview tab.

An individual entity's health score widget includes two components: the entity's current health score and a timeline of the health history over the selected time period. The current health is displayed as a single number out of 100, with the background of the current health colored based on whether the health is Good, Moderate, Bad, or NULL. Next to the current health score is a timeline charting the health scores of the entity over the selected time period. The health scores plotted on the timeline represent the minimum health scores calculated during a single slice of time. They do not represent the exact health score at a specific moment in time. If telemetry data was not received for the entity at any point in time, causing the health score to be unavailable, there will be a gap in the timeline.

The Health tab shows detailed information about the entity's health. The Health tab includes the entity's current health score and the timeline of the health history in a Health Score widget and a Health Events table. The Health Events table lists the events, such as anomalies and alerts, that affected the entity's health score during the selected time period. Click an event to open the Event Data panel.

Entity Explorer details view for a group of entities

Click an entity group or an individual entity in the Entity Explorer. Health Score widgets for an entity group or entities related to the current entity in the Entity Explorer details view summarize the entities' health scores using a donut chart, grouping the entities based on Good, Moderate, Bad, or NULL health scores.

Area overview

Click an area overview for a specific monitoring area to view the health score for all entities in that monitoring area. Health Score widgets in area overviews summarize the entities' health scores using a donut chart, grouping the entities based on Good, Moderate, Bad, or NULL health scores.

Health score calculation

The critical metrics and telemetry data that are analyzed for anomalous behavior vary by entity type, as do the impact of entity status, alerts, and anomalies. See each section below for specifics about the entity type's health score calculation and how much negative impact an event has on the health score.

AWS entities' health score

AWS EC2 instance

Event Impact on health score
Entity status is not 'running' 100 pts
Anomalies for AWS.EC2.CPUUtilization 50 pts
Anomalies for AWS.EC2.StatusCheckFailed 100 pts
Warning alerts 30 pts
Critical alerts 70 pts

Azure entities' health score

Azure VM instance

Event Impact on health score
Entity status is not 'running' 100 pts
Anomalies for azure.vm.cpu.percentage 50 pts
Anomalies for azure.vm.available 100 pts
Warning alerts 30 pts
Critical alerts 70 pts

Database instance health score

Event Impact on health score

Anomalies for dbo.host.queries.latency_us

25 pts
Anomalies for dbo.host.queries.errors.tput 50 pts
Anomalies for dbo.host.queries.warnings.tput 25 pts

Kubernetes cluster health score

Event Impact on health score
Anomalies for k8s.cluster.nodes.ready.avg 20 pts
Warning alerts 30 pts
Critical alerts 70 pts

Kubernetes node health score

Event Impact on health score
Entity status is not 'ready' 100 pts
Warning alerts 30 pts
Critical alerts 70 pts

Network device health score

Event Impact on health score
Entity status is warning, inactive, or partly available 50 pts
Entity status is critical, unreachable, or shutdown 75 pts
Entity status is down 100 pts

Anomalies for:

  • Orion.CPULoad.AvgLoad
  • Orion.CPULoad.AvgPercentMemoryUsed
  • Orion.ResponseTime.AvgResponseTime
25 pts
Warning alerts 50 pts
Critical alerts 100 pts

OTel integration entities

NGINX health score

Event Impact on health score
Anomalies for nginx.connections_dropped.rate 100 pts
Warning alerts 30 pts
Critical alerts 70 pts

Apache Web Server health score

Event Impact on health score
Anomalies for apache.time.perrequest 100 pts
Warning alerts 30 pts
Critical alerts 70 pts

Self-managed host health score

Event Impact on health score
Entity status is not 'running' 100 pts
Anomalies for system.cpu.utilization.aggregated 50 pts
Warning alerts 30 pts
Critical alerts 70 pts

Service health score

Event Impact on health score

Anomalies for:

  • trace.service.response_time
  • trace.service.errors
50 pts
Warning alerts 50 pts
Critical alerts 100 pts

Website and URI health score

Event Impact on health score

Anomalies for:

  • rum.pageview.load_time (if applicable)
  • synthetics.http.response.time
  • synthetics.https.response.time
20 pts
Warning alerts 50 pts
Critical alerts 100 pts

Website out of the box alerts

Some alerts are created immediately when a website is configured. These out-of-the-box alerts provide an ideal starting point for your health score calculation. You can modify this alert to add notifications, but SolarWinds recommends leaving the alert conditions unchanged. If the alert does not exist or you have modified the pre-configured alert, use the following information to restore your alerts.

Critical website availability alert

This standard alert warns you when a website's availability drops below 95%. You can modify this alert to add notifications, but SolarWinds recommends leaving the alert conditions unchanged.

Alert definition:

  • Severity: Critical
  • Name: Website availability: below 95%
  • Alert type: Metric condition
  • Alert on: Entity
  • Condition:
    • Metric: synthetics.availability
    • Trigger when metric is: lower than 0.95
    • During last: 1 minutes

Warning website availability alert

This standard alert warns you when a website's availability drops below 98% but is above 95%. You can modify this alert to add notifications, but SolarWinds recommends leaving the alert conditions unchanged.

Alert definition:

  • Severity: Warning
  • Name: Website availability: below 95%
  • Alert type: Metric condition
  • Alert on: Entity
  • Condition one:
    • Metric: synthetics.availability
    • Trigger when metric is: lower than 0.98
    • During last: 1 minutes
  • Condition two:
    • Metric: synthetics.availability
    • Trigger when metric is: higher than 0.95
    • During last: 1 minutes