Entity health score
Metrics, performance data, entity availability, and other telemetry data is collected for each entity type added to SolarWinds Observability. Telemetry data is displayed in real-time or presented historically in entity widgets. Collected telemetry data helps create a baseline for the typical operating performance of your monitored entity. Anomalies indicate when the operating performance for your entity deviates from the baseline, and alerts notify users when key metrics, logs, or events match pre-defined conditions. An entity's overall health can be determined by analyzing the anomalies and alerts for key metrics.
A health score provides real-time insight into the overall health and performance of your monitored entities. The health score represents the deviation in performance from your entity's typical performance and is calculated based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health score is displayed as a single numerical value that ranges from a Good (70-100) to Moderate (40-69) to Bad (0-39) distinction. Each entity type's health score is calculated based on a combination of telemetry data:
-
Entity state: The current state of the entity compared to typical recorded telemetry. The entity state can be categorized into normal, warning, and critical states. The state of the entity impacts the overall health score.
-
Anomalous entity performance: Entity performance is based on a combination of key metrics that help to determine anomalous patterns in your entity. These key metrics and anomalies vary by entity type.
-
Triggered entity alerts: You can set alerts for each entity type that directly affect the overall health score. Some out-of-the-box alerts are created when an entity is added for monitoring.
Alerts with a higher severity may have a greater impact on health score. The impact that alert severity has on the health score varies by entity type.
If telemetry data was not received for the entity at any point in time, the health score will not be available. To view the events that resulted in a change to an entity's health score, click the Health tab in the entity's details view.
For some entities, telemetry data cannot be received or a health score would not be relevant. In these cases, their health score is NULL (displayed as '—').
For entity-specific health score information and how to view the health score in SolarWinds Observability:
View entity health score
Since the health score is an important indicator for whether an entity needs immediate attention, it can be found throughout SolarWinds Observability. The following is a list of some of the places that include the health score of an entity.
Entity lists in the Entity Explorer and area overviews
In the Entity Explorer and some area overviews, entities are listed in either a grid or list view.
In Grid View, a hexagon represents each entity. The color of the hexagon indicates whether the entity's health score is Good, Moderate, Bad, or NULL.
In List View, a table lists each entity. The first column of the table displays the health score, with a colored icon and text indicating whether the entity's health score is Good, Moderate, or Bad, followed by a numeric health score. In case of a NULL healthscore, '—' is displayed.
Entity Explorer details view for an individual entity
Click an individual entity in the Entity Explorer. The Health Score widget for the current entity is included in the Overview tab.
An individual entity's health score widget includes two components: the entity's current health score and a timeline of the health history over the selected time period. The current health is displayed as a single number out of 100, with the background of the current health colored based on whether the health is Good, Moderate, Bad, or NULL. Next to the current health score is a timeline charting the health scores of the entity over the selected time period. The health scores plotted on the timeline represent the minimum health scores calculated during a single slice of time. They do not represent the exact health score at a specific moment in time. If telemetry data was not received for the entity at any point in time, causing the health score to be unavailable, there will be a gap in the timeline.
The Health tab shows detailed information about the entity's health. The Health tab includes the entity's current health score and the timeline of the health history in a Health Score widget and a Health Events table. The Health Events table lists the events, such as anomalies and alerts, that affected the entity's health score during the selected time period. Click an event to open the Event Data panel.
Entity Explorer details view for a group of entities
Click an entity group or an individual entity in the Entity Explorer. Health Score widgets for an entity group or entities related to the current entity in the Entity Explorer details view summarize the entities' health scores using a donut chart, grouping the entities based on Good, Moderate, Bad, or NULL health scores.
Area overview
Click an area overview for a specific monitoring area to view the health score for all entities in that monitoring area. Health Score widgets in area overviews summarize the entities' health scores using a donut chart, grouping the entities based on Good, Moderate, Bad, or NULL health scores.
Health score calculation
The critical metrics and telemetry data that are analyzed for anomalous behavior vary by entity type, as do the impact of entity status, alerts, and anomalies. See each section below for specifics about the entity type's health score calculation and how much negative impact an event has on the health score.
AWS entities' health score
AWS EC2 instance
Event | Impact on health score |
---|---|
Entity status is Pending | 50 pts |
Entity status is Stopping, Stopped, Shutdown, or Terminated | 100 pts |
Anomalies for AWS.EC2.CPUUtilization |
50 pts |
Anomalies for AWS.EC2.StatusCheckFailed |
100 pts |
Warning alerts | 30 pts |
Critical alerts | 70 pts |
Azure entities' health score
Azure VM instance
Event | Impact on health score |
---|---|
Entity status is not Running | 100 pts |
Anomalies for azure.vm.cpu.percentage |
50 pts |
Anomalies for azure.vm.available |
100 pts |
Warning alerts | 30 pts |
Critical alerts | 70 pts |
Database instance health score
Event | Impact on health score |
---|---|
Anomalies for |
25 pts |
Anomalies for dbo.host.queries.errors.tput |
50 pts |
Anomalies for dbo.host.queries.warnings.tput |
25 pts |
Kubernetes cluster health score
Event | Impact on health score |
---|---|
Anomalies for k8s.cluster.nodes.ready.avg |
20 pts |
Warning alerts | 30 pts |
Critical alerts | 70 pts |
Kubernetes node health score
Event | Impact on health score |
---|---|
Entity status is not Ready | 100 pts |
Warning alerts | 30 pts |
Critical alerts | 70 pts |
Network device health score
Event | Impact on health score |
---|---|
Entity status is Down, Unreachable, or Shutdown | 100 pts |
Anomalies for:
|
10 pts |
Warning alerts | 30 pts |
Critical alerts | 70 pts |
OTel integration entities
NGINX health score
Event | Impact on health score |
---|---|
Anomalies for nginx.connections_dropped.rate |
100 pts |
Warning alerts | 30 pts |
Critical alerts | 70 pts |
Apache Web Server health score
Event | Impact on health score |
---|---|
Anomalies for apache.time.perrequest |
100 pts |
Warning alerts | 30 pts |
Critical alerts | 70 pts |
Self-managed host health score
Event | Impact on health score |
---|---|
Entity status is not Running | 100 pts |
Anomalies for system.cpu.utilization.aggregated |
50 pts |
Warning alerts | 30 pts |
Critical alerts | 70 pts |
Self-managed host health score for hosts monitored via a Network Collector
Event | Impact on health score |
---|---|
Entity status is Down or Unreachable | 100 pts |
Anomalies for:
|
10 pts |
Warning alerts | 30 pts |
Critical alerts | 70 pts |
Service health score
Event | Impact on health score |
---|---|
Anomalies for:
|
50 pts |
Warning alerts | 50 pts |
Critical alerts | 100 pts |
Virtualization entities health score
Virtual cluster health score
Event | Impact on health score |
---|---|
Entity status is:
|
100 pts |
Entity status is:
|
75 pts |
Entity status is Warning | 50 pts |
Entity status is Unmanaged | 25 pts |
Anomalies for:
|
20 pts |
Virtual datastore
Event | Impact on health score |
---|---|
Entity status is:
|
100 pts |
Entity status is:
|
75 pts |
Entity status is Warning | 50 pts |
Entity status is Unmanaged | 25 pts |
Virtual host
Event | Impact on health score |
---|---|
Entity status is:
|
100 pts |
Entity status is:
|
75 pts |
Entity status is Warning | 50 pts |
Entity status is Unmanaged | 25 pts |
Anomalies for:
|
20 pts |
Virtual machine
Event | Impact on health score |
---|---|
Entity status is:
|
100 pts |
Entity status is:
|
75 pts |
Entity status is Warning | 50 pts |
Entity status is Unmanaged | 25 pts |
Anomalies for:
|
20 pts |
Website and URI health score
Event | Impact on health score |
---|---|
Anomalies for:
|
20 pts |
Warning alerts | 50 pts |
Critical alerts | 100 pts |
Website out of the box alerts
Some alerts are created immediately when a website is configured. These out-of-the-box alerts provide an ideal starting point for your health score calculation. You can modify this alert to add notifications, but SolarWinds recommends leaving the alert conditions unchanged. If the alert does not exist or you have modified the pre-configured alert, use the following information to restore your alerts.
Critical website availability alert
This standard alert warns you when a website's availability drops below 95%. You can modify this alert to add notifications, but SolarWinds recommends leaving the alert conditions unchanged.
Alert definition:
- Severity: Critical
- Name: Website availability: below 95%
- Alert type: Metric condition
- Alert on: Entity
- Condition:
- Metric:
synthetics.availability
- Trigger when metric is: lower than 0.95
- During last: 1 minutes
- Metric:
Warning website availability alert
This standard alert warns you when a website's availability drops below 98% but is above 95%. You can modify this alert to add notifications, but SolarWinds recommends leaving the alert conditions unchanged.
Alert definition:
- Severity: Warning
- Name: Website availability: below 95%
- Alert type: Metric condition
- Alert on: Entity
- Condition one:
- Metric:
synthetics.availability
- Trigger when metric is: lower than 0.98
- During last: 1 minutes
- Metric:
- Condition two:
- Metric:
synthetics.availability
- Trigger when metric is: higher than 0.95
- During last: 1 minutes
- Metric: