Examples of common alert definitions
The following sections provide instructions for creating commonly used alert definitions. These are provided as examples, and you can modify them to meet your needs.
Alerts that are created automatically when you configure a website are described under Website out-of-the-box alerts.
Alert me when a SolarWinds Observability SaaS entity stops reporting
To be alerted when an entity stops reporting (for example, it is disconnected or down), select any metric value that the entity returns each time it is polled. Then create an alert that is triggered when the count for that metric is 0.
-
Open the Active Alerts page (Alerts > Active Alerts) or the Alert Settings page (Alerts > Alert Settings).
-
In the upper-right corner, click Create Alert.
The Create Alert wizard opens.
-
On the Details page, specify the name and severity. Optionally, enter a description and a runbook URL. Then click Next.
-
Under Condition type, click Metric condition.
-
Under Alert on, select Entity.
-
Under Select a scope, select the type of entity. Then specify which entities you want to alert on. For more information, see Create an alert definition.
-
Under Condition 1, define a condition that triggers the alert when the entity has stopped reporting:
-
Under Metric, select a metric whose value is returned regularly when the metric is being monitored. For example, for a Host entity, select
system.cpu.utilization.aggregated
.Enter part of the metric name to filter the list. For metric descriptions, see Metrics for SolarWinds Observability SaaS entities.
-
Under Trigger when metric is, select lower than. Then enter
1
. -
Under During last, enter
1
and select hours. As the aggregation method, select count.
-
-
Click Next to open the Notifications tab, and define one or more notifications to be sent when this alert is triggered. For more information, see Create an alert definition.
-
On the Summary page, review the alert definition, and then click Create.
Alert me when disk utilization on a host is 90% or higher
This alert is triggered when the average file system utilization is 90% or higher.
-
Open the Active Alerts page (Alerts > Active Alerts) or the Alert Settings page (Alerts > Alert Settings).
-
In the upper-right corner, click Create Alert.
The Create Alert wizard opens.
-
On the Details page, specify the name and severity. Optionally, enter a description and a runbook URL. Then click Next.
-
Under Condition type, click Metric condition.
-
Under Alert on, select Entity.
-
Under Select a scope, select Host as the type of entity. Then specify which hosts you want to alert on. For more information, see Create an alert definition.
-
Under Condition 1, define a condition that triggers the alert when disk utilization is equal to or above 90%:
-
Under Metric, select
system.filesystem.utilization
.Enter part of the metric name to filter the list. For metric descriptions, see Metrics for SolarWinds Observability SaaS entities.
-
Under Trigger when metric is, select higher than or equal to. Then enter
0.9
. Note thatsystem.filesystem.utilization
uses decimal values instead of a percentage. -
Under During last, enter
1
and select hours. As the aggregation method, select average.
-
-
Click Next to open the Notifications tab, and define one or more notifications to be sent when this alert is triggered. For more information, see Create an alert definition.
-
On the Summary page, review the alert definition, and then click Create.
Alert me when either memory utilization or CPU utilization on a host is 90% or higher
This alert is triggered when the average memory utilization or the average CPU utilization is 90% or higher.
-
Open the Active Alerts page (Alerts > Active Alerts) or the Alert Settings page (Alerts > Alert Settings).
-
In the upper-right corner, click Create Alert.
The Create Alert wizard opens.
-
On the Details page, specify the name and severity. Optionally, enter a description and a runbook URL. Then click Next.
-
Under Condition type, click Metric condition.
-
Under Alert on, select Entity.
-
Under Select a scope, select Host as the type of entity. Then specify which hosts you want to alert on. For more information, see Create an alert definition.
-
Under Condition 1, define a condition that triggers the alert when memory utilization is equal to or above 90%:
-
Under Metric, select
system.memory.utilization
.Enter part of the metric name to filter the list. For metric descriptions, see Metrics for SolarWinds Observability SaaS entities.
-
Under Trigger when metric is, select higher than or equal to. Then enter
90
. Note thatsystem.memory.utilization
uses decimal values instead of a percentage -
Under During last, enter
1
and select hours. As the aggregation method, select average.
-
-
Click Add New Condition, and then click At least one condition is true (OR).
-
Under Condition 2, define a condition that triggers the alert when CPU utilization is equal to or above 90%:
-
Under Metric, select
system.cpu.utilization
.Enter part of the metric name to filter the list. For metric descriptions, see Metrics for SolarWinds Observability SaaS entities.
-
Under Trigger when metric is, select higher than or equal to. Then enter
0.9
. Note thatsystem.cpu.utilization
uses decimal values instead of a percentage -
Under During last, enter
1
and select hours. As the aggregation method, select average.
-
-
Click Next to open the Notifications tab, and define one or more notifications to be sent when this alert is triggered. For more information, see Create an alert definition.
-
On the Summary page, review the alert definition, and then click Create.
Alert when a host process is down, has high CPU utilization, or has high memory utilization
You can create quick alerts from the Process tab of a host's detail view. You can get alerts when the process is down or when the CPU or memory utilization is high. The alert automatically specifies the host and metric.
-
In the left pane, click Explore.
-
Locate the host you want to alert on, and click the host name to open the details view.
-
Click the Processes tab.
-
Locate the process you want to alert on.
-
Hover over the table row, and click the vertical ellipsis (
) in the far-right column. Then click one of the following:
- Create alert on Process Down
- Create alert on CPU Utilization
- Create alert on Memory Usage
The Create Quick Alert dialog opens with default values for the Name and Severity.
-
(Optional) Add a description and runbook URL, and change the default values on the Details page. Then click Next.
-
Specify the trigger condition:
-
For a process down alert, under Trigger when metric is, select equal to. Then enter
0
. -
For a CPU or memory utilization alert, enter the threshold (for example,
0.9
).
-
-
Click Next to open the Notifications tab, and define one or more notifications to be sent when this alert is triggered. For more information, see Create an alert definition.
-
On the Summary page, review the alert definition, and then click Create.
Alert me when a network interface's utilization peaks at 100%
This alert is triggered when a network interface's utilization reaches 100% during a 5 minute period.
-
Open the Active Alerts page (Alerts > Active Alerts) or the Alert Settings page (Alerts > Alert Settings).
-
In the upper-right corner, click Create Alert.
The Create Alert wizard opens.
-
On the Details page, specify the name and severity. Optionally, enter a description and a runbook URL. Then click Next.
-
Under Condition type, click Metric condition.
-
Under Alert on, select Entity.
-
Under Select a scope, select Network Interface as the type of entity. Then specify which network interfaces you want to alert on. For more information, see Create an alert definition.
-
Under Condition 1, define a condition that triggers the alert when transmit utilization peaks at 100%:
-
Under Metric, select
Orion.NPM.InterfaceTraffic.InPercentUtil
.Enter part of the metric name to filter the list. For metric descriptions, see Metrics for SolarWinds Observability SaaS entities.
-
Under Trigger when metric is, select equal to. Then enter
100
. -
Under During last, enter
5
and select minutes. As the aggregation method, select maximum.
-
-
Click Next to open the Notifications tab, and define one or more notifications to be sent when this alert is triggered. For more information, see Create an alert definition.
-
On the Summary page, review the alert definition, and then click Create.
Alert me when a network device reports an anomalous metric value
This alert is triggered when a network device reports a value for CPU utilization, memory utilization, response time, or packet loss that is significantly higher than usual.
-
Open the Active Alerts page (Alerts > Active Alerts) or the Alert Settings page (Alerts > Alert Settings).
-
In the upper-right corner, click Create Alert.
The Create Alert wizard opens.
-
On the Details page, specify the name and severity. Optionally, enter a description and a runbook URL. Then click Next.
-
Under Condition type, click Event condition.
-
Under Select a scope, select Network Interface as the type of entity. Then specify which network interfaces you want to alert on. For more information, see Create an alert definition.
-
Under Event type, select Anomaly.
-
Under Metric, select one of the following:
- For CPU utilization:
Orion.CPULoad.AvgLoad
- For memory utilization:
Orion.CPULoad.AvgPercentMemoryUsed
- For response time:
Orion.ResponseTime.AvgResponseTimed
- For packet loss:
Orion.ResponseTime.PercentLoss
- For CPU utilization:
-
Click Next to open the Notifications tab, and define one or more notifications to be sent when this alert is triggered. For more information, see Create an alert definition.
-
On the Summary page, review the alert definition, and then click Create.
Alert me when a network device is not responsive
This alert is triggered when a network device's average response time is higher than 80% for five minutes.
-
Open the Active Alerts page (Alerts > Active Alerts) or the Alert Settings page (Alerts > Alert Settings).
-
In the upper-right corner, click Create Alert.
The Create Alert wizard opens.
-
On the Details page, specify the name and severity. Optionally, enter a description and a runbook URL. Then click Next.
-
Under Condition type, click Metric condition.
-
Under Alert on, select Entity.
-
Under Select a scope, select Network Device as the type of entity. Then specify which network devices you want to alert on. For more information, see Create an alert definition.
-
Under Condition 1, define a condition that triggers the alert when a device is not responsive:
-
Under Metric, select
Orion.ResponseTime.PercentLoss
.Enter part of the metric name to filter the list. For metric descriptions, see Metrics for SolarWinds Observability SaaS entities.
-
Under Trigger when metric is, select higher than. Then enter
80
. -
Under During last, enter
5
and select minutes. As the aggregation method, select average.
-
-
Click Next to open the Notifications tab, and define one or more notifications to be sent when this alert is triggered. For more information, see Create an alert definition.
-
On the Summary page, review the alert definition, and then click Create.
Alert me when a network device's health state is bad
This alert is triggered when a network device's health state is Bad for 10 minutes.
-
Open the Active Alerts page (Alerts > Active Alerts) or the Alert Settings page (Alerts > Alert Settings).
-
In the upper-right corner, click Create Alert.
The Create Alert wizard opens.
-
On the Details page, specify the name and severity. Optionally, enter a description and a runbook URL. Then click Next.
-
Under Condition type, click Metric condition.
-
Under Alert on, select Entity.
-
Under Select a scope, select Network Device as the type of entity. Then specify which network devices you want to alert on. For more information, see Create an alert definition.
-
Under Condition 1, define a condition that triggers the alert when a device is not responsive:
-
Under Metric, select
sw.metrics.healthscore
.Enter part of the metric name to filter the list. For metric descriptions, see Metrics for SolarWinds Observability SaaS entities.
-
Under Trigger when metric is, select lower tha
90
n. Then enter . -
Under During last, enter
10
and select minutes. As the aggregation method, select average.
-
-
Click Next to open the Notifications tab, and define one or more notifications to be sent when this alert is triggered. For more information, see Create an alert definition.
-
On the Summary page, review the alert definition, and then click Create.
-
Changes to the state can affect the entity's overall health state.
-
Anomalies: Entity performance is based on a combination of key metrics that help to determine anomalous patterns in your entity. These key metrics and anomalies vary by entity type.
-
Alerts: You can set alerts for each entity type. Some out-of-the-box alerts are created when an entity is added for monitoring.
When an alert is triggered, it can affect how the alert data for an entity is assessed. The alert severity determines the effect. For most entity types that use the system defaults, any triggered Critical alerts cause alert data to be classified as bad. Warning alerts cause alert data to be classified as moderate. Info alerts have no impact.
Use the Health settings page to determine how alert severity affects the health state for an entity type and to customize it if needed.
Each of these telemetry data types (entity state, anomalies, and alerts) has a value of either good, moderate, or bad. The data type with the worst value determines the entity's overall health state. This means that:
-
If all types have a value good, the entity's health state is Good.
-
If at least one type is moderate and all others are good, the entity's health state is Moderate.
For example, if alerts are moderate but the entity state and anomalies are good, the entity's health state is Moderate.
-
If any type is bad, the entity's health state is Bad.
For example, if anomalies are bad but the entity state and alerts are good, the entity's health state is Bad.
View entity health state
Since health is an important indicator for whether an entity needs immediate attention, it can be found throughout SolarWinds Observability SaaS. The following is a list of some of the places that include the health state of an entity.
Entity lists in the Entity Explorer and area overviews
In the Entity Explorer and some area overviews, entities are listed in either a grid or list view.
In Grid View, the color of the hexagon indicates whether the entity's health state is Good, Moderate, Bad, or Unknown. The number inside the hexagon indicates how many entities in the specific state there are.
In List View, a table lists each entity. The first column of the table displays the health state, with a colored icon and text indicating whether the entity's health is Good, Moderate, Bad, or Unknown.
Entity Explorer details view for an individual entity
Click an individual entity in the Entity Explorer. The Health widget for the current entity is included on the Overview tab or the Health tab.
An individual
The Health tab shows detailed information about the entity's health. The Health tab includes the entity's current health and the timeline of the health history in a Health widget and a Health Events table. The Health Events table lists the events, such as anomalies and alerts, that affected the entity's health during the specified time period. Click an event to open the Event Data panel.
Entity Explorer details view for a group of entities
Click an entity group or an individual entity in the Entity Explorer. Health widgets for an entity group or entities related to the current entity in the Entity Explorer details view summarize the entities' health states using a donut chart, grouping the entities based on Good, Moderate, Bad, or Unknown health states.
Area overview
Click an area overview for a specific monitoring area to view the health states for all entities in that monitoring area. Health widgets in area overviews summarize the entities' health states using a donut chart, grouping the entities based on Good, Moderate, Bad, or Unknown health states.
Customize how conditions affect the entity health state
You can determine how affect how the health state is calculated for each entity type. Only system administrators or owners can edit the health configuration.
-
In the left pane, click Settings. Then under My Settings, click Health.
The Health settings page lists all entity types. It indicates whether each entity type uses a custom configuration or the default configuration, when the configuration was updated, and who updated it.
-
Click the entity type whose health configuration you want to change.
A sidebar opens, showing the current health configuration for that entity.
-
To make changes, click Edit Configuration.
The Events and Health Impact panel lists the events, alerts, statuses, and anomalies that could affect the health state of entities of the selected type.
-
To change the affect that an item has on entities of the selected data type, select an option from the drop-down menu: Bad, Moderate, or No Impact.
When No Impact is selected, the drop-down menu is disabled. To change a selection from No Impact to Bad or Moderate, first select the checkbox on the right and then select a menu option.
Example: By default, alerts with a severity of Info do not affect the alert health value. If you want the alert health value to change to Moderate when an Info alert is triggered, change Info to Moderate on the Events and Health panel.
-
Click Save.
Website out-of-the-box alerts
Some alerts are created immediately when a website is configured. These out-of-the-box alerts provide an ideal starting point to monitor for conditions that could affect the entity health state. You can modify these alerts to add notifications, but SolarWinds recommends leaving the alert conditions unchanged. If an alert does not exist or you have modified the pre-configured alert, use the following information to restore your alerts.
Critical website availability alert
This standard alert warns you when a website's availability drops below 95%. You can modify this alert to add notifications, but SolarWinds recommends leaving the alert conditions unchanged.
Alert definition:
- Severity: Critical
- Name: Website availability: below 95%
- Alert type: Metric condition
- Alert on: Entity
- Condition:
- Metric:
synthetics.availability
- Trigger when metric is: lower than 0.95
- During last: 1 minutes
- Metric:
Warning website availability alert
This standard alert warns you when a website's availability drops below 98% but is above 95%. You can modify this alert to add notifications, but SolarWinds recommends leaving the alert conditions unchanged.
Alert definition:
- Severity: Warning
- Name: Website availability: below 95%
- Alert type: Metric condition
- Alert on: Entity
- Condition one:
- Metric:
synthetics.availability
- Trigger when metric is: lower than 0.98
- During last: 1 minutes
- Metric:
- Condition two:
- Metric:
synthetics.availability
- Trigger when metric is: higher than 0.95
- During last: 1 minutes
- Metric: