AlertStack
AlertStack correlates alerts, events and other problems on monitored entities, visualizes them to identify the root cause of issues and resolve them.
Correlate
SolarWinds Platform (self-hosted) continually monitors your alerts and correlates other problems that happened at the same time on related devices, pulling them together into a single alert cluster. Metrics, network configuration changes, server configuration changes, and device statuses that are considered unusual are included in AlertStack.
Visualize
AlertStack clusters relate alerts and events into a single view, providing a chronological list of the events and impacted entities, live tracking of all related events and alerts, and maps of entity relationships.
Identify
You can view historical data and the timeline of the alert clusters to identify a possible root cause of the alert. AlertStack also helps you optimize your alert setup for future incidents.
Resolve
AlertStack allows you to drill down on related entities and critical issues to deal with them efficiently. Every polling interval, AlertStack checks related entities for new alerts and events and dynamically updates active alert clusters.
Enable AlertStack
AlertStack is disabled by default. If you want to use it, enable AlertStack.
The AlertStack feature may be resource-intensive, increasing with the amount of active alerts. If you experience performance degradation across the SolarWinds Platform, consider increasing the amount of CPU and memory available for the SolarWinds Platform server.
-
In the SolarWinds Platform Web Console, click All Settings > Product Specific Settings > AlertStack Settings.
-
Select Enable and click Save.
View alert clusters
To access the AlertStack summary page, log in to the SolarWinds Platform Web Console as an administrator, and click Alerts & Activity > AlertStack.
- Details about the alert cluster.
- Cluster intervals: Severity of the cluster for the selected time frame. The duration is dependent upon the selected time interval (5).
- Sort: allows you to sort the clusters by Start Date, End Date, Cluster Id, Severity, State, Name, Alert count, and Entity count. Default is descending Start Date.
- Search: Search by name. To search by Cluster ID, precede the number by "cluster-"
- Time Frame Selector: Allows you to select standard time frames (Last hour, Last 12 hours, Last 24 hours, Last 5 days, Last 7 days) or a user-defined interval. Default is Last 24 hours.
- Next/Previous Time Frame: Move back in time by the currently selected time frame. The corresponding control on the right is displayed when in the past and allows for moving forward in time.
- Page Size: Changes the number of clusters per page. Default is 10.
- Next/Previous Cluster Page: Paging controls for the list of clusters.
- What is AlertStack: Introduction to AlertStack.
AlertStack Summary widget
You can also review the AlertStack Summary widget on the Summary Home page (My Dashboards > Home > Summary).
-
Severity and name of the cluster.
-
Unique ID for the cluster.
-
Cluster starting and ending time or "now" if ongoing.
-
Cluster State:
- Closed: The cluster is no longer active because it was closed by a user.
- Auto-closed: The cluster was closed by the system because the cluster life cycle ended.
- Open: The cluster is currently active.
- Suspended: Maps recording will occur once an hour since no changes have occurred after the time specified in the settings.
- Resolved: The cluster is no longer active because the end conditions have been met.
-
Duration of the cluster.
-
The total number of entities and alerts contained in the cluster.
Alert cluster details
The details for an alert cluster are displayed when you click on the description of the alert cluster in the AlertStack Summary Widget or AlertStack Summary page.
-
Cluster Elements: list of Entity Status Changes, making up the alert cluster.
-
Cluster Severity History: Severity of the cluster over time with each interval representing the default interval duration of 10 minutes.
-
Cluster Element Info: Information about the cluster element including name, an associated entity, status, and severity.
Alert
Metric Threshold
Entity Status
Event
-
Cluster Start: The start of the cluster. The gray region to the left represents the history of the elements for 30 minutes prior to the start. Any elements in a warning or critical state in this region contributed to the creation of the cluster.
-
Cluster End: The end of the cluster. If there is a gray region to the right, represents the history of the elements for 30 minutes following the end of the cluster to provide additional context for the end of the cluster.
-
Selected Interval: The currently selected interval within the cluster.
-
Cluster Map: A map of the elements in the currently selected interval for the cluster.
-
Entity Occurrences: The map defaults to displaying only the entities and their relationships. To see other elements (occurrences) related to an entity, click the purple dot (2) to show related elements. Click on the expanded purple dot (1) to collapse.
-
Map Controls: Controls to zoom in, zoom out, and center the map. To pan, hold down the SPACE BAR and drag with the mouse.
-
Panel Splitter Controls: The panel splitter controls allow the elements panel and the maps panel to be resized. The view map only button will adjust the panels to only show the cluster map. The view list only button will adjust the panels to only show the cluster elements list. The view list and map button will adjust the panels so that both the map and element lists are shown equally. To manually adjust the size, hover over the line between the panels then click and drag.
-
Previous/Next Time Range: Navigates back in time by the total number of intervals currently being displayed. The corresponding control on the right moves forwards in time if present.
-
Previous/Next Time Interval: Navigates back within the cluster by one interval. Updates the map to display the entities present during that interval. The corresponding button on the right moves one interval forward within the cluster.
Limit the number of entities shown on AlertStack
You can apply filters on both the AlertStack and on AlertStack cluster details page.
Click the Arrow in the top left corner of AlertStack and select the Alert Count, Entity Count, Incident Number, Severity, or State values you want to see on AlertStack.
The maximum supported entity count is 100.
Create SolarWinds Service Desk incidents from AlertStack
Starting with 2023.4, you can create SolarWinds Service Desk incidents for open or suspended clusters directly from AlertStack.
This requires that you have SolarWinds Service Desk set up. SeeAlert Integrations in the SolarWinds Platform.
-
In AlertStack (Alerts & Activity > AlertStack), select one or more open or suspended clusters and click Create Incident(s).
For each selected cluster, an incident will be created in Service Desk. You will see the incident number in the bottom right corner of the incident field.
When the incident is being created, you will see the Pending Incident status in the bottom right corner will change into the incident number.
Configure AlertStack
Log in to the SolarWinds Platform Web Console as an administrator, and access the Advanced Configuration settings page. For more information, see Access the Advanced Configuration settings in the Orion Web Console.
Scroll to the AIIM.Settings section and update the relevant parameters. Changes on the Advanced Configuration page are global.
The following parameters are relevant to AlertStack:
Setting | Description | Default |
---|---|---|
Closed Issue Lifetime Days | Period of time (in days) before a closed AlertStack is removed from the database. | 7 days |
Issue Look Back Period Minutes |
The number of minutes to look back to try to determine if this alert should generate an alert cluster. | 30 minutes |
Issue Minimum Entities | The minimum number of entities to create an alert cluster. | 2 |
Issue Minimum Occurrences | The minimum number of occurrences to create an alert cluster. | 2 |
Issue Normal Status List | List of normal ChildStatusMap's (Unknown, Up, Down, Warning) to ignore when creating or updating an alert cluster, separated by a semicolon. | Unknown;Up |
Issue Resolution Criteria | List of Occurrence Type(EntityStatus, Alert, Event, Metric, Anomaly) that must be resolved to resolve the alert cluster, separated by a semicolon. | Alert |
Max Issue Duration Days | The period of time (in days) before the living AlertStack is auto-closed. | 7 days |
When is an alert stack resolved?
An alert cluster is considered resolved when the Issue Resolution Criteria defined in the settings have been met for all entities, when manually closed by the user, or when the alert cluster lifetime is over.
Alert cluster lifetime
Alert clusters are automatically closed after 7 days.
Closed and Resolved clusters are automatically deleted after 7 days.
Cluster lifetime settings contribute to a smooth database maintenance. Alert clusters might consume larger amounts of space in the database.
Issue resolution criteria for alert clusters
The criteria define which occurrence types must no longer be present to consider the alert cluster resolved. The table below defines the logic for each of the criteria.
Issue Resolution Criteria | True when: |
---|---|
Entity Status | The issue status matches one of the statuses defined in the Issue Normal Status List setting. |
Alert | There are no active alerts. |
Event | There are no active events. |
Metric | There are no metrics that are currently over the defined thresholds. |
Anomaly | There are no metrics containing active anomalies. |