An important part of any log management system is alerting. Rather than constantly searching your log data for specific events, why not let Loggly monitor for you? Loggly’s paid plans (Standard & Pro) include the alerting functionality. We’ll run your saved searches on a predetermined schedule and send emails or messages to your chosen endpoint when search results fall within a set band.
There are three main components to an alert:
- Alert threshold: How often should the condition be monitored and what number of events (or lack of events) should trigger an alert?
- Search: Describe the terms & conditions.
- Endpoint: We can email a user or trigger a third-party endpoint. We support generic POST/GET endpoints & offer an integration with Alert Birds.
Read on for details on both adding alerts and configuring alert endpoints
Alerting Scenario Examples
- Alert if I have less than 10 sign-ups per hour. My saved search might look like:
- Alert if my response time is greater than 3 seconds more than 5 times per minute. My saved search might look like:
json.response_time:[3 TO *]
- Alert if there are more than 10 errors in a 30 minute period. My saved search might look like:
apache.status:500 OR json.exception:error
There are three ways to set up alerts in Loggly:
- By clicking on the icon at the right-hand side of the search interface.
From the Saved Search creation dialog box.
By selecting Add New on the Alerts page.
Regardless of how you start setting up your alert, you’ll be prompted to fill in the following information:
- Name: Choose a name for your alert. The name will be returned with any alert that’s triggered.
- Description: Add a short description so that you remember why you wanted it set up.
- Search: You can choose a Saved Search to use. If you initiated your alert setup from the Saved Search creation dialog, that Saved Search will display. If you initiated your alert setup using the bell icon, you’ll see "custom search context" and the details of the current search you were performing. Any time range that was part of your saved search will be ignored, only the terms of a saved search are used for alerting.
- Alert if: Here is where you’ll create the criteria to trigger an alert. Set the threshold number of search results that trigger an alert within a given timeframe. For example, you can set an alert to trigger when the search results show more than 10 results over any 5 minute span (based on timestamp).
- Then: In this section you establish how you’d like to receive notification. Choose to send an email or hit a 3rd party endpoint. Please see Alert Endpoints for a discussion on setting up your own endpoints. Only registered users can receive email notifications.
- Check for this condition every: Set how often we run your saved search and scan for the number of results that match your alert criteria. If you choose to check for the condition every minute & the condition exists for 30 minutes, 30 notifications will be sent.
- Add Multiple Endpoints: You can set multiple endpoints to receive the notification from a alert. For example you can set an alert to notify your Microsoft Teams channel and Slack room without the need to create another alert.
- Include up to 10 recent events: To receive up to 10 recent events as part of the alert, enable this option, as shown below.
- Check for this condition every: Set how often we run your saved search and check if the condition matches your alert criteria. Please note, if you choose to check the condition every minute and the condition exists for 30 minutes, 30 notifications will be sent.
- Delay alerts during indexing delays. This may reduce false positives: This will help you to avoid false positives, because if your account is having an indexing delay, the alert might trigger (if this setting is not enabled) due to absence of expected events caused by the indexing delay. This setting if enabled, will not send you an alert until the indexing delay is recovered and back to normal.
- Enable this alert: If you don’t want the alert to be enabled at this time, you could uncheck the box. The alert will be saved but not active until you enable it from the alerts page.
With this alert feature, you can specify the threshold in relative terms using standard deviations. This statistical operator is a measure that quantifies the amount of variation in a set of data values. While a low standard deviation indicates that most of the values in the data set are close to the average, a high standard deviation indicates that they are distributed over a wide range of values. Here’s a good example:
If you’re tracking 404 errors, this setup would alert when the count of 404 errors in the last 15 minutes is above two standard deviations from the average for the last six hours. 404 errors are a way of life, so setting an absolute threshold often doesn’t make sense. But you would certainly want to investigate a sudden 404 spike. In this case, you can specify whether you would like to be alerted on one, two, or three standard deviations from the mean. Mathematically speaking, what this implies is that you can choose to be alerted when some values exceed 68%, 95%, and 99% from the mean, respectively.
Let’s say that a customer complained about your app’s performance, and log data shows that this customer indeed experienced a page load time of almost 15 seconds. The question now is: Did this customer encounter a severe problem that many other customers (who just haven’t complained yet) ran into as well, or was it just an outlier?
For cases like this, percentiles allow you to quickly answer that exact question. You can simply look at all the load times that have been recorded in your log data.
As an example, if a load time of 15 seconds is in the 95th percentile, this basically means that 95% of users received page load times better than 15 seconds. The long load time can easily be identified as an outlier and is not part of a larger group. Percentiles make this analysis easy even when you’re dealing with large amounts of log events.
The Percent Difference alert allows you to get notified if a value changes by a given percentage compared to a specific time period in the past. For example, if you’re interested in monitoring an abnormal amount of errors, you can set up a Percent Difference alert similar to the following:
Alert when the count of 403 errors in a rolling window of 30 minutes jumps by 20% as compared to the count of errors during the last day.
** Loggly compares the difference of percentages within the scope of all events in a given search–NOT absolute numbers.
Anomaly Significance Score:
Anomaly Detection highlights anomalies in your logs that come from major changes in the frequency of certain log events. For example, you can immediately see if you have a big increase in errors after a new code deployment, if you have a jump in Amazon EC2 configurations, or if you have an unusual number of user login failures that could signal an attacker looking for vulnerabilities.
However, it’s not uncommon for things that you never thought about to be the ones that cause the really tricky problems. Anomaly detection alerts are a way to find out about things that you haven’t anticipated. You can tell Loggly to notify you of anything that deviates from normal levels in the log fields you want to monitor.
Example: the ‘syslog.severity’ or ‘json.level’ fields in your log messages contain a lot of information. You might have a baseline level of ERROR and CRITICAL messages that don’t really signal trouble, but what if these values creep up? Let’s say that your normal ratio is 90% INFO and 10% ERROR and CRITICAL messages. Loggly anomaly detection alerts will notify you if any of these values deviate from "normal" beyond a defined threshold. You can then go directly to the relevant events in Loggly to investigate.
Loggly analyzes countless field values in parallel as it ingests your logs, determines the normal value ranges in your logs, and brings the ones with the biggest changes to your attention in near real-time. You will see any significant deviations, even the ones you had never thought about. And you can take action before they turn into problems.
When the anomaly significance score is higher, it means there’s a big anomaly. This takes into account the change from the background time range, as well as the size of the value relative to others. The score doesn’t really have any meaning in an absolute sense, only relative to the other field values.
You may want to suppress alerts when there is a planned outage such as a maintenance window or system upgrade. This will prevent alerts from being sent and disrupting your support team. Additionally, you may want to suppress alerts after you have acknowledged the problem in order to avoid duplicate alerts. Alert suppression in these cases can help avoid being inundated with information we are already aware of and help us focus on information that will help resolve issues at hand.
Once alerts are configured and you are receiving them, you can start to set alert suppression parameters.
- On the Alerts page, identify the alert you want to suppress. If this alert is not currently suppressed, the ‘None’ values will be displayed here. To set suppress parameters, select the ‘None’ link. If an alert is not active it will have an N/A to indicate that you cannot suppress it.
- A window will pop up with the Alert name (Syslog) and in the ‘Suppress Alert for’ field you can specify a number value for the number of minutes or hours you want to suppress this alert for. If you change the suppression time for an alert that is already suppressed, it will replace the previous value.
When the APM Integrated Experience is enabled, Loggly shares a common navigation and settings with the other integrated experiences' products. How you navigate Loggly and access its features may vary from these instructions. For more information, go to the APM Integrated Experience documentation.