About metrics
What is a metric?
A metric is a numeric value that quantifies a characteristic or an event related to an observed entity. Examples include CPU usage, HTTP response time, temperature, the number of tasks in a queue, or the number of page views. Aggregations of metric values show an entity's behavior over time, which you can use to evaluate performance, spot patterns or trends, and identify potential problems.
Each metric must have a name and a unit. Each metric value (measurement) must have four attributes:
- Reference to the metric name
- Value
- Measurement date and time
- Measurement context
Tags and context
Tags are key-value pairs of data associated with measurements to provide context (for example, environment=prod
or ip=192.168.0.1
). Tags can be used to filter, group, or compare metrics.
Based on certain tags SolarWinds Observability SaaS can identify that a particular metric value was reported in the context of an entity (for example, a service, a host, or an AWS cloud account).
A measurement is not exclusively related to one entity. One metric value can be related to more than one entity. For example, a Request Count metric can be reported by a service that is installed on a host that is processing requests from a website.
The same metric can be reported in multiple contexts. For example, the Total CPU Load can be reported for development, stage, and production environments. Reporting a value for the same metric in multiple contexts is standard practice in monitoring solutions. Reporting a metric in a single, global context is a nonstandard design pattern and should be used only in exceptional cases.
Rollups and data pre-aggregation
To minimize storage and query costs, raw metrics values are aggregated and stored as rollups. A rollup is a single value that summarizes all values collected during a certain time period. The retention period is different for each aggregation interval. The query engine dynamically selects the optimal data source (the raw data table or one of the rollup tables) depending on the metric query parameters (the length of the time period and the expected data granularity).
Metrics data retention period
The following table shows the default metrics data retention for SolarWinds Observability SaaS:
Data granularity | Calculated after | Retention period |
---|---|---|
Raw data | N/A | 8 days |
1 minute rollup | 2-3 days | 30 days |
10 minutes rollup | 3-4 days | 30 days |
1 hour rollup | 4-7 days | 540 days |
Metric categories
Metrics displayed in SolarWinds Observability SaaS belong to one of two categories:
-
Stored metric values are collected from internal or external sources, and their values are stored in SolarWinds Observability SaaS internal storage.
-
Composite metric values are not stored in internal storage. They are calculated when needed (on read) based on the formula defined for each composite metric. Stored metrics are the elements used to calculate composite metric values.
In addition to stored and composite metrics, built-in is a special category of metrics that are predefined and used internally by the system. Built-in metrics can be either stored or composite.
SolarWinds Observability SaaS supports internal definitions of composite metrics as well as user-defined composite metrics.
Custom metrics
SolarWinds Observability SaaS can accept and ingest custom metrics sent using various methods:
-
Integration with Prometheus, StatsD, and OTLP can be used to send custom metrics.
-
A Kubernetes (K8s) integration can collect metrics exposed by services using Prometheus endpoints.
-
The APM instrumentation SDK allows for custom metrics reporting.
-
Users can send custom metrics to a dedicated endpoint.
Prometheus integration
The UAMS Prometheus integration plugin can connect to the Prometheus server and transfer all metrics available on this server to SolarWinds Observability SaaS. See Configure Prometheus integration.
OTLP integration
The UAMS OTLP integration plugin can connect to the OTeL receiver using the gRPC or HTTP protocol and transfer all metrics available in this receiver to SolarWinds Observability SaaS. See Configure OpenTelemetry Protocol (OTLP) integration.
StatsD integration
The StatsD integration plugin can connect to the StatsD server and transfer all metrics available in this server to SolarWinds Observability SaaS. See Configure StatsD integration.
K8s integration
The K8s collector primarily collects metrics produced by the Kubernetes services managing a cluster, such as the CPU and memory usage of pods, the creation time of namespaces, and the statuses of jobs. SolarWinds Observability SaaS collects more of these metrics than it displays in the UI, but all collected metrics are tightly related to the SolarWinds Observability SaaS K8s entities.
Additionally, the K8s collector supports autodiscovery of Prometheus metrics. When services (applications) running in a K8s cluster expose their metrics in a Prometheus-compatible format (for example, an HTTP endpoint), the K8s collector detects them, collects them, annotates them with additional information to link them to related SolarWinds Observability SaaS entities, and sends them to SolarWinds Observability SaaS. This behavior is enabled by default, but it is configurable.
The K8s collector can also work as a proxy for OTLP and Telegraph metrics, but must be configured. The main benefit of doing this instead of sending the OTLP metrics directly to SolarWinds Observability SaaS is their automatic annotation with information about the K8s cluster, which will link them to the cluster in the SolarWinds Observability SaaS UI.
OTLP endpoint
The main method for sending data to SolarWinds Observability SaaS is the use of an OTLP endpoint. For detailed information about the endpoint, see OTel direct ingestion.
For an example of sending custom metrics via the OTel SDK/exporter to this endpoint, see Use Micrometer to send custom metrics to SolarWinds Observability SaaS . The SolarWinds Observability SaaS APM libraries will also introduce support for OTLP export of metrics in upcoming releases.
APM instrumentation SDK
Custom metrics can also be created using an APM Instrumentation SDK that continues support for the legacy AppOptics instrumentation API "Increment Metric" and "Summary Metric". Most APM libraries will introduce support for OTLP export of metrics created via standard OTel API in upcoming releases. The legacy "Increment Metric" and "Summary Metric" API for custom metrics is supported by:
- Ruby: Ruby Library instrumentation SDK
- .NET: .NET Library instrumentation SDK
- PHP: PHP Library instrumentation SDK
Metrics created by the APM Instrumentation SDK in AppOptics will automatically be migrated to SolarWinds Observability SaaS after you redirect monitoring traffic from AppOptics to SolarWinds Observability SaaS.
Important guidelines
Avoid global context
When custom metrics are produced by end-user code, you must avoid generating metrics that are reported only in a single global context.
Example 1
Instead of the following metrics:
Number_of_transactions_executed_by_service_A_on_host_1
Number_of_transactions_executed_by_service_A_on_host_2
Number_of_transactions_executed_by_service_B_on_host_1
Number_of_transactions_executed_by_service_B_on_host_2
Use a common metric:
-
Number_of_transactions
Each reported value of this metric should be marked with different tag sets:
service = service_A
,host = host_1
service = service_A
,host = host_2
service = service_B
,host = host_1
service = service_B
,host = host_2
Example 2
Instead of the following metrics:
Duration_of_ABCD_Build_for_Service_A
Duration_of_EFGH_Build_for_Service_A
Duration_of_IJKL_Build_for_Service_B
Duration_of_MNOP_Build_for_Service_B
Use a common metric:
-
Build_Duration
Each reported value of this metric should be marked with different tag sets:
build = ABCD
,service = Service_A
build = EFGH
,service = Service_A
build = IJKL
,service = Service_B
build = MNOP
,service = Service_B
Duplicates of metric values
When the following characteristics are exactly the same for two or more custom metric values, they are treated as a duplicated representation of more and more precise values of the same measurement:
- Reference to the metric name
- Measurement date and time
- Measurement context (all key-value pairs)
As a consequence, SolarWinds Observability SaaS will overwrite those values. In most cases the last reported value should be the last in the system. However, because of implementation details, exceptions to this rule can occur.
It is possible to avoid overwriting metric values reported for the same context at the same time. To do this, you must add an additional tag (key-value pair) to the reported metric values. This tag is called nonce
and its value should be unique.