Documentation forSolarWinds Observability Saas

About metrics

What is a metric?

A metric is a numeric value that quantifies a characteristic or an event related to an observed entity. Examples include CPU usage, HTTP response time, temperature, the number of tasks in a queue, or the number of page views. Aggregations of metric values show an entity's behavior over time, which you can use to evaluate performance, spot patterns or trends, and identify potential problems.

Each metric must have a name and a unit. Each metric value (measurement) must have four attributes:

  • Reference to the metric name
  • Value
  • Measurement date and time
  • Measurement context

Tags and context

Tags are key-value pairs of data associated with measurements to provide context (for example, environment=prod or ip=192.168.0.1). Tags can be used to filter, group, or compare metrics.

Based on certain tags SolarWinds Observability can identify that a particular metric value was reported in the context of an entity (for example, a service, a host, or an AWS cloud account).

A measurement is not exclusively related to one entity. One metric value can be related to more than one entity. For example, a Request Count metric can be reported by a service that is installed on a host that is processing requests from a website.

The same metric can be reported in multiple contexts. For example, the Total CPU Load can be reported for development, stage, and production environments. Reporting a value for the same metric in multiple contexts is standard practice in monitoring solutions. Reporting a metric in a single, global context is a nonstandard design pattern and should be used only in exceptional cases.

Rollups and data pre-aggregation

To minimize storage and query costs, raw metrics values are aggregated and stored as rollups. A rollup is a single value that summarizes all values collected during a certain time period. The retention period is different for each aggregation interval. The query engine dynamically selects the optimal data source (the raw data table or one of the rollup tables) depending on the metric query parameters (the length of the time period and the expected data granularity).

Metrics data retention period

The following table shows the default metrics data retention for SolarWinds Observability:

Data granularity Calculated after Retention period
Raw data N/A 8 days
1 minute rollup 2-3 days 30 days
10 minutes rollup 3-4 days 30 days
1 hour rollup 4-7 days 540 days

Metric categories

Metrics displayed in SolarWinds Observability belong to one of two categories:

  • Stored metric values are collected from internal or external sources, and their values are stored in SolarWinds Observability internal storage.

  • Composite metric values are not stored in internal storage. They are calculated when needed (on read) based on the formula defined for each composite metric. Stored metrics are the elements used to calculate composite metric values.

In addition to stored and composite metrics, built-in is a special category of metrics that are predefined and used internally by the system. Built-in metrics can be either stored or composite.

SolarWinds Observability supports internal definitions of composite metrics as well as user-defined composite metrics.

Custom metrics

SolarWinds Observability can accept and ingest custom metrics sent using various methods:

  • Integration with Prometheus, StatsD, and OTLP can be used to send custom metrics.

  • A Kubernetes (K8s) integration can collect metrics exposed by services using Prometheus endpoints.

  • The APM instrumentation SDK allows for custom metrics reporting.

  • Users can send custom metrics to a dedicated endpoint.

Prometheus integration

The UAMS Prometheus integration plugin can connect to the Prometheus server and transfer all metrics available on this server to SolarWinds Observability. See Configure Prometheus integration.

OTLP integration

The UAMS OTLP integration plugin can connect to the OTeL receiver using the gRPC or HTTP protocol and transfer all metrics available in this receiver to SolarWinds Observability. See Configure OpenTelemetry Protocol (OTLP) integration.

StatsD integration

The StatsD integration plugin can connect to the StatsD server and transfer all metrics available in this server to SolarWinds Observability. See Configure StatsD integration.

K8s integration

The K8s collector primarily collects metrics produced by the Kubernetes services managing a cluster, such as the CPU and memory usage of pods, the creation time of namespaces, and the statuses of jobs. SolarWinds Observability collects more of these metrics than it displays in the UI, but all collected metrics are tightly related to the SolarWinds Observability K8s entities.

Additionally, the K8s collector supports autodiscovery of Prometheus metrics. When services (applications) running in a K8s cluster expose their metrics in a Prometheus-compatible format (for example, an HTTP endpoint), the K8s collector detects them, collects them, annotates them with additional information to link them to related SolarWinds Observability entities, and sends them to SolarWinds Observability. This behavior is enabled by default, but it is configurable.

The K8s collector can also work as a proxy for OTLP and Telegraph metrics, but must be configured. The main benefit of doing this instead of sending the OTLP metrics directly to SolarWinds Observability is their automatic annotation with information about the K8s cluster, which will link them to the cluster in the SolarWinds Observability UI.

OTLP endpoint

The main method for sending data to SolarWinds Observability is the use of an OTLP endpoint. For detailed information about the endpoint, see OTel direct ingestion.

For an example of sending custom metrics via the OTel SDK/exporter to this endpoint, see Use Micrometer to send custom metrics to SolarWinds Observability . The SolarWinds Observability APM libraries will also introduce support for OTLP export of metrics in upcoming releases.

APM instrumentation SDK

Custom metrics can also be created using an APM Instrumentation SDK that continues support for the legacy AppOptics instrumentation API "Increment Metric" and "Summary Metric". Most APM libraries will introduce support for OTLP export of metrics created via standard OTel API in upcoming releases. The legacy "Increment Metric" and "Summary Metric" API for custom metrics is supported by:

Metrics created by the APM Instrumentation SDK in AppOptics will automatically be migrated to SolarWinds Observability after you redirect monitoring traffic from AppOptics to SolarWinds Observability.

Important guidelines

Avoid global context

When custom metrics are produced by end-user code, you must avoid generating metrics that are reported only in a single global context.

Example 1

Instead of following metrics:

  • Number_of_transactions_executed_by_service_A_on_host_1
  • Number_of_transactions_executed_by_service_A_on_host_2
  • Number_of_transactions_executed_by_service_B_on_host_1
  • Number_of_transactions_executed_by_service_B_on_host_2

Use a common metric:

  • Number_of_transactions

Each reported value of this metric should be marked with different tag sets:

  • service = service_A, host = host_1
  • service = service_A, host = host_2
  • service = service_B, host = host_1
  • service = service_B, host = host_2
Example 2

Instead of following metrics:

  • Duration_of_ABCD_Build_for_Service_A
  • Duration_of_EFGH_Build_for_Service_A
  • Duration_of_IJKL_Build_for_Service_B
  • Duration_of_MNOP_Build_for_Service_B

Use a common metric:

  • Build_Duration

Each reported value of this metric should be marked with different tag sets:

  • build = ABCD, service = Service_A
  • build = EFGH, service = Service_A
  • build = IJKL, service = Service_B
  • build = MNOP, service = Service_B

Duplicates of metric values

When the following characteristics are exactly the same for two or more custom metric values, they are treated as a duplicated representation of more and more precise values of the same measurement:

  • Reference to the metric name
  • Measurement date and time
  • Measurement context (all key-value pairs)

As a consequence, SolarWinds Observability will overwrite those values. In most cases the last reported value should be the last in the system. However, because of implementation details, exceptions to this rule can occur.

It is possible to avoid overwriting metric values reported for the same context at the same time. To do this, you must add an additional tag (key-value pair) to the reported metric values. This tag is called nonce and its value should be unique.