Documentation forSolarWinds Observability SaaS

Tail-based sampling

Tail-based (or tail) sampling is a sampling technique where the system decides which traces to keep after a request completes, allowing it to retain high-value traces. Unlike head-based sampling, in which samples are evaluated at the beginning of a request, tail-based sampling occurs after the request finishes so that decisions are based on the complete picture of the trace sampled.

Tail-based sampling is supported by APM libraries that export via OTLP and achieved through standard OpenTelemetry Collector components. The core concept is that requests sent to the service (or services, for a distributed workflow) are head-sampled at a high rate by the APM library and the resulting traces are exported to OpenTelemetry Collector instances deployed in your environment. The traces are then filtered by the collector’s tail sampling processor based on your custom policy and only the retained ones are forwarded to SolarWinds Observability.

A flow diagram showing how an instrumented application sends trace spans to a first‑tier OpenTelemetry collector that load‑balances and routes data by trace ID. The first tier forwards traces to two second‑tier collectors that apply tail‑sampling before sending sampled traces to the SolarWinds Observability OTLP ingestion endpoint. The diagram also shows the customer environment retrieving head‑sampling configuration from the SolarWinds APM settings endpoint.

Tail sampling makes the rich context of a trace available to the sampling decision so that it is possible, for example, to capture only errored or high-latency requests, or sample a particular transaction at a higher rate than others. With this power comes trade-offs such as the increased complexity in deploying, configuring, and monitoring collectors in your environment. There is also the risk of unintended high volume of traces exported and the associated increase in costs; to prevent this, APM libraries still impose an upper bound rate limit even when this feature is enabled.

Tail-based sampling is recommended for core workloads only and for selective purposes, such as looking at more errored or high latency requests. SolarWinds will limit the number of traces if it exceeds fair use as seen within the customer base, reverting your service instance(s) to lower sample rates.

This documentation provides best-practice recommendations and example configurations that should be customized to suit your needs. It also assumes familiarity with setting up a standard head-sampled service where the APM library exports directly to SolarWinds Observability.

Prerequisites

An OpenTelemetry Collector deployed in your environment. We recommend using the SolarWinds OpenTelemetry Collector, configured and deployed in the gateway pattern recommended for tail sampling where the first tier instances use the load-balancing exporter to export to the second tier that runs the tail sampling processor. This is described in the Collector deployment example section.

The service(s) in a workflow must be instrumented by a supported APM library version to tail-sample a complete distributed trace.

The list of services to enable for tail sampling must be provided to customer support, in order to enable high head sampling rate in the APM library. These services should already be instrumented and visible in SolarWinds Observability, to be able to confirm the related services in a distributed workflow, and to access the trace details that are required to configure tail sampling policies.

The service(s) must be configured to export traces to the OpenTelemetry Collector deployed in your environment.

Certain features such as code profiling and exemplar collection should be disabled to prevent unnecessary overhead, see details below in Limitations.

Collector deployment examples

These examples provide a quick start to try out tail sampling using a SolarWinds OpenTelemetry Collector distribution. They highlight core concepts including:

  • The first-tier collector uses load-balancing exporter with traceID routing key to send spans to second tier collectors for tail sampling.

  • A set of second-tier collectors have tail sampling processors configured with desired policies.

  • The collectors include the solarwinds extension for enhanced entity inference and enable collector internal telemetry and tail sampling processor policy tracking for diagnostics.

Docker Compose

See the Tail Sampling Example in our GitHub repository.

Kubernetes

Coming soon.

Configuration

Load-balancing by trace ID

The example in this file demonstrates the simplest backend scaling and resiliency configuration. Refer to the official documentation for details on configuring backends to suit your deployment. Also note this upstream section on scaling stateful collectors, which is particularly relevant for dynamic environments like Kubernetes.

Tail sampling processor policy

Refer to the official documentation for details on creating policies that suit your needs. The examples in this file demonstrate common scenarios, and SolarWinds-specific cases are highlighted below.

Custom transaction name

When the transaction name for a trace is customized via SDK or configuration, the custom name is set as the attribute sw.transaction on the service/ local root span:

{
  name: filter-custom-transaction-name,
  type: string_attribute,
  string_attribute: { key: sw.transaction, values: [value] },
}

Trigger trace

SolarWinds Digital Experience Monitoring probes use trigger trace when making requests to the monitored target. The resulting trace allows deep insight from the website to the underlying service(s). Thus, it is recommended to preserve triggered traces by checking the TriggeredTrace span attribute, which is set on any triggered trace:

{
  # Always sample trigger-traced requests
  name: trigger-trace-policy,
  type: boolean_attribute,
  boolean_attribute: { key: TriggeredTrace, value: true },
}

APM library

The following APM libraries support tail sampling. We recommend upgrading to the latest available version and assume the latest version is used in examples.

As mentioned in prerequisites the service should first be added to SolarWinds Observability using a supported version of the APM library listed above.

Once tail sampling settings are enabled for the service by customer support, and the OpenTelemetry Collector deployment is available in your environment, the only APM library configuration change needed is to configure it to export traces to the first tier collector. An example setting the endpoint for all signals:

OTEL_EXPORTER_OTLP_ENDPOINT=<first-tier-collector-endpoint>

See OTLP Exporter Configuration for details, and check the zero-code config pages (.NET, Java, Node.js, Python and PHP) or the SDK pages (Go, Ruby) for language-specific information.

Monitoring and diagnostics

Telemetry association to collector

The collector deployment examples configure the SolarWinds Extension and Processor so that each collector instance is assigned a unique name by the extension:

extensions:
  solarwinds:
    collector_name: ${env:SOLARWINDS_OTELCOL_NAME}

The processor enriches data flowing through the applicable pipeline with this collector name, so that a SolarWinds OTel Collector entity is created in SolarWinds Observability and correlated to the telemetry. You can find it in Entity Explorer by searching on the configured name (for example, collector-lb, collector-ts1) or get a list of all Infrastructure entities of the type SolarWinds OTel Collector:

SolarWinds Observability Entity explorer, filtered by Infrastructure entities, then by SolarWinds OTel Collectors. Three collectors—collector-lb, collector-ts1, and collector-ts2—are listed with “Good” health status, about one hour of uptime, and version 0.145.2.

In the entity details view for a particular collector, you can use the LOGS, METRIC, and TRACES tabs to see the associated telemetry. Our example deployment configures logs and metrics pipelines to the first tier collector-lb, and traces pipeline to the second tier collector-ts1, collector-ts2 collectors, which can be seen in the entity detail view tabs:

collector-lb

SolarWinds Observability Entity Details view showing the collector‑lb details with the Logs tab selected. A log stream displays recent HTTP requests from a Flask APM test service, including successful GET requests and one error entry highlighted in red.

SolarWinds Observability Entity Details showing the collector‑lb details with the Metrics tab selected. A search filter for “response” is applied, displaying the metric trace.service.response_time with units in milliseconds and a description of request processing duration.

collector-ts2

SolarWinds Observability Entity Details view showing the collector‑ts2 details with the Traces tab selected. A duration heatmap appears above a table of traced requests, listing recent GET transactions from a Flask APM test service with HTTP 500 error responses and request durations in milliseconds.

Conversely, the Metrics Explorer and Traces Explorer views can be used to analyze collector association across telemetry, for example grouping by the enriched collector name attribute sw.otelcol.collector.name:

SolarWinds Observability Metrics Explorer view displaying the trace.service.response_time metric. A line chart shows average response time over the selected time range, grouped by OTEL collector name, with visible variation in millisecond values for collector‑lb.

SolarWinds Observability Traces Explorer view showing traced requests filtered by OTEL collector name. A duration heatmap appears above a table of recent HTTP GET requests, listing services, transactions, HTTP status codes, and request durations across multiple collectors.

Internal telemetry

The deployment examples also enable collector internal telemetry export, so that metrics useful for monitoring and diagnostics of the collector and its components are available in SolarWinds Observability. Most of these metric names are prefixed with otelcol_, you can use the METRICS tab in the entity details view to see them for a particular collector or use the Metrics Explorer view to analyze them across collectors.

Below are references to the internal telemetry relevant to tail sampling:

To gain insight on which tail sampling policy was used for a trace, the deployment examples also enable policy tracking. You can see these tailsampling.* attributes on individual traces in the details view or analyze across traces in the Traces Explorer view:

SolarWinds Observability Trace Details view showing a selected GET request to /error from a Flask APM test service. The page displays span duration, an execution breakdown with local code time, a spans timeline, and attributes indicating an error-based tail‑sampling policy.

SolarWinds Observability Traces Explorer view showing traced requests filtered by tailsampling.policy. A duration heatmap is displayed above a table of recent HTTP requests, with results grouped by different tail‑sampling policies and showing request status codes and durations.

Limitations

Ensure that code profiling is disabled in APM Java when tail sampling is enabled.

The code profiling feature for APM .NET and APM Java are not supported for tail sampling. This is due to code profiling still using the legacy proprietary data export protocol instead of OTLP. Importantly, since profiling is performed for every in-process traced (head-sampled) request, the instrumented application could incur significant performance penalty in a tail sampling enabled service. Ensure that code profiling is disabled (the default) in APM Java before setting up tail sampling.

SolarWinds recommends disabling exemplars when tail sampling is enabled.

When tail sampling, traces linked to by metric exemplars may be dropped, thus reducing the usefulness of exemplars. Additionally, the default criteria for collecting exemplars is the TraceBased exemplar filter that includes all in-process traced (head-sampled) requests, which could incur more overhead in a tail sampling enabled service. The AlwaysOff exemplar filter can be used to disable exemplar collection, for example by setting this environment variable:

OTEL_METRICS_EXEMPLAR_FILTER=always_off