Services metrics

Once installed and enabled, the SolarWinds Observability APM libraries automatically start reporting key performance metrics about your application, available in the Metrics Explorer. These standard metrics are collected for all requests and include response time and error rate for the service as a whole, or for a particular transaction. Also available are trace-derived metrics for various types of outbound calls made by your application, such as response time for database calls that can be filtered by database host or query operation. These trace-derived metrics are based on a sampled subset of requests to a service. Some of the libraries report language-specific runtime metrics about the execution environment of your application.

Many of the collected metrics from service entities are displayed as widgets in SolarWinds Observability explorers; additional metrics may be collected and available in the Metrics Explorer. You can also create an alert for when an entity's metric value moves out of a specific range. See Entities in SolarWinds Observability SaaS for information about entity types in SolarWinds Observability SaaS.

The following tables list some of the metrics collected for these entities. To see the service metrics in the Metrics Explorer, type trace. in the search box.

Standard metrics
Primary service metrics
Service metrics with percentiles
Service counters
Sampled trace-derived metrics
Sample rate
Database metrics
Cache metrics
Remote service metrics
Exception metrics
Show More
Other sampled metrics
Runtime metrics
Show Less

Standard metrics

The tables below list the default set of metrics collected by the APM library for all requests. Counts are reported every minute.

Primary service metrics

Metric	Units	Description
`sw.metrics.healthscore`	Percent (%)	Health score. A health score provides real-time insight into the overall health and performance of your monitored entities. The health score is calculated based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health score is displayed as a single numerical value that ranges from a Good (70-100) to Moderate (40-69) to Bad (0-39) distinction. To view the health score for service entities in the Metrics Explorer, filter the `sw.metrics.healthscore` metric by `entity_types` and select `service`.
`trace.service.errors`	Count	Count of requests that ended with an error status. Aggregate by `Sum` to see the total error count for the service.
`trace.service.error_ratio`	%	Ratio of errors to requests, calculated by dividing the number of requests with errors by the total number of requests.
`trace.service.requests`	Count	Count of requests for each HTTP status code (200, 404, etc.). Aggregate by `Sum` to see the total request count for the service.
`trace.service.request_rate`	Count	Rate of requests per second, calculated by dividing the number of requests (`trace.service.requests`) by the length of the aggregation period in seconds.
`trace.service.response_time`	Milliseconds (ms)	Duration of each entry span for the service, typically meaning the time taken to process an inbound request. This metric is based on the following attributes: `sw.transaction` `service.name` `http.response.status_code` `http.request.method` `sw.is_error` This is the primary metric to track service response time.

Service metrics with percentiles

Service metrics stored with percentiles

Metric	Units	Description
`trace.service.service_response_time`	ms	Duration of each entry span for the service, typically meaning the time taken to process an inbound request. This metric is based on the `service.name` attribute.
`trace.service.service_response_time.p50` `trace.service.service_response_time.p95` `trace.service.service_response_time.p99` `trace.service.service_response_time.p999`	ms	Percentile values for the `trace.service.service_response_time` metric.
`trace.service.transaction_response_time`	ms	Duration of each entry span for the service, typically meaning the time taken to process an inbound request.
`trace.service.transaction_response_time.p50` `trace.service.transaction_response_time.p95` `trace.service.transaction_response_time.p99` `trace.service.transaction_response_time.p999`	ms	Percentile values for the `trace.service.transaction_response_time` metric.

Service counters

Metrics representing counts of service related entities

Metric	Units	Description
`trace.service.count`	Count	Number of services that were reporting data in selected time period.
`trace.service.faas.count`	Count	Number of AWS Lambda functions for which APM Services were reporting data during the selected time period.
`trace.service.faas.instance.count`	Count	Number of AWS Lambda instances for which APM Services were reporting data during the selected time period.
`trace.service.hosts.count`	Count	Number of APM Hosts for which APM Services were reporting data during the selected time period. Unique APM Host is captured only for Azure VMs, AWS EC2 Instances, and hosts monitored with UAMS.
`trace.service.instance.count`	Count	Number of service instances that were reporting data during the selected time period.
`trace.service.pod.count`	Count	Number of Kubernetes Pods for which APM Services were reporting data in selected time period.
`trace.service.samplecount`	Count	Count of requests that went through a sampling decision, which excludes those with valid upstream decision and trigger trace requests.
`trace.service.tracecount`	Count	Count of traces generated from requests.
`trace.service.transaction.count`	Count	Number of transactions that were reporting data during the selected time period. This metric is based on the following attributes: `service.name` `sw.transaction`

Sampled trace-derived metrics

Trace-derived metrics are additional metrics calculated from the traces gathered by the APM library. Since traces are sampled, unlike the standard metrics above, these are not guaranteed to reflect all activity in your applications.

Sample rate

SolarWinds Observability APM instrumentation gathers not only metrics on application performance but also high-fidelity distributed trace data. Each traced request, called a trace, contains a rich data set about the request's handling across all tiers of the application stack, including queries issued, exceptions raised, and backtraces at relevant code execution points.

Since instrumentation has the potential to incur application overhead under load, SolarWinds Observability libraries use adaptive sampling to achieve a balance between good monitoring data and application performance. In a low-traffic environment like development or staging, typically every request will be traced. However, in high-traffic production environments, it is possible to see sample rates of less than one percent.

The default sample rate is 100 traces per minute per service.

Database metrics
Cache metrics
Remote service metrics
Exception metrics
Other sampled metrics

Database metrics

Metric	Units	Description
`trace.service.outbound_calls.database.query.response_time`	ms	Duration of traced queries executed by the service to the database.

Cache metrics

Metric	Units	Description
`trace.service.outbound_calls.cache.op.hits`	Count	The count of successful retrievals from cache `get` or `multiget` operations. This is collected only by the PHP Library.
`trace.service.outbound_calls.cache.op.requests`	Count	Number of cache keys returned by the cache call. If the number of keys is not returned, every cache call is counted once.
`trace.service.outbound_calls.cache.op.response_time`	ms	Duration of traced cache calls executed by the service to the cache engine.

Remote service metrics

Metric	Units	Description
`trace.service.outbound_calls.remote_service.call.response_time`	ms	Duration of spans representing remote calls executed by the service to a remote endpoint or remote instrumented service.

Exception metrics

Metric Units Description

Metric	Units	Description
`trace.service.exceptions.count`	Count	Service exceptions count captured in traces. Total number of error events for traced requests. An event is classified as an error if: `exception.message` is set An HTTP call returns a `5XX` status code `sw.event.type` is equal to `error`, `error_log`, or `php_error_cb`

trace.service.exceptions.count

Count

Service exceptions count captured in traces.

Total number of error events for traced requests. An event is classified as an error if:

exception.message is set
An HTTP call returns a 5XX status code
sw.event.type is equal to error, error_log, or php_error_cb

Other sampled metrics

Metric	Units	Description
`trace.service.breakdown.response_time`	Microseconds (μs)	Trace Response Time Breakdown, Average Trace Response Time Breakdown. The amount of time it takes to complete a service transaction, broken down by operation type (for example, application, database calls, or remote calls). The average trace response time breakdown is calculated based on collected traces, so it covers only sampled traces. It can be used to analyze the impact that different types of operations performed by the service have on the average response time.

Runtime metrics

Many of the APM libraries also report language-specific runtime metrics, to provide you insight and monitoring on memory, CPU and other statistics about your application's execution environment. An example is using the JMX metrics reported by the Java Library to correlate between application performance and JVM metrics such as garbage collection, heap size, and thread count.

By default, the libraries report runtime metrics. These runtime metrics are automatically detected and available in the Metrics Explorer.

See the links below on the metrics for each language runtime and library-specific configuration:

Search SolarWinds Support