Documentation forSolarWinds Observability

Services metrics

Once installed and enabled, the SolarWinds Observability APM libraries automatically start reporting key performance metrics about your application, available in the Metrics Explorer. These standard metrics are collected for all requests and include response time and error rate for the service as a whole, or for a particular transaction. They also provides a set of trace-derived metrics for various types of outbound calls made by your application, such as response time for database calls that can be filtered by database host or query operation. These trace-derived metrics are based on a sampled subset of requests to a service. Some of the libraries report language-specific runtime metrics about the execution environment of your application.

Standard Metrics

The tables below list the default set of metrics collected by the APM library for all requests, as well as the tags available to the standard metrics. Counts are reported every minute.

Metric Description Available Tags
trace.service.errors Service error count service
trace.service.error_rate Service error rate (%) service
trace.service.host.errors Host error count service,status
trace.service.host.requests Host request count service,status
trace.service.host.response_time Host response time service,status
trace.service.hosts.count Service host count service
trace.service.http_method.requests Service HTTP method count service,method
trace.service.http_method.response_time Service response time by HTTP method service,method
trace.service.http_status.requests Service HTTP status count service,status
trace.service.http_status.response_time Service response time by HTTP method service,status
trace.service.requests Service request count service
trace.service.response_time Service average response time service
trace.service.response_time.p50 Service response time of the top 50% service
trace.service.response_time.p95 Service response time of the top 95% service
trace.service.response_time.p99 Service response time of the top 99% service
trace.service.response_time.p999 Service response time of the top 99.9% service
trace.service.samplecount count of sampled requests service,host_type
trace.service.tracecount count of traces generated from requests service,host_type
trace.service.transaction.errors Transaction error count service,transaction
trace.service.transaction.http_method.requests Transaction HTTP method count service,transaction,method
trace.service.transaction.http_status.requests Transaction HTTP status count service,transaction,status
trace.service.transaction.requests Transaction request count service,transaction
trace.service.transaction.response_time Transaction average response time service,transaction
trace.service.transaction.response_time.p50 Transaction response time of the top 50% service,transaction
trace.service.transaction.response_time.p95 Transaction response time of the top 95% service,transaction
trace.service.transaction.response_time.p99 Transaction response time of the top 99% service,transaction

Tags

Tag Name Description
host_type The type of the host, which is either server-based (PERSISTENT) or AWS Lambda (AWS_LAMBDA).
method HTTP request method
service Name of the service
status HTTP status code
transaction Name of the transaction

Sampled Trace-Derived Metrics

Trace-derived metrics are additional metrics pulled from the traces gathered by the APM library. These metrics are gathered using adaptive sampling; in low-traffic environments every request will be traced but in high-traffic environments a sampled subset of requests will be used. Unlike the standard metrics above, these are not guaranteed to reflect all activity in your applications.

SolarWinds Observability APM instrumentation gathers not only metrics on application performance but also high-fidelity distributed trace data. Each traced request, called a trace, contains a rich data set about the request's handling across all tiers of the application stack, including queries issued, exceptions raised, backtraces at relevant code execution points, and so on.

This instrumentation has the potential to incur application overhead under load, therefore SolarWinds Observability libraries use adaptive sampling to achieve a balance between good monitoring data and application performance.

In a low-traffic environment like development or staging, typically every request will be traced. However, in high-traffic production environments, it is possible to see sample rates of less than one percent.

Database Metrics

All database metric tags are available to these database metrics.

Metric Description
trace.service.outbound_calls.database.query.response_time_per_trace Service database query average response time per trace
trace.service.outbound_calls.database.query.response_time Service database query average response time
trace.service.outbound_calls.database.query.count_per_trace Service database query average count per trace

Database Metric Tags

Tag Name Description
service Name of the service
transaction Name of the transaction
database Name of the database
database_host The Database Host
query_op Query operation
query_table Database Table

Cache Metrics

All cache metric tags are available to these cache metrics.

Metric Description
trace.service.outbound_calls.cache.op.response_time_per_trace Service cache op average response time per trace
trace.service.outbound_calls.cache.op.response_time Service cache op query average response time
trace.service.outbound_calls.cache.op.count_per_trace Service cache op average count per trace
trace.service.outbound_calls.cache.op.hit_rate Servce cache op average hit rate

Cache Metric Tags

Tag Name Description
service Name of the service
cache Name of the cache
cache_host The Cache Host
cache_op Cache operation

Remote Service Metrics

All remote service metric tags are available to these remote service metrics.

Metric Description
trace.service.outbound_calls.remote_service.call.response_time_per_trace Service remote call average response time per trace
trace.service.outbound_calls.remote_service.call.response_time Service remote call query average response time
trace.service.outbound_calls.remote_service.call.count_per_trace Service remote call average count per trace

Remote Service Metric Tags

Tag Name Description
service Name of the service
remote_service Name of the remote service
remote_service_type Remotes service type
remote_service_op Remote service operation

Exception Metrics

Metric Description
trace.service.exceptions.count Service exceptions count

Exception Metric Tags

Tag Name Description
service Name of the service
transaction Name of the transaction
exception_class Name of the Exception Class
exception_class_message_hash A hash of the message
exception_class_message_backtrace_hash A hash of the backtrace

Other Sampled Metrics

Metric Description
trace.service.critical_path.response_time_per_trace Service critical path average response time

Other Sampled Metric Tags

Tag Name Description
service Name of the service
host Name of the host
transaction Name of the transaction
layer_name Name of the Layer
layer_type The type of Layer

Runtime Metrics

Many of the APM libraries also report language-specific runtime metrics, to provide you insight and monitoring on memory, CPU and other statistics about your application's execution environment. An example is using the JMX metrics reported by the Java Library to correlate between application performance and JVM metrics such as garbage collection, heap size, and thread count.

By default, the libraries report runtime metrics. These runtime metrics are automatically detected and available in the Metrics Explorer.

See the links below on the metrics for each language runtime and library-specific configuration:

Runtime Metric Tags

Runtime metrics, regardless of library, will have the following tags:

Tag Name Description
hostname Name of the host
service Name of the service