Services metrics
Once installed and enabled, the SolarWinds Observability APM libraries automatically start reporting key performance metrics about your application, available in the Metrics Explorer. These standard metrics are collected for all requests and include response time and error rate for the service as a whole, or for a particular transaction. They also provides a set of trace-derived metrics for various types of outbound calls made by your application, such as response time for database calls that can be filtered by database host or query operation. These trace-derived metrics are based on a sampled subset of requests to a service. Some of the libraries report language-specific runtime metrics about the execution environment of your application.
Many of the collected metrics from service entities are displayed as widgets in SolarWinds Observability explorers; additional metrics may be collected and available in the Metrics Explorer. You can also create an alert for when an entity's metric value moves out of a specific range. See Entities in SolarWinds Observability for information about entity types in SolarWinds Observability.
The following tables list trace.
in the search box.
Standard metrics
The tables below list the default set of metrics collected by the APM library for all requests. Counts are reported every minute.
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. A health score provides real-time insight into the overall health and performance of your monitored entities. The health score is calculated based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health score is displayed as a single numerical value that ranges from a Good (70-100) to Moderate (40-69) to Bad (0-39) distinction. To view the health score for service entities in the Metrics Explorer, filter the |
trace.service.breakdown.response_time
|
Trace Response Time Breakdown, Average Trace Response Time Breakdown. The amount of time it takes to complete a service transaction, broken down by operation type (for example, application, database calls, or remote calls). The average trace response time breakdown is calculated based on collected traces, so it covers only sampled traces. It can be used to analyze the impact different types of operations performed by the service have on the average response time. |
|
trace.service.errors
|
Count | Service error count. |
trace.service.error_rate
|
Percent (%) | Service error rate (%). |
trace.service.error_ratio
|
Ratio of errors to requests, calculated by dividing the number of requests with errors by the total number of requests. | |
trace.service.host.response_time
|
seconds (s) | Host response time. |
trace.service.hosts.count
|
Count | Service host count. |
trace.service.instance.count
|
Count | Number of service instances that were reporting data in selected time period. |
trace.service.requests
|
Count | Service request count divided by HTTP status codes (200, 404, etc.). |
trace.service.request_rate
|
Rate of requests per second, calculated by dividing the number of requests by the length of aggregation period in seconds. | |
trace.service.response_time
|
seconds (s) | The period between the request being made to the result of the request being returned. |
trace.service.samplecount
|
Count | Count of sampled requests. |
trace.service.tracecount
|
Count | Count of traces generated from requests. |
trace.service.transaction.count
|
Count | Number of transactions that were reporting data in selected time period. |
trace.service.transaction.http_method.requests
|
Count | Transaction HTTP method count. |
trace.service.transaction.http_status.requests
|
Count | Transaction HTTP status count. |
trace.service.transaction.requests
|
Count | Transaction request count. |
Sampled trace-derived metrics
Trace-derived metrics are additional metrics pulled from the traces gathered by the APM library. These metrics are gathered using adaptive sampling; in low-traffic environments every request will be traced but in high-traffic environments a sampled subset of requests will be used. Unlike the standard metrics above, these are not guaranteed to reflect all activity in your applications.
SolarWinds Observability APM instrumentation gathers not only metrics on application performance but also high-fidelity distributed trace data. Each traced request, called a trace, contains a rich data set about the request's handling across all tiers of the application stack, including queries issued, exceptions raised, backtraces at relevant code execution points, and so on.
This instrumentation has the potential to incur application overhead under load, therefore SolarWinds Observability libraries use adaptive sampling to achieve a balance between good monitoring data and application performance.
In a low-traffic environment like development or staging, typically every request will be traced. However, in high-traffic production environments, it is possible to see sample rates of less than one percent.
Database metrics
Metric | Units | Description |
---|---|---|
trace.service.outbound_calls.database.query.response_time_per_trace
|
seconds (s) | Service database query average response time per trace. |
trace.service.outbound_calls.database.query.response_time
|
seconds (s) | Service database query average response time. |
trace.service.outbound_calls.database.query.count_per_trace
|
Count | Service database query average count per trace. |
Cache metrics
Metric | Units | Description |
---|---|---|
trace.service.outbound_calls.cache.op.response_time_per_trace
|
seconds (s) | Service cache op average response time per trace. |
trace.service.outbound_calls.cache.op.response_time
|
seconds (s) | Service cache op query average response time. |
trace.service.outbound_calls.cache.op.count_per_trace
|
Count | Service cache op average count per trace. |
trace.service.outbound_calls.cache.op.hit_rate
|
Service cache op average hit rate. |
Remote service metrics
Metric | Units | Description |
---|---|---|
trace.service.outbound_calls.remote_service.call.response_time_per_trace
|
seconds (s) | Service remote call average response time per trace. |
trace.service.outbound_calls.remote_service.call.response_time
|
seconds (s) | Service remote call query average response time. |
trace.service.outbound_calls.remote_service.call.count_per_trace
|
seconds (s) | Service remote call average count per trace. |
Exception metrics
Metric | Units | Description |
---|---|---|
trace.service.exceptions.count
|
Count | Service exceptions count. |
Other sampled metrics
Metric | Units | Description |
---|---|---|
trace.service.critical_path.response_time_per_trace
|
seconds (s) | Service critical path average response time. |
Runtime metrics
Many of the APM libraries also report language-specific runtime metrics, to provide you insight and monitoring on memory, CPU and other statistics about your application's execution environment. An example is using the JMX metrics reported by the Java Library to correlate between application performance and JVM metrics such as garbage collection, heap size, and thread count.
By default, the libraries report runtime metrics. These runtime metrics are automatically detected and available in the Metrics Explorer.
See the links below on the metrics for each language runtime and library-specific configuration: