Documentation forSolarWinds Observability SaaS

Metrics for SolarWinds Observability SaaS entities

Many of the collected metrics from SolarWinds Observability entities are displayed as widgets in SolarWinds Observability explorers; additional metrics may be collected and available in the Metrics Explorer. You can also create an alert for when an entity's metric value moves out of a specific range. See Entities in SolarWinds Observability SaaS for information about entity types in SolarWinds Observability SaaS.

Common metrics

The following metric(s) are available for all entities in SolarWinds Observability SaaS.

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. A health score provides real-time insight into the overall health and performance of your monitored entities. The health score is calculated based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health score is displayed as a single numerical value that ranges from a Good (70-100) to Moderate (40-69) to Bad (0-39) distinction.

To view the health score separately for each specific entity type in the Metrics Explorer, group the sw.metrics.healthscore metric by entity_types.

APM/service metrics

Metrics for service entities are sent by APM libraries installed and configured to monitor your service. See Application performance monitoring (APM) for more information.

Standard metrics

Metric Units Description
trace.service.breakdown.response_time  

Trace Response Time Breakdown, Average Trace Response Time Breakdown. The amount of time it takes to complete a service transaction, broken down by operation type (for example, application, database calls, or remote calls). The average trace response time breakdown is calculated based on collected traces, so it covers only sampled traces. It can be used to analyze the impact that different types of operations performed by the service have on the average response time.

trace.service.count Count Number of services that were reporting data in selected time period.
trace.service.errors Count Count of requests that ended with an error status. Aggregate by Sum to see the total error count for the service.
trace.service.error_ratio % Ratio of errors to requests, calculated by dividing the number of requests with errors by the total number of requests.
trace.service.exceptions.count Count

Total number of error events for traced requests. An event is classified as an errors if:

  • exception.message is set
  • An HTTP call returns a 5XX status code
  • sw.event.type is equal to error, error_log, or php_error_cb
trace.service.faas.count Count Number of AWS Lambda functions for which APM Services were reporting data during the selected time period.
trace.service.faas.instance.count Count Number of AWS Lambda instances for which APM Services were reporting data during the selected time period.
trace.service.hosts.count Count

Number of APM Hosts for which APM Services were reporting data during the selected time period.

Unique APM Host is captured only for Azure VMs, AWS EC2 Instances, and hosts monitored with UAMS.

trace.service.instance.count Count Number of service instances that were reporting data during the selected time period.
trace.service.pod.count Count Number of Kubernetes Pods for which APM Services were reporting data in selected time period.
trace.service.requests Count Count of requests for each HTTP status code (200, 404, etc.). Aggregate by Sum to see the total request count for the service.
trace.service.request_rate Count Rate of requests per second, calculated by dividing the number of requests (trace.service.requests) by the length of the aggregation period in seconds.
trace.service.response_time ms Duration of each entry span for the service, typically meaning the time taken to process an inbound request.
trace.service.samplecount Count Count of requests that went through a sampling decision, which excludes those with valid upstream decision and trigger trace requests.
trace.service.service_response_time ms Duration of each entry span for the service, typically meaning the time taken to process an inbound request.
trace.service.service_response_time.p50
trace.service.service_response_time.p95
trace.service.service_response_time.p99
trace.service.service_response_time.p999
ms Percentile values for the trace.service.service_response_time metric.
trace.service.tracecount Count Count of traces generated from requests.
trace.service.transaction.count Count Number of transactions that were reporting data during the selected time period.
trace.service.transaction_response_time ms Duration of each entry span for the service, typically meaning the time taken to process an inbound request.
trace.service.service_response_time.p50
trace.service.service_response_time.p95
trace.service.service_response_time.p99
trace.service.service_response_time.p999
ms Percentile values for the trace.service.service_response_time metric.

Sampled trace-derived database metrics

Metric Units Description
trace.service.outbound_calls.database.query.response_time ms Duration of traced queries executed by the service to the database.

Sampled trace-derived cache metrics

Metric Units Description
trace.service.outbound_calls.cache.op.hits Count

The count of successful retrievals from cache get or multiget operations.

trace.service.outbound_calls.cache.op.requests Count Number of cache keys returned by the cache call. If the number of keys is not returned, every cache call is counted once.
trace.service.outbound_calls.cache.op.response_time ms Duration of traced cache calls executed by the service to the cache engine.

Sampled trace-derived remote service metrics

Metric Units Description
trace.service.outbound_calls.remote_service.call.response_time ms Duration of spans representing remote calls executed by the service to a remote endpoint or remote instrumented service.

Sampled trace-derived exception metrics

Metric Units Description
trace.service.exceptions.count Count Service exceptions count.

Other sampled trace-derived metrics

Metric Units Description
trace.service.critical_path.response_time_per_trace seconds (s) Service critical path average response time.

Runtime metrics

See the links below on the metrics for each language runtime and library-specific configuration:

Database metrics

Metrics for database instance entities are sent by the SolarWinds Observability Agent monitoring your databases. See Database monitoring for more information.

Metric Units Description
dbo.host.queries.errors.tput EPS

Errors, Error Rate. The number of recorded errors for your database instances per second; the total number of errors returned per second across your monitored databases. Incorrect database responses may indicate request are failing, while throughput and response time appear healthy.

dbo.host.queries.latency_us milliseconds (ms)

Response Time. The amount of query latency in milliseconds per query execution across your monitored databases. May be displayed as:

  • Average Response Time. An average of the query latency per query execution for all monitored databases during the selected time period
dbo.host.queries.p99_latency_us milliseconds (ms)

Response Time 99th percentile. The amount of response time in the 99th percentile value for each of the top selected queries.

dbo.host.queries.time_us Count

Load. The load on your monitored databases, as a number of requests executing simultaneously. Concurrency reveals load (or demand) in a way that is orthogonal to variations in request speed or frequency.

dbo.host.queries.tput QPS

Throughput. The number of queries or statements completed per second. This is a metric of traffic intensity and frequency, showing how many requests your servers are processing.

Digital Experience /website metrics

Metrics for website entities are either collected by probes that synthetically test your website's availability, or sent by the RUM script added to your website. See Digital experience monitoring .

Synthetic availability metrics

Metric Units Description
synthetics.attempts Count Value representing the sum of all (successful and unsuccessful) page loads for a selected time period. Used to calculate the success and error rate.
synthetics.availability Percent (%)

Overall Availability, Availability History. Represents if a website is available or unavailable.

Found in: Metrics Explorer, Entity Explorer (Availability tab)

synthetics.error_rate Percent (%)

Error Rate. Percentage of the tests that ran during the specified time period and failed. A test fails if an error prevents the website(s) from loading.

Found in: Metrics Explorer, Entity Explorer, DEM area overview

synthetics.errors Count Value representing the sum of unsuccessful page loads for a selected time period. Used to calculate the error rate.
synthetics.http.response.time milliseconds (ms)

HTTP Response Time, Average HTTP Response Time History. Time required to perform an HTTP GET command to retrieve the webpage(s) during the specified time period.

HTTP communications between SolarWinds Observability SaaS and configured entities are not encrypted.

May be displayed as:

  • Average HTTP Response Time. Average time required to perform an HTTP GET command to retrieve the webpage(s) during the specified time period.

  • Average Response Time History. The average HTTP response times during the specified time period. Use this chart, for example, to identify time periods when response time is typically higher than usual.

Found in: Metrics Explorer, Entity Explorer, Inspector Panel.

synthetics.https.certificates.days_to_certificate_expiration Days The number of days between today and the date the website's SSL/TLS certificate expires.
synthetics.https.response.time milliseconds (ms) HTTPS Response Time, Average HTTPS Response Time History. Time required to perform an HTTPS GET command to retrieve the webpage(s).

HTTPS communications are encrypted using Transport Layer Security (TLS).

May be displayed as:

  • Average HTTPS Response Time. Average time required to perform an HTTPS GET command to retrieve the webpage(s) during the specified time period.

  • Average Response Time History. The average HTTPS response times during the specified time period. Use this chart, for example, to identify time periods when response time is typically higher than usual.

Found in: Metrics Explorer, Entity Explorer, Inspector Panel.

synthetics.status Boolean Status of a test result, where 0 indicates the website is unavailable and 1 indicates the website is available.
synthetics.success_rate Percent (%)

Success Rate. Percentage of the tests that ran during the specified time period and were successful. A test is successful if SolarWinds Observability SaaS is able to load the website(s). and it fails if an error prevents the website(s) from loading. An average of this metric is used to include availability in the health score.

Found in: Metrics Explorer, Entity Explorer, DEM area overview

synthetics.successes Count Value representing the sum of successful page loads for a selected time period. Used to calculate the success rate.

Synthetic transaction metrics

Metric Units Description
synthetics.transaction.attempts Count The number of attempted executions of your synthetic transaction for the selected time period.
synthetics.transaction.duration Seconds (s)

Historical Overview. The amount of time in seconds that it took your synthetic transaction to complete its execution.

May be displayed as:

  • Average Test Duration. Average time it took for your synthetic transaction to complete its execution.

synthetics.transaction.error_rate Percentage (%) Test Success Rate. Value representing the percentage of failed transaction attempts for the selected time period.
synthetics.transaction.errors Count Test Success Rate. Value representing the sum of failed transaction attempts for the selected time period. Used to calculate the Synthetic transaction error rate.
synthetics.transaction.success_rate Percentage (%) Test Success Rate. Value representing the percentage of successful transaction attempts for the selected time period.
synthetics.transaction.successes Count Test Success Rate. Value representing the sum of successful transaction attempts for the selected time period. Used to calculate the Synthetic transaction success rate.

RUM metrics

Metric Units Description
rum.pageview.apdex_score  

Apdex score. A measurement of user satisfaction, using the Application Performance Index standard to specify the degree to which measured performance meets user expectations. The satisfactory load time, tolerating, and frustrated load times are defined when creating the website entity. For more information about the Apdex standard, Defining the Application Performance Index.

If the response time for requests takes less time than the satisfied load time threshold set for your website, the Apdex score is considered a satisfied load time. It is a tolerating load time if the response time takes up to four times the satisfied load time threshold, and a frustrated load time if it takes longer than four times the satisfied load time threshold.

rum.pageview.client_processing seconds (s) Client Processing Time. Measurement of the time from when the browser sends the initial HTTP request until all synchronous load events have been processed, including layout and running scripts.
rum.pageview.count Count PageViews. Count of the views of your webpage(s).
rum.pageview.load_time seconds (s) Load Time. The amount of time for the website to fully load.
rum.pageview.ttfb seconds (s) Time to First Byte. The amount of time between when the browser requested a page and when it received the first byte of information from the server.
rum.web_vitals.largest_contentful_paint seconds (s)

Largest Contentful Paint. A measurement of how quickly the largest image or text content of a web page is loaded.

Largest contentful paint time is considered good if loading the largest image or text block takes less than 2.5 seconds, needs improvement if it takes up to 4.0 seconds, and poor if it takes longer than 4.0 seconds.

rum.web_vitals.cumulative_layout_shift  

Cumulative Layout Shift. Measures how much a webpage shifts unexpectedly while a user is viewing the webpage. A shift may occur if content loads at different speeds or if elements are added to the website dynamically.

A cumulative layout shift value of less than .1 is considered good, a value up to .25 needs improvement, and a value greater than .25 is poor.

rum.web_vitals.first_input_delay seconds (s)

First Input Delay. Time from when a user first interacts with your site to the time when the browser is able to respond to the interaction. First input delay (FID) helps measure the first impression a user has of your site's responsiveness.

The FID is considered good if responding to a customer’s first interaction with the site takes less than 100ms, needs improvement if it takes up to 300 ms, and poor if it takes longer than 300 ms.

rum.session.count   Sessions, Top 10 countries by session. The total number of sessions, or visits, to the website during the selected time period and by country. A single session includes every action that the user takes during the entirety of their visit to the website.

Infrastructure/self-managed host metrics

Metrics for self-managed host entities are sent by the SolarWinds Observability Agent monitoring your host. See Host monitoring for more information.

Metric Units Description
system.cpu.utilization Percent (%)

The percentage of CPU time broken down by different states, as a percentage.

system.cpu.utilization.aggregated Percent (%)

CPU Utilization. The average amount of CPU capacity in use, as a percentage.

system.memory.utilization.aggregated Percent (%) Memory Utilization. The average amount of memory in use, as a percentage.
system.filesystem.usage GB (Gigabytes)

The average amount of used space on each drive over time, in Gigabytes.

system.filesystem.utilization Percent (%)

Disk Utilization. The average amount of used space on each drive over time, as a percentage.

system.disk.operations.read.aggregated.rate   Disk Read Operations. The average number of read operations performed on a disk per second.
system.disk.operations.write.aggregated.rate   Disk Write Operations. The average number of write operations performed on a disk per second.
system.network.io.receive.aggregated.rate Binary Bytes Network In. The average amount of data received over the network, in bytes. System metrics report bytes per second.
system.network.io.transmit.aggregated.rate Binary Bytes Network Out. The average amount of data sent over the network, in bytes. System metrics report bytes per second.

Infrastructure/AWS metrics

Metrics for AWS entities are collected by integrating SolarWinds Observability SaaS with your AWS cloud account. See AWS cloud platform monitoring.

API Gateway

Metric Units Description
AWS.ApiGateway.4XXError Count

4XXError. The total number of client-side errors for REST APIs captured in a given period.

AWS.ApiGateway.4xx Count

4xx. The total number of client-side errors for HTTP APIs captured in a given period.

AWS.ApiGateway.5XXError Count

5XXError. The total number of server-side errors for REST APIs captured in a given period.

AWS.ApiGateway.5xx Count

5xx. The total number of server-side errors for HTTP APIs captured in a given period.

AWS.ApiGateway.CacheHitCount Count

CacheHitCount. The total number of requests served from the API cache in a given period.

AWS.ApiGateway.CacheMissCount Count

CacheMissCount. The total number of requests served from the backend in a given period, when API caching is enabled.

AWS.ApiGateway.ClientError Count

ClientError. The total number of requests that have a 4XX response returned by API Gateway before the integration is invoked.

AWS.ApiGateway.ConnectCount Count

ConnectCount. The total number of messages sent to the connect route integration.

AWS.ApiGateway.Count Count

Count. The total number of API requests in a given period.

AWS.ApiGateway.DataProcessed bytes

DataProcessed. The total amount of data processed in bytes.

AWS.ApiGateway.ExecutionError Count

ExecutionError. The total number of errors that occurred when calling the integration.

AWS.ApiGateway.IntegrationError Count

IntegrationError. The total number of requests that return a 4XX or 5XX response from the integration.

AWS.ApiGateway.IntegrationLatency milliseconds (ms)

IntegrationLatency. The average time between when API Gateway relays a request to the backend and when it receives a response from the backend.

AWS.ApiGateway.Latency milliseconds (ms)

Latency. The average time between when API Gateway receives a request from a client and when it returns a response to the client.

AWS.ApiGateway.MessageCount Count

MessageCount. The total number of messages sent to the WebSocket API, either from or to the client.

Application ELB

Metric Units Description
AWS.ApplicationELB.ActiveConnectionCount Count

ActiveConnectionCount. The total number of concurrent TCP connections active from clients to the load balancer and from the load balancer to targets.

AWS.ApplicationELB.ConsumedLCUs Count

ConsumedLCUs. The total number of load balancer capacity units (LCU) used by load balancer.

AWS.ApplicationELB.HTTPCode_ELB_4XX_Count Count

HTTPCode_ELB_4XX_Count. The total number of HTTP 4XX client error codes that originate from the load balancer.

AWS.ApplicationELB.HTTPCode_ELB_5XX_Count Count

HTTPCode_ELB_5XX_Count. The total number of HTTP 5XX client error codes that originate from the load balancer.

AWS.ApplicationELB.HTTPCode_Target_4XX_Count Count

HTTPCode_Target_4XX_Count. The total number of HTTP response with 4xx status codes generated by the targets. This does not include any response codes generated by the load balancer.

AWS.ApplicationELB.HTTPCode_Target_5XX_Count Count

HTTPCode_Target_5XX_Count. The total number of HTTP response with 5xx status codes generated by the targets. This does not include any response codes generated by the load balancer.

AWS.ApplicationELB.HealthyHostCount Count

HealthyHostCount. The average number of targets that are considered healthy.

AWS.ApplicationELB.NewConnectionCount Count

NewConnectionCount. The total number of new TCP connections established from clients to the load balancer and from the load balancer to targets.

AWS.ApplicationELB.ProcessedBytes bytes

ProcessedBytes. The total number of bytes processed by the load balancer over IPv4 and IPv6 (HTTP header and HTTP payload).

AWS.ApplicationELB.RejectedConnectionCount Count

RejectedConnectionCount. The total number of connections that were rejected because the load balancer had reached its maximum number of connections.

AWS.ApplicationELB.RequestCount Count

RequestCount. The total number of requests processed over IPv4 and IPv6. This metric is only incremented for requests where the load balancer node was able to choose a target.

AWS.ApplicationELB.RequestCountPerTarget Count

RequestCountPerTarget. The total number of requests received by each target in a target group.

AWS.ApplicationELB.TargetConnectionErrorCount Count

TargetConnectionErrorCount. The total number of connections that were not successfully established between the load balancer and target. This metric does not apply if the target is a Lambda function.

AWS.ApplicationELB.TargetResponseTime s

TargetResponseTime. The average time elapsed, in seconds, after the request leaves the load balancer until a response from the target is received.

AWS.ApplicationELB.UnHealthyHostCount Count

UnhealthyHostCount. The average number of targets that are considered unhealthy.

Aurora Cluster

Metric Units Description
AWS.RDS.AuroraGlobalDBReplicationLag

AuroraGlobalDBReplicationLag. The total amount of lag when replicating updates from the primary AWS region.

AWS.RDS.AuroraVolumeBytesLeftTotal

AuroraVolumeBytesLeftTotal. The total available space for the cluster volume.

AWS.RDS.BacktrackChangeRecordsCreationRate

BacktrackChangeRecordsCreationRate. The total number of backtrack change records created over five minutes for the DB cluster.

AWS.RDS.BacktrackChangeRecordsStored

BacktrackChangeRecordsCreationStored. The total number of backtrack change records used by the DB cluster.

AWS.RDS.ServerlessDatabaseCapacity

ServerlessDatabaseCapacity. The total current capacity of an Aurora Serverless DB cluster.

AWS.RDS.SnapshotStorageUsed

SnapshotStorageUsed. The total amount of backup storage consumed by all Aurora snapshots for an Aurora DB cluster outside its backup retention window.

AWS.RDS.VolumeBytesUsed

VolumeBytesUsed. The total amount of storage used by the Aurora DB instance.

AWS.RDS.VolumeReadIOPs

VolumeReadIOPs. The total number of billed read I/O operations from a cluster volume within a five-minute interval.

AWS.RDS.VolumeWriteIOPs

VolumeWriteIOPs. The total number of write disk I/O operations to the cluster volume, reported at five-minute intervals.

Aurora Instance

Metric Units Description
AWS.RDS.ActiveTransactions

ActiveTransactions. The total number of current transactions executing on an Aurora database instance per second.

AWS.RDS.AuroraReplicaLag

AuroraReplicaLag. The total amount of lag when replicating updates from the primary instance.

AWS.RDS.CPUCreditBalance Count

CPUCreditBalance. The total number of CPU credits that an instance has accumulated, reported at five-minute intervals. You can use this metric to determine how long a DB instance can burst beyond its baseline performance level at a given rate.

AWS.RDS.CPUCreditUsage Count

CPUCreditUsage. The total number of CPU credits consumed during the specified period, reported at five-minute intervals. This metric measures the amount of time during which physical CPUs have been used for processing instructions by virtual CPUs allocated to the DB instance.

AWS.RDS.CPUUtilization Percent (%)

CPUUtilization. The total percentage of CPU used by an Aurora DB instance.

AWS.RDS.ConnectionAttempts

ConnectionAttempts. The total number of attempts to connect to an instance, whether successful or not.

AWS.RDS.DDLLatency

DDLLatency. The total duration of requests such as example, create, alter, and drop requests.

AWS.RDS.DDLThroughput

DDLThroughput. The total number of DDL requests per second.

AWS.RDS.DMLLatency

DMLLatency. The total duration of inserts, updates, and deletes.

AWS.RDS.DMLThroughput

DMLThroughput. The total number of inserts, updates, and deletes per second.

AWS.RDS.DatabaseConnections Count

DatabaseConnections. The total number of client network connections to the database instance.

AWS.RDS.FreeableMemory Binary Bytes

FreeableMemory. The total amount of available random access memory.

AWS.RDS.LoginFailures

LoginFailures. The total number of failed login attempts per second.

AWS.RDS.MaximumUsedTransactionIDs Count

MaximumUsedTransactionIDs. The total age of the oldest unvacuumed transaction ID, in transactions. If this value reaches 2,146,483,648 (2^31 - 1,000,000), the database is forced into read-only mode to avoid transaction ID wraparound.

AWS.RDS.ReadIOPS

ReadIOPS. The total number of disk I/O operations per second.

AWS.RDS.ReadLatency seconds (s)

ReadLatency. The total amount of time taken per disk I/O operation.

AWS.RDS.ReadThroughput

ReadThroughput. The total number of bytes read from disk per second.

AWS.RDS.TransactionLogsDiskUsage Megabytes

TransactionLogsDiskUsage. The average amount of disk space consumed by transaction logs on the Aurora PostgreSQL DB instance.

AWS.RDS.WriteIOPS

WriteIOPS. The total number of Aurora storage write records generated per second.

AWS.RDS.WriteLatency seconds (s)

WriteLatency. The total amount of time taken per disk I/O operation.

AWS.RDS.WriteThroughput

WriteThroughput. The total number of bytes written to persistent storage every second.

Auto Scaling Group

Metric Units Description
AWS.AutoScaling.GroupDesiredCapacity

GroupDesiredCapacity. The average number of instances that the Auto Scaling group attempts to maintain.

AWS.AutoScaling.GroupInServiceInstances

GroupInServiceInstances. The average number of instances that are running as part of the Auto Scaling group.

AWS.AutoScaling.GroupMaxSize

GroupMaxSize. The average maximum size of the Auto Scaling group.

AWS.AutoScaling.GroupMinSize

GroupMinSize. The average minimum size of the Auto Scaling group.

AWS.AutoScaling.GroupPendingInstances

GroupPendingInstances. The average number of instances that are pending.

AWS.AutoScaling.GroupStandbyInstances

GroupStandbyInstances. The average number of instances that are in standby state.

AWS.AutoScaling.GroupTerminatingInstances

GroupTerminatingInstances. The average number of instances that are in the process of terminating.

AWS.AutoScaling.GroupTotalInstances

GroupTotalInstances. The average number of total instances.

CloudFront

Metric Units Description
AWS.CloudFront.4xxErrorRate Percent (%)

4xx error rate. The average percentage of all viewer requests for which the response's HTTP status code is 4xx.

AWS.CloudFront.5xxErrorRate Percent (%)

5xx error rate. The average percentage of all viewer requests for which the response's HTTP status code is 5xx.

AWS.CloudFront.BytesDownloaded

Bytes downloaded. The average number of bytes downloaded by viewers for GET, HEAD, and OPTIONS requests.

AWS.CloudFront.BytesUploaded

Bytes uploaded. The average number of bytes that viewers uploaded to your origin with CloudFront using POST and PUT requests.

AWS.CloudFront.Requests

Requests. The total number of viewer requests received by CloudFront for all HTTP methods and for both HTTP and HTTPS requests.

AWS.CloudFront.TotalErrorRate Percent (%)

Total error rate. The average percentage of all viewer requests for which the response's HTTP status code is 4xx or 5xx.

EBS

Metric Units Description
AWS.EBS.AverageReadLatency

AverageReadLatency. The average time required to complete a read request during the specified time period.

AWS.EBS.AverageWriteLatency

AverageWriteLatency. The average time required to complete a write request during the specified time period.

AWS.EBS.BurstBalance Percent (%)

Used with General Purpose SSD (gp2), Throughput Optimized HDD (st1) and Cold HDD (sc1) volumes only. Provides information about the percentage of I/O credits (for gp2) or throughput credits (for st1 and sc1) remaining in the burst bucket.

AWS.EBS.VolumeConsumedReadWriteOps Count

VolumeConsumedReadWriteOps. The total amount of read and write operations (normalized to 256K capacity units) consumed during the specified time period.

AWS.EBS.VolumeIdleTime seconds (s)

The total number of seconds in a specified period of time when no read or write operations were submitted.

AWS.EBS.VolumeQueueLength Count

VolumeQueueLength. The number of read and write operation requests waiting to be completed during the specified time period.

AWS.EBS.VolumeReadBytes Binary Bytes

VolumeReadBytes. The total number of bytes transferred by read operations during the specified time period.

AWS.EBS.VolumeReadOps Count

VolumeReadOps. The total number of read operations during the specified time period. Read operations are counted on completion.

AWS.EBS.VolumeThroughputPercentage Percent (%)

VolumeThroughputPercentage. The percentage of I/O operations per second (IOPS) delivered of the total IOPS provisioned for an Amazon EBS volume.

AWS.EBS.VolumeTotalReadTime seconds (s)

The total number of seconds spent by input operations that completed in a specified period of time.

AWS.EBS.VolumeTotalWriteTime seconds (s)

The total number of seconds spent by output operations that completed in a specified period of time.

AWS.EBS.VolumeWriteBytes Binary Bytes

VolumeWriteBytes. The total number of bytes transferred by write operations during the specified time period.

AWS.EBS.VolumeWriteOps Count

VolumeWriteOps. The total number of write operations during the specified time period. Write operations are counted on completion.

EC2

Metric Units Description
AWS.EC2.CPUCreditBalance Count

For T2 Instances. The number of CPU credits available for the instance to burst beyond its base CPU utilization. Credits are stored in the credit balance after they are earned and removed from the credit balance after they expire. Credits expire 24 hours after they are earned.

AWS.EC2.CPUCreditUsage Count

For T2 Instances. The number of CPU credits consumed by the instance. One CPU credit equals one vCPU running at 100% utilization for one minute or an equivalent combination of vCPUs, utilization, and time (for example, one vCPU running at 50% utilization for two minutes or two vCPUs running at 25% utilization for two minutes).

AWS.EC2.CPUUtilization Percent (%)

The percentage of allocated EC2 compute units that are currently in use on the instance. This metric identifies the processing power required to run an application upon a selected instance.

AWS.EC2.DiskReadBytes Binary Bytes

Bytes read from all instance store volumes available to the instance. This metric is used to determine the volume of the data the application reads from the hard disk of the instance. This can be used to determine the speed of the application.

AWS.EC2.DiskReadOps Count

Completed read operations from all instance store volumes available to the instance in a specified period of time.

AWS.EC2.DiskWriteBytes Binary Bytes

Bytes written to all instance store volumes available to the instance. This metric is used to determine the volume of the data the application writes onto the hard disk of the instance. This can be used to determine the speed of the application.

AWS.EC2.DiskWriteOps Count

Completed write operations to all instance store volumes available to the instance in a specified period of time.

AWS.EC2.NetworkIn Binary Bytes

The number of bytes received on all network interfaces by the instance. This metric identifies the volume of incoming network traffic to a single instance.

AWS.EC2.NetworkOut Binary Bytes

The number of bytes sent out on all network interfaces by the instance. This metric identifies the volume of outgoing network traffic from a single instance.

AWS.EC2.NetworkPacketsIn Count

The number of packets received on all network interfaces by the instance. This metric identifies the volume of incoming traffic in terms of the number of packets on a single instance. This metric is available for basic monitoring only.

AWS.EC2.NetworkPacketsOut Count

The number of packets sent out on all network interfaces by the instance. This metric identifies the volume of outgoing traffic in terms of the number of packets on a single instance. This metric is available for basic monitoring only.

AWS.EC2.StatusCheckFailed Count

Reports whether the instance has passed both the instance status check and the system status check in the last minute.This metric can be either 0 (passed) or 1 (failed).

AWS.EC2.StatusCheckFailed_Instance Count

Reports whether the instance has passed the instance status check in the last minute.This metric can be either 0 (passed) or 1 (failed).

EFS

Metric Units Description
AWS.EFS.BurstCreditBalance Binary Bytes

BurstCreditBalance. The average number of burst credits that a file system has. Burst credits allow a file system to burst to throughput levels above a file system’s baseline level for periods of time.

AWS.EFS.ClientConnections Count

ClientConnections. The total number of client connections to a file system. When using a standard client, there is one connection per mounted Amazon EC2 instance.

AWS.EFS.DataReadIOBytes Binary Bytes

DataReadIOBytes. The average number of bytes for each file system read operation.

AWS.EFS.DataWriteIOBytes Binary Bytes

DataWriteIOBytes. The average number of bytes for each file system write operation.

AWS.EFS.MetadataIOBytes Binary Bytes

MetadataIOBytes. The average number of bytes for each metadata operation.

AWS.EFS.MeteredIOBytes

MeteredIOBytes. The average number of metered bytes for each file system operation, including data read, data write, and metadata operations, with read operations metered at one-third the rate of other operations.

AWS.EFS.PercentIOLimit Percent (%)

PercentIOLimit. How close a file system is to reaching the I/O limit of the General Purpose performance mode. Data is available only for file systems running with General Purpose performance mode.

AWS.EFS.PermittedThroughput

PermittedThroughput. The maximum amount of throughput that a file system can drive.

AWS.EFS.StorageBytes

StorageBytes. The average size of the file system in bytes, including the amount of data stored in the EFS Standard and EFS Standard–Infrequent Access (EFS Standard-IA) storage classes.

AWS.EFS.TimeSinceLastSync

TimeSinceLastSync. The average amount of time that has passed since the last successful sync to the destination file system in a replication configuration.

AWS.EFS.TotalIOBytes Binary Bytes

TotalIOBytes. The total number of bytes for each file system operation, including data read, data write, and metadata operations. This is the actual amount that your application is driving, and not the throughput the file system is being metered at.

Elastic Beanstalk

Metric Units Description
AWS.ElasticBeanstalk.ApplicationLatencyP99.9 Count

P99.9. The average latency for the slowest x percent of requests over the last 10 seconds, where x is the difference between the number and 100. For example, p99 1.403 indicates the slowest 1% of requests over the last 10 seconds had an average latency of 1.403 seconds.

AWS.ElasticBeanstalk.ApplicationRequests2xx Count

Status 2xx. The average number of requests over the last 10 seconds that resulted in a status code greater than or equal to 200 but less than 300.

AWS.ElasticBeanstalk.ApplicationRequests3xx Count

Status 3xx. The average number of requests over the last 10 seconds that resulted in a status code greater than or equal to 300 but less than 400.

AWS.ElasticBeanstalk.ApplicationRequests4xx Count

Status 4xx. The average number of requests over the last 10 seconds that resulted in a status code greater than or equal to 400 but less than 500.

AWS.ElasticBeanstalk.ApplicationRequests5xx Count

Status 5xx. The average number of requests over the last 10 seconds that resulted in a status code greater than or equal to 500 but less than 600.

AWS.ElasticBeanstalk.ApplicationRequestsTotal Count

Request Count. The average number of requests handled by the web server per second over the last 10 seconds.

AWS.ElasticBeanstalk.EnvironmentHealth Count

The health status of the environment. The possible values are 0 (OK), 1 (Info), 5 (Unknown), 10 (No data), 15 (Warning), 20 (Degraded) and 25 (Severe).

ELB

Metric Units Description
AWS.ELB.BackendConnectionErrors

BackendConnectionErrors. The total number of connections that were not successfully established between the load balancer and the registered instances.

AWS.ELB.HTTPCode_ELB_4XX

HTTPCode_ELB_4XX. The total number of HTTP 4XX client error codes generated by the load balancer.

AWS.ELB.HTTPCode_ELB_5XX

HTTPCode_ELB_5XX. The total number of HTTP 5XX client error codes generated by the load balancer.

AWS.ELB.HealthyHostCount

healthyHostCount. The average number of healthy instances registered with your load balancer.

AWS.ELB.RequestCount

RequestCount. The total number of requests completed or connections made during the specified interval

AWS.ELB.SpilloverCount

SpilloverCount. The total number of requests that were rejected because the surge queue is full.

AWS.ELB.SurgeQueueLength

SurgeQueueLength. The total number of requests (HTTP listener) or connections (TCP listener) that are pending routing to a healthy instance.

AWS.ELB.UnHealthyHostCount

UnHealthyHostCount. The average number of unhealthy instances registered with your load balancer. An instance is considered unhealthy after it exceeds the unhealthy threshold configured for health checks.

Lambda

Metric Units Description
AWS.Lambda.ConcurrentExecutions Count

ConcurrentExecutions. The maximum number of function instances that are processing events.

AWS.Lambda.DeadLetterErrors Count

DeadLetterErrors. The total number of times that Lambda attempts to send an event to a dead-letter queue but fails. Dead-letter errors can occur due to permissions errors, misconfigured resources, or size limits.

AWS.Lambda.Duration milliseconds (ms)

Duration. The average amount of time that your function code spends processing an event.

AWS.Lambda.Errors Count

Errors. The total number of invocations that result in a function error.

AWS.Lambda.Invocations Count

Invocations. The total number of times that a function code is invoked, including successful invocations and invocations that result in a function error.

AWS.Lambda.IteratorAge milliseconds (ms)

IteratorAge. The maximum amount of time between when a stream receives the record and when the event source mapping sends the event to the function.

AWS.Lambda.Throttles Count

Throttles. The total number of invocation requests that are throttled. When all function instances are processing requests and no concurrency is available to scale up, Lambda rejects additional requests with a TooManyRequestsException error.

NAT Gateway

Metric Units Description
AWS.NATGateway.ActiveConnectionCount

ActiveConnectionCount. The maximum number of concurrent active TCP connections through the NAT gateway.

AWS.NATGateway.BytesInFromDestination

BytesInFromDestination. The total number of bytes received by the NAT gateway from the destination.

AWS.NATGateway.BytesInFromSource

BytesInFromSource. The total number of bytes received by the NAT gateway from clients in VPC.

AWS.NATGateway.BytesOutToDestination

BytesOutToDestination. The total number of bytes sent out through the NAT gateway to the destination.

AWS.NATGateway.BytesOutToSource

BytesOutToSource. The total number of bytes sent through the NAT gateway to the clients in VPC.

AWS.NATGateway.ConnectionAttemptCount

ConnectionAttemptCount. The total number of connection attempts made through the NAT gateway.

AWS.NATGateway.ConnectionEstablishedCount

ConnectionEstablishedCount. The total number of connections established through the NAT gateway.

AWS.NATGateway.ErrorPortAllocation

ErrorPortAllocation. The total number of times the NAT gateway could not allocate a source port.

AWS.NATGateway.IdleTimeoutCount

IdleTimeoutCount. The total number of connections that transitioned from the active state to the idle state.

AWS.NATGateway.PacketsDropCount

PacketsDropCount. The total number of packets dropped by the NAT gateway.

AWS.NATGateway.PacketsInFromDestination

PacketsInFromDestination. The total number of packets received by the NAT gateway from the destination.

AWS.NATGateway.PacketsInFromSource

PacketsInFromSource. The total number of packets received by the NAT gateway from clients in VPC.

AWS.NATGateway.PacketsOutToDestination

PacketsOutToDestination. The total number of packets sent out through the NAT gateway to the destination.

AWS.NATGateway.PacketsOutToSource

PacketsOutToSource. The total number of packets sent through the NAT gateway to the clients in VPC.

RDS

Metric Units Description
AWS.RDS.BinLogDiskUsage Binary Bytes

BinLogDiskUsage. The average amount of disk space occupied by binary logs.

AWS.RDS.BurstBalance Percent (%)

BurstBalance. The average percent of General Purpose SSD (gp2) burst-bucket I/O credits available.

AWS.RDS.CPUCreditBalance Count

CpuCreditBalance. The average number of earned CPU credits that an instance has accrued since it was launched or started.

AWS.RDS.CPUCreditUsage Count

CpuCreditUsage. The average number of CPU credits spent by the instance for CPU utilization.

AWS.RDS.CPUUtilization Percent (%)

CpuUtilization. The average percentage of CPU utilization.

AWS.RDS.DatabaseConnections Count

DatabaseConnections. The total number of client network connections to the database instance.

AWS.RDS.DiskQueueDepth Count

DiskQueueDepth. The average number of outstanding I/Os (read/write requests) waiting to access the disk.

AWS.RDS.FreeStorageSpace Binary Bytes

FreeStorageSpace. The average amount of available storage space.

AWS.RDS.FreeableMemory Binary Bytes

FreeableMemory. The average amount of available random access memory.

AWS.RDS.MaximumUsedTransactionIDs Count

MaximumUsedTransactionIDs. The maximum transaction IDs that have been used. This metric applies to PostgreSQL.

AWS.RDS.NetworkReceiveThroughput

NetworkReceiveThroughput. The average incoming (receive) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication.

AWS.RDS.NetworkTransmitThroughput

NetworkTransmitThroughput. The average outgoing (transmit) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication.

AWS.RDS.OldestReplicationSlotLag Megabytes

OldestReplicationSlotLag. The average lagging size of the replica lagging the most in terms of write-ahead log (WAL) data received. This metric applies to PostgreSQL.

AWS.RDS.ReadIOPS

ReadIOPS. The average number of disk read I/O operations per second.

AWS.RDS.ReadLatency seconds (s)

Readlatency. The average amount of time taken per disk I/O operation.

AWS.RDS.ReadThroughput

ReadThroughput. The average number of bytes read from disk per second.

AWS.RDS.ReplicaLag seconds (s)

ReplicaLag. For read replica configurations, the average amount of time a read replica DB instance lags behind the source DB instance.

AWS.RDS.ReplicationSlotDiskUsage Megabytes

ReplicationSlotDiskUsage. The average disk space used by replication slot files. This metric applies to PostgreSQL.

AWS.RDS.SwapUsage Binary Bytes

SwapUsage. The average amount of swap space used on the DB instance. This metric is not available for SQL Server.

AWS.RDS.TransactionLogsDiskUsage Megabytes

TransactionLogsDiskUsage. The average disk space used by transaction logs. This metric applies to PostgreSQL.

AWS.RDS.TransactionLogsGeneration

TransactionLogsGeneration. The average size of transaction logs generated per second. This metric applies to PostgreSQL.

AWS.RDS.WriteIOPS

WriteIOPS. The average number of disk write I/O operations per second.

AWS.RDS.WriteLatency seconds (s)

WriteLatency. The average amount of time taken per disk I/O operation.

AWS.RDS.WriteThroughput

WriteThroughput. The average number of bytes written to disk per second.

S3

Metric Units Description
AWS.S3.4xxErrors Count

4xxErrors. The number of HTTP 4xx client error status code requests made to an Amazon S3 bucket with a value of either 0 or 1.

AWS.S3.5xxErrors

5xxErrors. The number of HTTP 5xx server error status code requests made to an Amazon S3 bucket with a value of either 0 or 1.

AWS.S3.AllRequests Count

AllRequests. The total number of HTTP requests made to an Amazon S3 bucket, regardless of type.

AWS.S3.BucketSizeBytes Binary Bytes

BucketSizeBytes. The amount of data that is stored in a bucket, in bytes.

AWS.S3.BytesDownloaded Binary Bytes

BytesDownloaded. The number of bytes downloaded for requests made to an Amazon S3 bucket where the response includes a body.

AWS.S3.BytesUploaded Binary Bytes

BytesUploaded. The number of bytes uploaded for requests made to an Amazon S3 bucket where the request includes a body.

AWS.S3.DeleteRequests Count

The number of HTTP DELETE requests made for objects in a bucket.

AWS.S3.FirstByteLatency

FirstByteLatency. The per-request time from the complete request being received by an Amazon S3 bucket to when the response starts to be returned.

AWS.S3.GetRequests Count

GetRequests. The number of HTTP GET requests made for objects in an Amazon S3 bucket. This doesn't include list operations.

AWS.S3.HeadRequests Count

The number of HTTP HEAD requests made to a bucket.

AWS.S3.ListRequests Count

The number of HTTP requests that list the contents of a bucket.

AWS.S3.NumberOfObjects Count

NumberOfObjects. The total number of objects stored in a bucket for all storage classes. This value is calculated by counting all objects in the bucket (both current and noncurrent objects) and the total number of parts for all incomplete multipart uploads to the bucket.

AWS.S3.PostRequests Count

PostRequests. The number of HTTP POST requests made to an Amazon S3 bucket.

AWS.S3.PutRequests Count

PutRequests. The number of HTTP PUT requests made for objects in an Amazon S3 bucket.

AWS.S3.TotalRequestLatency

TotalRequestLatency. The elapsed per-request time from the first byte received to the last byte sent to an Amazon S3 bucket. This metric includes the time taken to receive the request body and send the response body, which is not included in FirstByteLatency.

SNS

Metric Units Description
AWS.SNS.NumberOfMessagesPublished Count

NumberOfMessagesPublished. The average number of messages published to Amazon SNS topics.

AWS.SNS.NumberOfNotificationsDelivered Count

NumberOfNotificationsDelivered. The average number of messages successfully delivered from Amazon SNS topics to subscribing endpoints.

AWS.SNS.NumberOfNotificationsFailed Count

NumberOfNotificationsFailed. The average number of messages that Amazon SNS failed to deliver.

AWS.SNS.NumberOfNotificationsFailedToRedriveToDlq

NumberOfNotificationsFailedToRedriveToDlq. The average number of messages that couldn't be moved to a dead-letter queue.

AWS.SNS.NumberOfNotificationsFilteredOut

NumberOfNotificationsFilteredOut. The average number of messages that were rejected by subscription filter policies. A filter policy rejects a message when the message attributes don't match the policy attributes.

AWS.SNS.NumberOfNotificationsFilteredOut-InvalidAttributes

NumberOfNotificationsFilteredOut-InvalidAttributes. The average number of messages that were rejected by subscription filter policies because the messages' attributes are invalid.

AWS.SNS.NumberOfNotificationsFilteredOut-NoMessageAttributes

NumberOfNotificationsFilteredOut-NoMessageAttributes. The average number of messages that were rejected by subscription filter policies because the messages have no attributes.

AWS.SNS.NumberOfNotificationsRedrivenToDlq

NumberOfNotificationsRedrivenToDlq. The average number of messages that have been moved to a dead-letter queue.

AWS.SNS.PublishSize Binary Bytes

PublishSize. The average size of messages published.

Transit Gateway

Metric Units Description
AWS.TransitGateway.BytesDropCountBlackhole

BytesDropCountBlackhole. The total number of bytes dropped because they matched a blackhole route.

AWS.TransitGateway.BytesDropCountNoRoute

BytesDropCountNoRoute. The total number of bytes dropped because they did not match a route.

AWS.TransitGateway.BytesIn

BytesIn. The total number of bytes received by the transit gateway.

AWS.TransitGateway.BytesOut

BytesOut. The total number of bytes sent from the transit gateway.

AWS.TransitGateway.PacketDropCountBlackhole

PacketDropCountBlackhole. The total number of packets dropped because they matched a blackhole route.

AWS.TransitGateway.PacketDropCountNoRoute

PacketDropCountNoRoute. The total number of packets dropped because they did not match a route.

AWS.TransitGateway.PacketsIn

PacketsIn. The total number of packets received by the transit gateway.

AWS.TransitGateway.PacketsOut

PacketsOut. The total number of packets sent by the transit gateway.

VPN

Metric Units Description
AWS.VPN.TunnelDataIn Binary Bytes

TunnelDataIn. The total bytes received on the AWS side of the connection through the VPN tunnel from a customer gateway.

AWS.VPN.TunnelDataOut Binary Bytes

TunnelDataOut. The total bytes sent from the AWS side of the connection through the VPN tunnel to the customer gateway. Each metric data point represents the number of bytes sent after the previous data point.

AWS.VPN.TunnelState Count

TunnelState. The average state of the tunnels. For static VPNs, 0 indicates DOWN and 1 indicates UP.

Infrastructure/Azure metrics

Metrics for Azure entities are collected by integrating SolarWinds Observability SaaS with your Azure cloud account. See Azure cloud platform monitoring.

App Service

Metric Description
azure.sites.app_connections

Average Connections

azure.sites.app_domains

Total App Domains. The average number of AppDomains loaded in this application.

azure.sites.app_domains.unloaded

Total App Domains Unloaded

azure.sites.collections.gen1

Gen 1 Garbage Collections

azure.sites.collections.gen2

Gen 2 Garbage Collections

azure.sites.cpu_time

CPU Time. The total amount of CPU consumed by the app, in seconds.

azure.sites.current_assemblies

Current Assemblies

azure.sites.function_executions

Function execution count

azure.sites.handles

Average Handle Count

azure.sites.http.101

Total Http 101 Requests

azure.sites.http.2xx

Http2xx. The total number of requests resulting in an HTTP status code greater than or equal to 200 but less than 300.

azure.sites.http.3xx

Total 3xx Requests

azure.sites.http.401

Total 410 Requests

azure.sites.http.403

Total 403 Requests

azure.sites.http.404

Total 404 Requests

azure.sites.http.406

Total 406 Requests

azure.sites.http.4xx

Http4xx. The total number of requests resulting in an HTTP status code greater than or equal to 400 but less than 500.

azure.sites.http.5xx

Http5xx. The total number of requests resulting in an HTTP status code greater than or equal to 500 but less than 600.

azure.sites.io.bytes_received

Bytes Received. The total amount of incoming bandwidth consumed by the app.

azure.sites.io.bytes_sent

Bytes Sent. The total amount of outgoing bandwidth consumed by the app.

azure.sites.io.other_bytes

IO Other Bytes Per Second

azure.sites.io.other_ops

IO Other Operations Per Second

azure.sites.io.read_bytes

IoReadBytesPerSecond. The number of bytes per second the app is reading from I/O operations.

azure.sites.io.read_ops

IO Read Operations Per Second.

azure.sites.io.write_bytes

IO Write Bytes Per Second. The number of bytes per second the app is writing to I/O operations.

azure.sites.io.write_ops

IO Write Operations Per Second

azure.sites.memory.working_set

Memory Working Set. The current amount of memory used by the app.

azure.sites.memory.working_set.avg

Average Memory Working Set. The average amount of memory used by the app, in megabytes.

azure.sites.private_bytes

Private Bytes

azure.sites.queued_requests

Requests In Application Queue. The average number of requests in the application request queue.

azure.sites.requests

Requests. The total number of requests regardless of their resulting HTTP status code.

azure.sites.response_time

Average Response Time. The average time taken for the app to serve requests, in seconds.

azure.sites.threads

Threads. The average number of threads currently active in the app process.

Blob Storage

Metric Description
azure.storage.blob.availability

Availability. The average percentage of availability for the storage service or the specified API operation. Availability is calculated by taking the total billable requests value and dividing it by the number of applicable requests.

azure.storage.blob.blobs

BlobCount. The average number of blob objects stored in the storage account.

azure.storage.blob.capacity

BlobCapacity. The average amount of blob storage used in the storage account.

azure.storage.blob.containers

ContainerCount. The average number of containers in the storage account.

azure.storage.blob.egress

Egress. The total amount of egress data. This number includes egress from an external client into Azure Storage as well as egress within Azure. As a result, this number does not reflect billable egress.

azure.storage.blob.index_capacity

IndexCapacity. The average amount of storage used by ADLS Gen2 Hierarchical Index.

azure.storage.blob.ingress

Ingress. The total amount of ingress data. This number includes ingress from an external client into Azure Storage as well as ingress within Azure.

azure.storage.blob.success.e2e_latency

SuccessE2ELatency. The average end-to-end latency of successful requests made to a storage service or the specified API operation. This value includes the required processing time within Azure Storage to read the request, send the response, and receive acknowledgment of the response.

azure.storage.blob.success.server_latency

SuccessServerLatency. The average time used to process a successful request by Azure Storage.

azure.storage.blob.transactions

Transactions. The total number of requests made to a storage service or the specified API operation. This number includes successful and failed requests, as well as requests that produced errors.

CDN

Metric Description
azure.cdn.byte_hit_ratio

ByteHitRatio. Of the total number of response bytes, the percentage that were served from the CDN cache.

azure.cdn.origin_health_percentage

OriginHealthPercentage. The percentage of successful health probes sent to backends.

azure.cdn.origin_latency

OriginLatency. The average time from when the request was sent to the backend to when the last response byte was received.

azure.cdn.origin_request_count

OriginRequestCount. The total number of requests sent to origin.

azure.cdn.percentage_4XX

Percentage4XX. The average percentage of requests with a status code greater than or equal to 400 but less than 500.

azure.cdn.percentage_5XX

Percentage5XX. The average percentage of requests with a status code greater than or equal to 500 but less than 600.

azure.cdn.request_count

RequestCount. The total number of client requests served by CDN.

azure.cdn.request_size

RequestSize. The total number of bytes sent as requests from clients.

azure.cdn.response_size

ResponseSize. The total number of bytes sent as responses from CDN edge to clients.

azure.cdn.total_latency

TotalLatency. The average time from the client request being received by CDN until the last response byte is sent from CDN to the client.

azure.cdn.web_application_firewall_request_count

WebApplicationFirewallRequestCount. The total number of matched WAF requests.

Cosmos DB

Metric Description
azure.cosmos.autoscale_max_throughput

AutoscaleMaxThroughput. The maximum throughput the autoscale will scale to.

azure.cosmos.available_storage

AvailableStorage. The total amount of available storage reported at 5-minute granularity per region.

azure.cosmos.cassandra.connection.avg_replication_latency

CassandraConnectorAvgReplicationLatency. The average replication latency of the Cassandra Connector.

azure.cosmos.cassandra.connection.replication_health_status

CassandraConnectorReplicationHealthStatus. The replication health status of the Cassandra Connector.

azure.cosmos.cassandra.connection_closures

CassandraConnectionClosures. The total number of Cassandra Connections closed.

azure.cosmos.cassandra.request_charges

CassandraRequestCharges. The total number of request units consumed by the API for Cassandra.

azure.cosmos.cassandra.requests

CassandraRequests. The total number of Cassandra API requests made.

azure.cosmos.data.usage

DataUsage. The total data usage reported at 5-minute granularity per region.

azure.cosmos.document.count

DocumentCount. The total document count reported at 5-minute granularity per region.

azure.cosmos.document.quota

DocumentQuota. The total storage quota reported at 5-minute granularity per region.

azure.cosmos.gremlin.request_charge

GremlinRequestCharges. The total number of request units consumed by Gremlin queries.

azure.cosmos.gremlin.requests

GremlinRequests. The total number of requests made by Gremlin queries.

azure.cosmos.index_usage

IndexUsage. The total Index usage reported at 5-minute granularity per region.

azure.cosmos.mongo.request_charge

MongoRequestCharge. The total number of Mongo request units consumed.

azure.cosmos.mongo.requests

MongoRequests. The total number of Mongo requests made.

azure.cosmos.normalized_ru_consumption

NormalizedRUConsumption. The maximum request unit consumption percentage per minute.

azure.cosmos.provisioned_throughput

ProvisionedThroughput. The maximum provisioned throughput at container granularity.

azure.cosmos.replication_latency.p99

ReplicationLatency. The average replication latency across the source and target regions for a geo-enabled account.

azure.cosmos.requests.metadata

MetadataRequests. The total number of metadata requests.

azure.cosmos.requests.total

TotalRequests. The total number of requests made.

azure.cosmos.requests.total_units

TotalRequestUnits. The total number of request units consumed.

azure.cosmos.server_side_latency

ServerSideLatency. The average amount of time taken by the server to process a request.

azure.cosmos.service_availability

ServiceAvailability. The average account request availability at one-hour granularity.

Event Hubs

Metric Description
azure.eventhubs.namespaces.active_connections

ActiveConnections. The maximum number of active connections on a namespace and on an entity (event hub) in the namespace.

azure.eventhubs.namespaces.captured_bytes

CapturedBytes. The total number of captured bytes for an event hub.

azure.eventhubs.namespaces.captured_messages

CapturedMessages. The total number of captured messages for an event hub.

azure.eventhubs.namespaces.connections_closed

ConnectionsClosed. The total number of closed connections.

azure.eventhubs.namespaces.connections_opened

ConnectionsOpened. The total number of open connections.

azure.eventhubs.namespaces.incoming_bytes

IncomingBytes. The number of incoming bytes for an event hub during the specified period.

azure.eventhubs.namespaces.incoming_messages

IncomingMessages. The total number of events or messages sent to Event Hubs over a specified period.

azure.eventhubs.namespaces.incoming_requests

IncomingRequests. The total number of requests made to the Event Hubs service over a specified period. This metric includes all the data and management plane operations.

azure.eventhubs.namespaces.namespace_cpu_usage

NamespaceCpuUsage. The maximum namespace CPU usage.

azure.eventhubs.namespaces.namespace_memory_usage

NamespaceMemoryUsage. The maximum namespace memory usage.

azure.eventhubs.namespaces.outgoing_bytes

OutgoingBytes. The number of outgoing bytes for an event hub during the specified period.

azure.eventhubs.namespaces.outgoing_messages

OutgoingMessages. The total number of events or messages received from Event Hubs over a specified period.

azure.eventhubs.namespaces.quota_exceeded_errors

QuotaExceededErrors. The total number of errors caused by exceeding quotas over a specified period.

azure.eventhubs.namespaces.server_errors

ServerErrors. The total number of requests not processed because of an error in the Event Hubs service over a specified period.

azure.eventhubs.namespaces.size

Size. The average size of an event hub.

azure.eventhubs.namespaces.successful_requests

SuccessfulRequests. The total number of successful requests made to the Event Hubs service over a specified period.

azure.eventhubs.namespaces.throttled_requests

ThrottledRequests. The total number of requests that were throttled because the usage was exceeded.

azure.eventhubs.namespaces.user_errors

UserErrors. The total number of requests not processed because of user errors over a specified period.

Files

Metric Description
azure.storage.files.availability

Availability. The average percentage of availability for the storage service or the specified API operation. Availability is calculated by taking the total billable requests value and dividing it by the number of applicable requests, including those requests that produced unexpected errors.

azure.storage.files.egress

Egress. The total amount of egress data. This number includes egress from an external client into Azure Storage as well as egress within Azure.

azure.storage.files.file_capacity

FileCapacity. The average amount of file storage used by the storage account.

azure.storage.files.file_count

FileCount. The average number of files in the storage account.

azure.storage.files.fileshare_count

FileShareCount. The average number of file shares in the storage account.

azure.storage.files.fileshare_quota

FileShareQuota. The average upper limit on the amount of storage that can be used by Azure Files service in bytes.

azure.storage.files.fileshare_snapshotcount

FileShareSnapshotCount. The average number of snapshots present on the share in the storage account's Azure Files service.

azure.storage.files.fileshare_snapshotsize

FileShareSnapshotSize. The average amount of storage used by the snapshots in the storage account's Azure Files service.

azure.storage.files.ingress

Ingress. The total amount of ingress data. This number includes ingress from an external client into Azure Storage as well as ingress within Azure.

azure.storage.files.success.e2e_latency

SuccessE2ELatency. The average end-to-end latency of successful requests made to a storage service or the specified API operation. This value includes the required processing time within Azure Storage to read the request, send the response, and receive acknowledgment of the response.

azure.storage.files.success.server_latency

SuccessServerLatency. The average time used to process a successful request by Azure Storage. This value does not include the network latency specified in SuccessE2ELatency.

azure.storage.files.transactions

Transactions. The total number of requests made to a storage service or the specified API operation. This number includes successful and failed requests, as well as requests that produced errors.

Front Door

Metric Description
azure.frontdoor.backend_health_percentage

BackendHealthPercentage. The average percentage of successful health probes from AFD to origin.

azure.frontdoor.backend_request_count

BackendRequestCount. The total number of requests sent from AFD to origin.

azure.frontdoor.backend_request_latency

BackendRequestLatency. The average time calculated from when the request was sent by AFD edge to the backend until AFD received the last response byte from the backend.

azure.frontdoor.billable_response_size

BillableResponseSize. The total number of billable bytes (minimum 2KB per request) sent as responses from HTTP/S proxy to clients.

azure.frontdoor.request_count

RequestCount. The total number of client requests served by CDN.

azure.frontdoor.request_size

RequestSize. The total number of bytes sent as requests from clients to AFD.

azure.frontdoor.response_size

ResponseSize. The total number of bytes sent as responses from Front Door to clients.

azure.frontdoor.total_latency

TotalLatency. The average time from the client request being received by CDN until the last response byte is sent from CDN to the client.

azure.frontdoor.web_application_firewall_request_count

WebApplicationFirewallRequestCount. The total number of matched WAF requests.

Functions

Metric Description
azure.sites.app_domains

Total App Domains. The average number of app domains loaded in the application.

azure.sites.app_domains.unloaded

Total App Domains Unloaded. The average number of application domains unloaded.

azure.sites.collections.gen1

Gen 1 Garbage Collections

azure.sites.collections.gen2

Gen 2 Garbage Collections

azure.sites.current_assemblies

Current Assemblies

azure.sites.function_executions

Function Execution Count. The total number of times a function app has executed. This value correlates to the number of times a function runs in an app.

azure.sites.function_executions.unit

Function Execution Units. The number of function execution units.

azure.sites.http.5xx

HTTP 5xx. The total number of requests with a status code greater than or equal to 500 but less than 600.

azure.sites.io.bytes_received

Bytes Received. The number of incoming data bytes.

azure.sites.io.bytes_sent

Bytes Sent. The number of outgoing data bytes.

azure.sites.io.other_bytes

IO Other Bytes Per Second

azure.sites.io.other_ops

IO Other Operations Per Second

azure.sites.io.read_bytes

IO Read Bytes Per Second. The number of bytes per second the app is reading from I/O operations.

azure.sites.io.read_ops

IO Read Operations Per Second. The number of read I/O operations per second the app is issuing.

azure.sites.io.write_bytes

IO Write Bytes Per Second. The number of bytes per second the app is writing to I/O operations.

azure.sites.io.write_ops

IO Write Operations Per Second. The number of write I/O operations per second the app is issuing.

azure.sites.memory.working_set

Memory Working Set. The average amount of memory used by the app.

azure.sites.memory.working_set.avg

Average Memory Working Set. The average amount of memory used by the app.

azure.sites.private_bytes

Private Bytes. The average number of private bytes allocated to the app.

azure.sites.queued_requests

Requests In Application Queue. The average number of requests in the application queue.

azure.sites.requests

Requests. The total number of requests.

azure.sites.response_time

Average Response Time. The average time taken for the app to serve requests.

Key Vault

Metric Description
azure.key_vault.service_api.hit

Service API Hit. The total number of service API hits.

azure.key_vault.service_api.latency

Service API Latency. The average latency of service API requests.

azure.key_vault.service_api.result

Service API Result. The total number of service API results.

Service Bus

Metric Description
azure.servicebus.namespaces.abandon_message

AbandonMessage. The total number of messages abandoned over a specified period.

azure.servicebus.namespaces.active_connections

ActiveConnections. The total number of active connections on a namespace and on an entity in the namespace. The value for this metric is a point-in-time value. Connections that were active immediately after that point in time may not be reflected in the metric.

azure.servicebus.namespaces.active_messages

ActiveMessages. The average number of active messages in a queue/topic.

azure.servicebus.namespaces.complete_message

CompleteMessage. The total number of messages completed over a specified period.

azure.servicebus.namespaces.connections_closed

ConnectionsClosed. The average number of connections closed. The value for this metric is an aggregation and includes all connections that were opened in the aggregation time window.

azure.servicebus.namespaces.connections_opened

ConnectionsOpened. The average number of connections opened. The value for this metric is an aggregation and includes all connections that were opened in the aggregation time window.

azure.servicebus.namespaces.deadlettered_messages

DeadletteredMessages. The average number of dead-lettered messages in a queue/topic.

azure.servicebus.namespaces.incoming_messages

IncomingMessages. The total number of events or messages sent to Service Bus over a specified period. For basic and standard tiers, incoming auto-forwarded messages are included in this metric. For the premium tier, they aren't included.

azure.servicebus.namespaces.incoming_requests

IncomingRequests. The total number of requests made to the Service Bus service over a specified period.

azure.servicebus.namespaces.messages

Messages. The average number of messages in a queue/topic.

azure.servicebus.namespaces.outgoing_messages

OutgoingMessages. The total number of events or messages received from Service Bus over a specified period. The outgoing auto-forwarded messages aren't included in this metric.

azure.servicebus.namespaces.pending_checkpoint_operation_count

PendingCheckpointOperationCount. The average number of pending checkpoint operations on the namespace. Service starts to throttle when the pending checkpoint count exceeds limit of (500,000 + (500,000 * messaging units)) operations. This metric applies only to namespaces using the premium tier.

azure.servicebus.namespaces.scheduled_messages

ScheduledMessages. The average number of scheduled messages in a queue/topic.

azure.servicebus.namespaces.server_errors

ServerErrors. The total number of requests not processed because of an error in the Service Bus service over a specified period.

azure.servicebus.namespaces.server_send_latency

ServerSendLatency. The average time taken by the Service Bus service to complete the request.

azure.servicebus.namespaces.size

Size. The average size of an entity (queue or topic) in bytes.

azure.servicebus.namespaces.successful_requests

SuccessfulRequests. The total number of successful requests made to the Service Bus service over a specified period.

azure.servicebus.namespaces.throttled_requests

ThrottledRequests. The total number of requests that were throttled because the usage was exceeded.

azure.servicebus.namespaces.user_errors

UserErrors. The total number of requests not processed because of user errors over a specified period.

SQL Database

Metric Description
azure.sql.servers.databases.connection_failed

Failed Connections. The total number of connections that failed.

azure.sql.servers.databases.connection_successful

Successful Connections. The total number of successful connections.

azure.sql.servers.databases.cpu_percent

CPU Utilization. The average percentage of CPU used.

azure.sql.servers.databases.deadlock

Deadlocks. The total number of deadlocks.

azure.sql.servers.databases.log_write_percent

Log Write Percentage. The average log I/O percentage based on the limit of the service tier.

azure.sql.servers.databases.physical_data_read_percent

Data IO Percentage. The average data I/O percentage based on the limit of the service tier.

azure.sql.servers.databases.sessions_percent

Sessions Percentage. The average percentage of concurrent sessions based on the limit of the service tier.

azure.sql.servers.databases.storage

Data Space Used. The total amount of space used to store data.

azure.sql.servers.databases.storage_percent

Storage Utilization. The average percentage of spaced used to store data based on the limit of the service tier.

Virtual Machines

Metric Description
azure.vm.cpu.credits_consumed

Total number of credits consumed by the Virtual Machine

azure.vm.cpu.credits_remaining

Total number of credits available to burst

azure.vm.cpu.percentage

The percentage of allocated compute units that are currently in use by the Virtual Machine(s)

azure.vm.disk.read_bytes

Bytes read from disk during monitoring period

azure.vm.disk.read_ops

Disk Read IOPS

azure.vm.disk.write_bytes

Bytes written to disk during monitoring period

azure.vm.disk.write_ops

Disk Write IOPS

azure.vm.network.in

The number of billable bytes received on all network interfaces by the Virtual Machine(s) (Incoming Traffic)

Virtual Machine Scale Sets

Metric Description
azure.vmss.cpu.percentage

Percentage CPU. The percentage of allocated compute units that are currently in use by the VM(s).

azure.vmss.disk.data.read_bytes

Data Disk Read. The average number of bytes per second read from a single disk during the monitoring period.

azure.vmss.disk.data.write_bytes

Data Disk Write. The average number of bytes per second written to a single disk during the monitoring period.

azure.vmss.disk.read_bytes

Disk Read. The total number of bytes read from disk during the monitoring period.

azure.vmss.disk.read_ops

Disk Read Operations. The average number of input operations read in a second from all disks attached to the VM(s).

azure.vmss.disk.write_bytes

Disk Write. The total number of bytes written to disk during the monitoring period.

azure.vmss.disk.write_ops

Disk Write Operations. The average number of output operations written in a second to all disks attached to the VM(s).

azure.vmss.memory.available_bytes

Available Memory Bytes. The amount of physical memory, in bytes, immediately available for allocation to a process or for system use in the VM(s).

azure.vmss.network.total_in

Network In Total. The number of bytes received on all network interfaces by the VM(s) (incoming traffic).

azure.vmss.network.total_out

Network Out Total. The number of bytes out on all network interfaces by the VM(s) (outgoing traffic).

Infrastructure/Kubernetes metrics

Metrics for Kubernetes entities are collected by installing the SWO K8s Collector on a Kubernetes cluster that has Prometheus installed. See Kubernetes monitoring.

Cluster metrics

Metric Unit Description
k8s.cluster.cpu.allocatable core

The allocatable of CPU on cluster that are available for scheduling.

Metric type: Gauge.

k8s.cluster.cpu.capacity core

The cluster CPU capacity.

Metric type: Gauge.

k8s.cluster.cpu.utilization Percent (%)

The cluster CPU usage.

Metric type: Gauge.

k8s.cluster.memory.allocatable Binary Bytes

The allocatable of memory on cluster that are available for scheduling.

Metric type: Gauge.

k8s.cluster.memory.capacity Binary Bytes

The cluster memory capacity.

Metric type: Gauge.

k8s.cluster.memory.utilization Percent (%)

The cluster memory usage.

Metric type: Gauge.

k8s.cluster.nodes Count

The number of nodes on cluster.

Metric type: Gauge.

k8s.cluster.nodes.ready Count

The number of nodes with status condition ready.

Metric type: Gauge.

k8s.cluster.nodes.ready.avg Percent (%)

The percentage of nodes with status condition ready.

Metric type: Gauge.

k8s.cluster.pods Count

The number of pods on a cluster.

Metric type: Gauge.

k8s.cluster.pods.running Count

The number of pods in running phase.

Metric type: Gauge.

k8s.cluster.spec.cpu.requests cores

The total number of requested CPU by all containers in a cluster.

Metric type: Gauge.

k8s.cluster.spec.memory.requests Binary Bytes

The total number of requested memory by all containers in a cluster.

Metric type: Gauge.

Node metrics

Metric Unit Description
k8s.kube_node_created seconds (s)

Unix creation timestamp.

Metric type: Gauge.

k8s.kube_node_info  

Information about a cluster node.

Metric type: Gauge.

k8s.kube_node_spec_unschedulable  

Whether a node can schedule new pods.

Metric type: Gauge.

k8s.kube_node_status_allocatable

cpu=<core>

ephemeral_storage=<byte>

pods=<integer>

attachable_volumes_*=<byte>

hugepages_*=<byte>

memory=<byte>

The amount of resources allocatable for pods (after reserving some for system daemons).

Metric type: Gauge.

k8s.kube_node_status_capacity

cpu=<core>

ephemeral_storage=<byte>

pods=<integer>

attachable_volumes_*=<byte>

hugepages_*=<byte>

memory=<byte>

The total amount of resources available for a node.

Metric type: Gauge.

k8s.kube_node_status_condition  

The condition of a cluster node.

Metric type: Gauge.

k8s.kube_node_status_ready  

Node status (as tag sw.k8s.node.status).

Metric type: Gauge.

k8s.node.cpu.allocatable core

CPU Utilization. The allocatable of CPU on node that are available for scheduling.

Metric type: Gauge.

k8s.node.cpu.capacity core

CPU Utilization. The node CPU capacity.

Metric type: Gauge.

k8s.node.cpu.usage.seconds.rate core

CPU Utilization. The rate of node cumulative CPU time consumed.

Metric type: Gauge.

k8s.node.fs.iops  

Disk IOPS. Rate of reads and writes of all pods on node.

Metric type: Gauge.

k8s.node.fs.throughput  

Disk throughput. Rate of bytes read and written of all pods on node.

Metric type: Gauge.

k8s.node.fs.usage Binary Bytes

Disk Usage. Number of bytes that are consumed by containers on this node’s filesystem.

Metric type: Gauge.

k8s.node.memory.allocatable Binary Bytes

Memory Utilization. The allocatable of memory on node that are available for scheduling.

Metric type: Gauge.

k8s.node.memory.capacity Binary Bytes

Memory Utilization. The node memory capacity.

Metric type: Gauge.

k8s.node.memory.working_set Binary Bytes

Memory utilization. Current working set on node.

Metric type: Gauge.

k8s.node.network.bytes_received  

Network In. Rate of bytes received of all pods on node.

Metric type: Gauge.

k8s.node.network.bytes_transmitted  

Network Out. Rate of bytes transmitted of all pods on node.

Metric type: Gauge.

k8s.node.network.packets_received  

Rate of packets received of all pods on node.

Metric type: Gauge.

k8s.node.network.packets_transmitted  

Rate of packets transmitted of all pods on node.

Metric type: Gauge.

k8s.node.network.receive_packets_dropped  

Rate of packets dropped while receiving of all pods on node.

Metric type: Gauge.

k8s.node.network.transmit_packets_dropped  

Rate of packets dropped while transmitting of all pods on node.

Metric type: Gauge.

k8s.node.pods Count

Number of pods. The number of pods on a node.

Metric type: Gauge.

k8s.node.status.condition.diskpressure  

The condition diskpressure of a cluster node (1 when true, 0 when false or unknown).

Metric type: Gauge.

k8s.node.status.condition.memorypressure  

The condition memorypressure of a cluster node (1 when true, 0 when false or unknown).

Metric type: Gauge.

k8s.node.status.condition.networkunavailable  

The condition networkunavailable of a cluster node (1 when true, 0 when false or unknown).

Metric type: Gauge.

k8s.node.status.condition.pidpressure  

The condition pidpressure of a cluster node (1 when true, 0 when false or unknown).

Metric type: Gauge.

k8s.node.status.condition.ready  

The condition ready of a cluster node (1 when true, 0 when false or unknown).

Metric type: Gauge.

Pod metrics

Metric Unit Description
k8s.kube.pod.owner.daemonset  

Information about the DaemonSet owning the pod.

Metric type: Gauge.

k8s.kube.pod.owner.replicaset  

Information about the ReplicaSet owning the pod.

Metric type: Gauge.

k8s.kube.pod.owner.statefulset  

Information about the StatefulSet owning the pod.

Metric type: Gauge.

k8s.kube_pod_completion_time seconds (s)

Completion time in unix timestamp for a pod.

Metric type: Gauge.

k8s.kube_pod_created seconds (s)

Unix creation timestamp.

Metric type: Gauge.

k8s.kube_pod_info  

Information about the pod.

Metric type: Gauge.

k8s.kube_pod_owner  

Information about the pod owner.

Metric type: Gauge.

k8s.kube_pod_start_time seconds (s)

Start time in unix timestamp for a pod.

Metric type: Gauge.

k8s.kube_pod_status_phase  

The pod's current phase.

Metric type: Gauge.

k8s.kube_pod_status_ready  

Describes whether the pod is ready to serve requests.

Metric type: Gauge.

k8s.kube_pod_status_reason  

The pod status reasons.

Metric type: Gauge.

k8s.pod.containers Count

The number of containers on pod.

Metric type: Gauge.

k8s.pod.containers.running  

Current number of running containers on pod.

Metric type: Gauge.

k8s.pod.cpu.usage.seconds.rate seconds (s)

CPU Utilization. The rate of pod's cumulative CPU time consumed.

Metric type: Gauge.

k8s.pod.fs.iops  

Disk IOPS. Rate of reads and writes of all containers on pod.

Metric type: Gauge.

k8s.pod.fs.reads.bytes.rate  

Rate of bytes read of all containers on pod.

Metric type: Gauge.

k8s.pod.fs.reads.rate  

Rate of reads of all containers on pod.

Metric type: Gauge.

k8s.pod.fs.throughput  

Disk Throughput. Rate of bytes read and written of all containers on pod.

Metric type: Gauge.

k8s.pod.fs.usage.bytes Binary Bytes

Disk Usage. Number of bytes that are consumed by containers on this pod's filesystem.

Metric type: Gauge.

k8s.pod.fs.writes.bytes.rate  

Rate of bytes written of all containers on pod.

Metric type: Gauge.

k8s.pod.fs.writes.rate  

Rate of writes of all containers on pod.

Metric type: Gauge.

k8s.pod.memory.working_set Binary Bytes

Memory Utilization. Current working set on pod.

Metric type: Gauge.

k8s.pod.network.bytes_received  

Network In. Rate of bytes received of all containers on pod.

Metric type: Gauge.

k8s.pod.network.bytes_transmitted  

Network Out. Rate of bytes transmitted of all containers on pod.

Metric type: Gauge.

k8s.pod.network.packets_received  

Rate of packets received of all containers on pod.

Metric type: Gauge.

k8s.pod.network.packets_transmitted  

Rate of packets transmitted of all containers on pod.

Metric type: Gauge.

k8s.pod.network.receive_packets_dropped  

Rate of packets dropped while receiving of all containers on pod.

Metric type: Gauge.

k8s.pod.network.transmit_packets_dropped  

Rate of packets dropped while transmitting of all containers on pod.

Metric type: Gauge.

k8s.pod.spec.cpu.limit cores

CPU quota of all containers on pod in given CPU period.

Metric type: Gauge.

k8s.pod.spec.cpu.requests cores

The number of requested request resource by all containers on pod.

Metric type: Gauge.

k8s.pod.spec.memory.limit Binary Bytes

Memory Utilization. Memory limit for all containers on pod.

Metric type: Gauge.

k8s.pod.spec.memory.requests Binary Bytes

The number of requested memory by all containers on pod.

Metric type: Gauge.

k8s.pod.status.reason  

The current pod status reason.

Metric type: Gauge.

Container metrics

Metric Unit Description
k8s.container.spec.cpu.requests core

The number of requested CPU by a container.

Metric type: Gauge.

k8s.container.spec.memory.requests Binary Bytes

The number of requested memory by a container.

Metric type: Gauge.

k8s.container.status  

Describes the status of the container (waiting, running, or terminated).

Metric type: Gauge.

k8s.container_cpu_cfs_periods_total  

Number of elapsed enforcement period intervals.

Metric type: Counter.

k8s.container_cpu_cfs_throttled_periods_total  

Number of throttled period intervals.

Metric type: Counter.

k8s.container_cpu_usage_seconds_total seconds (s)

Cumulative CPU time consumed.

Metric type: Counter

k8s.container_fs_reads_bytes_total Binary Bytes

Cumulative count of bytes read.

Metric type: Counter.

k8s.container_fs_reads_total Count

Cumulative count of reads completed.

Metric type: Counter.

k8s.container_fs_usage_bytes Binary Bytes

Number of bytes that are consumed by the container on this filesystem.

Metric type: Gauge.

k8s.container_fs_writes_bytes_total Binary Bytes

Cumulative count of bytes written.

Metric type: Counter.

k8s.container_fs_writes_total Count

Cumulative count of writes completed.

Metric type: Counter.

k8s.container_memory_working_set_bytes Binary Bytes

Current working set.

Metric type: Gauge.

k8s.container_network_receive_bytes_total Binary Bytes

Cumulative count of bytes received.

Metric type: Counter.

k8s.container_network_receive_packets_dropped_total Count

Cumulative count of packets dropped while receiving.

Metric type: Counter.

k8s.container_network_receive_packets_total Count

Cumulative count of packets received.

Metric type: Counter.

k8s.container_network_transmit_bytes_total Binary Bytes

Cumulative count of bytes transmitted.

Metric type: Counter.

k8s.container_network_transmit_packets_dropped_total Count

Cumulative count of packets dropped while transmitting.

Metric type: Counter.

k8s.container_network_transmit_packets_total Count

Cumulative count of packets transmitted.

Metric type: Counter.

k8s.container_spec_cpu_period  

CPU period of the container.

Metric type: Gauge.

k8s.container_spec_cpu_quota  

CPU quota of the container.

Metric type: Gauge.

k8s.container_spec_memory_limit_bytes Binary Bytes

Memory limit for the container.

Metric type: Gauge.

k8s.kube_pod_container_info  

Information about a container in a pod.

Metric type: Gauge.

k8s.kube_pod_container_resource_limits cpu=<core>

memory=<bytes>

The number of requested limit resource by a container.

Metric type: Gauge.

k8s.kube_pod_container_resource_requests

cpu=<core>

memory=<bytes>

The number of requested request resource by a container.

Metric type: Gauge.

k8s.kube_pod_container_state_started seconds (s)

Start time in unix timestamp for a pod container.

Metric type: Gauge.

k8s.kube_pod_container_status_last_terminated_exitcode  

Describes the exit code for the last container in terminated state.

Metric type: Gauge.

k8s.kube_pod_container_status_last_terminated_reason  

Describes the last reason the container was in terminated state.

Metric type: Gauge.

k8s.kube_pod_container_status_ready  

Describes whether the containers readiness check succeeded.

Metric type: Gauge.

k8s.kube_pod_container_status_restarts_total  

The number of container restarts per container.

Metric type: Counter.

k8s.kube_pod_container_status_running  

Describes whether the container is currently in running state.

Metric type: Gauge.

k8s.kube_pod_container_status_terminated  

Describes whether the container is currently in terminated state.

Metric type: Gauge.

k8s.kube_pod_container_status_terminated_reason  

Describes the reason the container is currently in terminated state.

Metric type: Gauge.

k8s.kube_pod_container_status_waiting  

Describes whether the container is currently in waiting state.

Metric type: Gauge.

k8s.kube_pod_container_status_waiting_reason  

Describes the reason the container is currently in waiting state.

Metric type: Gauge.

Deployment metrics

Metric Unit Description
k8s.deployment.condition.available  

Describes whether the deployment has an Available status condition.

Metric type: Gauge.

k8s.deployment.condition.progressing  

Describes whether the deployment has a Progressing status condition.

Metric type: Gauge.

k8s.deployment.condition.replicafailure  

Describes whether the deployment has a ReplicaFailure status condition.

Metric type: Gauge.

k8s.kube_deployment_created seconds (s)

Unix creation timestamp.

Metric type: Gauge.

k8s.kube_deployment_labels  

Kubernetes labels converted to Prometheus labels.

Metric type: Gauge.

k8s.kube_deployment_spec_paused  

Whether the deployment is paused and will not be processed by the deployment controller.

Metric type: Gauge.

k8s.kube_deployment_spec_replicas  

Number of desired pods for a deployment.

Metric type: Gauge.

k8s.kube_deployment_status_condition  

The current status conditions of a deployment.

Metric type: Gauge.

k8s.kube_deployment_status_replicas  

The number of replicas per deployment.

Metric type: Gauge.

k8s.kube_deployment_status_replicas_available  

The number of available replicas per deployment.

Metric type: Gauge.

k8s.kube_deployment_status_replicas_ready  

The number of ready replicas per deployment.

Metric type: Gauge.

k8s.kube_deployment_status_replicas_unavailable  

The number of unavailable replicas per deployment.

Metric type: Gauge.

k8s.kube_deployment_status_replicas_updated  

The number of updated replicas per deployment.

Metric type: Gauge.

StatefulSet metrics

Metric Unit Description
k8s.kube_statefulset_created seconds (s)

Unix creation timestamp.

 

k8s.kube_statefulset_labels  

Kubernetes labels converted to Prometheus labels.

Metric type: Gauge.

k8s.kube_statefulset_replicas  

Number of desired pods for a StatefulSet.

Metric type: Gauge.

k8s.kube_statefulset_status_replicas_current  

The number of current replicas per StatefulSet.

Metric type: Gauge.

k8s.kube_statefulset_status_replicas_ready  

The number of ready replicas per StatefulSet.

Metric type: Gauge.

k8s.kube_statefulset_status_replicas_updated  

The number of updated replicas per StatefulSet.

Metric type: Gauge.

DaemonSet metrics

Metric Unit Description
k8s.kube_daemonset_created seconds (s)

Unix creation timestamp.

Metric type: Gauge.

k8s.kube_daemonset_labels  

Kubernetes labels converted to Prometheus labels.

Metric type: Gauge.

k8s.kube_daemonset_status_current_number_scheduled  

The number of nodes that should be running a daemon pod and have at least one daemon pod running.

Metric type: Gauge.

k8s.kube_daemonset_status_desired_number_scheduled  

The number of nodes that should be running the daemon pod.

Metric type: Gauge.

k8s.kube_daemonset_status_number_available  

The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and available.

Metric type: Gauge.

k8s.kube_daemonset_status_number_misscheduled  

The number of nodes that should not be running a daemon pod and have one or more running anyway.

Metric type: Gauge.

k8s.kube_daemonset_status_number_ready  

The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and ready.

Metric type: Gauge.

k8s.kube_daemonset_status_number_unavailable  

The number of nodes that should be running the daemon pod and have none of the daemon pod running and available.

Metric type: Gauge.

k8s.kube_daemonset_status_updated_number_scheduled  

The total number of nodes that are running updated daemon pod.

Metric type: Gauge.

ReplicaSet metrics

Metric Unit Description
k8s.kube.replicaset.owner.deployment  

Information about the Deployment owning the ReplicaSet.

Metric type: Gauge.

k8s.kube_replicaset_created seconds (s)

Unix creation timestamp.

Metric type: Gauge.

k8s.kube_replicaset_owner  

Information about the ReplicaSet's owner.

Metric type: Gauge.

Namespace metrics

Metric Unit Description
k8s.kube_namespace_created seconds (s)

Unix creation timestamp.

Metric type: Gauge.

k8s.kube_namespace_status_phase  

Kubernetes namespace status phase.

Metric type: Gauge.

k8s.kube_resourcequota

ResourceQuota metric.

Metric type: Gauge.

Other metrics

Metric Unit Description
k8s.apiserver.request.successrate Percent (%)

Success rate of Kubernetes API server calls.

Metric type: Gauge.

Network metrics

Metrics for network device entities are sent by an installed Network Collector. See Network monitoring.

Standard metrics

Network device metrics

Metric Units Description
sw.collector.CPULoad.AvgLoad Percent (%)

Average CPU Utilization. Average CPU utilization of a network device instance or instances. Displayed as a percentage.

sw.collector.CPULoad.AvgPercentMemoryUsed Percent (%)

Average Memory Utilization. Average memory usages of the network device as a percentage.

sw.collector.Nodes.DisplayName [name]

Display name polled from the device to be used in custom widgets for filtering, sorting, or grouping data.

sw.collector.ResponseTime.Availability Percent (%)

Availability. Availability of the network device instance of instances. Displayed as a percentage.

May be displayed as:

  • Average Availability. An average availability of network devices.
sw.collector.ResponseTime.AvgResponseTime milliseconds (ms)

Average Response Time. The average time in milliseconds it takes the network device to respond.

sw.collector.ResponseTime.PercentLoss Percent (%)

Packet Loss. The packet loss of the network device as a percentage.

May be displayed as:

  • Average Packet Loss

Interface metrics

Metric Units Description
sw.collector.InterfaceAvailability.Availability Percent (%)

Availability. Availability of the interface instance of instances. Displayed as a percentage.

May be displayed as:

  • Average Availability
sw.collector.InterfaceTraffic.InPercentUtil Percent (%)

In Percent Utilization. Average utilization of an interface instance or instances. Displayed as a percentage.

May be displayed as:

  • In Percent Utilization Average
sw.collector.InterfaceTraffic.OutPercentUtil Percent (%)

Out Percent Utilization. Average utilization of an interface instance or instances. Displayed as a percentage.

May be displayed as:

  • Out Percent Utilization Average
sw.collector.InterfaceTraffic.InAveragebps Percent (%)

In Bits Per Second Average. Average utilization of an interface instance or instances. Displayed as a percentage.

sw.collector.InterfaceTraffic.OutAveragebps Percent (%)

Out Bits Per Second Average. Average utilization of an interface instance or instances. Displayed as a percentage.

sw.collector.InterfaceErrors.InDiscards Percent (%)

Out Discards. Average utilization of an interface instance or instances. Displayed as a percentage.

May be displayed as:

  • Out Discards Average
sw.collector.InterfaceErrors.OutDiscards Percent (%)

In Discards. Average utilization of an interface instance or instances. Displayed as a percentage.

May be displayed as:

  • In Discards Average
sw.collector.InterfaceErrors.InErrors Percent (%)

In Errors. Average utilization of an interface instance or instances. Displayed as a percentage.

May be displayed as:

  • In Errors Average
sw.collector.InterfaceErrors.OutErrors Percent (%)

Out Errors. Average utilization of an interface instance or instances. Displayed as a percentage.

May be displayed as:

  • Out Errors Average

Volume metrics

Metric Units Description
sw.collector.VolumeUsageHistory.PercentDiskUsed Percent (%)

Percent Disk Used. Indicates the overall disk usage as a percentage.

sw.collector.VolumeUsageHistory.AvgDiskUsed Gigabytes

Average Disk Used. Indicates the average disk usage in Gigabytes.

sw.collector.VolumeUsageHistory.DiskSize Gigabytes

Volume Size. Indicates the disk size in Gigabytes.

sw.collector.VolumePerformanceHistory.AvgDiskReads Percent (%)

Disk Read Average. Indicates the average read speed of the volume.

Only for volumes monitored via WMI.

sw.collector.VolumePerformanceHistory.AvgDiskWrites Percent (%)

Disk Write Average. Indicates the average write speed.

Only for volumes monitored via WMI.

Sensor metrics

Metric Units Description
sw.collector.HardwareHealth.HardwareItemStatistics.AvgValue V/°C/

Average Sensor Value. Indicates the sensor value in appropriate units, as provided by the sensor. Sensors include power supplies, temperature, or fan sensors.

Flow metrics

Metric Units Description
sw.collector.Netflow.Flows.Bytes GB

Top Protocols, Top Countries, Top Endpoints, Top Conversations, Top Applications, Top Advanced Applications.

Endpoints producing the most traffic on your network, most bandwidth-consuming conversations, protocols used for most traffic, countries hosting endpoints that transmit the most data, or applications responsible for most monitored traffic.

Wireless Controller and Thin Access Point metrics

Metric Units Description
sw.collector.Wireless.Interfaces N/A MAC, SSIDs, Channels and Radio Type details are gathered from wireless interfaces of that AP.
sw.collector.Wireless.Clients Number The sum of clients connected to all interfaces of AP.
sw.collector.Wireless.HistoricalClients.SignalStrength  

RSSI - signal strength

The following thresholds are used to convert dbm value to a strength indicator: -82, -72, -68, -63, -56 (-82 is the worst).

sw.collector.Wireless.HistoricalClients.OutDataRate   Data rate on clients

Special metrics

Metric Units Description

sw.collector.InterfaceTraffic.Averagebps

Percent (%) Total average bps (transmitted + received).

OTel metrics

When an OTel receiver is configured to send telemetry data directly to SolarWinds Observability SaaS, the metrics collected depend on what OTel data is sent. See OTel direct ingestion.

When you integrate with Apache, Elasticsearch, NGINX, Redis, or ZooKeeper, the SolarWinds Observability Agent is used to send metrics and log data to SolarWinds Observability SaaS. See Monitor with OTel.

Apache metrics

Metric Units Description
apache.cpu.load Percent (%)

The current load of the CPU.

apache.cpu.time Jiff The jiffs used by processes of a given category.
apache.current_connections Connections The number of active connections currently attached to the HTTP server.
apache.load.1 Percent (%) The average server load during the last minute.
apache.load.15 Percent (%) The average server load during the last 15 minutes.
apache.load.5 Percent (%) The average server load during the last 5 minutes.
apache.request.time milliseconds (ms) Total time spent on handling requests.
apache.request.time.rate milliseconds (ms) Total time spent on handling requests.
apache.requests Requests The number of requests serviced by the HTTP server per second.
apache.requests.rate milliseconds (ms) Total time spent on handling requests.
apache.scoreboard Workers The number of workers in each state.
apache.throughput Byte per request The average number of bytes served per request.
apache.time.perrequest milliseconds per request The average processing time per request.
apache.traffic Byte Total HTTP server traffic in bytes.
apache.traffic.rate Byte per request HTTP server traffic in bytes per second.
apache.uptime seconds (s) The amount of time that the server has been running in seconds.
apache.workers Workers The number of workers currently attached to the HTTP server.
apache.workers.idle Workers The number of idle workers.

Confluent Cloud metrics

Metric Units Description
confluent_kafka_server_active_connection_count {connections} The count of active authenticated connections.

confluent_kafka_server_partition_count

{partitions} The number of partitions.
confluent_kafka_server_received_bytes By (bytes)/60s The delta count of bytes of the customer's data received from the network. Each sample is the number of bytes received since the previous data sample. The count is sampled every 60 seconds.
confluent_kafka_server_received_records {records}/60s The delta count of records received. Each sample is the number of records received since the previous data sample. The count is sampled every 60 seconds.
confluent_kafka_server_request_bytes Bytes/60s The delta count of total request bytes from the specified request types sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
confluent_kafka_server_request_count {requests}/60s The delta count of requests received over the network. Each sample is the number of requests received since the previous data point. The count sampled every 60 seconds.
confluent_kafka_server_response_bytes Bytes/60s The delta count of total response bytes from the specified response types sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
confluent_kafka_server_retained_bytes Bytes/60s The current count of bytes retained by the cluster. The count is sampled every 60 seconds.
confluent_kafka_server_sent_bytes Bytes/60s The delta count of bytes sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
confluent_kafka_server_sent_records {records}/60s The delta count of records sent. Each sample is the number of records sent since the previous data point. The count is sampled every 60 seconds.
confluent_kafka_server_successful_authentication_count {successful authentications}/60s The delta count of successful authentications. Each sample is the number of successful authentications since the previous data point. The count sampled every 60 seconds.

Docker metrics

Metric Units Description
container.blockio.io_service_bytes_recursive bytes (By) The nof bytes transferred to/from the disk by the group and descendant groups.
container.cpu.throttling_data.periods {periods} The number of periods with throttling active.
container.cpu.usage.kernelmode nanosecond (ns) Time spent by tasks of the cgroup in kernel mode (Linux). Time spent by all container processes in kernel mode (Windows).
container.cpu.usage.total nanosecond (ns) Total CPU time consumed.
container.cpu.usage.usermode nanosecond (ns) Time spent by tasks of the cgroup in user mode (Linux). Time spent by all container processes in user mode (Windows).
container.cpu.utilization percentage (%)

Container CPU Utilization. Percentage of CPU used per container.

container.memory.file bytes (By) Amount of memory used to cache filesystem data, including tmpfs and shared memory (Only available with cgroups v2).
container.memory.percent percentage (%)

Container Memory Utilization. Percentage of memory used per container

container.memory.total_cache bytes (By) Total amount of memory used by the processes of this cgroup (and descendants) that can be associated with a block on a block device. Also accounts for memory used by tmpfs (Only available with cgroups v1).
container.memory.usage.limit bytes (By) Memory limit of the container.
container.memory.usage.total bytes (By) Memory usage of the container. This excludes the cache.
container.network.io.usage.rx_bytes bytes (By)

Total Received Bytes per Container. Total bytes received by the container.

container.network.io.usage.rx_dropped {packets}

Total Incoming Dropped Packets by Container . Total incoming packets dropped by the container.

container.network.io.usage.tx_bytes bytes (By)

Total Sent Bytes per Container. Total bytes sent by the container.

container.network.io.usage.tx_dropped {packets}

Total Outgoing Dropped Packets by Container. Total outgoing packets dropped by the container.

container.uptime seconds (s)

Total Container Uptime. The time elapsed since the start time of the container.

Elasticsearch metrics

Metric Units Description
elasticsearch.breaker.memory.estimated bytes (By)

The estimated memory used for the operation.

elasticsearch.breaker.memory.limit bytes (By) The memory limit for the circuit breaker.
elasticsearch.breaker.tripped 1 The total number of times the circuit breaker has been triggered and prevented an out of memory error.
elasticsearch.cluster.data_nodes {nodes} Data Nodes. The number of data nodes in the cluster.
elasticsearch.cluster.health status Cluster by Status. The health status of the cluster. Health status is based on the state of its primary and replica shards. Green indicates all shards are assigned. Yellow indicates that one or more replica shards are unassigned. Red indicates that one or more primary shards are unassigned, making some data unavailable.
elasticsearch.cluster.in_flight_fetch {fetches} The number of unfinished fetches.
elasticsearch.cluster.nodes {nodes} Nodes, Top 5 Clusters by Node Count. The total number of nodes in the cluster.
elasticsearch.cluster.pending_tasks {tasks} Pending Tasks in Cluster. The number of cluster-level changes that have not yet been executed.
elasticsearch.cluster.published_states.differences 1 The number of differences between published cluster states.
elasticsearch.cluster.published_states.full 1 The number of published cluster states.
elasticsearch.cluster.shards {shards} Active Shards, Shards by State. The number of shards in the cluster.
elasticsearch.cluster.state_queue 1 The number of cluster states in queue.
elasticsearch.cluster.state_update.count 1 The number of cluster state update attempts that changed the cluster state since the node started.
elasticsearch.cluster.state_update.time milliseconds (ms) The cumulative amount of time updating the cluster state since the node started.
elasticsearch.index.operations.completed {operations} The number of operations completed for an index.
elasticsearch.index.operations.time milliseconds (ms) Time spent on operations for an index.
elasticsearch.index.shards.size bytes (By) The size of the shards assigned to this index.
elasticsearch.indexing_pressure.memory.limit bytes (By) The configured memory limit, in bytes, for the indexing requests.
elasticsearch.indexing_pressure.memory.total.primary_rejections 1 The cumulative number of indexing requests rejected in the primary stage.
elasticsearch.indexing_pressure.memory.total.replica_rejections 1 The number of indexing requests rejected in the replica stage.
elasticsearch.memory.indexing_pressure bytes (By) Indexing Pressure. The memory consumed, in bytes, by indexing requests in the specified stage.
elasticsearch.node.cache.count {count} The total count of query cache misses across all shards assigned to selected nodes.
elasticsearch.node.cache.evictions {evictions} The number of evictions from the cache on a node.
elasticsearch.node.cache.memory.usage bytes (By) The size in bytes of the cache on a node.
elasticsearch.node.cluster.connections {connections} Cluster Connections. The number of open TCP connections for internal cluster communication.
elasticsearch.node.cluster.io bytes (By) The number of bytes sent and received on the network for internal cluster communication.
elasticsearch.node.cluster.io.rate bytes per second (By/s) Network Traffic. The number of bytes sent and received for internal cluster communication per second.
elasticsearch.node.disk.io.read kilobytes (KiBy) Disk Read and Write. The total number of kilobytes read across all file stores for this node.
elasticsearch.node.disk.io.write kilobytes (KiBy) Disk Read and Write. The total number of kilobytes written across all file stores for this node.
elasticsearch.node.documents {documents} The number of documents on the node.
elasticsearch.node.fs.disk.available bytes (By) The amount of disk space available to the JVM across all file stores for this node. Depending on OS or process level restrictions, this might appear less than free. This is the actual amount of free disk space the Elasticsearch node can use.
elasticsearch.node.fs.disk.free bytes (By) The amount of unallocated disk space across all file stores for this node.
elasticsearch.node.fs.disk.total bytes (By) The amount of disk space across all file stores for this node.
elasticsearch.node.http.connections {connections} The number of HTTP connections to the node.
elasticsearch.node.ingest.documents {documents} The total number of documents ingested during the lifetime of this node.
elasticsearch.node.ingest.documents.current {documents} The total number of documents currently being ingested.
lasticsearch.node.ingest.operations.failed {operation} The total number of failed ingest operations during the lifetime of this node.
elasticsearch.node.open_files {files} Open File Descriptors. The number of open file descriptors held by the node.
elasticsearch.node.operations.completed {operations} The number of operations completed by a node.
elasticsearch.node.operations.completed.rate {operations} per second Node Operations Completed per Second. The number of operations completed for an index per second.
elasticsearch.node.operations.time milliseconds (s) Total Time Spent on Operations. The time spent on operations by a node.
elasticsearch.node.pipeline.ingest.documents.current {documents} The total number of documents currently being ingested by a pipeline.
elasticsearch.node.pipeline.ingest.documents.preprocessed {documents} The number of documents preprocessed by the ingest pipeline.
elasticsearch.node.pipeline.ingest.operations.failed {operation} The total number of failed operations for the ingest pipeline.
elasticsearch.node.script.cache_evictions 1 The total number of times the script cache has evicted old data.
elasticsearch.node.script.compilation_limit_triggered 1 The total number of times the script compilation circuit breaker has limited inline script compilations.
elasticsearch.node.script.compilations {compilations} The total number of inline script compilations performed by the node.
elasticsearch.node.shards.data_set.size bytes (By) The total data set size of all shards assigned to the node. This includes the size of shards not stored fully on the node, such as the cache for partially mounted indices.
elasticsearch.node.shards.reserved.size bytes (By) A prediction of how much larger the shard stores on this node will eventually grow due to ongoing peer recoveries, restoring snapshots, and similar activities. A value of -1 indicates that this is not available.
elasticsearch.node.shards.size bytes (By) The size of the shards assigned to this node.
elasticsearch.node.thread_pool.tasks.finished {tasks} The number of tasks finished by the thread pool.
elasticsearch.node.thread_pool.tasks.queued {tasks} Queued Tasks in Thread Pool. The number of queued tasks in the thread pool.
elasticsearch.node.thread_pool.threads {threads} The number of threads in the thread pool.
elasticsearch.node.translog.operations {operations} The number of transaction log operations.
elasticsearch.node.translog.size bytes (By) The size of the transaction log.
elasticsearch.node.translog.uncommitted.size bytes (By) The size of uncommitted transaction log operations.
elasticsearch.os.cpu.load_avg.15m 1 CPU Utilization. The fifteen-minute load average on the system. The field is not present if fifteen-minute load average is not available.
elasticsearch.os.cpu.load_avg.1m 1 CPU Utilization. The one-minute load average on the system. The field is not present if one-minute load average is not available.
elasticsearch.os.cpu.load_avg.5m 1 CPU Utilization. The five-minute load average on the system. The field is not present if five-minute load average is not available.
elasticsearch.os.cpu.usage Percent (%) The recent CPU usage for the whole system, or -1 if not supported.
elasticsearch.os.memory bytes (By) The amount of physical memory.
jvm.classes.loaded 1 The number of loaded classes.
jvm.gc.collections.count 1 The total number of garbage collections that have occurred.
jvm.gc.collections.count.rate collections per second JVM GC Collection Count per Second. The number of Java Virtual Machine garbage collections that have occurred per second.
jvm.gc.collections.elapsed milliseconds (ms) Total JVM GC Collection Time. The approximate accumulated collection elapsed time .
jvm.memory.heap.committed bytes (By) JVM Memory Heap Committed vs Used. The amount of memory that is guaranteed to be available for the heap.
jvm.memory.heap.max bytes (By) The maximum amount of memory can be used for the heap .
jvm.memory.heap.used bytes (By) JVM Memory Heap Committed vs Used. The current heap memory usage.
jvm.memory.nonheap.committed bytes (By) The amount of memory that is guaranteed to be available for non-heap purposes.
jvm.memory.nonheap.used bytes (By) The current non-heap memory usage.
jvm.memory.pool.max bytes (By) The maximum amount of memory can be used for the memory pool.
jvm.memory.pool.used bytes (By) The current memory pool memory usage.
jvm.threads.count 1 The current number of threads.

IIS metrics

Metric Units Description
iis.connection.active {active connections} The number of active connections.
iis.connection.anonymous {anonymous connections} The number of connections established anonymously.
iis.connection.anonymous/rate {anonymous connections}/s The number of connections established anonymously per second.
iis.connection.attempt.count {connection attempts} The total number of attempts to connect to the server.
iis.connection.attempt.count/rate {connection attempts}/second (s) The total number of attempts to connect to the server per second.
iis.network.blocked bytes (By) The total number of bytes blocked due to bandwidth throttling.
iis.network.file.count bytes (By) The number of transmitted files.
iis.network.io bytes (By) The total amount of bytes sent and received.
iis.network.io/rate bytes (By)/second (s) The total amount of bytes sent and received per second
iis.request.count {requests} The total number of requests of a given type.
iis.request.queue.count {requests} The current number of requests in the queue.
iis.request.rejected {requests} The total number of requests rejected.
iis.thread.active {requests} The total number of active threads.
iis.uptime M/k The amount of time the server has been up.

Kafka metrics

Metric Units Description
kafka_controller_kafkacontroller_activecontrollercount {active controllers in cluster} Active Cluster Controllers. The average number of active controllers in the cluster.

kafka_log_logflushstats_logflushrateandtimems.95th
kafka_log_logflushstats_logflushrateandtimems.999th
kafka_log_logflushstats_logflushrateandtimems.median

  Log Flush Rate and Time. The maximum values of log flush rate and time.

kafka_network_requestmetrics_localtimems
kafka_network_requestmetrics_localtimems.95th
kafka_network_requestmetrics_localtimems.999th
kafka_network_requestmetrics_localtimems.median

ms (millisecond) Leader Request Time. The average time taken to process a request at the leader.

kafka_network_requestmetrics_totaltimems
kafka_network_requestmetrics_totaltimems.95th
kafka_network_requestmetrics_totaltimems.999th
kafka_network_requestmetrics_totaltimems.median

ms (millisecond) Producer Request Time. The average total time to serve a single 'Produce' request.
kafka_network_socketserver_networkprocessoravgidlepercent % (percentage) Broker Process Idle Time. The average fraction of time the network processors are idle.
kafka_server_brokertopicmetrics_bytesin_1minuterate Bytes/second Broker Incoming Bytes. The one-minute sum of incoming bytes per second.
kafka_server_brokertopicmetrics_bytesin_1minuterate Bytes/second/{topic} Broker Incoming Bytes per Topic. The one-minute average rate of incoming bytes per second distributed by Topic.
kafka_server_brokertopicmetrics_messagesin_1minuterate {messages}/second Broker Incoming Messages. The one-minute sum of incoming messages per second.
kafka_server_brokertopicmetrics_messagesin_1minuterate {messages}/second/{topic} Broker Incoming Messages per Topic. The one-minute average rate of incoming messages per second distributed per topic.
kafka_server_replicafetchermanager_maxlag {messages} Max Replica Lag. The average of maximum number of messages by which the consumer lags behind the producer.
kafka_server_replicamanager_isrshrinks_1minuterate {shrink events}/minute ISR Shrink Rate. The one-minute rate of ISR shrink events. If a broker goes down, ISR for some of the partitions will shrink. When that broker is up again and the replicas are fully caught up, ISR will expand.
kafka_server_replicamanager_leadercount {replica leaders} Leader Replicas. The average number of replica leaders.
kafka_server_replicamanager_partitioncount {partitions} Partitions. The average number of partitions on all brokers.
kafka_server_replicamanager_underreplicatedpartitions {under-replicated partitions} Under-Replicated Partitions. The average number of under-replicated partitions.

Memcached metrics

Metric Units Description
memcached.bytes bytes (By) Current Bytes Stored, Bytes Stored. The current number of bytes used by this server to store items.
memcached.commands {commands} The commands executed.
memcached.commands.rate {commands}/second Commands. The commands executed per second.
memcached.connections.current {connections} The current number of open connections.
memcached.connections.total {connections} The total number of connections opened since the server started running.
memcached.cpu.usage seconds (s) CPU User Time, CPU System Time. The accumulated user and system time.
memcached.current_items {items} Current Items in Cache, Active Connections. The number of items currently stored in the cache.
memcached.evictions {evictions} Total Evictions. The average total number of cache item evictions.
memcached.network bytes (By) Bytes transferred over the network.
memcached.network.rate bytes/second (By/s) Network Traffic. The average number of bytes transferred over the network, per second.
memcached.operation_hit_ratio percentage (%) Operation Hit Ratio. The hit ratio for operations, expressed as a percentage value between 0.0 and 100.0.
memcached.operations {operations} Hits and Misses Total. The average total counts of hits and misses.
memcached.operations.rate {operations}/second The average counts of hits and misses per second.
memcached.threads {threads} The number of threads used by the Memcached instance.

NGINX metrics

Metric Units Description
nginx.conections Connections

The current number of nginx connections by state.

nginx.connections_accepted Connections The total number of accepted client connections.
nginx.connections_accepted.gauge Connections The accepted client connections (gauge).
nginx.connections_accepted.rate Connections The number of accepted client connections per second.
nginx.connections_current Connections The current number of nginx connections by state.
nginx.connections_dropped Connections The total number of dropped client connections.
nginx.connections_dropped.rate Connections The number of dropped client connections per second.
nginx.connections_handled Connections The total number of handled connections. Generally, the parameter value is the same as nginx.connections_accepted unless some resource limits have been reached (for example, the worker_connections limit).
nginx.connections_handled.gauge Connections The handled client connections (gauge).
nginx.connections_handled.rate Connections The number of handled client connections per second.
nginx.requests Requests The total number of requests made to the server since it started.
nginx.requests.rate

Requests per second

The number of requests per second.

Oracle DB metrics

Metric Units Description
oracledb.cpu_time Seconds (s)

The cumulative CPU time, in seconds.

oracledb.dml_locks.limit {locks} The maximum limit of active Data Manipulation Language (DML) locks, -1 if unlimited.
oracledb.dml_locks.usage {locks} The current count of active Data Manipulation Language (DML) locks.
oracledb.enqueue_deadlocks {deadlocks} The total number of deadlocks between table or row locks in different sessions.
oracledb.enqueue_locks.limit {locks} The maximum limit of active en queue locks, -1 if unlimited.
oracledb.enqueue_locks.usage {locks} The current count of active en queue locks.
oracledb.enqueue_resources.limit {resources} The maximum limit of active en queue resources, -1 if unlimited.
oracledb.enqueue_resources.usage {resources} The current count of active en queue resources.
oracledb.exchange_deadlocks {deadlocks} The number of times that a process detected a potential deadlock when exchanging two buffers and raised an internal, restartable error. Index scans are the only operations that perform exchanges.
oracledb.executions {executions} The total number of calls (user and recursive) that executed SQL statements.
oracledb.hard_parses {parses} The number of hard parses.
oracledb.logical_reads {reads} The number of logical reads.
oracledb.parse_calls {parses} The total number of parse calls.
oracledb.pga_memory bytes (By) The Session Program Global Area (PGA) memory.
oracledb.physical_reads {reads} The number of physical reads.
oracledb.processes.limit {processes} The maximum limit of active processes, -1 if unlimited.
oracledb.processes.usage {processes} The current count of active processes.
oracledb.sessions.limit {processes} The maximum limit of active sessions, -1 if unlimited.
oracledb.sessions.usage {processes} The count of active sessions.
oracledb.tablespace_size.limit bytes (By) The maximum size of tablespace in bytes, -1 if unlimited.
oracledb.tablespace_size.usage bytes (By) The used tablespace in bytes.
oracledb.transactions.limit {transactions} The maximum limit of active transactions, -1 if unlimited.
oracledb.transactions.usage {transactions} The current count of active transactions.
oracledb.user_commits {commits} The number of user commits. When a user commits a transaction, the redo generated that reflects the changes made to database blocks must be written to disk. Commits often represent the closest thing to a user transaction rate.
oracledb.user_rollbacks 1 The number of times users manually issue the ROLLBACK statement or an error occurs during a user's transactions

RabbitMQ metrics

Metric Units Description
rabbitmq.message.current.sum {messages} Current Messages in Queues, Top 10 Queues by Depth. The total number of messages currently in the queues on RabbitMQ by queue name.
rabbitmq_channels {channels} Open Channels. The number of channels currently open on RabbitMQ.
rabbitmq_channel_messages_unacked {messages} Messages Unacknowledged. The average number of delivered but not yet acknowledged messages on RabbitMQ.
rabbitmq_consumers {consumers} Queue Consumers. The number of currently connected consumers on RabbitMQ.
rabbitmq_disk_space_available_bytes Bytes Free Disk Space. The average free disk space available on RabbitMQ.
rabbitmq_erlang_processes_used {processes} Used Processes. The total number of Erlang processes used by RabbitMQ.
rabbitmq.message.acknowledged.rate {messages}/s Messages Acknowledged per Second. The average number of messages acknowledged per second on RabbitMQ.
rabbitmq.message.delivered.rate {messages}/s Messages Delivered per Second. The average number of messages delivered per second on RabbitMQ.
rabbitmq.message.dropped.rate {messages}/s Messages Dropped per Second. The average number of messages dropped per second on RabbitMQ.
rabbitmq.message.published.rate {messages}/s Messages Published per Second. The average number of messages published per second on RabbitMQ.
rabbitmq_process_open_fds {file descriptors} Open File Descriptors. The average number of open file descriptors on RabbitMQ.
rabbitmq_process_open_tcp_sockets {sockets} Open Sockets. The total number of open TCP sockets on RabbitMQ.
rabbitmq_process_resident_memory_bytes Bytes Memory Consumed by Node. The memory used by node on RabbitMQ.
rabbitmq_queue_consumer_utilisation   Consumer Utilization. The average proportion of time that the queues can deliver messages to consumers on RabbitMQ.
rabbitmq_queue_process_memory_bytes Bytes Memory Consumed by Queues. The average memory used by the Erlang queue process on RabbitMQ.

Redis metrics

Metric Units Description
redis.clients.blocked   Blocked Clients, Clients. The number of clients pending on a blocking call.
redis.clients.connected   Redis Version, Clients. The number of client connections (excluding connections from replicas).
redis.clients.max_input_buffer   The biggest input buffer among current client connections .
redis.clients.max_output_buffer   The longest output list among current client connections.
redis.commands operations/s Processed Commands per Second. The number of commands processed per second.
redis.commands.processed   Total Processed Commands. The total number of commands processed by the server.
redis.connections.received   Total Connections. The total number of connections accepted by the server.
redis.connections.rejected   Total Connections. The number of connections rejected because of the maxclients limit.
redis.cpu.time seconds (s) Total CPU Time by State. The system CPU consumed by the Redis server in seconds since the server started.
redis.db.avg_ttl milliseconds (ms) The average keyspace keys TTL.
redis.db.expires   The number of keyspace keys with an expiration.
redis.db.keys   The number of keyspace keys.
redis.keys.evicted   Total Expired and Evicted Keys. The number of keys evicted due to the maxmemory limit.
redis.keys.expired   Total Expired and Evicted Keys. The total number of key expiration events.
redis.keyspace.hits   The number of successful lookup of keys in the main dictionary.
redis.keyspace.misses   The number of failed lookup of keys in the main dictionary.
redis.latest_fork microseconds (μs) The duration of the latest fork operation in microseconds.
redis.memory.fragmentation_ratio   Fragmentation Ratio. The ratio between used_memory_rss and used_memory.
redis.memory.lua bytes (By) Used Memory. The number of bytes used by the Lua engine.
redis.memory.peak bytes (By) Peak memory consumed by Redis (in bytes).
redis.memory.rss bytes (By) Used Memory. The number of bytes that Redis allocated as seen by the operating system.
redis.memory.used bytes (By) Used Memory. The total number of bytes allocated by Redis using its allocator.
redis.net.input bytes (By) The total number of bytes read from the network.
redis.net.output bytes (By) Total Network Traffic. The total number of bytes written to the network.
redis.rdb.changes_since_last_save   Changes Since Last Save. The number of changes since the last dump.
redis.replication.backlog_first_byte_offset   The master offset of the replication backlog buffer.
redis.replication.offset   The server's current replication offset.
redis.role   Role. The Redis node's role.
redis.slaves.connected   Clients. The number of connected replicas.
redis.uptime seconds (s) Uptime. The number of seconds since Redis server started.

ZooKeeper metrics

Metric Units Description
zookeeper.connection.active Connections

The number of active clients connected to a ZooKeeper server.

zookeeper.data_tree.ephemeral_node.count Nodes The number of ephemeral nodes that a ZooKeeper server has in its data tree.
zookeeper.data_tree.size Byte The size of data in bytes that a ZooKeeper server has in its data tree.
zookeeper.file_descriptor.available File_descriptors The number of file descriptors that a ZooKeeper still has available.
zookeeper.file_descriptor.limit File_descriptors The maximum number of file descriptors that a ZooKeeper server can open.
zookeeper.file_descriptor.open File_descriptors The number of file descriptors that a ZooKeeper server has open.
zookeeper.latency.max milliseconds (ms) The maximum time in milliseconds for requests to be processed.
zookeeper.latency.min milliseconds (ms) The minimum time in milliseconds for requests to be processed.
zookeeper.packet.count Packets The number of ZooKeeper packets received or sent by a server.
zookeeper.packet.count.rate Packets per second The number of ZooKeeper packets received and sent by a server.
zookeeper.request.active Requests The number of currently executing requests.
zookeeper.watch.count Watches The number of watches placed on Z-Nodes on a ZooKeeper server.
zookeeper.znode.count Znodes The number of Z-Nodes that a ZooKeeper server has in its data tree.

Telegraf metrics

When you integrate with FluentD, HAProxy, NGINX Plus API, NTPq, StatsD, or Varnish, the SolarWinds Observability Agent is used to send metrics to SolarWinds Observability SaaS. See Monitor with Telegraf.

FluentD metrics

For a comprehensive list of metrics, see Fluentd Input Plugin at GitHub.

MetricUnitsDescription
fluentd_buffer_available_buffer_space_ratiosPercent (%)Available Buffer Space. The percentage of remaining available buffer space.
fluentd_buffer_queue_byte_sizeBytes (B)Buffer Queue Bytes. The current size of queued buffer chunks (in bytes).
fluentd_buffer_queue_length Buffer Queue Length. The length of the buffer queue.
fluentd_buffer_stage_byte_sizeBytes (B)Buffer Stage Bytes. The current size of staged buffer chunks (in bytes).
fluentd_buffer_stage_length Buffer Stage Length. The length of staged buffer chunks.
fluentd_buffer_total_queued_sizeBytes (B)Buffer Queue Size. The size of the buffer queue.
fluentd_emit_count{emits}Total Record Emit Count. The total number of emit calls.
fluentd_emit_records{records}Total Emit Records. The total number of emitted records.
fluentd_emit_sizeBytes (B)Total Emit Size. The total size of emit events.
fluentd_retry_count{retries}Retry Count. The number of retry attempts.
fluentd_rollback_count{count}Total Rollback Count. The total number of rollbacks. Rollbacks happen when write/try_write fails.
fluentd_slow_flush_count{count}Total Slow Flush Count. The total number of slow flushes. This count will be incremented when buffer flush is longer than slow_flush_log_threshold.
fluentd_write_count{count}The total number of writes.

HAProxy metrics

For a comprehensive list of metrics, see HAProxy Input Plugin at GitHub and HaProxy documentation at docs.haproxy.org.

SolarWinds Observability SaaS expects that metrics return a number. Some HAProxy metrics, such as status, return strings, and thus are not supported.

MetricUnitsDescription
haproxy_active_servers{servers}Active Servers. The number of currently active servers.
haproxy_backup_servers{servers}Backup Servers. The number of available backup servers.
haproxy_binbytesTotal In and Out Traffic. The cumulative total of incoming traffic.
haproxy_boutbytesTotal In and Out Traffic. The cumulative total of outgoing traffic.
haproxy_dreq{requests}Total Denied Requests. The cumulative number of requests denied because of security concerns.
haproxy_dcon{requests}Total Denied Requests. The cumulative number of requests denied by the 'tcp-request connection' rules.
haproxy_dses{requests}Total Denied Requests. The cumulative number of requests denied by the 'tcp-request session' rules.
haproxy_dresp{responses}Total Denied Responses. The cumulative number of responses denied because of security concerns. For HTTP, the responses are denied because of a matched http-request rule, or 'option checkcache'.
haproxy_eresp{responses}Total Denied Responses. The cumulative number of response errors, such as srv_abrt, or write errors on the client socket, or failure applying filters to the response.
haproxy_ereq{errors}Total Request Errors. The cumulative number of request errors, such as early termination from the client, read error, client timeout, client closed connection,.
haproxy_econ{errors}Total Request Errors. The cumulative number of request errors encountered when trying to connect to a backend server. The backend stat is the sum of the stat for all servers of that backend, plus any connection errors not associated with a particular server (such as the backend having no active servers).
haproxy_scur{sessions}Current Sessions. The number of current sessions per proxy
haproxy_slim{sessions}Session Limit. The currently configured session limit.
haproxy_stot{sessions}Total Sessions. The cumulative number of sessions.
haproxy_req_raterequests per secondRequest Rate. HTTP requests per second over the last elapsed second.
haproxy_rtimeMilliseconds (ms)Response Time. The average response time over the 1024 last requests (0 for TCP).
haproxy_req_tot{requests}Total Requests. The total number of received HTTP requests.
haproxy_ctimeMilliseconds (ms)Connection Time. The average connect time over the last 1024 responses.
haproxy_qtimeMilliseconds (ms)Queue Time. The average queue time over the last 1024 responses.
haproxy_ttimeMilliseconds (ms)Session Time. The average session time over the last 1024 responses.
haproxy_http_response.2xx{responses}Total Responses 2xx. The total number of HTTP responses with the 2xx code.
haproxy_http_response.3xx{responses}Total Responses 3xx. The total number of HTTP responses with the 3xx code.
haproxy_http_response.4xx{responses}Total Responses 4xx. The total number of HTTP responses with the 4xx code.
haproxy_http_response.5xx{responses}Total Responses 5xx. The total number of HTTP responses with the 5xx code.

NGINX Plus API metrics

For a more comprehensive list of metrics, see Nginx Virtual Host Traffic (VTS) Input Plugin and Nginx Plus API Input Plugin at GitHub.

MetricUnitsDescription
nginx_vts_connections{connections}The number of connections of individual types: active, reading, writing, waiting, accepted handled, requests.
nginx_vts_server, nginx_vts_filter  
nginx_vts_upstream 

 

nginx_vts_cache  

NTPq metrics

For a comprehensive list of metrics, see NTPQ Input Plugin at GitHub.

MetricUnitsDescription
ntpq_delayMilliseconds (ms)Round Trip Delay. Round trip communication delay to the remote peer or server.
ntpq_jitterMilliseconds (ms)Jitter. Mean deviation (jitter) in the time reported for the remote peer or server (RMS or difference of multiple time samples).
ntpq_offsetMilliseconds (ms)Time Offsets. Mean offset (phase) in the times reported between this local host and the remote peer or server (RMS)
ntpq_pollMinutes (min)Polling Frequency. RFC5905 suggests that this ranges in NTPv4 from 4 (16s) to 17 (36h) (log2 seconds), however, the observation suggests the actual displayed value is seconds for a much smaller range of 64 (26) to 1024 (210) seconds.
ntpq_reachOctal numbersReach. An 8-bit left-shift shift register value recording polls (bit set = successful, bit reset = fail) displayed in octal by default. The type can be changed to decimal/count/ratio by configuring it in the ntpq input section inside telegraf.conf.
ntpq_whenMinutes (min)Last Poll. The time since the last poll.

StatsD metrics

The StatsD integration does not include any default metrics. It supports all native StatsD metric types for custom metric submission. See StatsD Input Plugin at GitHub.

Varnish metrics

For a comprehensive list of metrics, see Varnish Input Plugin at GitHub.

MetricUnitsDescription
varnish_client_req{requests}Total Client Requests. The number of good client requests.
varnish_s_req_bodybytesbytesTotal Bytes. Total bytes for requests and responses.
varnish_s_req_hdrbytesbytesTotal Bytes. Total bytes for requests and responses.
varnish_s_resp_bodybytesbytesTotal Bytes. Total bytes for requests and responses.
varnish_s_ressp_hdrbytesbytesTotal Bytes. Total bytes for requests and responses.
varnish_sess_dropped{sessions}Total Failed and Dropped Sessions. The number of sessions dropped for thread. The number of times an HTTP/1 session was drpped because the queue was too long already. See thread_queue_limit.
varnish_sess_fail{sessions}Total Failed and Dropped Sessions. The number of sessions accept failure. The number of failures to accept a TCP connection. This counter is the sum of the sess_fail_* counters which give more detailed information.
varnish_sess_closed{operations}Total Session Operations. The number of closed sessions.
varnish_sess_herd{operations}Total Session Operations. The number of times the timeout_linger triggered.
varnish_sess_readahead{operations}Total Session Operations. The number of read ahead sessions.
varnish_sess_closed_err{operations}Total Session Operations. The number of sessions. closed with errors.
varnish_s_sess{sessions}Total Sessions. The total number of sessions that occurred.
varnish_n_expired{objects}Total Number of Objects. The number of objects expired because of old age.
varnish_n_lru_moved{objects}Total Number of Objects. The number of moved LRU objects (move operations done on the LRU list).
varnish_n_lru_nuked{objects}Total Number of Objects. The number of objects that have been forcefully evicted from the storage to make room for a new object (LRU nuked objects).
varnish_cache_miss{count}Total Cache Hits and Misses. The number of cache misses. A cache miss indicates that the object was fetched from the backend before delivering it to the client.
varnish_cache_hit{count}Total Cache Hits and Misses. The number of cache hits. A cache hit indicates that the object was delivered to a client without fetching it from a backend server.
varnish_backend_busy
{connections}Total Backed Connections. The number of times Varnish encountered a situation where it considered the backend to be too busy to handle additional connections.
varnish_backend_conn
{connections}Total Backed Connections. The number of successful backend connections.
varnish_backend_fail
{connections}Total Backed Connections. The number of failed backend connections.
varnish_backend_recycle
{connections}Total Backed Connections. The number of recycled backend connections.
varnish_backend_retry
{connections}Total Backed Connections. The number of retried backend connections.
varnish_backend_reuse
{connections}Total Backed Connections. The number of reused backend connections.
varnish_backend_unhealthy{connections}Total Backed Connections. The number of unhealthy backend connections.
varnish_fetch_length
varnish_fetch_bad
varnish_fetch_eof
varnish_fetch_failed
varnish_fetch_head
varnish_fetch_chunked
varnish_fetch_1xx
varnish_fetch_204
varnish_fetch_304
varnish_fetch_none
varnish_fetch_no_thread
{fetches}Total HTTP Request Fetches. The number of all request fetches by type.
varnish_shm_cont
{operations}Total Shared Memory Operations. The number of contention operations (when multiple threads compete for access to SHM resources).
varnish_shm_cycles
{operations}Total Shared Memory Operations. The number of times data cycles through the shared memory.
varnish_shm_flushes
{operations}Total Shared Memory Operations. The number of flush operations.
varnish_shm_records
{operations}Total Shared Memory Operations. The number of record operations.
varnish_shm_writes
{operations}Total Shared Memory Operations. The number of write operations.
varnish_thread_queue_len{count}Total Session Queue Length. The length of session queue waiting for threads.
varnish_threads{workers}Total Workers. The number of threads in all pools.
varnish_sess_queued{sessions}Total Queued Sessions. Sessions queued for thread. The number of times a session was queued waiting for a thread.
varnish_threads_created{threads}Total Worker Threads. The total number of threads created in all pools.
varnish_threads_destroyed{threads}Total Worker Threads. The total number of threads destroyed in all pools.
varnish_threads_failed{threads}Total Worker Threads. The number of times creating a thread failed.
varnish_threads_limited{threads}Total Worker Threads. The number of times more threads were needed but the limit was reached in a thread pool.