Metrics for SolarWinds Observability SaaS entities
Many of the collected metrics from SolarWinds Observability entities are displayed as widgets in SolarWinds Observability explorers; additional metrics may be collected and available in the Metrics Explorer. You can also create an alert for when an entity's metric value moves out of a specific range. See Entities in SolarWinds Observability SaaS for information about entity types in SolarWinds Observability SaaS.
Common metrics
The following metric(s) are available for all entities in SolarWinds Observability SaaS.
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. A health score provides real-time insight into the overall health and performance of your monitored entities. The health score is calculated based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health score is displayed as a single numerical value that ranges from a Good (70-100) to Moderate (40-69) to Bad (0-39) distinction. To view the health score separately for each specific entity type in the Metrics Explorer, group the |
APM/service metrics
Metrics for service entities are sent by APM libraries installed and configured to monitor your service. See Application performance monitoring (APM) for more information.
Standard metrics
Metric | Units | Description |
---|---|---|
trace.service.breakdown.response_time
|
Trace Response Time Breakdown, Average Trace Response Time Breakdown. The amount of time it takes to complete a service transaction, broken down by operation type (for example, application, database calls, or remote calls). The average trace response time breakdown is calculated based on collected traces, so it covers only sampled traces. It can be used to analyze the impact that different types of operations performed by the service have on the average response time. |
|
trace.service.count
|
Count | Number of services that were reporting data in selected time period. |
trace.service.errors
|
Count | Count of requests that ended with an error status. Aggregate by Sum to see the total error count for the service. |
trace.service.error_ratio
|
% | Ratio of errors to requests, calculated by dividing the number of requests with errors by the total number of requests. |
trace.service.exceptions.count
|
Count |
Total number of error events for traced requests. An event is classified as an errors if:
|
trace.service.faas.count
|
Count | Number of AWS Lambda functions for which APM Services were reporting data during the selected time period. |
trace.service.faas.instance.count
|
Count | Number of AWS Lambda instances for which APM Services were reporting data during the selected time period. |
trace.service.hosts.count
|
Count |
Number of APM Hosts for which APM Services were reporting data during the selected time period. Unique APM Host is captured only for Azure VMs, AWS EC2 Instances, and hosts monitored with UAMS. |
trace.service.instance.count
|
Count | Number of service instances that were reporting data during the selected time period. |
trace.service.pod.count
|
Count | Number of Kubernetes Pods for which APM Services were reporting data in selected time period. |
trace.service.requests
|
Count | Count of requests for each HTTP status code (200, 404, etc.). Aggregate by Sum to see the total request count for the service. |
trace.service.request_rate
|
Count | Rate of requests per second, calculated by dividing the number of requests (trace.service.requests ) by the length of the aggregation period in seconds. |
trace.service.response_time
|
ms | Duration of each entry span for the service, typically meaning the time taken to process an inbound request. |
trace.service.samplecount
|
Count | Count of requests that went through a sampling decision, which excludes those with valid upstream decision and trigger trace requests. |
trace.service.service_response_time
|
ms | Duration of each entry span for the service, typically meaning the time taken to process an inbound request. |
trace.service.service_response_time.p50
trace.service.service_response_time.p95
trace.service.service_response_time.p99
trace.service.service_response_time.p999
|
ms | Percentile values for the trace.service.service_response_time metric. |
trace.service.tracecount
|
Count | Count of traces generated from requests. |
trace.service.transaction.count
|
Count | Number of transactions that were reporting data during the selected time period. |
trace.service.transaction_response_time
|
ms | Duration of each entry span for the service, typically meaning the time taken to process an inbound request. |
trace.service.service_response_time.p50
trace.service.service_response_time.p95
trace.service.service_response_time.p99
trace.service.service_response_time.p999
|
ms | Percentile values for the trace.service.service_response_time metric. |
Sampled trace-derived database metrics
Metric | Units | Description |
---|---|---|
trace.service.outbound_calls.database.query.response_time
|
ms | Duration of traced queries executed by the service to the database. |
Sampled trace-derived cache metrics
Metric | Units | Description |
---|---|---|
trace.service.outbound_calls.cache.op.hits
|
Count |
The count of successful retrievals from cache |
trace.service.outbound_calls.cache.op.requests
|
Count | Number of cache keys returned by the cache call. If the number of keys is not returned, every cache call is counted once. |
trace.service.outbound_calls.cache.op.response_time
|
ms | Duration of traced cache calls executed by the service to the cache engine. |
Sampled trace-derived remote service metrics
Metric | Units | Description |
---|---|---|
trace.service.outbound_calls.remote_service.call.response_time
|
ms | Duration of spans representing remote calls executed by the service to a remote endpoint or remote instrumented service. |
Sampled trace-derived exception metrics
Metric | Units | Description |
---|---|---|
trace.service.exceptions.count
|
Count |
Service exceptions count captured in traces. Total number of error events for traced requests. An event is classified as an error if:
|
Other sampled trace-derived metrics
Metric | Units | Description |
---|---|---|
trace.service.breakdown.response_time
|
Microseconds (μs) |
Trace Response Time Breakdown, Average Trace Response Time Breakdown. The amount of time it takes to complete a service transaction, broken down by operation type (for example, application, database calls, or remote calls). The average trace response time breakdown is calculated based on collected traces, so it covers only sampled traces. It can be used to analyze the impact that different types of operations performed by the service have on the average response time. |
Runtime metrics
See the links below on the metrics for each language runtime and library-specific configuration:
Database metrics
Metrics for database instance entities are sent by the SolarWinds Observability Agent monitoring your databases. See Database monitoring for more information.
Metric | Units | Description |
---|---|---|
dbo.host.queries.errors.tput
|
EPS |
Errors, Error Rate. The number of recorded errors for your database instances per second; the total number of errors returned per second across your monitored databases. Incorrect database responses may indicate request are failing, while throughput and response time appear healthy. |
dbo.host.queries.latency_us
|
milliseconds (ms) |
Response Time. The amount of query latency in milliseconds per query execution across your monitored databases. May be displayed as:
|
dbo.host.queries.p99_latency_us
|
milliseconds (ms) |
Response Time 99th percentile. The amount of response time in the 99th percentile value for each of the top selected queries. |
dbo.host.queries.time_us
|
Count |
Load. The load on your monitored databases, as a number of requests executing simultaneously. Concurrency reveals load (or demand) in a way that is orthogonal to variations in request speed or frequency. |
dbo.host.queries.tput
|
QPS |
Throughput. The number of queries or statements completed per second. This is a metric of traffic intensity and frequency, showing how many requests your servers are processing. |
Digital Experience /website metrics
Metrics for website entities are either collected by probes that synthetically test your website's availability, or sent by the RUM script added to your website. See Digital experience monitoring .
Synthetic availability metrics
Synthetic transaction metrics
Metric | Units | Description |
---|---|---|
synthetics.transaction.attempts
|
Count | The number of attempted executions of your synthetic transaction for the selected time period. |
synthetics.transaction.duration
|
Seconds (s) |
Historical Overview. The amount of time in seconds that it took your synthetic transaction to complete its execution. May be displayed as:
|
synthetics.transaction.error_rate
|
Percentage (%) | Test Success Rate. Value representing the percentage of failed transaction attempts for the selected time period. |
synthetics.transaction.errors
|
Count | Test Success Rate. Value representing the sum of failed transaction attempts for the selected time period. Used to calculate the Synthetic transaction error rate. |
synthetics.transaction.success_rate
|
Percentage (%) | Test Success Rate. Value representing the percentage of successful transaction attempts for the selected time period. |
synthetics.transaction.successes
|
Count | Test Success Rate. Value representing the sum of successful transaction attempts for the selected time period. Used to calculate the Synthetic transaction success rate. |
RUM metrics
Metric | Units | Description |
---|---|---|
rum.pageview.apdex_score
|
Apdex score. A measurement of user satisfaction, using the Application Performance Index standard to specify the degree to which measured performance meets user expectations. The satisfactory load time, tolerating, and frustrated load times are defined when creating the website entity. For more information about the Apdex standard, Defining the Application Performance Index. If the response time for requests takes less time than the satisfied load time threshold set for your website, the Apdex score is considered a satisfied load time. It is a tolerating load time if the response time takes up to four times the satisfied load time threshold, and a frustrated load time if it takes longer than four times the satisfied load time threshold. |
|
rum.pageview.client_processing
|
seconds (s) | Client Processing Time. Measurement of the time from when the browser sends the initial HTTP request until all synchronous load events have been processed, including layout and running scripts. |
rum.pageview.count
|
Count | PageViews. Count of the views of your webpage(s). |
rum.pageview.load_time
|
seconds (s) | Load Time. The amount of time for the website to fully load. |
rum.pageview.ttfb
|
seconds (s) | Time to First Byte. The amount of time between when the browser requested a page and when it received the first byte of information from the server. |
rum.web_vitals.largest_contentful_paint
|
seconds (s) |
Largest Contentful Paint. A measurement of how quickly the largest image or text content of a web page is loaded. Largest contentful paint time is considered good if loading the largest image or text block takes less than 2.5 seconds, needs improvement if it takes up to 4.0 seconds, and poor if it takes longer than 4.0 seconds. |
rum.web_vitals.cumulative_layout_shift
|
Cumulative Layout Shift. Measures how much a webpage shifts unexpectedly while a user is viewing the webpage. A shift may occur if content loads at different speeds or if elements are added to the website dynamically. A cumulative layout shift value of less than .1 is considered good, a value up to .25 needs improvement, and a value greater than .25 is poor. |
|
rum.web_vitals.first_input_delay
|
seconds (s) |
First Input Delay. Time from when a user first interacts with your site to the time when the browser is able to respond to the interaction. First input delay (FID) helps measure the first impression a user has of your site's responsiveness. The FID is considered good if responding to a customer’s first interaction with the site takes less than 100ms, needs improvement if it takes up to 300 ms, and poor if it takes longer than 300 ms. |
rum.session.count
|
Sessions, Top 10 countries by session. The total number of sessions, or visits, to the website during the selected time period and by country. A single session includes every action that the user takes during the entirety of their visit to the website. |
Infrastructure/self-managed host metrics
Metrics for self-managed host entities are sent by the SolarWinds Observability Agent monitoring your host. See Host monitoring for more information.
SolarWinds Observability Agent metrics
Metrics for entities are sent by the SolarWinds Observability Agent monitoring your agent. See SolarWinds Observability Agents for more information.
Metric | Units | Description |
---|---|---|
swo.uams.agent.status
|
possible values: ok , updating, update_failed, restarting, disconnected, stopping, jwt_expired |
The reported operating status of the Agent |
swo.uams.agent.heartbeat
|
Reported by the SolarWinds Observability Agent every 1 minute, when it is not reported then may indicate problems with network or the agent. |
|
swo.uams.agent.cpu
|
Percent (%) | The average amount of CPU capacity in use, as a percentage |
swo.uams.agent.memory
|
Percent (%) |
The average amount of memory in use, as a percentage. |
swo.uams.agent.diskUsage
|
Percent (%) | The amount of storage being used by files and data. |
swo.uams.agent.networkIn
|
The average amount of data received over the network, in bits. This metric is not collected for Windows due to operating system limitations. |
|
swo.uams.agent.networkOut
|
|
The average amount of data send over the network, in bits. This metric is not collected for Windows due to operating system limitations. |
swo.uams.agent.errors.count
|
The amount of errors from the Agent logs - it is calculated from the recent Agent restart. | |
swo.uams.agent.uptime
|
The amount of time from the recent SWO Agent restart. | |
swo.uams.plugin.cpu |
The average amount of CPU used by the plugin, as a percentage . | |
swo.uams.plugin.memory
|
The average amount of memory used by the plugin, as a percentage. | |
swo.uams.plugin.uptime
|
The amount of time from the recent plugin or SWO Agent restart. | |
swo.uams.plugin.status
|
The reported operating status of plugin. See Possible values for plugin status. | |
swo.uams.plugin.healthy
|
0,1 | It is calculated based on reported operating status of plugin and indicate for problems with plugin. |
Possible values for plugin status
Plugin status | Healthy metric value | Description |
---|---|---|
STATUS_CODE_OK
|
1 | The plugin is responding via health checks. |
STATUS_CODE_STOPPED
|
0 | The plugin process stopped by user, not caused by error. |
STATUS_CODE_BROKEN
|
0 | The plugin was not deployed correctly. |
STATUS_CODE_START_FAILED
|
0 | The plugin process cannot be started and Agent tries run it in the loop. |
STATUS_CODE_NOT_RESPONDING
|
0 | The health check from the plugin process was not received for a defined amount of time but the plugin process is running. |
STATUS_CODE_HEALTHCHECK_FAILED
|
0 | Failed to send a health check request to the plugin process. |
STATUS_CODE_CONFIGURATION_ISSUE
|
0 | Reported by plugin and it indicates an invalid or missing configuration. |
STATUS_CODE_FAILED
|
0 | The plugin process was stopped unexpectedly. |
STATUS_CODE_STARTING
|
0 | Start for plugin process was called. |
STATUS_CODE_RESTARTING
|
1 | Restart was called. |
STATUS_CODE_STOPPING
|
0 | Stop for plugin process was called. |
STATUS_CODE_UPDATING
|
0 | Update for plugin was called. |
STATUS_CODE_CRITICAL
|
0 | Reported by plugin. |
STATUS_CODE_WARNING
|
0 | Reported by plugin. |
STATUS_CODE_JWT_EXPIRED
|
0 | It is not possible to refresh JWT. |
STATUS_CODE_UPDATE_FAILED
|
0 | Problems with plugin update. |
STATUS_CODE_INVALID
|
0 | Unknown reason. |
Infrastructure/AWS metrics
Metrics for AWS entities are collected by integrating SolarWinds Observability SaaS with your AWS cloud account. See AWS cloud platform monitoring.
API Gateway
Metric | Units | Description |
---|---|---|
AWS.ApiGateway.4XXError
|
Count |
4XXError. The total number of client-side errors for REST APIs captured in a given period. |
AWS.ApiGateway.4xx
|
Count |
4xx. The total number of client-side errors for HTTP APIs captured in a given period. |
AWS.ApiGateway.5XXError
|
Count |
5XXError. The total number of server-side errors for REST APIs captured in a given period. |
AWS.ApiGateway.5xx
|
Count |
5xx. The total number of server-side errors for HTTP APIs captured in a given period. |
AWS.ApiGateway.CacheHitCount
|
Count |
CacheHitCount. The total number of requests served from the API cache in a given period. |
AWS.ApiGateway.CacheMissCount
|
Count |
CacheMissCount. The total number of requests served from the backend in a given period, when API caching is enabled. |
AWS.ApiGateway.ClientError
|
Count |
ClientError. The total number of requests that have a 4XX response returned by API Gateway before the integration is invoked. |
AWS.ApiGateway.ConnectCount
|
Count |
ConnectCount. The total number of messages sent to the connect route integration. |
AWS.ApiGateway.Count
|
Count |
Count. The total number of API requests in a given period. |
AWS.ApiGateway.DataProcessed
|
bytes |
DataProcessed. The total amount of data processed in bytes. |
AWS.ApiGateway.ExecutionError
|
Count |
ExecutionError. The total number of errors that occurred when calling the integration. |
AWS.ApiGateway.IntegrationError
|
Count |
IntegrationError. The total number of requests that return a 4XX or 5XX response from the integration. |
AWS.ApiGateway.IntegrationLatency
|
milliseconds (ms) |
IntegrationLatency. The average time between when API Gateway relays a request to the backend and when it receives a response from the backend. |
AWS.ApiGateway.Latency
|
milliseconds (ms) |
Latency. The average time between when API Gateway receives a request from a client and when it returns a response to the client. |
AWS.ApiGateway.MessageCount
|
Count |
MessageCount. The total number of messages sent to the WebSocket API, either from or to the client. |
Application ELB
Metric | Units | Description |
---|---|---|
AWS.ApplicationELB.ActiveConnectionCount
|
Count |
ActiveConnectionCount. The total number of concurrent TCP connections active from clients to the load balancer and from the load balancer to targets. |
AWS.ApplicationELB.ConsumedLCUs
|
Count |
ConsumedLCUs. The total number of load balancer capacity units (LCU) used by load balancer. |
AWS.ApplicationELB.HTTPCode_ELB_4XX_Count
|
Count |
HTTPCode_ELB_4XX_Count. The total number of HTTP 4XX client error codes that originate from the load balancer. |
AWS.ApplicationELB.HTTPCode_ELB_5XX_Count
|
Count |
HTTPCode_ELB_5XX_Count. The total number of HTTP 5XX client error codes that originate from the load balancer. |
AWS.ApplicationELB.HTTPCode_Target_4XX_Count
|
Count |
HTTPCode_Target_4XX_Count. The total number of HTTP response with 4xx status codes generated by the targets. This does not include any response codes generated by the load balancer. |
AWS.ApplicationELB.HTTPCode_Target_5XX_Count
|
Count |
HTTPCode_Target_5XX_Count. The total number of HTTP response with 5xx status codes generated by the targets. This does not include any response codes generated by the load balancer. |
AWS.ApplicationELB.HealthyHostCount
|
Count |
HealthyHostCount. The average number of targets that are considered healthy. |
AWS.ApplicationELB.NewConnectionCount
|
Count |
NewConnectionCount. The total number of new TCP connections established from clients to the load balancer and from the load balancer to targets. |
AWS.ApplicationELB.ProcessedBytes
|
bytes |
ProcessedBytes. The total number of bytes processed by the load balancer over IPv4 and IPv6 (HTTP header and HTTP payload). |
AWS.ApplicationELB.RejectedConnectionCount
|
Count |
RejectedConnectionCount. The total number of connections that were rejected because the load balancer had reached its maximum number of connections. |
AWS.ApplicationELB.RequestCount
|
Count |
RequestCount. The total number of requests processed over IPv4 and IPv6. This metric is only incremented for requests where the load balancer node was able to choose a target. |
AWS.ApplicationELB.RequestCountPerTarget
|
Count |
RequestCountPerTarget. The total number of requests received by each target in a target group. |
AWS.ApplicationELB.TargetConnectionErrorCount
|
Count |
TargetConnectionErrorCount. The total number of connections that were not successfully established between the load balancer and target. This metric does not apply if the target is a Lambda function. |
AWS.ApplicationELB.TargetResponseTime
|
s |
TargetResponseTime. The average time elapsed, in seconds, after the request leaves the load balancer until a response from the target is received. |
AWS.ApplicationELB.UnHealthyHostCount
|
Count |
UnhealthyHostCount. The average number of targets that are considered unhealthy. |
Aurora Cluster
Metric | Units | Description |
---|---|---|
AWS.RDS.AuroraGlobalDBReplicationLag
|
AuroraGlobalDBReplicationLag. The total amount of lag when replicating updates from the primary AWS region. |
|
AWS.RDS.AuroraVolumeBytesLeftTotal
|
AuroraVolumeBytesLeftTotal. The total available space for the cluster volume. |
|
AWS.RDS.BacktrackChangeRecordsCreationRate
|
BacktrackChangeRecordsCreationRate. The total number of backtrack change records created over five minutes for the DB cluster. |
|
AWS.RDS.BacktrackChangeRecordsStored
|
BacktrackChangeRecordsCreationStored. The total number of backtrack change records used by the DB cluster. |
|
AWS.RDS.ServerlessDatabaseCapacity
|
ServerlessDatabaseCapacity. The total current capacity of an Aurora Serverless DB cluster. |
|
AWS.RDS.SnapshotStorageUsed
|
SnapshotStorageUsed. The total amount of backup storage consumed by all Aurora snapshots for an Aurora DB cluster outside its backup retention window. |
|
AWS.RDS.VolumeBytesUsed
|
VolumeBytesUsed. The total amount of storage used by the Aurora DB instance. |
|
AWS.RDS.VolumeReadIOPs
|
VolumeReadIOPs. The total number of billed read I/O operations from a cluster volume within a five-minute interval. |
|
AWS.RDS.VolumeWriteIOPs
|
VolumeWriteIOPs. The total number of write disk I/O operations to the cluster volume, reported at five-minute intervals. |
Aurora Instance
Metric | Units | Description |
---|---|---|
AWS.RDS.ActiveTransactions
|
ActiveTransactions. The total number of current transactions executing on an Aurora database instance per second. |
|
AWS.RDS.AuroraReplicaLag
|
AuroraReplicaLag. The total amount of lag when replicating updates from the primary instance. |
|
AWS.RDS.CPUCreditBalance
|
Count |
CPUCreditBalance. The total number of CPU credits that an instance has accumulated, reported at five-minute intervals. You can use this metric to determine how long a DB instance can burst beyond its baseline performance level at a given rate. |
AWS.RDS.CPUCreditUsage
|
Count |
CPUCreditUsage. The total number of CPU credits consumed during the specified period, reported at five-minute intervals. This metric measures the amount of time during which physical CPUs have been used for processing instructions by virtual CPUs allocated to the DB instance. |
AWS.RDS.CPUUtilization
|
Percent (%) |
CPUUtilization. The total percentage of CPU used by an Aurora DB instance. |
AWS.RDS.ConnectionAttempts
|
ConnectionAttempts. The total number of attempts to connect to an instance, whether successful or not. |
|
AWS.RDS.DDLLatency
|
DDLLatency. The total duration of requests such as example, create, alter, and drop requests. |
|
AWS.RDS.DDLThroughput
|
DDLThroughput. The total number of DDL requests per second. |
|
AWS.RDS.DMLLatency
|
DMLLatency. The total duration of inserts, updates, and deletes. |
|
AWS.RDS.DMLThroughput
|
DMLThroughput. The total number of inserts, updates, and deletes per second. |
|
AWS.RDS.DatabaseConnections
|
Count |
DatabaseConnections. The total number of client network connections to the database instance. |
AWS.RDS.FreeableMemory
|
Binary Bytes |
FreeableMemory. The total amount of available random access memory. |
AWS.RDS.LoginFailures
|
LoginFailures. The total number of failed login attempts per second. |
|
AWS.RDS.MaximumUsedTransactionIDs
|
Count |
MaximumUsedTransactionIDs. The total age of the oldest unvacuumed transaction ID, in transactions. If this value reaches 2,146,483,648 (2^31 - 1,000,000), the database is forced into read-only mode to avoid transaction ID wraparound. |
AWS.RDS.ReadIOPS
|
ReadIOPS. The total number of disk I/O operations per second. |
|
AWS.RDS.ReadLatency
|
seconds (s) |
ReadLatency. The total amount of time taken per disk I/O operation. |
AWS.RDS.ReadThroughput
|
ReadThroughput. The total number of bytes read from disk per second. |
|
AWS.RDS.TransactionLogsDiskUsage
|
Megabytes |
TransactionLogsDiskUsage. The average amount of disk space consumed by transaction logs on the Aurora PostgreSQL DB instance. |
AWS.RDS.WriteIOPS
|
WriteIOPS. The total number of Aurora storage write records generated per second. |
|
AWS.RDS.WriteLatency
|
seconds (s) |
WriteLatency. The total amount of time taken per disk I/O operation. |
AWS.RDS.WriteThroughput
|
WriteThroughput. The total number of bytes written to persistent storage every second. |
Auto Scaling Group
Metric | Units | Description |
---|---|---|
AWS.AutoScaling.GroupDesiredCapacity
|
GroupDesiredCapacity. The average number of instances that the Auto Scaling group attempts to maintain. |
|
AWS.AutoScaling.GroupInServiceInstances
|
GroupInServiceInstances. The average number of instances that are running as part of the Auto Scaling group. |
|
AWS.AutoScaling.GroupMaxSize
|
GroupMaxSize. The average maximum size of the Auto Scaling group. |
|
AWS.AutoScaling.GroupMinSize
|
GroupMinSize. The average minimum size of the Auto Scaling group. |
|
AWS.AutoScaling.GroupPendingInstances
|
GroupPendingInstances. The average number of instances that are pending. |
|
AWS.AutoScaling.GroupStandbyInstances
|
GroupStandbyInstances. The average number of instances that are in standby state. |
|
AWS.AutoScaling.GroupTerminatingInstances
|
GroupTerminatingInstances. The average number of instances that are in the process of terminating. |
|
AWS.AutoScaling.GroupTotalInstances
|
GroupTotalInstances. The average number of total instances. |
CloudFront
Metric | Units | Description |
---|---|---|
AWS.CloudFront.4xxErrorRate
|
Percent (%) |
4xx error rate. The average percentage of all viewer requests for which the response's HTTP status code is 4xx. |
AWS.CloudFront.5xxErrorRate
|
Percent (%) |
5xx error rate. The average percentage of all viewer requests for which the response's HTTP status code is 5xx. |
AWS.CloudFront.BytesDownloaded
|
Bytes downloaded. The average number of bytes downloaded by viewers for GET, HEAD, and OPTIONS requests. |
|
AWS.CloudFront.BytesUploaded
|
Bytes uploaded. The average number of bytes that viewers uploaded to your origin with CloudFront using POST and PUT requests. |
|
AWS.CloudFront.Requests
|
Requests. The total number of viewer requests received by CloudFront for all HTTP methods and for both HTTP and HTTPS requests. |
|
AWS.CloudFront.TotalErrorRate
|
Percent (%) |
Total error rate. The average percentage of all viewer requests for which the response's HTTP status code is 4xx or 5xx. |
EBS
Metric | Units | Description |
---|---|---|
AWS.EBS.AverageReadLatency
|
AverageReadLatency. The average time required to complete a read request during the specified time period. |
|
AWS.EBS.AverageWriteLatency
|
AverageWriteLatency. The average time required to complete a write request during the specified time period. |
|
AWS.EBS.BurstBalance
|
Percent (%) |
Used with General Purpose SSD (gp2), Throughput Optimized HDD (st1) and Cold HDD (sc1) volumes only. Provides information about the percentage of I/O credits (for gp2) or throughput credits (for st1 and sc1) remaining in the burst bucket. |
AWS.EBS.VolumeConsumedReadWriteOps
|
Count |
VolumeConsumedReadWriteOps. The total amount of read and write operations (normalized to 256K capacity units) consumed during the specified time period. |
AWS.EBS.VolumeIdleTime
|
seconds (s) |
The total number of seconds in a specified period of time when no read or write operations were submitted. |
AWS.EBS.VolumeQueueLength
|
Count |
VolumeQueueLength. The number of read and write operation requests waiting to be completed during the specified time period. |
AWS.EBS.VolumeReadBytes
|
Binary Bytes |
VolumeReadBytes. The total number of bytes transferred by read operations during the specified time period. |
AWS.EBS.VolumeReadOps
|
Count |
VolumeReadOps. The total number of read operations during the specified time period. Read operations are counted on completion. |
AWS.EBS.VolumeThroughputPercentage
|
Percent (%) |
VolumeThroughputPercentage. The percentage of I/O operations per second (IOPS) delivered of the total IOPS provisioned for an Amazon EBS volume. |
AWS.EBS.VolumeTotalReadTime
|
seconds (s) |
The total number of seconds spent by input operations that completed in a specified period of time. |
AWS.EBS.VolumeTotalWriteTime
|
seconds (s) |
The total number of seconds spent by output operations that completed in a specified period of time. |
AWS.EBS.VolumeWriteBytes
|
Binary Bytes |
VolumeWriteBytes. The total number of bytes transferred by write operations during the specified time period. |
AWS.EBS.VolumeWriteOps
|
Count |
VolumeWriteOps. The total number of write operations during the specified time period. Write operations are counted on completion. |
EC2
Metric | Units | Description |
---|---|---|
AWS.EC2.CPUCreditBalance
|
Count |
For T2 Instances. The number of CPU credits available for the instance to burst beyond its base CPU utilization. Credits are stored in the credit balance after they are earned and removed from the credit balance after they expire. Credits expire 24 hours after they are earned. |
AWS.EC2.CPUCreditUsage
|
Count |
For T2 Instances. The number of CPU credits consumed by the instance. One CPU credit equals one vCPU running at 100% utilization for one minute or an equivalent combination of vCPUs, utilization, and time (for example, one vCPU running at 50% utilization for two minutes or two vCPUs running at 25% utilization for two minutes). |
AWS.EC2.CPUUtilization
|
Percent (%) |
The percentage of allocated EC2 compute units that are currently in use on the instance. This metric identifies the processing power required to run an application upon a selected instance. |
AWS.EC2.DiskReadBytes
|
Binary Bytes |
Bytes read from all instance store volumes available to the instance. This metric is used to determine the volume of the data the application reads from the hard disk of the instance. This can be used to determine the speed of the application. |
AWS.EC2.DiskReadOps
|
Count |
Completed read operations from all instance store volumes available to the instance in a specified period of time. |
AWS.EC2.DiskWriteBytes
|
Binary Bytes |
Bytes written to all instance store volumes available to the instance. This metric is used to determine the volume of the data the application writes onto the hard disk of the instance. This can be used to determine the speed of the application. |
AWS.EC2.DiskWriteOps
|
Count |
Completed write operations to all instance store volumes available to the instance in a specified period of time. |
AWS.EC2.NetworkIn
|
Binary Bytes |
The number of bytes received on all network interfaces by the instance. This metric identifies the volume of incoming network traffic to a single instance. |
AWS.EC2.NetworkOut
|
Binary Bytes |
The number of bytes sent out on all network interfaces by the instance. This metric identifies the volume of outgoing network traffic from a single instance. |
AWS.EC2.NetworkPacketsIn
|
Count |
The number of packets received on all network interfaces by the instance. This metric identifies the volume of incoming traffic in terms of the number of packets on a single instance. This metric is available for basic monitoring only. |
AWS.EC2.NetworkPacketsOut
|
Count |
The number of packets sent out on all network interfaces by the instance. This metric identifies the volume of outgoing traffic in terms of the number of packets on a single instance. This metric is available for basic monitoring only. |
AWS.EC2.StatusCheckFailed
|
Count |
Reports whether the instance has passed both the instance status check and the system status check in the last minute.This metric can be either 0 (passed) or 1 (failed). |
AWS.EC2.StatusCheckFailed_Instance
|
Count |
Reports whether the instance has passed the instance status check in the last minute.This metric can be either 0 (passed) or 1 (failed). |
EFS
Metric | Units | Description |
---|---|---|
AWS.EFS.BurstCreditBalance
|
Binary Bytes |
BurstCreditBalance. The average number of burst credits that a file system has. Burst credits allow a file system to burst to throughput levels above a file system’s baseline level for periods of time. |
AWS.EFS.ClientConnections
|
Count |
ClientConnections. The total number of client connections to a file system. When using a standard client, there is one connection per mounted Amazon EC2 instance. |
AWS.EFS.DataReadIOBytes
|
Binary Bytes |
DataReadIOBytes. The average number of bytes for each file system read operation. |
AWS.EFS.DataWriteIOBytes
|
Binary Bytes |
DataWriteIOBytes. The average number of bytes for each file system write operation. |
AWS.EFS.MetadataIOBytes
|
Binary Bytes |
MetadataIOBytes. The average number of bytes for each metadata operation. |
AWS.EFS.MeteredIOBytes
|
MeteredIOBytes. The average number of metered bytes for each file system operation, including data read, data write, and metadata operations, with read operations metered at one-third the rate of other operations. |
|
AWS.EFS.PercentIOLimit
|
Percent (%) |
PercentIOLimit. How close a file system is to reaching the I/O limit of the General Purpose performance mode. Data is available only for file systems running with General Purpose performance mode. |
AWS.EFS.PermittedThroughput
|
PermittedThroughput. The maximum amount of throughput that a file system can drive. |
|
AWS.EFS.StorageBytes
|
StorageBytes. The average size of the file system in bytes, including the amount of data stored in the EFS Standard and EFS Standard–Infrequent Access (EFS Standard-IA) storage classes. |
|
AWS.EFS.TimeSinceLastSync
|
TimeSinceLastSync. The average amount of time that has passed since the last successful sync to the destination file system in a replication configuration. |
|
AWS.EFS.TotalIOBytes
|
Binary Bytes |
TotalIOBytes. The total number of bytes for each file system operation, including data read, data write, and metadata operations. This is the actual amount that your application is driving, and not the throughput the file system is being metered at. |
Elastic Beanstalk
Metric | Units | Description |
---|---|---|
AWS.ElasticBeanstalk.ApplicationLatencyP99.9
|
Count |
P99.9. The average latency for the slowest x percent of requests over the last 10 seconds, where x is the difference between the number and 100. For example, p99 1.403 indicates the slowest 1% of requests over the last 10 seconds had an average latency of 1.403 seconds. |
AWS.ElasticBeanstalk.ApplicationRequests2xx
|
Count |
Status 2xx. The average number of requests over the last 10 seconds that resulted in a status code greater than or equal to 200 but less than 300. |
AWS.ElasticBeanstalk.ApplicationRequests3xx
|
Count |
Status 3xx. The average number of requests over the last 10 seconds that resulted in a status code greater than or equal to 300 but less than 400. |
AWS.ElasticBeanstalk.ApplicationRequests4xx
|
Count |
Status 4xx. The average number of requests over the last 10 seconds that resulted in a status code greater than or equal to 400 but less than 500. |
AWS.ElasticBeanstalk.ApplicationRequests5xx
|
Count |
Status 5xx. The average number of requests over the last 10 seconds that resulted in a status code greater than or equal to 500 but less than 600. |
AWS.ElasticBeanstalk.ApplicationRequestsTotal
|
Count |
Request Count. The average number of requests handled by the web server per second over the last 10 seconds. |
AWS.ElasticBeanstalk.EnvironmentHealth
|
Count |
The health status of the environment. The possible values are 0 (OK), 1 (Info), 5 (Unknown), 10 (No data), 15 (Warning), 20 (Degraded) and 25 (Severe). |
ELB
Metric | Units | Description |
---|---|---|
AWS.ELB.BackendConnectionErrors
|
BackendConnectionErrors. The total number of connections that were not successfully established between the load balancer and the registered instances. |
|
AWS.ELB.HTTPCode_ELB_4XX
|
HTTPCode_ELB_4XX. The total number of HTTP 4XX client error codes generated by the load balancer. |
|
AWS.ELB.HTTPCode_ELB_5XX
|
HTTPCode_ELB_5XX. The total number of HTTP 5XX client error codes generated by the load balancer. |
|
AWS.ELB.HealthyHostCount
|
healthyHostCount. The average number of healthy instances registered with your load balancer. |
|
AWS.ELB.RequestCount
|
RequestCount. The total number of requests completed or connections made during the specified interval |
|
AWS.ELB.SpilloverCount
|
SpilloverCount. The total number of requests that were rejected because the surge queue is full. |
|
AWS.ELB.SurgeQueueLength
|
SurgeQueueLength. The total number of requests (HTTP listener) or connections (TCP listener) that are pending routing to a healthy instance. |
|
AWS.ELB.UnHealthyHostCount
|
UnHealthyHostCount. The average number of unhealthy instances registered with your load balancer. An instance is considered unhealthy after it exceeds the unhealthy threshold configured for health checks. |
Lambda
Metric | Units | Description |
---|---|---|
AWS.Lambda.ConcurrentExecutions
|
Count |
ConcurrentExecutions. The maximum number of function instances that are processing events. |
AWS.Lambda.DeadLetterErrors
|
Count |
DeadLetterErrors. The total number of times that Lambda attempts to send an event to a dead-letter queue but fails. Dead-letter errors can occur due to permissions errors, misconfigured resources, or size limits. |
AWS.Lambda.Duration
|
milliseconds (ms) |
Duration. The average amount of time that your function code spends processing an event. |
AWS.Lambda.Errors
|
Count |
Errors. The total number of invocations that result in a function error. |
AWS.Lambda.Invocations
|
Count |
Invocations. The total number of times that a function code is invoked, including successful invocations and invocations that result in a function error. |
AWS.Lambda.IteratorAge
|
milliseconds (ms) |
IteratorAge. The maximum amount of time between when a stream receives the record and when the event source mapping sends the event to the function. |
AWS.Lambda.Throttles
|
Count |
Throttles. The total number of invocation requests that are throttled. When all function instances are processing requests and no concurrency is available to scale up, Lambda rejects additional requests with a TooManyRequestsException error. |
NAT Gateway
Metric | Units | Description |
---|---|---|
AWS.NATGateway.ActiveConnectionCount
|
ActiveConnectionCount. The maximum number of concurrent active TCP connections through the NAT gateway. |
|
AWS.NATGateway.BytesInFromDestination
|
BytesInFromDestination. The total number of bytes received by the NAT gateway from the destination. |
|
AWS.NATGateway.BytesInFromSource
|
BytesInFromSource. The total number of bytes received by the NAT gateway from clients in VPC. |
|
AWS.NATGateway.BytesOutToDestination
|
BytesOutToDestination. The total number of bytes sent out through the NAT gateway to the destination. |
|
AWS.NATGateway.BytesOutToSource
|
BytesOutToSource. The total number of bytes sent through the NAT gateway to the clients in VPC. |
|
AWS.NATGateway.ConnectionAttemptCount
|
ConnectionAttemptCount. The total number of connection attempts made through the NAT gateway. |
|
AWS.NATGateway.ConnectionEstablishedCount
|
ConnectionEstablishedCount. The total number of connections established through the NAT gateway. |
|
AWS.NATGateway.ErrorPortAllocation
|
ErrorPortAllocation. The total number of times the NAT gateway could not allocate a source port. |
|
AWS.NATGateway.IdleTimeoutCount
|
IdleTimeoutCount. The total number of connections that transitioned from the active state to the idle state. |
|
AWS.NATGateway.PacketsDropCount
|
PacketsDropCount. The total number of packets dropped by the NAT gateway. |
|
AWS.NATGateway.PacketsInFromDestination
|
PacketsInFromDestination. The total number of packets received by the NAT gateway from the destination. |
|
AWS.NATGateway.PacketsInFromSource
|
PacketsInFromSource. The total number of packets received by the NAT gateway from clients in VPC. |
|
AWS.NATGateway.PacketsOutToDestination
|
PacketsOutToDestination. The total number of packets sent out through the NAT gateway to the destination. |
|
AWS.NATGateway.PacketsOutToSource
|
PacketsOutToSource. The total number of packets sent through the NAT gateway to the clients in VPC. |
RDS
Metric | Units | Description |
---|---|---|
AWS.RDS.BinLogDiskUsage
|
Binary Bytes |
BinLogDiskUsage. The average amount of disk space occupied by binary logs. |
AWS.RDS.BurstBalance
|
Percent (%) |
BurstBalance. The average percent of General Purpose SSD (gp2) burst-bucket I/O credits available. |
AWS.RDS.CPUCreditBalance
|
Count |
CpuCreditBalance. The average number of earned CPU credits that an instance has accrued since it was launched or started. |
AWS.RDS.CPUCreditUsage
|
Count |
CpuCreditUsage. The average number of CPU credits spent by the instance for CPU utilization. |
AWS.RDS.CPUUtilization
|
Percent (%) |
CpuUtilization. The average percentage of CPU utilization. |
AWS.RDS.DatabaseConnections
|
Count |
DatabaseConnections. The total number of client network connections to the database instance. |
AWS.RDS.DiskQueueDepth
|
Count |
DiskQueueDepth. The average number of outstanding I/Os (read/write requests) waiting to access the disk. |
AWS.RDS.FreeStorageSpace
|
Binary Bytes |
FreeStorageSpace. The average amount of available storage space. |
AWS.RDS.FreeableMemory
|
Binary Bytes |
FreeableMemory. The average amount of available random access memory. |
AWS.RDS.MaximumUsedTransactionIDs
|
Count |
MaximumUsedTransactionIDs. The maximum transaction IDs that have been used. This metric applies to PostgreSQL. |
AWS.RDS.NetworkReceiveThroughput
|
NetworkReceiveThroughput. The average incoming (receive) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication. |
|
AWS.RDS.NetworkTransmitThroughput
|
NetworkTransmitThroughput. The average outgoing (transmit) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication. |
|
AWS.RDS.OldestReplicationSlotLag
|
Megabytes |
OldestReplicationSlotLag. The average lagging size of the replica lagging the most in terms of write-ahead log (WAL) data received. This metric applies to PostgreSQL. |
AWS.RDS.ReadIOPS
|
ReadIOPS. The average number of disk read I/O operations per second. |
|
AWS.RDS.ReadLatency
|
seconds (s) |
Readlatency. The average amount of time taken per disk I/O operation. |
AWS.RDS.ReadThroughput
|
ReadThroughput. The average number of bytes read from disk per second. |
|
AWS.RDS.ReplicaLag
|
seconds (s) |
ReplicaLag. For read replica configurations, the average amount of time a read replica DB instance lags behind the source DB instance. |
AWS.RDS.ReplicationSlotDiskUsage
|
Megabytes |
ReplicationSlotDiskUsage. The average disk space used by replication slot files. This metric applies to PostgreSQL. |
AWS.RDS.SwapUsage
|
Binary Bytes |
SwapUsage. The average amount of swap space used on the DB instance. This metric is not available for SQL Server. |
AWS.RDS.TransactionLogsDiskUsage
|
Megabytes |
TransactionLogsDiskUsage. The average disk space used by transaction logs. This metric applies to PostgreSQL. |
AWS.RDS.TransactionLogsGeneration
|
TransactionLogsGeneration. The average size of transaction logs generated per second. This metric applies to PostgreSQL. |
|
AWS.RDS.WriteIOPS
|
WriteIOPS. The average number of disk write I/O operations per second. |
|
AWS.RDS.WriteLatency
|
seconds (s) |
WriteLatency. The average amount of time taken per disk I/O operation. |
AWS.RDS.WriteThroughput
|
WriteThroughput. The average number of bytes written to disk per second. |
S3
Metric | Units | Description |
---|---|---|
AWS.S3.4xxErrors
|
Count |
4xxErrors. The number of HTTP 4xx client error status code requests made to an Amazon S3 bucket with a value of either 0 or 1. |
AWS.S3.5xxErrors
|
5xxErrors. The number of HTTP 5xx server error status code requests made to an Amazon S3 bucket with a value of either 0 or 1. |
|
AWS.S3.AllRequests
|
Count |
AllRequests. The total number of HTTP requests made to an Amazon S3 bucket, regardless of type. |
AWS.S3.BucketSizeBytes
|
Binary Bytes |
BucketSizeBytes. The amount of data that is stored in a bucket, in bytes. |
AWS.S3.BytesDownloaded
|
Binary Bytes |
BytesDownloaded. The number of bytes downloaded for requests made to an Amazon S3 bucket where the response includes a body. |
AWS.S3.BytesUploaded
|
Binary Bytes |
BytesUploaded. The number of bytes uploaded for requests made to an Amazon S3 bucket where the request includes a body. |
AWS.S3.DeleteRequests
|
Count |
The number of HTTP DELETE requests made for objects in a bucket. |
AWS.S3.FirstByteLatency
|
FirstByteLatency. The per-request time from the complete request being received by an Amazon S3 bucket to when the response starts to be returned. |
|
AWS.S3.GetRequests
|
Count |
GetRequests. The number of HTTP GET requests made for objects in an Amazon S3 bucket. This doesn't include list operations. |
AWS.S3.HeadRequests
|
Count |
The number of HTTP HEAD requests made to a bucket. |
AWS.S3.ListRequests
|
Count |
The number of HTTP requests that list the contents of a bucket. |
AWS.S3.NumberOfObjects
|
Count |
NumberOfObjects. The total number of objects stored in a bucket for all storage classes. This value is calculated by counting all objects in the bucket (both current and noncurrent objects) and the total number of parts for all incomplete multipart uploads to the bucket. |
AWS.S3.PostRequests
|
Count |
PostRequests. The number of HTTP POST requests made to an Amazon S3 bucket. |
AWS.S3.PutRequests
|
Count |
PutRequests. The number of HTTP PUT requests made for objects in an Amazon S3 bucket. |
AWS.S3.TotalRequestLatency
|
TotalRequestLatency. The elapsed per-request time from the first byte received to the last byte sent to an Amazon S3 bucket. This metric includes the time taken to receive the request body and send the response body, which is not included in FirstByteLatency. |
SNS
Metric | Units | Description |
---|---|---|
AWS.SNS.NumberOfMessagesPublished
|
Count |
NumberOfMessagesPublished. The average number of messages published to Amazon SNS topics. |
AWS.SNS.NumberOfNotificationsDelivered
|
Count |
NumberOfNotificationsDelivered. The average number of messages successfully delivered from Amazon SNS topics to subscribing endpoints. |
AWS.SNS.NumberOfNotificationsFailed
|
Count |
NumberOfNotificationsFailed. The average number of messages that Amazon SNS failed to deliver. |
AWS.SNS.NumberOfNotificationsFailedToRedriveToDlq
|
NumberOfNotificationsFailedToRedriveToDlq. The average number of messages that couldn't be moved to a dead-letter queue. |
|
AWS.SNS.NumberOfNotificationsFilteredOut
|
NumberOfNotificationsFilteredOut. The average number of messages that were rejected by subscription filter policies. A filter policy rejects a message when the message attributes don't match the policy attributes. |
|
AWS.SNS.NumberOfNotificationsFilteredOut-InvalidAttributes
|
NumberOfNotificationsFilteredOut-InvalidAttributes. The average number of messages that were rejected by subscription filter policies because the messages' attributes are invalid. |
|
AWS.SNS.NumberOfNotificationsFilteredOut-NoMessageAttributes
|
NumberOfNotificationsFilteredOut-NoMessageAttributes. The average number of messages that were rejected by subscription filter policies because the messages have no attributes. |
|
AWS.SNS.NumberOfNotificationsRedrivenToDlq
|
NumberOfNotificationsRedrivenToDlq. The average number of messages that have been moved to a dead-letter queue. |
|
AWS.SNS.PublishSize
|
Binary Bytes |
PublishSize. The average size of messages published. |
Transit Gateway
Metric | Units | Description |
---|---|---|
AWS.TransitGateway.BytesDropCountBlackhole
|
BytesDropCountBlackhole. The total number of bytes dropped because they matched a blackhole route. |
|
AWS.TransitGateway.BytesDropCountNoRoute
|
BytesDropCountNoRoute. The total number of bytes dropped because they did not match a route. |
|
AWS.TransitGateway.BytesIn
|
BytesIn. The total number of bytes received by the transit gateway. |
|
AWS.TransitGateway.BytesOut
|
BytesOut. The total number of bytes sent from the transit gateway. |
|
AWS.TransitGateway.PacketDropCountBlackhole
|
PacketDropCountBlackhole. The total number of packets dropped because they matched a blackhole route. |
|
AWS.TransitGateway.PacketDropCountNoRoute
|
PacketDropCountNoRoute. The total number of packets dropped because they did not match a route. |
|
AWS.TransitGateway.PacketsIn
|
PacketsIn. The total number of packets received by the transit gateway. |
|
AWS.TransitGateway.PacketsOut
|
PacketsOut. The total number of packets sent by the transit gateway. |
VPN
Metric | Units | Description |
---|---|---|
AWS.VPN.TunnelDataIn
|
Binary Bytes |
TunnelDataIn. The total bytes received on the AWS side of the connection through the VPN tunnel from a customer gateway. |
AWS.VPN.TunnelDataOut
|
Binary Bytes |
TunnelDataOut. The total bytes sent from the AWS side of the connection through the VPN tunnel to the customer gateway. Each metric data point represents the number of bytes sent after the previous data point. |
AWS.VPN.TunnelState
|
Count |
TunnelState. The average state of the tunnels. For static VPNs, 0 indicates DOWN and 1 indicates UP. |
Infrastructure/Azure metrics
Metrics for Azure entities are collected by integrating SolarWinds Observability SaaS with your Azure cloud account. See Azure cloud platform monitoring.
App Service
Metric | Description |
---|---|
azure.sites.app_connections
|
Average Connections |
azure.sites.app_domains
|
Total App Domains. The average number of AppDomains loaded in this application. |
azure.sites.app_domains.unloaded
|
Total App Domains Unloaded |
azure.sites.collections.gen1
|
Gen 1 Garbage Collections |
azure.sites.collections.gen2
|
Gen 2 Garbage Collections |
azure.sites.cpu_time
|
CPU Time. The total amount of CPU consumed by the app, in seconds. |
azure.sites.current_assemblies
|
Current Assemblies |
azure.sites.function_executions
|
Function execution count |
azure.sites.handles
|
Average Handle Count |
azure.sites.http.101
|
Total Http 101 Requests |
azure.sites.http.2xx
|
Http2xx. The total number of requests resulting in an HTTP status code greater than or equal to 200 but less than 300. |
azure.sites.http.3xx
|
Total 3xx Requests |
azure.sites.http.401
|
Total 410 Requests |
azure.sites.http.403
|
Total 403 Requests |
azure.sites.http.404
|
Total 404 Requests |
azure.sites.http.406
|
Total 406 Requests |
azure.sites.http.4xx
|
Http4xx. The total number of requests resulting in an HTTP status code greater than or equal to 400 but less than 500. |
azure.sites.http.5xx
|
Http5xx. The total number of requests resulting in an HTTP status code greater than or equal to 500 but less than 600. |
azure.sites.io.bytes_received
|
Bytes Received. The total amount of incoming bandwidth consumed by the app. |
azure.sites.io.bytes_sent
|
Bytes Sent. The total amount of outgoing bandwidth consumed by the app. |
azure.sites.io.other_bytes
|
IO Other Bytes Per Second |
azure.sites.io.other_ops
|
IO Other Operations Per Second |
azure.sites.io.read_bytes
|
IoReadBytesPerSecond. The number of bytes per second the app is reading from I/O operations. |
azure.sites.io.read_ops
|
IO Read Operations Per Second. |
azure.sites.io.write_bytes
|
IO Write Bytes Per Second. The number of bytes per second the app is writing to I/O operations. |
azure.sites.io.write_ops
|
IO Write Operations Per Second |
azure.sites.memory.working_set
|
Memory Working Set. The current amount of memory used by the app. |
azure.sites.memory.working_set.avg
|
Average Memory Working Set. The average amount of memory used by the app, in megabytes. |
azure.sites.private_bytes
|
Private Bytes |
azure.sites.queued_requests
|
Requests In Application Queue. The average number of requests in the application request queue. |
azure.sites.requests
|
Requests. The total number of requests regardless of their resulting HTTP status code. |
azure.sites.response_time
|
Average Response Time. The average time taken for the app to serve requests, in seconds. |
azure.sites.threads
|
Threads. The average number of threads currently active in the app process. |
Blob Storage
Metric | Description |
---|---|
azure.storage.blob.availability
|
Availability. The average percentage of availability for the storage service or the specified API operation. Availability is calculated by taking the total billable requests value and dividing it by the number of applicable requests. |
azure.storage.blob.blobs
|
BlobCount. The average number of blob objects stored in the storage account. |
azure.storage.blob.capacity
|
BlobCapacity. The average amount of blob storage used in the storage account. |
azure.storage.blob.containers
|
ContainerCount. The average number of containers in the storage account. |
azure.storage.blob.egress
|
Egress. The total amount of egress data. This number includes egress from an external client into Azure Storage as well as egress within Azure. As a result, this number does not reflect billable egress. |
azure.storage.blob.index_capacity
|
IndexCapacity. The average amount of storage used by ADLS Gen2 Hierarchical Index. |
azure.storage.blob.ingress
|
Ingress. The total amount of ingress data. This number includes ingress from an external client into Azure Storage as well as ingress within Azure. |
azure.storage.blob.success.e2e_latency
|
SuccessE2ELatency. The average end-to-end latency of successful requests made to a storage service or the specified API operation. This value includes the required processing time within Azure Storage to read the request, send the response, and receive acknowledgment of the response. |
azure.storage.blob.success.server_latency
|
SuccessServerLatency. The average time used to process a successful request by Azure Storage. |
azure.storage.blob.transactions
|
Transactions. The total number of requests made to a storage service or the specified API operation. This number includes successful and failed requests, as well as requests that produced errors. |
CDN
Metric | Description |
---|---|
azure.cdn.byte_hit_ratio
|
ByteHitRatio. Of the total number of response bytes, the percentage that were served from the CDN cache. |
azure.cdn.origin_health_percentage
|
OriginHealthPercentage. The percentage of successful health probes sent to backends. |
azure.cdn.origin_latency
|
OriginLatency. The average time from when the request was sent to the backend to when the last response byte was received. |
azure.cdn.origin_request_count
|
OriginRequestCount. The total number of requests sent to origin. |
azure.cdn.percentage_4XX
|
Percentage4XX. The average percentage of requests with a status code greater than or equal to 400 but less than 500. |
azure.cdn.percentage_5XX
|
Percentage5XX. The average percentage of requests with a status code greater than or equal to 500 but less than 600. |
azure.cdn.request_count
|
RequestCount. The total number of client requests served by CDN. |
azure.cdn.request_size
|
RequestSize. The total number of bytes sent as requests from clients. |
azure.cdn.response_size
|
ResponseSize. The total number of bytes sent as responses from CDN edge to clients. |
azure.cdn.total_latency
|
TotalLatency. The average time from the client request being received by CDN until the last response byte is sent from CDN to the client. |
azure.cdn.web_application_firewall_request_count
|
WebApplicationFirewallRequestCount. The total number of matched WAF requests. |
Cosmos DB
Metric | Description |
---|---|
azure.cosmos.autoscale_max_throughput
|
AutoscaleMaxThroughput. The maximum throughput the autoscale will scale to. |
azure.cosmos.available_storage
|
AvailableStorage. The total amount of available storage reported at 5-minute granularity per region. |
azure.cosmos.cassandra.connection.avg_replication_latency
|
CassandraConnectorAvgReplicationLatency. The average replication latency of the Cassandra Connector. |
azure.cosmos.cassandra.connection.replication_health_status
|
CassandraConnectorReplicationHealthStatus. The replication health status of the Cassandra Connector. |
azure.cosmos.cassandra.connection_closures
|
CassandraConnectionClosures. The total number of Cassandra Connections closed. |
azure.cosmos.cassandra.request_charges
|
CassandraRequestCharges. The total number of request units consumed by the API for Cassandra. |
azure.cosmos.cassandra.requests
|
CassandraRequests. The total number of Cassandra API requests made. |
azure.cosmos.data.usage
|
DataUsage. The total data usage reported at 5-minute granularity per region. |
azure.cosmos.document.count
|
DocumentCount. The total document count reported at 5-minute granularity per region. |
azure.cosmos.document.quota
|
DocumentQuota. The total storage quota reported at 5-minute granularity per region. |
azure.cosmos.gremlin.request_charge
|
GremlinRequestCharges. The total number of request units consumed by Gremlin queries. |
azure.cosmos.gremlin.requests
|
GremlinRequests. The total number of requests made by Gremlin queries. |
azure.cosmos.index_usage
|
IndexUsage. The total Index usage reported at 5-minute granularity per region. |
azure.cosmos.mongo.request_charge
|
MongoRequestCharge. The total number of Mongo request units consumed. |
azure.cosmos.mongo.requests
|
MongoRequests. The total number of Mongo requests made. |
azure.cosmos.normalized_ru_consumption
|
NormalizedRUConsumption. The maximum request unit consumption percentage per minute. |
azure.cosmos.provisioned_throughput
|
ProvisionedThroughput. The maximum provisioned throughput at container granularity. |
azure.cosmos.replication_latency.p99
|
ReplicationLatency. The average replication latency across the source and target regions for a geo-enabled account. |
azure.cosmos.requests.metadata
|
MetadataRequests. The total number of metadata requests. |
azure.cosmos.requests.total
|
TotalRequests. The total number of requests made. |
azure.cosmos.requests.total_units
|
TotalRequestUnits. The total number of request units consumed. |
azure.cosmos.server_side_latency
|
ServerSideLatency. The average amount of time taken by the server to process a request. |
azure.cosmos.service_availability
|
ServiceAvailability. The average account request availability at one-hour granularity. |
Event Hubs
Metric | Description |
---|---|
azure.eventhubs.namespaces.active_connections
|
ActiveConnections. The maximum number of active connections on a namespace and on an entity (event hub) in the namespace. |
azure.eventhubs.namespaces.captured_bytes
|
CapturedBytes. The total number of captured bytes for an event hub. |
azure.eventhubs.namespaces.captured_messages
|
CapturedMessages. The total number of captured messages for an event hub. |
azure.eventhubs.namespaces.connections_closed
|
ConnectionsClosed. The total number of closed connections. |
azure.eventhubs.namespaces.connections_opened
|
ConnectionsOpened. The total number of open connections. |
azure.eventhubs.namespaces.incoming_bytes
|
IncomingBytes. The number of incoming bytes for an event hub during the specified period. |
azure.eventhubs.namespaces.incoming_messages
|
IncomingMessages. The total number of events or messages sent to Event Hubs over a specified period. |
azure.eventhubs.namespaces.incoming_requests
|
IncomingRequests. The total number of requests made to the Event Hubs service over a specified period. This metric includes all the data and management plane operations. |
azure.eventhubs.namespaces.namespace_cpu_usage
|
NamespaceCpuUsage. The maximum namespace CPU usage. |
azure.eventhubs.namespaces.namespace_memory_usage
|
NamespaceMemoryUsage. The maximum namespace memory usage. |
azure.eventhubs.namespaces.outgoing_bytes
|
OutgoingBytes. The number of outgoing bytes for an event hub during the specified period. |
azure.eventhubs.namespaces.outgoing_messages
|
OutgoingMessages. The total number of events or messages received from Event Hubs over a specified period. |
azure.eventhubs.namespaces.quota_exceeded_errors
|
QuotaExceededErrors. The total number of errors caused by exceeding quotas over a specified period. |
azure.eventhubs.namespaces.server_errors
|
ServerErrors. The total number of requests not processed because of an error in the Event Hubs service over a specified period. |
azure.eventhubs.namespaces.size
|
Size. The average size of an event hub. |
azure.eventhubs.namespaces.successful_requests
|
SuccessfulRequests. The total number of successful requests made to the Event Hubs service over a specified period. |
azure.eventhubs.namespaces.throttled_requests
|
ThrottledRequests. The total number of requests that were throttled because the usage was exceeded. |
azure.eventhubs.namespaces.user_errors
|
UserErrors. The total number of requests not processed because of user errors over a specified period. |
Files
Metric | Description |
---|---|
azure.storage.files.availability
|
Availability. The average percentage of availability for the storage service or the specified API operation. Availability is calculated by taking the total billable requests value and dividing it by the number of applicable requests, including those requests that produced unexpected errors. |
azure.storage.files.egress
|
Egress. The total amount of egress data. This number includes egress from an external client into Azure Storage as well as egress within Azure. |
azure.storage.files.file_capacity
|
FileCapacity. The average amount of file storage used by the storage account. |
azure.storage.files.file_count
|
FileCount. The average number of files in the storage account. |
azure.storage.files.fileshare_count
|
FileShareCount. The average number of file shares in the storage account. |
azure.storage.files.fileshare_quota
|
FileShareQuota. The average upper limit on the amount of storage that can be used by Azure Files service in bytes. |
azure.storage.files.fileshare_snapshotcount
|
FileShareSnapshotCount. The average number of snapshots present on the share in the storage account's Azure Files service. |
azure.storage.files.fileshare_snapshotsize
|
FileShareSnapshotSize. The average amount of storage used by the snapshots in the storage account's Azure Files service. |
azure.storage.files.ingress
|
Ingress. The total amount of ingress data. This number includes ingress from an external client into Azure Storage as well as ingress within Azure. |
azure.storage.files.success.e2e_latency
|
SuccessE2ELatency. The average end-to-end latency of successful requests made to a storage service or the specified API operation. This value includes the required processing time within Azure Storage to read the request, send the response, and receive acknowledgment of the response. |
azure.storage.files.success.server_latency
|
SuccessServerLatency. The average time used to process a successful request by Azure Storage. This value does not include the network latency specified in SuccessE2ELatency. |
azure.storage.files.transactions
|
Transactions. The total number of requests made to a storage service or the specified API operation. This number includes successful and failed requests, as well as requests that produced errors. |
Front Door
Metric | Description |
---|---|
azure.frontdoor.backend_health_percentage
|
BackendHealthPercentage. The average percentage of successful health probes from AFD to origin. |
azure.frontdoor.backend_request_count
|
BackendRequestCount. The total number of requests sent from AFD to origin. |
azure.frontdoor.backend_request_latency
|
BackendRequestLatency. The average time calculated from when the request was sent by AFD edge to the backend until AFD received the last response byte from the backend. |
azure.frontdoor.billable_response_size
|
BillableResponseSize. The total number of billable bytes (minimum 2KB per request) sent as responses from HTTP/S proxy to clients. |
azure.frontdoor.request_count
|
RequestCount. The total number of client requests served by CDN. |
azure.frontdoor.request_size
|
RequestSize. The total number of bytes sent as requests from clients to AFD. |
azure.frontdoor.response_size
|
ResponseSize. The total number of bytes sent as responses from Front Door to clients. |
azure.frontdoor.total_latency
|
TotalLatency. The average time from the client request being received by CDN until the last response byte is sent from CDN to the client. |
azure.frontdoor.web_application_firewall_request_count
|
WebApplicationFirewallRequestCount. The total number of matched WAF requests. |
Functions
Metric | Description |
---|---|
azure.sites.app_domains
|
Total App Domains. The average number of app domains loaded in the application. |
azure.sites.app_domains.unloaded
|
Total App Domains Unloaded. The average number of application domains unloaded. |
azure.sites.collections.gen1
|
Gen 1 Garbage Collections |
azure.sites.collections.gen2
|
Gen 2 Garbage Collections |
azure.sites.current_assemblies
|
Current Assemblies |
azure.sites.function_executions
|
Function Execution Count. The total number of times a function app has executed. This value correlates to the number of times a function runs in an app. |
azure.sites.function_executions.unit
|
Function Execution Units. The number of function execution units. |
azure.sites.http.5xx
|
HTTP 5xx. The total number of requests with a status code greater than or equal to 500 but less than 600. |
azure.sites.io.bytes_received
|
Bytes Received. The number of incoming data bytes. |
azure.sites.io.bytes_sent
|
Bytes Sent. The number of outgoing data bytes. |
azure.sites.io.other_bytes
|
IO Other Bytes Per Second |
azure.sites.io.other_ops
|
IO Other Operations Per Second |
azure.sites.io.read_bytes
|
IO Read Bytes Per Second. The number of bytes per second the app is reading from I/O operations. |
azure.sites.io.read_ops
|
IO Read Operations Per Second. The number of read I/O operations per second the app is issuing. |
azure.sites.io.write_bytes
|
IO Write Bytes Per Second. The number of bytes per second the app is writing to I/O operations. |
azure.sites.io.write_ops
|
IO Write Operations Per Second. The number of write I/O operations per second the app is issuing. |
azure.sites.memory.working_set
|
Memory Working Set. The average amount of memory used by the app. |
azure.sites.memory.working_set.avg
|
Average Memory Working Set. The average amount of memory used by the app. |
azure.sites.private_bytes
|
Private Bytes. The average number of private bytes allocated to the app. |
azure.sites.queued_requests
|
Requests In Application Queue. The average number of requests in the application queue. |
azure.sites.requests
|
Requests. The total number of requests. |
azure.sites.response_time
|
Average Response Time. The average time taken for the app to serve requests. |
Key Vault
Metric | Description |
---|---|
azure.key_vault.service_api.hit
|
Service API Hit. The total number of service API hits. |
azure.key_vault.service_api.latency
|
Service API Latency. The average latency of service API requests. |
azure.key_vault.service_api.result
|
Service API Result. The total number of service API results. |
Service Bus
Metric | Description |
---|---|
azure.servicebus.namespaces.abandon_message
|
AbandonMessage. The total number of messages abandoned over a specified period. |
azure.servicebus.namespaces.active_connections
|
ActiveConnections. The total number of active connections on a namespace and on an entity in the namespace. The value for this metric is a point-in-time value. Connections that were active immediately after that point in time may not be reflected in the metric. |
azure.servicebus.namespaces.active_messages
|
ActiveMessages. The average number of active messages in a queue/topic. |
azure.servicebus.namespaces.complete_message
|
CompleteMessage. The total number of messages completed over a specified period. |
azure.servicebus.namespaces.connections_closed
|
ConnectionsClosed. The average number of connections closed. The value for this metric is an aggregation and includes all connections that were opened in the aggregation time window. |
azure.servicebus.namespaces.connections_opened
|
ConnectionsOpened. The average number of connections opened. The value for this metric is an aggregation and includes all connections that were opened in the aggregation time window. |
azure.servicebus.namespaces.deadlettered_messages
|
DeadletteredMessages. The average number of dead-lettered messages in a queue/topic. |
azure.servicebus.namespaces.incoming_messages
|
IncomingMessages. The total number of events or messages sent to Service Bus over a specified period. For basic and standard tiers, incoming auto-forwarded messages are included in this metric. For the premium tier, they aren't included. |
azure.servicebus.namespaces.incoming_requests
|
IncomingRequests. The total number of requests made to the Service Bus service over a specified period. |
azure.servicebus.namespaces.messages
|
Messages. The average number of messages in a queue/topic. |
azure.servicebus.namespaces.outgoing_messages
|
OutgoingMessages. The total number of events or messages received from Service Bus over a specified period. The outgoing auto-forwarded messages aren't included in this metric. |
azure.servicebus.namespaces.pending_checkpoint_operation_count
|
PendingCheckpointOperationCount. The average number of pending checkpoint operations on the namespace. Service starts to throttle when the pending checkpoint count exceeds limit of (500,000 + (500,000 * messaging units)) operations. This metric applies only to namespaces using the premium tier. |
azure.servicebus.namespaces.scheduled_messages
|
ScheduledMessages. The average number of scheduled messages in a queue/topic. |
azure.servicebus.namespaces.server_errors
|
ServerErrors. The total number of requests not processed because of an error in the Service Bus service over a specified period. |
azure.servicebus.namespaces.server_send_latency
|
ServerSendLatency. The average time taken by the Service Bus service to complete the request. |
azure.servicebus.namespaces.size
|
Size. The average size of an entity (queue or topic) in bytes. |
azure.servicebus.namespaces.successful_requests
|
SuccessfulRequests. The total number of successful requests made to the Service Bus service over a specified period. |
azure.servicebus.namespaces.throttled_requests
|
ThrottledRequests. The total number of requests that were throttled because the usage was exceeded. |
azure.servicebus.namespaces.user_errors
|
UserErrors. The total number of requests not processed because of user errors over a specified period. |
SQL Database
Metric | Description |
---|---|
azure.sql.servers.databases.connection_failed
|
Failed Connections. The total number of connections that failed. |
azure.sql.servers.databases.connection_successful
|
Successful Connections. The total number of successful connections. |
azure.sql.servers.databases.cpu_percent
|
CPU Utilization. The average percentage of CPU used. |
azure.sql.servers.databases.deadlock
|
Deadlocks. The total number of deadlocks. |
azure.sql.servers.databases.log_write_percent
|
Log Write Percentage. The average log I/O percentage based on the limit of the service tier. |
azure.sql.servers.databases.physical_data_read_percent
|
Data IO Percentage. The average data I/O percentage based on the limit of the service tier. |
azure.sql.servers.databases.sessions_percent
|
Sessions Percentage. The average percentage of concurrent sessions based on the limit of the service tier. |
azure.sql.servers.databases.storage
|
Data Space Used. The total amount of space used to store data. |
azure.sql.servers.databases.storage_percent
|
Storage Utilization. The average percentage of spaced used to store data based on the limit of the service tier. |
Virtual Machines
Metric | Description |
---|---|
azure.vm.cpu.credits_consumed
|
Total number of credits consumed by the Virtual Machine |
azure.vm.cpu.credits_remaining
|
Total number of credits available to burst |
azure.vm.cpu.percentage
|
The percentage of allocated compute units that are currently in use by the Virtual Machine(s) |
azure.vm.disk.read_bytes
|
Bytes read from disk during monitoring period |
azure.vm.disk.read_ops
|
Disk Read IOPS |
azure.vm.disk.write_bytes
|
Bytes written to disk during monitoring period |
azure.vm.disk.write_ops
|
Disk Write IOPS |
azure.vm.network.in
|
The number of billable bytes received on all network interfaces by the Virtual Machine(s) (Incoming Traffic) |
Virtual Machine Scale Sets
Metric | Description |
---|---|
azure.vmss.cpu.percentage
|
Percentage CPU. The percentage of allocated compute units that are currently in use by the VM(s). |
azure.vmss.disk.data.read_bytes
|
Data Disk Read. The average number of bytes per second read from a single disk during the monitoring period. |
azure.vmss.disk.data.write_bytes
|
Data Disk Write. The average number of bytes per second written to a single disk during the monitoring period. |
azure.vmss.disk.read_bytes
|
Disk Read. The total number of bytes read from disk during the monitoring period. |
azure.vmss.disk.read_ops
|
Disk Read Operations. The average number of input operations read in a second from all disks attached to the VM(s). |
azure.vmss.disk.write_bytes
|
Disk Write. The total number of bytes written to disk during the monitoring period. |
azure.vmss.disk.write_ops
|
Disk Write Operations. The average number of output operations written in a second to all disks attached to the VM(s). |
azure.vmss.memory.available_bytes
|
Available Memory Bytes. The amount of physical memory, in bytes, immediately available for allocation to a process or for system use in the VM(s). |
azure.vmss.network.total_in
|
Network In Total. The number of bytes received on all network interfaces by the VM(s) (incoming traffic). |
azure.vmss.network.total_out
|
Network Out Total. The number of bytes out on all network interfaces by the VM(s) (outgoing traffic). |
Infrastructure/Kubernetes metrics
Metrics for Kubernetes entities are collected by installing the SWO K8s Collector on a Kubernetes cluster that has Prometheus installed. See Kubernetes monitoring.
Cluster metrics
Metric | Unit | Description |
---|---|---|
k8s.cluster.cpu.allocatable
|
core |
The allocatable of CPU on cluster that are available for scheduling. Metric type: Gauge. |
k8s.cluster.cpu.capacity
|
core |
The cluster CPU capacity. Metric type: Gauge. |
k8s.cluster.cpu.utilization
|
Percent (%) |
The cluster CPU usage. Metric type: Gauge. |
k8s.cluster.memory.allocatable |
Binary Bytes |
The allocatable of memory on cluster that are available for scheduling. Metric type: Gauge. |
k8s.cluster.memory.capacity
|
Binary Bytes |
The cluster memory capacity. Metric type: Gauge. |
k8s.cluster.memory.utilization
|
Percent (%) |
The cluster memory usage. Metric type: Gauge. |
k8s.cluster.nodes
|
Count |
The number of nodes on cluster. Metric type: Gauge. |
k8s.cluster.nodes.ready
|
Count |
The number of nodes with status condition ready. Metric type: Gauge. |
k8s.cluster.nodes.ready.avg
|
Percent (%) |
The percentage of nodes with status condition ready. Metric type: Gauge. |
k8s.cluster.pods
|
Count |
The number of pods on a cluster. Metric type: Gauge. |
k8s.cluster.pods.running
|
Count |
The number of pods in running phase. Metric type: Gauge. |
k8s.cluster.spec.cpu.requests
|
cores |
The total number of requested CPU by all containers in a cluster. Metric type: Gauge. |
k8s.cluster.spec.memory.requests
|
Binary Bytes |
The total number of requested memory by all containers in a cluster. Metric type: Gauge. |
Node metrics
Metric | Unit | Description |
---|---|---|
k8s.kube_node_created |
seconds (s) |
Unix creation timestamp. Metric type: Gauge. |
k8s.kube_node_info |
Information about a cluster node. Metric type: Gauge. |
|
k8s.kube_node_spec_unschedulable
|
Whether a node can schedule new pods. Metric type: Gauge. |
|
k8s.kube_node_status_allocatable |
|
The amount of resources allocatable for pods (after reserving some for system daemons). Metric type: Gauge. |
k8s.kube_node_status_capacity |
|
The total amount of resources available for a node. Metric type: Gauge. |
k8s.kube_node_status_condition
|
The condition of a cluster node. Metric type: Gauge. |
|
k8s.kube_node_status_ready
|
Node status (as tag Metric type: Gauge. |
|
k8s.node.cpu.allocatable |
core |
CPU Utilization. The allocatable of CPU on node that are available for scheduling. Metric type: Gauge. |
k8s.node.cpu.capacity
|
core |
CPU Utilization. The node CPU capacity. Metric type: Gauge. |
k8s.node.cpu.usage.seconds.rate
|
core |
CPU Utilization. The rate of node cumulative CPU time consumed. Metric type: Gauge. |
k8s.node.fs.iops |
Disk IOPS. Rate of reads and writes of all pods on node. Metric type: Gauge. |
|
k8s.node.fs.throughput
|
Disk throughput. Rate of bytes read and written of all pods on node. Metric type: Gauge. |
|
k8s.node.fs.usage |
Binary Bytes |
Disk Usage. Number of bytes that are consumed by containers on this node’s filesystem. Metric type: Gauge. |
k8s.node.memory.allocatable |
Binary Bytes |
Memory Utilization. The allocatable of memory on node that are available for scheduling. Metric type: Gauge. |
k8s.node.memory.capacity |
Binary Bytes |
Memory Utilization. The node memory capacity. Metric type: Gauge. |
k8s.node.memory.working_set
|
Binary Bytes |
Memory utilization. Current working set on node. Metric type: Gauge. |
k8s.node.network.bytes_received |
Network In. Rate of bytes received of all pods on node. Metric type: Gauge. |
|
k8s.node.network.bytes_transmitted
|
Network Out. Rate of bytes transmitted of all pods on node. Metric type: Gauge. |
|
k8s.node.network.packets_received
|
Rate of packets received of all pods on node. Metric type: Gauge. |
|
k8s.node.network.packets_transmitted
|
Rate of packets transmitted of all pods on node. Metric type: Gauge. |
|
k8s.node.network.receive_packets_dropped
|
Rate of packets dropped while receiving of all pods on node. Metric type: Gauge. |
|
k8s.node.network.transmit_packets_dropped
|
Rate of packets dropped while transmitting of all pods on node. Metric type: Gauge. |
|
k8s.node.pods |
Count |
Number of pods. The number of pods on a node. Metric type: Gauge. |
k8s.node.status.condition.diskpressure
|
The condition diskpressure of a cluster node (1 when true, 0 when false or unknown). Metric type: Gauge. |
|
k8s.node.status.condition.memorypressure |
The condition memorypressure of a cluster node (1 when true, 0 when false or unknown). Metric type: Gauge. |
|
k8s.node.status.condition.networkunavailable
|
The condition networkunavailable of a cluster node (1 when true, 0 when false or unknown). Metric type: Gauge. |
|
k8s.node.status.condition.pidpressure
|
The condition pidpressure of a cluster node (1 when true, 0 when false or unknown). Metric type: Gauge. |
|
k8s.node.status.condition.ready
|
The condition ready of a cluster node (1 when true, 0 when false or unknown). Metric type: Gauge. |
Pod metrics
Metric | Unit | Description |
---|---|---|
k8s.kube.pod.owner.daemonset
|
Information about the DaemonSet owning the pod. Metric type: Gauge. |
|
k8s.kube.pod.owner.replicaset
|
Information about the ReplicaSet owning the pod. Metric type: Gauge. |
|
k8s.kube.pod.owner.statefulset
|
Information about the StatefulSet owning the pod. Metric type: Gauge. |
|
k8s.kube_pod_completion_time |
seconds (s) |
Completion time in unix timestamp for a pod. Metric type: Gauge. |
k8s.kube_pod_created
|
seconds (s) |
Unix creation timestamp. Metric type: Gauge. |
k8s.kube_pod_info |
Information about the pod. Metric type: Gauge. |
|
k8s.kube_pod_owner
|
Information about the pod owner. Metric type: Gauge. |
|
k8s.kube_pod_start_time
|
seconds (s) |
Start time in unix timestamp for a pod. Metric type: Gauge. |
k8s.kube_pod_status_phase
|
The pod's current phase. Metric type: Gauge. |
|
k8s.kube_pod_status_ready
|
Describes whether the pod is ready to serve requests. Metric type: Gauge. |
|
k8s.kube_pod_status_reason
|
The pod status reasons. Metric type: Gauge. |
|
k8s.pod.containers |
Count |
The number of containers on pod. Metric type: Gauge. |
k8s.pod.containers.running
|
Current number of running containers on pod. Metric type: Gauge. |
|
k8s.pod.cpu.usage.seconds.rate
|
seconds (s) |
CPU Utilization. The rate of pod's cumulative CPU time consumed. Metric type: Gauge. |
k8s.pod.fs.iops
|
Disk IOPS. Rate of reads and writes of all containers on pod. Metric type: Gauge. |
|
k8s.pod.fs.reads.bytes.rate
|
Rate of bytes read of all containers on pod. Metric type: Gauge. |
|
k8s.pod.fs.reads.rate
|
Rate of reads of all containers on pod. Metric type: Gauge. |
|
k8s.pod.fs.throughput
|
Disk Throughput. Rate of bytes read and written of all containers on pod. Metric type: Gauge. |
|
k8s.pod.fs.usage.bytes
|
Binary Bytes |
Disk Usage. Number of bytes that are consumed by containers on this pod's filesystem. Metric type: Gauge. |
k8s.pod.fs.writes.bytes.rate
|
Rate of bytes written of all containers on pod. Metric type: Gauge. |
|
k8s.pod.fs.writes.rate
|
Rate of writes of all containers on pod. Metric type: Gauge. |
|
k8s.pod.memory.working_set
|
Binary Bytes |
Memory Utilization. Current working set on pod. Metric type: Gauge. |
k8s.pod.network.bytes_received
|
Network In. Rate of bytes received of all containers on pod. Metric type: Gauge. |
|
k8s.pod.network.bytes_transmitted
|
Network Out. Rate of bytes transmitted of all containers on pod. Metric type: Gauge. |
|
k8s.pod.network.packets_received
|
Rate of packets received of all containers on pod. Metric type: Gauge. |
|
k8s.pod.network.packets_transmitted
|
Rate of packets transmitted of all containers on pod. Metric type: Gauge. |
|
k8s.pod.network.receive_packets_dropped
|
Rate of packets dropped while receiving of all containers on pod. Metric type: Gauge. |
|
k8s.pod.network.transmit_packets_dropped
|
Rate of packets dropped while transmitting of all containers on pod. Metric type: Gauge. |
|
k8s.pod.spec.cpu.limit
|
cores |
CPU quota of all containers on pod in given CPU period. Metric type: Gauge. |
k8s.pod.spec.cpu.requests
|
cores |
The number of requested request resource by all containers on pod. Metric type: Gauge. |
k8s.pod.spec.memory.limit
|
Binary Bytes |
Memory Utilization. Memory limit for all containers on pod. Metric type: Gauge. |
k8s.pod.spec.memory.requests
|
Binary Bytes |
The number of requested memory by all containers on pod. Metric type: Gauge. |
k8s.pod.status.reason
|
The current pod status reason. Metric type: Gauge. |
Container metrics
Metric | Unit | Description |
---|---|---|
k8s.container.spec.cpu.requests |
core |
The number of requested CPU by a container. Metric type: Gauge. |
k8s.container.spec.memory.requests |
Binary Bytes |
The number of requested memory by a container. Metric type: Gauge. |
k8s.container.status
|
Describes the status of the container (waiting, running, or terminated). Metric type: Gauge. |
|
k8s.container_cpu_cfs_periods_total
|
Number of elapsed enforcement period intervals. Metric type: Counter. |
|
k8s.container_cpu_cfs_throttled_periods_total
|
Number of throttled period intervals. Metric type: Counter. |
|
k8s.container_cpu_usage_seconds_total
|
seconds (s) |
Cumulative CPU time consumed. Metric type: Counter |
k8s.container_fs_reads_bytes_total
|
Binary Bytes |
Cumulative count of bytes read. Metric type: Counter. |
k8s.container_fs_reads_total
|
Count |
Cumulative count of reads completed. Metric type: Counter. |
k8s.container_fs_usage_bytes
|
Binary Bytes |
Number of bytes that are consumed by the container on this filesystem. Metric type: Gauge. |
k8s.container_fs_writes_bytes_total
|
Binary Bytes |
Cumulative count of bytes written. Metric type: Counter. |
k8s.container_fs_writes_total
|
Count |
Cumulative count of writes completed. Metric type: Counter. |
k8s.container_memory_working_set_bytes |
Binary Bytes |
Current working set. Metric type: Gauge. |
k8s.container_network_receive_bytes_total
|
Binary Bytes |
Cumulative count of bytes received. Metric type: Counter. |
k8s.container_network_receive_packets_dropped_total
|
Count |
Cumulative count of packets dropped while receiving. Metric type: Counter. |
k8s.container_network_receive_packets_total
|
Count |
Cumulative count of packets received. Metric type: Counter. |
k8s.container_network_transmit_bytes_total
|
Binary Bytes |
Cumulative count of bytes transmitted. Metric type: Counter. |
k8s.container_network_transmit_packets_dropped_total
|
Count |
Cumulative count of packets dropped while transmitting. Metric type: Counter. |
k8s.container_network_transmit_packets_total
|
Count |
Cumulative count of packets transmitted. Metric type: Counter. |
k8s.container_spec_cpu_period |
CPU period of the container. Metric type: Gauge. |
|
k8s.container_spec_cpu_quota |
CPU quota of the container. Metric type: Gauge. |
|
k8s.container_spec_memory_limit_bytes |
Binary Bytes |
Memory limit for the container. Metric type: Gauge. |
k8s.kube_pod_container_info |
Information about a container in a pod. Metric type: Gauge. |
|
k8s.kube_pod_container_resource_limits |
cpu =<core>
|
The number of requested limit resource by a container. Metric type: Gauge. |
k8s.kube_pod_container_resource_requests
|
|
The number of requested request resource by a container. Metric type: Gauge. |
k8s.kube_pod_container_state_started |
seconds (s) |
Start time in unix timestamp for a pod container. Metric type: Gauge. |
k8s.kube_pod_container_status_last_terminated_exitcode
|
Describes the exit code for the last container in terminated state. Metric type: Gauge. |
|
k8s.kube_pod_container_status_last_terminated_reason
|
Describes the last reason the container was in terminated state. Metric type: Gauge. |
|
k8s.kube_pod_container_status_ready |
Describes whether the containers readiness check succeeded. Metric type: Gauge. |
|
k8s.kube_pod_container_status_restarts_total |
The number of container restarts per container. Metric type: Counter. |
|
k8s.kube_pod_container_status_running |
Describes whether the container is currently in running state. Metric type: Gauge. |
|
k8s.kube_pod_container_status_terminated |
Describes whether the container is currently in terminated state. Metric type: Gauge. |
|
k8s.kube_pod_container_status_terminated_reason |
Describes the reason the container is currently in terminated state. Metric type: Gauge. |
|
k8s.kube_pod_container_status_waiting |
Describes whether the container is currently in waiting state. Metric type: Gauge. |
|
k8s.kube_pod_container_status_waiting_reason |
Describes the reason the container is currently in waiting state. Metric type: Gauge. |
Deployment metrics
Metric | Unit | Description |
---|---|---|
k8s.deployment.condition.available
|
Describes whether the deployment has an Available status condition. Metric type: Gauge. |
|
k8s.deployment.condition.progressing
|
Describes whether the deployment has a Progressing status condition. Metric type: Gauge. |
|
k8s.deployment.condition.replicafailure
|
Describes whether the deployment has a ReplicaFailure status condition. Metric type: Gauge. |
|
k8s.kube_deployment_created
|
seconds (s) |
Unix creation timestamp. Metric type: Gauge. |
k8s.kube_deployment_labels
|
Kubernetes labels converted to Prometheus labels. Metric type: Gauge. |
|
k8s.kube_deployment_spec_paused
|
Whether the deployment is paused and will not be processed by the deployment controller. Metric type: Gauge. |
|
k8s.kube_deployment_spec_replicas
|
Number of desired pods for a deployment. Metric type: Gauge. |
|
k8s.kube_deployment_status_condition
|
The current status conditions of a deployment. Metric type: Gauge. |
|
k8s.kube_deployment_status_replicas
|
The number of replicas per deployment. Metric type: Gauge. |
|
k8s.kube_deployment_status_replicas_available
|
The number of available replicas per deployment. Metric type: Gauge. |
|
k8s.kube_deployment_status_replicas_ready
|
The number of ready replicas per deployment. Metric type: Gauge. |
|
k8s.kube_deployment_status_replicas_unavailable
|
The number of unavailable replicas per deployment. Metric type: Gauge. |
|
k8s.kube_deployment_status_replicas_updated
|
The number of updated replicas per deployment. Metric type: Gauge. |
StatefulSet metrics
Metric | Unit | Description |
---|---|---|
k8s.kube_statefulset_created
|
seconds (s) |
Unix creation timestamp.
|
k8s.kube_statefulset_labels
|
Kubernetes labels converted to Prometheus labels. Metric type: Gauge. |
|
k8s.kube_statefulset_replicas
|
Number of desired pods for a StatefulSet. Metric type: Gauge. |
|
k8s.kube_statefulset_status_replicas_current
|
The number of current replicas per StatefulSet. Metric type: Gauge. |
|
k8s.kube_statefulset_status_replicas_ready
|
The number of ready replicas per StatefulSet. Metric type: Gauge. |
|
k8s.kube_statefulset_status_replicas_updated
|
The number of updated replicas per StatefulSet. Metric type: Gauge. |
DaemonSet metrics
Metric | Unit | Description |
---|---|---|
k8s.kube_daemonset_created
|
seconds (s) |
Unix creation timestamp. Metric type: Gauge. |
k8s.kube_daemonset_labels
|
Kubernetes labels converted to Prometheus labels. Metric type: Gauge. |
|
k8s.kube_daemonset_status_current_number_scheduled
|
The number of nodes that should be running a daemon pod and have at least one daemon pod running. Metric type: Gauge. |
|
k8s.kube_daemonset_status_desired_number_scheduled
|
The number of nodes that should be running the daemon pod. Metric type: Gauge. |
|
k8s.kube_daemonset_status_number_available
|
The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and available. Metric type: Gauge. |
|
k8s.kube_daemonset_status_number_misscheduled
|
The number of nodes that should not be running a daemon pod and have one or more running anyway. Metric type: Gauge. |
|
k8s.kube_daemonset_status_number_ready
|
The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and ready. Metric type: Gauge. |
|
k8s.kube_daemonset_status_number_unavailable
|
The number of nodes that should be running the daemon pod and have none of the daemon pod running and available. Metric type: Gauge. |
|
k8s.kube_daemonset_status_updated_number_scheduled
|
The total number of nodes that are running updated daemon pod. Metric type: Gauge. |
ReplicaSet metrics
Metric | Unit | Description |
---|---|---|
k8s.kube.replicaset.owner.deployment
|
Information about the Deployment owning the ReplicaSet. Metric type: Gauge. |
|
k8s.kube_replicaset_created
|
seconds (s) |
Unix creation timestamp. Metric type: Gauge. |
k8s.kube_replicaset_owner
|
Information about the ReplicaSet's owner. Metric type: Gauge. |
Namespace metrics
Metric | Unit | Description |
---|---|---|
k8s.kube_namespace_created
|
seconds (s) |
Unix creation timestamp. Metric type: Gauge. |
k8s.kube_namespace_status_phase
|
Kubernetes namespace status phase. Metric type: Gauge. |
|
k8s.kube_resourcequota |
ResourceQuota metric. Metric type: Gauge. |
Other metrics
Metric | Unit | Description |
---|---|---|
k8s.apiserver.request.successrate
|
Percent (%) |
Success rate of Kubernetes API server calls. Metric type: Gauge. |
Network metrics
Metrics for network device entities are sent by an installed Network Collector. See Network monitoring.
Standard metrics
Network device metrics
Interface metrics
Metric | Units | Description |
---|---|---|
sw.collector.InterfaceAvailability.Availability
|
Percent (%) |
Availability. Availability of the interface instance of instances. Displayed as a percentage. May be displayed as:
|
sw.collector.InterfaceTraffic.InPercentUtil
|
Percent (%) |
In Percent Utilization. Average utilization of an interface instance or instances. Displayed as a percentage. May be displayed as:
|
sw.collector.InterfaceTraffic.OutPercentUtil
|
Percent (%) |
Out Percent Utilization. Average utilization of an interface instance or instances. Displayed as a percentage. May be displayed as:
|
sw.collector.InterfaceTraffic.InAveragebps
|
Percent (%) |
In Bits Per Second Average. Average utilization of an interface instance or instances. Displayed as a percentage. |
sw.collector.InterfaceTraffic.OutAveragebps
|
Percent (%) |
Out Bits Per Second Average. Average utilization of an interface instance or instances. Displayed as a percentage. |
sw.collector.InterfaceErrors.InDiscards
|
Percent (%) |
Out Discards. Average utilization of an interface instance or instances. Displayed as a percentage. May be displayed as:
|
sw.collector.InterfaceErrors.OutDiscards
|
Percent (%) |
In Discards. Average utilization of an interface instance or instances. Displayed as a percentage. May be displayed as:
|
sw.collector.InterfaceErrors.InErrors
|
Percent (%) |
In Errors. Average utilization of an interface instance or instances. Displayed as a percentage. May be displayed as:
|
sw.collector.InterfaceErrors.OutErrors
|
Percent (%) |
Out Errors. Average utilization of an interface instance or instances. Displayed as a percentage. May be displayed as:
|
Volume metrics
Metric | Units | Description |
---|---|---|
sw.collector.VolumeUsageHistory.PercentDiskUsed
|
Percent (%) |
Percent Disk Used. Indicates the overall disk usage as a percentage. |
sw.collector.VolumeUsageHistory.AvgDiskUsed
|
Gigabytes |
Average Disk Used. Indicates the average disk usage in Gigabytes. |
sw.collector.VolumeUsageHistory.DiskSize
|
Gigabytes |
Volume Size. Indicates the disk size in Gigabytes. |
sw.collector.VolumePerformanceHistory.AvgDiskReads
|
Percent (%) |
Disk Read Average. Indicates the average read speed of the volume. Only for volumes monitored via WMI. |
sw.collector.VolumePerformanceHistory.AvgDiskWrites
|
Percent (%) |
Disk Write Average. Indicates the average write speed. Only for volumes monitored via WMI. |
Sensor metrics
Flow metrics
Metric | Units | Description |
---|---|---|
sw.collector.Netflow.Flows.Bytes
|
GB |
Top Protocols, Top Countries, Top Endpoints, Top Conversations, Top Applications, Top Advanced Applications. Endpoints producing the most traffic on your network, most bandwidth-consuming conversations, protocols used for most traffic, countries hosting endpoints that transmit the most data, or applications responsible for most monitored traffic. |
Wireless Controller and Thin Access Point metrics
Metric | Units | Description |
---|---|---|
sw.collector.Wireless.Interfaces
|
N/A | MAC, SSIDs, Channels and Radio Type details are gathered from wireless interfaces of that AP. |
sw.collector.Wireless.Clients
|
Number | The sum of clients connected to all interfaces of AP. |
sw.collector.Wireless.HistoricalClients.SignalStrength
|
RSSI - signal strength The following thresholds are used to convert dbm value to a strength indicator: -82, -72, -68, -63, -56 (-82 is the worst). |
|
sw.collector.Wireless.HistoricalClients.OutDataRate
|
Data rate on clients |
Special metrics
Metric | Units | Description |
---|---|---|
|
Percent (%) | Total average bps (transmitted + received). |
OTel metrics
When an OTel receiver is configured to send telemetry data directly to SolarWinds Observability SaaS, the metrics collected depend on what OTel data is sent. See OTel direct ingestion.
When you integrate with Apache, Elasticsearch, NGINX, Redis, or ZooKeeper, the SolarWinds Observability Agent is used to send metrics and log data to SolarWinds Observability SaaS. See Monitor with OTel.
Apache metrics
Metric | Units | Description |
---|---|---|
apache.cpu.load
|
Percent (%) |
The current load of the CPU. |
apache.cpu.time
|
Jiff | The jiffs used by processes of a given category. |
apache.current_connections
|
Connections | The number of active connections currently attached to the HTTP server. |
apache.load.1
|
Percent (%) | The average server load during the last minute. |
apache.load.15
|
Percent (%) | The average server load during the last 15 minutes. |
apache.load.5
|
Percent (%) | The average server load during the last 5 minutes. |
apache.request.time
|
milliseconds (ms) | Total time spent on handling requests. |
apache.request.time.rate
|
milliseconds (ms) | Total time spent on handling requests. |
apache.requests
|
Requests | The number of requests serviced by the HTTP server per second. |
apache.requests.rate
|
milliseconds (ms) | Total time spent on handling requests. |
apache.scoreboard
|
Workers | The number of workers in each state. |
apache.throughput
|
Byte per request | The average number of bytes served per request. |
apache.time.perrequest
|
milliseconds per request | The average processing time per request. |
apache.traffic
|
Byte | Total HTTP server traffic in bytes. |
apache.traffic.rate
|
Byte per request | HTTP server traffic in bytes per second. |
apache.uptime
|
seconds (s) | The amount of time that the server has been running in seconds. |
apache.workers
|
Workers | The number of workers currently attached to the HTTP server. |
apache.workers.idle
|
Workers | The number of idle workers. |
Confluent Cloud metrics
Metric | Units | Description |
---|---|---|
confluent_kafka_server_active_connection_count
|
{connections} | The count of active authenticated connections. |
|
{partitions} | The number of partitions. |
confluent_kafka_server_received_bytes
|
By (bytes)/60s | The delta count of bytes of the customer's data received from the network. Each sample is the number of bytes received since the previous data sample. The count is sampled every 60 seconds. |
confluent_kafka_server_received_records
|
{records}/60s | The delta count of records received. Each sample is the number of records received since the previous data sample. The count is sampled every 60 seconds. |
confluent_kafka_server_request_bytes
|
Bytes/60s | The delta count of total request bytes from the specified request types sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds. |
confluent_kafka_server_request_count
|
{requests}/60s | The delta count of requests received over the network. Each sample is the number of requests received since the previous data point. The count sampled every 60 seconds. |
confluent_kafka_server_response_bytes
|
Bytes/60s | The delta count of total response bytes from the specified response types sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds. |
confluent_kafka_server_retained_bytes
|
Bytes/60s | The current count of bytes retained by the cluster. The count is sampled every 60 seconds. |
confluent_kafka_server_sent_bytes
|
Bytes/60s | The delta count of bytes sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds. |
confluent_kafka_server_sent_records
|
{records}/60s | The delta count of records sent. Each sample is the number of records sent since the previous data point. The count is sampled every 60 seconds. |
confluent_kafka_server_successful_authentication_count
|
{successful authentications}/60s | The delta count of successful authentications. Each sample is the number of successful authentications since the previous data point. The count sampled every 60 seconds. |
Docker metrics
Metric | Units | Description |
---|---|---|
container.blockio.io_service_bytes_recursive
|
bytes (By) | The nof bytes transferred to/from the disk by the group and descendant groups. |
container.cpu.throttling_data.periods
|
{periods} | The number of periods with throttling active. |
container.cpu.usage.kernelmode
|
nanosecond (ns) | Time spent by tasks of the cgroup in kernel mode (Linux). Time spent by all container processes in kernel mode (Windows). |
container.cpu.usage.total
|
nanosecond (ns) | Total CPU time consumed. |
container.cpu.usage.usermode
|
nanosecond (ns) | Time spent by tasks of the cgroup in user mode (Linux). Time spent by all container processes in user mode (Windows). |
container.cpu.utilization
|
percentage (%) |
Container CPU Utilization. Percentage of CPU used per container. |
container.memory.file
|
bytes (By) | Amount of memory used to cache filesystem data, including tmpfs and shared memory (Only available with cgroups v2). |
container.memory.percent
|
percentage (%) |
Container Memory Utilization. Percentage of memory used per container |
container.memory.total_cache
|
bytes (By) | Total amount of memory used by the processes of this cgroup (and descendants) that can be associated with a block on a block device. Also accounts for memory used by tmpfs (Only available with cgroups v1). |
container.memory.usage.limit
|
bytes (By) | Memory limit of the container. |
container.memory.usage.total
|
bytes (By) | Memory usage of the container. This excludes the cache. |
container.network.io.usage.rx_bytes
|
bytes (By) |
Total Received Bytes per Container. Total bytes received by the container. |
container.network.io.usage.rx_dropped
|
{packets} |
Total Incoming Dropped Packets by Container . Total incoming packets dropped by the container. |
container.network.io.usage.tx_bytes
|
bytes (By) |
Total Sent Bytes per Container. Total bytes sent by the container. |
container.network.io.usage.tx_dropped
|
{packets} |
Total Outgoing Dropped Packets by Container. Total outgoing packets dropped by the container. |
container.uptime
|
seconds (s) |
Total Container Uptime. The time elapsed since the start time of the container. |
Elasticsearch metrics
Metric | Units | Description |
---|---|---|
elasticsearch.breaker.memory.estimated
|
bytes (By) |
The estimated memory used for the operation. |
elasticsearch.breaker.memory.limit
|
bytes (By) | The memory limit for the circuit breaker. |
elasticsearch.breaker.tripped
|
1 | The total number of times the circuit breaker has been triggered and prevented an out of memory error. |
elasticsearch.cluster.data_nodes
|
{nodes} | Data Nodes. The number of data nodes in the cluster. |
elasticsearch.cluster.health
|
status | Cluster by Status. The health status of the cluster. Health status is based on the state of its primary and replica shards. Green indicates all shards are assigned. Yellow indicates that one or more replica shards are unassigned. Red indicates that one or more primary shards are unassigned, making some data unavailable. |
elasticsearch.cluster.in_flight_fetch
|
{fetches} | The number of unfinished fetches. |
elasticsearch.cluster.nodes
|
{nodes} | Nodes, Top 5 Clusters by Node Count. The total number of nodes in the cluster. |
elasticsearch.cluster.pending_tasks
|
{tasks} | Pending Tasks in Cluster. The number of cluster-level changes that have not yet been executed. |
elasticsearch.cluster.published_states.differences
|
1 | The number of differences between published cluster states. |
elasticsearch.cluster.published_states.full
|
1 | The number of published cluster states. |
elasticsearch.cluster.shards
|
{shards} | Active Shards, Shards by State. The number of shards in the cluster. |
elasticsearch.cluster.state_queue
|
1 | The number of cluster states in queue. |
elasticsearch.cluster.state_update.count
|
1 | The number of cluster state update attempts that changed the cluster state since the node started. |
elasticsearch.cluster.state_update.time
|
milliseconds (ms) | The cumulative amount of time updating the cluster state since the node started. |
elasticsearch.index.operations.completed
|
{operations} | The number of operations completed for an index. |
elasticsearch.index.operations.time
|
milliseconds (ms) | Time spent on operations for an index. |
elasticsearch.index.shards.size
|
bytes (By) | The size of the shards assigned to this index. |
elasticsearch.indexing_pressure.memory.limit
|
bytes (By) | The configured memory limit, in bytes, for the indexing requests. |
elasticsearch.indexing_pressure.memory.total.primary_rejections
|
1 | The cumulative number of indexing requests rejected in the primary stage. |
elasticsearch.indexing_pressure.memory.total.replica_rejections
|
1 | The number of indexing requests rejected in the replica stage. |
elasticsearch.memory.indexing_pressure
|
bytes (By) | Indexing Pressure. The memory consumed, in bytes, by indexing requests in the specified stage. |
elasticsearch.node.cache.count
|
{count} | The total count of query cache misses across all shards assigned to selected nodes. |
elasticsearch.node.cache.evictions
|
{evictions} | The number of evictions from the cache on a node. |
elasticsearch.node.cache.memory.usage
|
bytes (By) | The size in bytes of the cache on a node. |
elasticsearch.node.cluster.connections
|
{connections} | Cluster Connections. The number of open TCP connections for internal cluster communication. |
elasticsearch.node.cluster.io
|
bytes (By) | The number of bytes sent and received on the network for internal cluster communication. |
elasticsearch.node.cluster.io.rate
|
bytes per second (By/s) | Network Traffic. The number of bytes sent and received for internal cluster communication per second. |
elasticsearch.node.disk.io.read
|
kilobytes (KiBy) | Disk Read and Write. The total number of kilobytes read across all file stores for this node. |
elasticsearch.node.disk.io.write
|
kilobytes (KiBy) | Disk Read and Write. The total number of kilobytes written across all file stores for this node. |
elasticsearch.node.documents
|
{documents} | The number of documents on the node. |
elasticsearch.node.fs.disk.available
|
bytes (By) | The amount of disk space available to the JVM across all file stores for this node. Depending on OS or process level restrictions, this might appear less than free. This is the actual amount of free disk space the Elasticsearch node can use. |
elasticsearch.node.fs.disk.free
|
bytes (By) | The amount of unallocated disk space across all file stores for this node. |
elasticsearch.node.fs.disk.total
|
bytes (By) | The amount of disk space across all file stores for this node. |
elasticsearch.node.http.connections
|
{connections} | The number of HTTP connections to the node. |
elasticsearch.node.ingest.documents
|
{documents} | The total number of documents ingested during the lifetime of this node. |
elasticsearch.node.ingest.documents.current
|
{documents} | The total number of documents currently being ingested. |
lasticsearch.node.ingest.operations.failed
|
{operation} | The total number of failed ingest operations during the lifetime of this node. |
elasticsearch.node.open_files
|
{files} | Open File Descriptors. The number of open file descriptors held by the node. |
elasticsearch.node.operations.completed
|
{operations} | The number of operations completed by a node. |
elasticsearch.node.operations.completed.rate
|
{operations} per second | Node Operations Completed per Second. The number of operations completed for an index per second. |
elasticsearch.node.operations.time
|
milliseconds (s) | Total Time Spent on Operations. The time spent on operations by a node. |
elasticsearch.node.pipeline.ingest.documents.current
|
{documents} | The total number of documents currently being ingested by a pipeline. |
elasticsearch.node.pipeline.ingest.documents.preprocessed
|
{documents} | The number of documents preprocessed by the ingest pipeline. |
elasticsearch.node.pipeline.ingest.operations.failed
|
{operation} | The total number of failed operations for the ingest pipeline. |
elasticsearch.node.script.cache_evictions
|
1 | The total number of times the script cache has evicted old data. |
elasticsearch.node.script.compilation_limit_triggered
|
1 | The total number of times the script compilation circuit breaker has limited inline script compilations. |
elasticsearch.node.script.compilations
|
{compilations} | The total number of inline script compilations performed by the node. |
elasticsearch.node.shards.data_set.size
|
bytes (By) | The total data set size of all shards assigned to the node. This includes the size of shards not stored fully on the node, such as the cache for partially mounted indices. |
elasticsearch.node.shards.reserved.size
|
bytes (By) | A prediction of how much larger the shard stores on this node will eventually grow due to ongoing peer recoveries, restoring snapshots, and similar activities. A value of -1 indicates that this is not available. |
elasticsearch.node.shards.size
|
bytes (By) | The size of the shards assigned to this node. |
elasticsearch.node.thread_pool.tasks.finished
|
{tasks} | The number of tasks finished by the thread pool. |
elasticsearch.node.thread_pool.tasks.queued
|
{tasks} | Queued Tasks in Thread Pool. The number of queued tasks in the thread pool. |
elasticsearch.node.thread_pool.threads
|
{threads} | The number of threads in the thread pool. |
elasticsearch.node.translog.operations
|
{operations} | The number of transaction log operations. |
elasticsearch.node.translog.size
|
bytes (By) | The size of the transaction log. |
elasticsearch.node.translog.uncommitted.size
|
bytes (By) | The size of uncommitted transaction log operations. |
elasticsearch.os.cpu.load_avg.15m
|
1 | CPU Utilization. The fifteen-minute load average on the system. The field is not present if fifteen-minute load average is not available. |
elasticsearch.os.cpu.load_avg.1m
|
1 | CPU Utilization. The one-minute load average on the system. The field is not present if one-minute load average is not available. |
elasticsearch.os.cpu.load_avg.5m
|
1 | CPU Utilization. The five-minute load average on the system. The field is not present if five-minute load average is not available. |
elasticsearch.os.cpu.usage
|
Percent (%) | The recent CPU usage for the whole system, or -1 if not supported. |
elasticsearch.os.memory
|
bytes (By) | The amount of physical memory. |
jvm.classes.loaded |
1 | The number of loaded classes. |
jvm.gc.collections.count |
1 | The total number of garbage collections that have occurred. |
jvm.gc.collections.count.rate
|
collections per second | JVM GC Collection Count per Second. The number of Java Virtual Machine garbage collections that have occurred per second. |
jvm.gc.collections.elapsed
|
milliseconds (ms) | Total JVM GC Collection Time. The approximate accumulated collection elapsed time . |
jvm.memory.heap.committed |
bytes (By) | JVM Memory Heap Committed vs Used. The amount of memory that is guaranteed to be available for the heap. |
jvm.memory.heap.max
|
bytes (By) | The maximum amount of memory can be used for the heap . |
jvm.memory.heap.used
|
bytes (By) | JVM Memory Heap Committed vs Used. The current heap memory usage. |
jvm.memory.nonheap.committed
|
bytes (By) | The amount of memory that is guaranteed to be available for non-heap purposes. |
jvm.memory.nonheap.used
|
bytes (By) | The current non-heap memory usage. |
jvm.memory.pool.max
|
bytes (By) | The maximum amount of memory can be used for the memory pool. |
jvm.memory.pool.used
|
bytes (By) | The current memory pool memory usage. |
jvm.threads.count
|
1 | The current number of threads. |
IIS metrics
Metric | Units | Description |
---|---|---|
iis.connection.active
|
{active connections} | The number of active connections. |
iis.connection.anonymous
|
{anonymous connections} | The number of connections established anonymously. |
iis.connection.anonymous/rate
|
{anonymous connections}/s | The number of connections established anonymously per second. |
iis.connection.attempt.count
|
{connection attempts} | The total number of attempts to connect to the server. |
iis.connection.attempt.count/rate
|
{connection attempts}/second (s) | The total number of attempts to connect to the server per second. |
iis.network.blocked
|
bytes (By) | The total number of bytes blocked due to bandwidth throttling. |
iis.network.file.count
|
bytes (By) | The number of transmitted files. |
iis.network.io
|
bytes (By) | The total amount of bytes sent and received. |
iis.network.io/rate
|
bytes (By)/second (s) | The total amount of bytes sent and received per second |
iis.request.count
|
{requests} | The total number of requests of a given type. |
iis.request.queue.count
|
{requests} | The current number of requests in the queue. |
iis.request.rejected
|
{requests} | The total number of requests rejected. |
iis.thread.active
|
{requests} | The total number of active threads. |
iis.uptime
|
M/k | The amount of time the server has been up. |
Kafka metrics
Metric | Units | Description |
---|---|---|
kafka_controller_kafkacontroller_activecontrollercount
|
{active controllers in cluster} | Active Cluster Controllers. The average number of active controllers in the cluster. |
|
Log Flush Rate and Time. The maximum values of log flush rate and time. | |
|
ms (millisecond) | Leader Request Time. The average time taken to process a request at the leader. |
|
ms (millisecond) | Producer Request Time. The average total time to serve a single 'Produce' request. |
kafka_network_socketserver_networkprocessoravgidlepercent
|
% (percentage) | Broker Process Idle Time. The average fraction of time the network processors are idle. |
kafka_server_brokertopicmetrics_bytesin_1minuterate
|
Bytes/second | Broker Incoming Bytes. The one-minute sum of incoming bytes per second. |
kafka_server_brokertopicmetrics_bytesin_1minuterate
|
Bytes/second/{topic} | Broker Incoming Bytes per Topic. The one-minute average rate of incoming bytes per second distributed by Topic. |
kafka_server_brokertopicmetrics_messagesin_1minuterate
|
{messages}/second | Broker Incoming Messages. The one-minute sum of incoming messages per second. |
kafka_server_brokertopicmetrics_messagesin_1minuterate
|
{messages}/second/{topic} | Broker Incoming Messages per Topic. The one-minute average rate of incoming messages per second distributed per topic. |
kafka_server_replicafetchermanager_maxlag
|
{messages} | Max Replica Lag. The average of maximum number of messages by which the consumer lags behind the producer. |
kafka_server_replicamanager_isrshrinks_1minuterate
|
{shrink events}/minute | ISR Shrink Rate. The one-minute rate of ISR shrink events. If a broker goes down, ISR for some of the partitions will shrink. When that broker is up again and the replicas are fully caught up, ISR will expand. |
kafka_server_replicamanager_leadercount
|
{replica leaders} | Leader Replicas. The average number of replica leaders. |
kafka_server_replicamanager_partitioncount
|
{partitions} | Partitions. The average number of partitions on all brokers. |
kafka_server_replicamanager_underreplicatedpartitions
|
{under-replicated partitions} | Under-Replicated Partitions. The average number of under-replicated partitions. |
Memcached metrics
Metric | Units | Description |
---|---|---|
memcached.bytes
|
bytes (By) | Current Bytes Stored, Bytes Stored. The current number of bytes used by this server to store items. |
memcached.commands
|
{commands} | The commands executed. |
memcached.commands.rate
|
{commands}/second | Commands. The commands executed per second. |
memcached.connections.current
|
{connections} | The current number of open connections. |
memcached.connections.total
|
{connections} | The total number of connections opened since the server started running. |
memcached.cpu.usage
|
seconds (s) | CPU User Time, CPU System Time. The accumulated user and system time. |
memcached.current_items
|
{items} | Current Items in Cache, Active Connections. The number of items currently stored in the cache. |
memcached.evictions
|
{evictions} | Total Evictions. The average total number of cache item evictions. |
memcached.network
|
bytes (By) | Bytes transferred over the network. |
memcached.network.rate
|
bytes/second (By/s) | Network Traffic. The average number of bytes transferred over the network, per second. |
memcached.operation_hit_ratio
|
percentage (%) | Operation Hit Ratio. The hit ratio for operations, expressed as a percentage value between 0.0 and 100.0. |
memcached.operations
|
{operations} | Hits and Misses Total. The average total counts of hits and misses. |
memcached.operations.rate
|
{operations}/second | The average counts of hits and misses per second. |
memcached.threads
|
{threads} | The number of threads used by the Memcached instance. |
NGINX metrics
Metric | Units | Description |
---|---|---|
nginx.conections
|
Connections |
The current number of nginx connections by state. |
nginx.connections_accepted
|
Connections | The total number of accepted client connections. |
nginx.connections_accepted.gauge
|
Connections | The accepted client connections (gauge). |
nginx.connections_accepted.rate
|
Connections | The number of accepted client connections per second. |
nginx.connections_current
|
Connections | The current number of nginx connections by state. |
nginx.connections_dropped
|
Connections | The total number of dropped client connections. |
nginx.connections_dropped.rate
|
Connections | The number of dropped client connections per second. |
nginx.connections_handled
|
Connections | The total number of handled connections. Generally, the parameter value is the same as nginx.connections_accepted unless some resource limits have been reached (for example, the worker_connections limit). |
nginx.connections_handled.gauge
|
Connections | The handled client connections (gauge). |
nginx.connections_handled.rate
|
Connections | The number of handled client connections per second. |
nginx.requests
|
Requests | The total number of requests made to the server since it started. |
nginx.requests.rate
|
Requests per second |
The number of requests per second. |
Oracle DB metrics
Metric | Units | Description |
---|---|---|
oracledb.cpu_time
|
Seconds (s) |
The cumulative CPU time, in seconds. |
oracledb.dml_locks.limit
|
{locks} | The maximum limit of active Data Manipulation Language (DML) locks, -1 if unlimited. |
oracledb.dml_locks.usage
|
{locks} | The current count of active Data Manipulation Language (DML) locks. |
oracledb.enqueue_deadlocks
|
{deadlocks} | The total number of deadlocks between table or row locks in different sessions. |
oracledb.enqueue_locks.limit
|
{locks} | The maximum limit of active en queue locks, -1 if unlimited. |
oracledb.enqueue_locks.usage
|
{locks} | The current count of active en queue locks. |
oracledb.enqueue_resources.limit
|
{resources} | The maximum limit of active en queue resources, -1 if unlimited. |
oracledb.enqueue_resources.usage
|
{resources} | The current count of active en queue resources. |
oracledb.exchange_deadlocks
|
{deadlocks} | The number of times that a process detected a potential deadlock when exchanging two buffers and raised an internal, restartable error. Index scans are the only operations that perform exchanges. |
oracledb.executions
|
{executions} | The total number of calls (user and recursive) that executed SQL statements. |
oracledb.hard_parses
|
{parses} | The number of hard parses. |
oracledb.logical_reads
|
{reads} | The number of logical reads. |
oracledb.parse_calls
|
{parses} | The total number of parse calls. |
oracledb.pga_memory
|
bytes (By) | The Session Program Global Area (PGA) memory. |
oracledb.physical_reads
|
{reads} | The number of physical reads. |
oracledb.processes.limit
|
{processes} | The maximum limit of active processes, -1 if unlimited. |
oracledb.processes.usage
|
{processes} | The current count of active processes. |
oracledb.sessions.limit
|
{processes} | The maximum limit of active sessions, -1 if unlimited. |
oracledb.sessions.usage
|
{processes} | The count of active sessions. |
oracledb.tablespace_size.limit
|
bytes (By) | The maximum size of tablespace in bytes, -1 if unlimited. |
oracledb.tablespace_size.usage
|
bytes (By) | The used tablespace in bytes. |
oracledb.transactions.limit
|
{transactions} | The maximum limit of active transactions, -1 if unlimited. |
oracledb.transactions.usage
|
{transactions} | The current count of active transactions. |
oracledb.user_commits
|
{commits} | The number of user commits. When a user commits a transaction, the redo generated that reflects the changes made to database blocks must be written to disk. Commits often represent the closest thing to a user transaction rate. |
oracledb.user_rollbacks
|
1 | The number of times users manually issue the ROLLBACK statement or an error occurs during a user's transactions |
RabbitMQ metrics
Metric | Units | Description |
---|---|---|
rabbitmq.message.current.sum
|
{messages} | Current Messages in Queues, Top 10 Queues by Depth. The total number of messages currently in the queues on RabbitMQ by queue name. |
rabbitmq_channels
|
{channels} | Open Channels. The number of channels currently open on RabbitMQ. |
rabbitmq_channel_messages_unacked
|
{messages} | Messages Unacknowledged. The average number of delivered but not yet acknowledged messages on RabbitMQ. |
rabbitmq_consumers
|
{consumers} | Queue Consumers. The number of currently connected consumers on RabbitMQ. |
rabbitmq_disk_space_available_bytes
|
Bytes | Free Disk Space. The average free disk space available on RabbitMQ. |
rabbitmq_erlang_processes_used
|
{processes} | Used Processes. The total number of Erlang processes used by RabbitMQ. |
rabbitmq.message.acknowledged.rate
|
{messages}/s | Messages Acknowledged per Second. The average number of messages acknowledged per second on RabbitMQ. |
rabbitmq.message.delivered.rate
|
{messages}/s | Messages Delivered per Second. The average number of messages delivered per second on RabbitMQ. |
rabbitmq.message.dropped.rate
|
{messages}/s | Messages Dropped per Second. The average number of messages dropped per second on RabbitMQ. |
rabbitmq.message.published.rate
|
{messages}/s | Messages Published per Second. The average number of messages published per second on RabbitMQ. |
rabbitmq_process_open_fds
|
{file descriptors} | Open File Descriptors. The average number of open file descriptors on RabbitMQ. |
rabbitmq_process_open_tcp_sockets
|
{sockets} | Open Sockets. The total number of open TCP sockets on RabbitMQ. |
rabbitmq_process_resident_memory_bytes
|
Bytes | Memory Consumed by Node. The memory used by node on RabbitMQ. |
rabbitmq_queue_consumer_utilisation
|
Consumer Utilization. The average proportion of time that the queues can deliver messages to consumers on RabbitMQ. | |
rabbitmq_queue_process_memory_bytes
|
Bytes | Memory Consumed by Queues. The average memory used by the Erlang queue process on RabbitMQ. |
Redis metrics
Metric | Units | Description |
---|---|---|
redis.clients.blocked
|
Blocked Clients, Clients. The number of clients pending on a blocking call. | |
redis.clients.connected
|
Redis Version, Clients. The number of client connections (excluding connections from replicas). | |
redis.clients.max_input_buffer
|
The biggest input buffer among current client connections . | |
redis.clients.max_output_buffer
|
The longest output list among current client connections. | |
redis.commands
|
operations/s | Processed Commands per Second. The number of commands processed per second. |
redis.commands.processed
|
Total Processed Commands. The total number of commands processed by the server. | |
redis.connections.received
|
Total Connections. The total number of connections accepted by the server. | |
redis.connections.rejected
|
Total Connections. The number of connections rejected because of the maxclients limit. | |
redis.cpu.time
|
seconds (s) | Total CPU Time by State. The system CPU consumed by the Redis server in seconds since the server started. |
redis.db.avg_ttl
|
milliseconds (ms) | The average keyspace keys TTL. |
redis.db.expires
|
The number of keyspace keys with an expiration. | |
redis.db.keys
|
The number of keyspace keys. | |
redis.keys.evicted
|
Total Expired and Evicted Keys. The number of keys evicted due to the maxmemory limit. | |
redis.keys.expired
|
Total Expired and Evicted Keys. The total number of key expiration events. | |
redis.keyspace.hits
|
The number of successful lookup of keys in the main dictionary. | |
redis.keyspace.misses
|
The number of failed lookup of keys in the main dictionary. | |
redis.latest_fork
|
microseconds (μs) | The duration of the latest fork operation in microseconds. |
redis.memory.fragmentation_ratio
|
Fragmentation Ratio. The ratio between used_memory_rss and used_memory. | |
redis.memory.lua
|
bytes (By) | Used Memory. The number of bytes used by the Lua engine. |
redis.memory.peak
|
bytes (By) | Peak memory consumed by Redis (in bytes). |
redis.memory.rss
|
bytes (By) | Used Memory. The number of bytes that Redis allocated as seen by the operating system. |
redis.memory.used
|
bytes (By) | Used Memory. The total number of bytes allocated by Redis using its allocator. |
redis.net.input
|
bytes (By) | The total number of bytes read from the network. |
redis.net.output
|
bytes (By) | Total Network Traffic. The total number of bytes written to the network. |
redis.rdb.changes_since_last_save
|
Changes Since Last Save. The number of changes since the last dump. | |
redis.replication.backlog_first_byte_offset
|
The master offset of the replication backlog buffer. | |
redis.replication.offset
|
The server's current replication offset. | |
redis.role
|
Role. The Redis node's role. | |
redis.slaves.connected
|
Clients. The number of connected replicas. | |
redis.uptime
|
seconds (s) | Uptime. The number of seconds since Redis server started. |
Snowflake metrics
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. A health score provides real-time insight into the overall health and performance of your monitored entities. The health score is calculated based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health score is displayed as a single numerical value that ranges from a Good (70-100) to Moderate (40-69) to Bad (0-39) distinction. To view the health score for Apache Web Server entities in the Metrics Explorer, filter the |
snowflake.database.bytes_scanned.avg
|
Bytes (By) |
Average bytes scanned in a database over the last 24-hour period. |
snowflake.database.query.count
|
{queries} | Query Counts. Total query count for the database over the last 24-hour period. |
snowflake.query.blocked
|
{queries} | Blocked query count for the warehouse over the last 24-hour period. |
snowflake.query.bytes_deleted.avg
|
Bytes (By) | Query Bytes. Average bytes deleted in the database over the last 24-hour period. |
snowflake.query.bytes_written.avg
|
Bytes (By) | Query Bytes. Average bytes written by the database over the last 24-hour period. |
snowflake.query.compilation_time.avg
|
Seconds (s) | Query Times. Average time taken to compile a query over the last 24-hour period. |
snowflake.query.executed
|
{queries} | Executed query count for the warehouse over the last 24-hour period. |
snowflake.query.execution_time.avg
|
Seconds (s) | Query Times. Average time spent executing queries in the database over the last 24-hour period. |
snowflake.query.queued_overload
|
{queries} | Overloaded query count for the warehouse over the last 24-hour period. |
snowflake.query.queued_provision
|
{queries} | Number of compute resources queued for provisioning over the last 24-hour period. |
snowflake.queued_overload_time.avg
|
Seconds (s) | Queued Times. Average time spent in the warehouse queue due to the warehouse being overloaded over the last 24-hour period. |
snowflake.queued_provisioning_time.avg
|
Seconds (s) | Queued Times. Average time spent in the warehouse queue waiting for resources to provision over the last 24-hour period. |
snowflake.queued_repair_time.avg
|
Seconds (s) | Queued Times. Average time spent in warehouse queue waiting for compute resources to be repaired over the last 24-hour period. |
snowflake.storage.stage_bytes.total
|
Bytes (By) | Storage Bytes. Number of bytes of stage storage used by files in all internal stages (named, table, user). |
snowflake.storage.storage_bytes.total
|
Bytes (By) | Storage Bytes. Number of bytes of table storage used, including bytes for data currently in Time Travel. |
snowflake.total_elapsed_time.avg
|
Seconds (s) | Total Elapsed Time. Average elapsed time over the last 24-hour period. |
Optional metrics
Metric | Units | Description |
---|---|---|
snowflake.billing.cloud_service.total
|
{credits} |
Reported total credits used in the cloud service over the last 24-hour period. |
snowflake.billing.total_credit.total
|
{credits} | Used Credits. Reported total credits used across the account over the last 24-hour period. |
snowflake.billing.virtual_warehouse.total
|
{credits} | Reported total credits used by the virtual warehouse service over the last 24-hour period. |
snowflake.billing.warehouse.cloud_service.total
|
{credits} | Credits used across the cloud service for the given warehouse over the last 24-hour period. |
snowflake.billing.warehouse.total_credit.total
|
{credits} | Total credits used associated with the given warehouse over the last 24-hour period. |
snowflake.billing.warehouse.virtual_warehouse.total
|
{credits} | Total credits used by the virtual warehouse service for the given warehouse over the last 24-hour period. |
snowflake.logins.total
|
{logins} | Total login attempts for account over the last 24-hour period. |
snowflake.pipe.credits_used.total
|
{credits} | Snow pipe credits total used over the last 24-hour period. |
snowflake.query.bytes_spilled.local.avg
|
Bytes (By) | Average bytes spilled (intermediate results do not fit in memory) by the local storage over the last 24-hour period. |
snowflake.query.bytes_spilled.remote.avg
|
Bytes (By) | Average bytes spilled (intermediate results do not fit in memory) by the remote storage over the last 24-hour period. |
snowflake.query.data_scanned_cache.avg
|
Percentage (%) | Average percentage of data scanned from cache over the last 24-hour period. |
snowflake.query.partitions_scanned.avg
|
{partitions} | Number of partitions scanned during the query so far over the last 24-hour period. |
snowflake.rows_deleted.avg
|
{rows} | Row Operations. Number of rows deleted from a table (or tables) over the last 24-hour period. |
snowflake.rows_inserted.avg
|
{rows} | Row Operations. Number of rows inserted into a table (or tables) over the last 24-hour period. |
snowflake.rows_produced.avg
|
{rows} | Row Operations. Average number of rows produced by the statement over the last 24-hour period. |
snowflake.rows_unloaded.avg
|
{rows} | Row Operations. Average number of rows unloaded during data export over the last 24-hour period. |
snowflake.rows_updated.avg
|
{rows} | Row Operations. Average number of rows updated in a table over the last 24-hour period. |
snowflake.session_id.count
|
{session ids} | Distinct session id's associated with the snowflake username over the last 24-hour period. |
snowflake.storage.failsafe_bytes.total
|
Bytes (By) | Number of bytes of data in Fail-safe. |
ZooKeeper metrics
Metric | Units | Description |
---|---|---|
zookeeper.connection.active
|
Connections |
The number of active clients connected to a ZooKeeper server. |
zookeeper.data_tree.ephemeral_node.count
|
Nodes | The number of ephemeral nodes that a ZooKeeper server has in its data tree. |
zookeeper.data_tree.size
|
Byte | The size of data in bytes that a ZooKeeper server has in its data tree. |
zookeeper.file_descriptor.available
|
File_descriptors | The number of file descriptors that a ZooKeeper still has available. |
zookeeper.file_descriptor.limit
|
File_descriptors | The maximum number of file descriptors that a ZooKeeper server can open. |
zookeeper.file_descriptor.open
|
File_descriptors | The number of file descriptors that a ZooKeeper server has open. |
zookeeper.latency.max
|
milliseconds (ms) | The maximum time in milliseconds for requests to be processed. |
zookeeper.latency.min
|
milliseconds (ms) | The minimum time in milliseconds for requests to be processed. |
zookeeper.packet.count
|
Packets | The number of ZooKeeper packets received or sent by a server. |
zookeeper.packet.count.rate
|
Packets per second | The number of ZooKeeper packets received and sent by a server. |
zookeeper.request.active
|
Requests | The number of currently executing requests. |
zookeeper.watch.count
|
Watches | The number of watches placed on Z-Nodes on a ZooKeeper server. |
zookeeper.znode.count
|
Znodes | The number of Z-Nodes that a ZooKeeper server has in its data tree. |
Telegraf metrics
When you integrate with FluentD, HAProxy, NGINX Plus API, NTPq, StatsD, or Varnish, the SolarWinds Observability Agent is used to send metrics to SolarWinds Observability SaaS. See Monitor with Telegraf.
FluentD metrics
For a comprehensive list of metrics, see Fluentd Input Plugin at GitHub.
Metric | Units | Description |
---|---|---|
fluentd_buffer_available_buffer_space_ratios | Percent (%) | Available Buffer Space. The percentage of remaining available buffer space. |
fluentd_buffer_queue_byte_size | Bytes (B) | Buffer Queue Bytes. The current size of queued buffer chunks (in bytes). |
fluentd_buffer_queue_length | Buffer Queue Length. The length of the buffer queue. | |
fluentd_buffer_stage_byte_size | Bytes (B) | Buffer Stage Bytes. The current size of staged buffer chunks (in bytes). |
fluentd_buffer_stage_length | Buffer Stage Length. The length of staged buffer chunks. | |
fluentd_buffer_total_queued_size | Bytes (B) | Buffer Queue Size. The size of the buffer queue. |
fluentd_emit_count | {emits} | Total Record Emit Count. The total number of emit calls. |
fluentd_emit_records | {records} | Total Emit Records. The total number of emitted records. |
fluentd_emit_size | Bytes (B) | Total Emit Size. The total size of emit events. |
fluentd_retry_count | {retries} | Retry Count. The number of retry attempts. |
fluentd_rollback_count | {count} | Total Rollback Count. The total number of rollbacks. Rollbacks happen when write/try_write fails. |
fluentd_slow_flush_count | {count} | Total Slow Flush Count. The total number of slow flushes. This count will be incremented when buffer flush is longer than slow_flush_log_threshold. |
fluentd_write_count | {count} | The total number of writes. |
HAProxy metrics
For a comprehensive list of metrics, see HAProxy Input Plugin at GitHub and HaProxy documentation at docs.haproxy.org.
SolarWinds Observability SaaS expects that metrics return a number. Some HAProxy metrics, such as status, return strings, and thus are not supported.
Metric | Units | Description |
---|---|---|
haproxy_active_servers | {servers} | Active Servers. The number of currently active servers. |
haproxy_backup_servers | {servers} | Backup Servers. The number of available backup servers. |
haproxy_bin | bytes | Total In and Out Traffic. The cumulative total of incoming traffic. |
haproxy_bout | bytes | Total In and Out Traffic. The cumulative total of outgoing traffic. |
haproxy_dreq | {requests} | Total Denied Requests. The cumulative number of requests denied because of security concerns. |
haproxy_dcon | {requests} | Total Denied Requests. The cumulative number of requests denied by the 'tcp-request connection' rules. |
haproxy_dses | {requests} | Total Denied Requests. The cumulative number of requests denied by the 'tcp-request session' rules. |
haproxy_dresp | {responses} | Total Denied Responses. The cumulative number of responses denied because of security concerns. For HTTP, the responses are denied because of a matched http-request rule, or 'option checkcache'. |
haproxy_eresp | {responses} | Total Denied Responses. The cumulative number of response errors, such as srv_abrt, or write errors on the client socket, or failure applying filters to the response. |
haproxy_ereq | {errors} | Total Request Errors. The cumulative number of request errors, such as early termination from the client, read error, client timeout, client closed connection,. |
haproxy_econ | {errors} | Total Request Errors. The cumulative number of request errors encountered when trying to connect to a backend server. The backend stat is the sum of the stat for all servers of that backend, plus any connection errors not associated with a particular server (such as the backend having no active servers). |
haproxy_scur | {sessions} | Current Sessions. The number of current sessions per proxy |
haproxy_slim | {sessions} | Session Limit. The currently configured session limit. |
haproxy_stot | {sessions} | Total Sessions. The cumulative number of sessions. |
haproxy_req_rate | requests per second | Request Rate. HTTP requests per second over the last elapsed second. |
haproxy_rtime | Milliseconds (ms) | Response Time. The average response time over the 1024 last requests (0 for TCP). |
haproxy_req_tot | {requests} | Total Requests. The total number of received HTTP requests. |
haproxy_ctime | Milliseconds (ms) | Connection Time. The average connect time over the last 1024 responses. |
haproxy_qtime | Milliseconds (ms) | Queue Time. The average queue time over the last 1024 responses. |
haproxy_ttime | Milliseconds (ms) | Session Time. The average session time over the last 1024 responses. |
haproxy_http_response.2xx | {responses} | Total Responses 2xx. The total number of HTTP responses with the 2xx code. |
haproxy_http_response.3xx | {responses} | Total Responses 3xx. The total number of HTTP responses with the 3xx code. |
haproxy_http_response.4xx | {responses} | Total Responses 4xx. The total number of HTTP responses with the 4xx code. |
haproxy_http_response.5xx | {responses} | Total Responses 5xx. The total number of HTTP responses with the 5xx code. |
NGINX Plus API metrics
For a more comprehensive list of metrics, see Nginx Virtual Host Traffic (VTS) Input Plugin and Nginx Plus API Input Plugin at GitHub.
Metric | Units | Description |
---|---|---|
nginx_vts_connections | {connections} | The number of connections of individual types: active, reading, writing, waiting, accepted handled, requests. |
nginx_vts_server, nginx_vts_filter | ||
nginx_vts_upstream |
| |
nginx_vts_cache |
NTPq metrics
For a comprehensive list of metrics, see NTPQ Input Plugin at GitHub.
Metric | Units | Description |
---|---|---|
ntpq_delay | Milliseconds (ms) | Round Trip Delay. Round trip communication delay to the remote peer or server. |
ntpq_jitter | Milliseconds (ms) | Jitter. Mean deviation (jitter) in the time reported for the remote peer or server (RMS or difference of multiple time samples). |
ntpq_offset | Milliseconds (ms) | Time Offsets. Mean offset (phase) in the times reported between this local host and the remote peer or server (RMS) |
ntpq_poll | Minutes (min) | Polling Frequency. RFC5905 suggests that this ranges in NTPv4 from 4 (16s) to 17 (36h) (log2 seconds), however, the observation suggests the actual displayed value is seconds for a much smaller range of 64 (26) to 1024 (210) seconds. |
ntpq_reach | Octal numbers | Reach. An 8-bit left-shift shift register value recording polls (bit set = successful, bit reset = fail) displayed in octal by default. The type can be changed to decimal/count/ratio by configuring it in the ntpq input section inside telegraf.conf. |
ntpq_when | Minutes (min) | Last Poll. The time since the last poll. |
StatsD metrics
The StatsD integration does not include any default metrics. It supports all native StatsD metric types for custom metric submission. See StatsD Input Plugin at GitHub.
Varnish metrics
For a comprehensive list of metrics, see Varnish Input Plugin at GitHub.
Metric | Units | Description |
---|---|---|
varnish_client_req | {requests} | Total Client Requests. The number of good client requests. |
varnish_s_req_bodybytes | bytes | Total Bytes. Total bytes for requests and responses. |
varnish_s_req_hdrbytes | bytes | Total Bytes. Total bytes for requests and responses. |
varnish_s_resp_bodybytes | bytes | Total Bytes. Total bytes for requests and responses. |
varnish_s_ressp_hdrbytes | bytes | Total Bytes. Total bytes for requests and responses. |
varnish_sess_dropped | {sessions} | Total Failed and Dropped Sessions. The number of sessions dropped for thread. The number of times an HTTP/1 session was drpped because the queue was too long already. See thread_queue_limit. |
varnish_sess_fail | {sessions} | Total Failed and Dropped Sessions. The number of sessions accept failure. The number of failures to accept a TCP connection. This counter is the sum of the sess_fail_* counters which give more detailed information. |
varnish_sess_closed | {operations} | Total Session Operations. The number of closed sessions. |
varnish_sess_herd | {operations} | Total Session Operations. The number of times the timeout_linger triggered. |
varnish_sess_readahead | {operations} | Total Session Operations. The number of read ahead sessions. |
varnish_sess_closed_err | {operations} | Total Session Operations. The number of sessions. closed with errors. |
varnish_s_sess | {sessions} | Total Sessions. The total number of sessions that occurred. |
varnish_n_expired | {objects} | Total Number of Objects. The number of objects expired because of old age. |
varnish_n_lru_moved | {objects} | Total Number of Objects. The number of moved LRU objects (move operations done on the LRU list). |
varnish_n_lru_nuked | {objects} | Total Number of Objects. The number of objects that have been forcefully evicted from the storage to make room for a new object (LRU nuked objects). |
varnish_cache_miss | {count} | Total Cache Hits and Misses. The number of cache misses. A cache miss indicates that the object was fetched from the backend before delivering it to the client. |
varnish_cache_hit | {count} | Total Cache Hits and Misses. The number of cache hits. A cache hit indicates that the object was delivered to a client without fetching it from a backend server. |
varnish_backend_busy | {connections} | Total Backed Connections. The number of times Varnish encountered a situation where it considered the backend to be too busy to handle additional connections. |
varnish_backend_conn | {connections} | Total Backed Connections. The number of successful backend connections. |
varnish_backend_fail | {connections} | Total Backed Connections. The number of failed backend connections. |
varnish_backend_recycle | {connections} | Total Backed Connections. The number of recycled backend connections. |
varnish_backend_retry | {connections} | Total Backed Connections. The number of retried backend connections. |
varnish_backend_reuse | {connections} | Total Backed Connections. The number of reused backend connections. |
varnish_backend_unhealthy | {connections} | Total Backed Connections. The number of unhealthy backend connections. |
varnish_fetch_length varnish_fetch_bad varnish_fetch_eof varnish_fetch_failed varnish_fetch_head varnish_fetch_chunked varnish_fetch_1xx varnish_fetch_204 varnish_fetch_304 varnish_fetch_none varnish_fetch_no_thread | {fetches} | Total HTTP Request Fetches. The number of all request fetches by type. |
varnish_shm_cont | {operations} | Total Shared Memory Operations. The number of contention operations (when multiple threads compete for access to SHM resources). |
varnish_shm_cycles | {operations} | Total Shared Memory Operations. The number of times data cycles through the shared memory. |
varnish_shm_flushes | {operations} | Total Shared Memory Operations. The number of flush operations. |
varnish_shm_records | {operations} | Total Shared Memory Operations. The number of record operations. |
varnish_shm_writes | {operations} | Total Shared Memory Operations. The number of write operations. |
varnish_thread_queue_len | {count} | Total Session Queue Length. The length of session queue waiting for threads. |
varnish_threads | {workers} | Total Workers. The number of threads in all pools. |
varnish_sess_queued | {sessions} | Total Queued Sessions. Sessions queued for thread. The number of times a session was queued waiting for a thread. |
varnish_threads_created | {threads} | Total Worker Threads. The total number of threads created in all pools. |
varnish_threads_destroyed | {threads} | Total Worker Threads. The total number of threads destroyed in all pools. |
varnish_threads_failed | {threads} | Total Worker Threads. The number of times creating a thread failed. |
varnish_threads_limited | {threads} | Total Worker Threads. The number of times more threads were needed but the limit was reached in a thread pool. |