Documentation forSolarWinds Observability SaaS

Metrics for SolarWinds Observability SaaS entities

Many of the collected metrics from SolarWinds Observability entities are displayed as widgets in SolarWinds Observability explorers; additional metrics may be collected and available in the Metrics Explorer. You can also create an alert for when an entity's metric value moves out of a specific range. See Entities in SolarWinds Observability SaaS for information about entity types in SolarWinds Observability SaaS.

Common metrics

The following metric(s) are available for all entities in SolarWinds Observability SaaS.

Metric Units Description
sw.metrics.healthscore Percent (%)

Health state. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health state separately for each specific entity type in the Metrics Explorer, group the sw.metrics.healthscore metric by entity_types.

APM/service metrics

Metrics for service entities are sent by APM libraries installed and configured to monitor your service. See Application performance monitoring (APM) for more information.

Standard metrics

The tables below list the default set of metrics collected by the APM library for all requests. Counts are reported every minute.

Primary service metrics

Metric Units Description
trace.service.errors Count Count of requests that ended with an error status. Aggregate by Sum to see the total error count for the service.
trace.service.error_ratio % Ratio of errors to requests, calculated by dividing the number of requests with errors by the total number of requests.
trace.service.requests Count Count of requests for each HTTP status code (200, 404, etc.). Aggregate by Sum to see the total request count for the service.
trace.service.request_rate Count Rate of requests per second, calculated by dividing the number of requests (trace.service.requests) by the length of the aggregation period in seconds.
trace.service.response_time Milliseconds (ms)

Duration of each entry span for the service, typically meaning the time taken to process an inbound request. This metric is based on the following attributes:

  • sw.transaction
  • service.name
  • http.response.status_code
  • http.request.method
  • sw.is_error

This is the primary metric to track service response time.

Service metrics stored with percentiles

Metric Units Description
trace.service.service_response_time ms Duration of each entry span for the service, typically meaning the time taken to process an inbound request. This metric is based on the service.name attribute.
trace.service.service_response_time.p50
trace.service.service_response_time.p95
trace.service.service_response_time.p99
trace.service.service_response_time.p999
ms Percentile values for the trace.service.service_response_time metric.
trace.service.transaction_response_time ms Duration of each entry span for the service, typically meaning the time taken to process an inbound request.
trace.service.transaction_response_time.p50
trace.service.transaction_response_time.p95
trace.service.transaction_response_time.p99
trace.service.transaction_response_time.p999
ms Percentile values for the trace.service.transaction_response_time metric.

Service counters

Metrics representing counts of service related entities

Metric Units Description
trace.service.count Count Number of services that were reporting data in selected time period.
trace.service.faas.count Count Number of AWS Lambda functions for which APM Services were reporting data during the selected time period.
trace.service.faas.instance.count Count Number of AWS Lambda instances for which APM Services were reporting data during the selected time period.
trace.service.hosts.count Count

Number of APM Hosts for which APM Services were reporting data during the selected time period.

Unique APM Host is captured only for Azure VMs, AWS EC2 Instances, and hosts monitored with UAMS.

trace.service.instance.count Count Number of service instances that were reporting data during the selected time period.
trace.service.pod.count Count Number of Kubernetes Pods for which APM Services were reporting data in selected time period.
trace.service.samplecount Count Count of requests that went through a sampling decision, which excludes those with valid upstream decision and trigger trace requests.
trace.service.tracecount Count Count of traces generated from requests.
trace.service.transaction.count Count

Number of transactions that were reporting data during the selected time period. This metric is based on the following attributes:

  • service.name
  • sw.transaction

Sampled trace-derived metrics

Database metrics

Metric Units Description
trace.service.outbound_calls.database.query.response_time ms Duration of traced queries executed by the service to the database.

Cache metrics

Metric Units Description
trace.service.outbound_calls.cache.op.hits Count

The count of successful retrievals from cache get or multiget operations. This is collected only by the PHP Library.

trace.service.outbound_calls.cache.op.requests Count Number of cache keys returned by the cache call. If the number of keys is not returned, every cache call is counted once.
trace.service.outbound_calls.cache.op.response_time ms Duration of traced cache calls executed by the service to the cache engine.

Remote service metrics

Metric Units Description
trace.service.outbound_calls.remote_service.call.response_time ms Duration of spans representing remote calls executed by the service to a remote endpoint or remote instrumented service.

Exception metrics

Metric Units Description
trace.service.exceptions.count Count

Service exceptions count captured in traces.

Total number of error events for traced requests. An event is classified as an error if:

  • exception.message is set
  • An HTTP call returns a 5XX status code
  • sw.event.type is equal to error, error_log, or php_error_cb

Other sampled metrics

Metric Units Description
trace.service.breakdown.response_time Microseconds (μs)

Trace Response Time Breakdown, Average Trace Response Time Breakdown. The amount of time it takes to complete a service transaction, broken down by operation type (for example, application, database calls, or remote calls). The average trace response time breakdown is calculated based on collected traces, so it covers only sampled traces. It can be used to analyze the impact that different types of operations performed by the service have on the average response time.

Runtime metrics

See the links below on the metrics for each language runtime and library-specific configuration:

Database metrics

Metrics for database instance entities are sent by the SolarWinds Observability Agent monitoring your databases. See Database monitoring for more information.

Metric Units Description
dbo.host.queries.errors.tput EPS

Errors, Error Rate. The number of recorded errors for your database instances per second; the total number of errors returned per second across your monitored databases. Incorrect database responses may indicate request are failing, while throughput and response time appear healthy.

dbo.host.queries.latency_us milliseconds (ms)

Response Time. The amount of query latency in milliseconds per query execution across your monitored databases. May be displayed as:

  • Average Response Time. An average of the query latency per query execution for all monitored databases during the selected time period
dbo.host.queries.p99_latency_us milliseconds (ms)

Response Time 99th percentile. The amount of response time in the 99th percentile value for each of the top selected queries.

dbo.host.queries.time_us Count

Load. The load on your monitored databases, as a number of requests executing simultaneously. Concurrency reveals load (or demand) in a way that is orthogonal to variations in request speed or frequency.

dbo.host.queries.tput QPS

Throughput. The number of queries or statements completed per second. This is a metric of traffic intensity and frequency, showing how many requests your servers are processing.

Digital Experience /website metrics

Metrics for website entities are either collected by probes that synthetically test your website's availability, or sent by the RUM script added to your website. See Digital experience monitoring .

Synthetic availability metrics

Metric Units Description
composite.synthetics.availability Percent (%) Overall Availability, Availability History. Represents if a website is available or unavailable.

 

Found in: Metrics Explorer, Entity Explorer (Availability tab)

composite.synthetics.status.downtime.count Count

Value representing the number of times the website entity was down in a given time range.

For example, if the entity was down the entirety of the time range, the count would be 1.

composite.synthetics.status.downtime.total Seconds (s) The total downtime for the entity during the specified time range.
synthetics.attempts Count Value representing the sum of all (successful and unsuccessful) page loads for a selected time period. Used to calculate the success and error rate.
synthetics.error_rate Percent (%)

Error Rate. Percentage of the tests that ran during the specified time period and failed. A test fails if an error prevents the website(s) from loading.

Found in: Metrics Explorer, Entity Explorer, DEM area overview

synthetics.errors Count Value representing the sum of unsuccessful page loads for a selected time period. Used to calculate the error rate.
synthetics.overall_status.duration Seconds (s) Value representing the amount of time the website had its last status (up or down).
synthetics.overall_status.duration.total Seconds (s) Value representing the total time the entity had a given status (up, down, paused, unknown, or in maintenance) for a specified time period.
synthetics.ping.packet_loss Count Value representing the number of failed ping responses.
synthetics.ping.response.time milliseconds (ms) Time required to send data from a user's device to the server.
synthetics.http.response.time milliseconds (ms)

HTTP Response Time, Average HTTP Response Time History. Time required to perform an HTTP GET command to retrieve the webpage(s) during the specified time period.

HTTP communications between SolarWinds Observability SaaS and configured entities are not encrypted.

May be displayed as:

  • Average HTTP Response Time. Average time required to perform an HTTP GET command to retrieve the webpage(s) during the specified time period.

  • Average Response Time History. The average HTTP response times during the specified time period. Use this chart, for example, to identify time periods when response time is typically higher than usual.

Found in: Metrics Explorer, Entity Explorer, Inspector Panel.

synthetics.https.certificates.days_to_certificate_expiration Days The number of days between today and the date the website's SSL/TLS certificate expires.
synthetics.https.response.time milliseconds (ms) HTTPS Response Time, Average HTTPS Response Time History. Time required to perform an HTTPS GET command to retrieve the webpage(s).

HTTPS communications are encrypted using Transport Layer Security (TLS).

May be displayed as:

  • Average HTTPS Response Time. Average time required to perform an HTTPS GET command to retrieve the webpage(s) during the specified time period.

  • Average Response Time History. The average HTTPS response times during the specified time period. Use this chart, for example, to identify time periods when response time is typically higher than usual.

Found in: Metrics Explorer, Entity Explorer, Inspector Panel.

synthetics.response.time.avg   Value representing the average response time across all monitored entities for the specified time period.
synthetics.status Boolean Status of a test result, where 0 indicates the website is unavailable and 1 indicates the website is available.
synthetics.success_rate Percent (%)

Success Rate. Percentage of the tests that ran during the specified time period and were successful. A test is successful if SolarWinds Observability SaaS is able to load the website(s). and it fails if an error prevents the website(s) from loading. An average of this metric is used to include availability in the health state.

Found in: Metrics Explorer, Entity Explorer, DEM area overview

synthetics.successes Count Value representing the sum of successful page loads for a selected time period. Used to calculate the success rate.
synthetics.tcp.response.time milliseconds (ms) Time required to see if a port is open on a specified address.

Synthetic transaction metrics

Metric Units Description
composite.synthetics.availability Percent (%) Status Changes, Status History. Represents if a synthetic transaction is available or unavailable.

 

Found in: Metrics Explorer, Entity Explorer (Overview tab)

composite.synthetics.status.downtime.count Count

Value representing the number of times the synthetic transaction entity was down in a given time range.

For example, if the entity was down the entirety of the time range, the count would be 1.

composite.synthetics.status.downtime.total Seconds (s) The total downtime for the entity during the specified time range.
synthetics.overall_status.duration.total Seconds (s) The total amount of time the entity had a given a status.
synthetics.overall_status.duration Seconds (s) The amount of time in seconds the entity had a given status.
synthetics.transaction.attempts Count The number of attempted executions of your synthetic transaction for the selected time period.
synthetics.transaction.duration Seconds (s)

Historical Overview. The amount of time in seconds that it took your synthetic transaction to complete its execution.

May be displayed as:

  • Average Test Duration. Average time it took for your synthetic transaction to complete its execution.

synthetics.transaction.error_rate Percentage (%) Test Success Rate. Value representing the percentage of failed transaction attempts for the selected time period.
synthetics.transaction.errors Count Test Success Rate. Value representing the sum of failed transaction attempts for the selected time period. Used to calculate the Synthetic transaction error rate.
synthetics.transaction.result Count The number of times the synthetic transaction resulted in a success or error.
synthetics.transaction.success_rate Percentage (%) Test Success Rate. Value representing the percentage of successful transaction attempts for the selected time period.
synthetics.transaction.successes Count Test Success Rate. Value representing the sum of successful transaction attempts for the selected time period. Used to calculate the Synthetic transaction success rate.

RUM metrics

Metric Units Description
composite.rum.session.bounce_rate Percent (%) Bounce Rate. The percentage of users who abandon the website immediately after landing on one of its pages.
rum.pageview.apdex_score  

Apdex score. A measurement of user satisfaction, using the Application Performance Index standard to specify the degree to which measured performance meets user expectations. The satisfactory load time, tolerating, and frustrated load times are defined when creating the website entity. For more information about the Apdex standard, Defining the Application Performance Index.

If the response time for requests takes less time than the satisfied load time threshold set for your website, the Apdex score is considered a satisfied load time. It is a tolerating load time if the response time takes up to four times the satisfied load time threshold, and a frustrated load time if it takes longer than four times the satisfied load time threshold.

rum.pageview.client_processing seconds (s) Client Processing Time. Measurement of the time from when the browser sends the initial HTTP request until all synchronous load events have been processed, including layout and running scripts.
rum.pageview.count Count PageViews. Count of the views of your webpage(s).
rum.pageview.load_time seconds (s) Load Time. The amount of time for the website to fully load.
rum.pageview.ttfb seconds (s) Time to First Byte. The amount of time between when the browser requested a page and when it received the first byte of information from the server.
rum.web_vitals.largest_contentful_paint seconds (s)

Largest Contentful Paint. A measurement of how quickly the largest image or text content of a web page is loaded.

Largest contentful paint time is considered good if loading the largest image or text block takes less than 2.5 seconds, needs improvement if it takes up to 4.0 seconds, and poor if it takes longer than 4.0 seconds.

rum.web_vitals.interaction_to_next_paint milliseconds(ms)

Interaction to Next Paint. A measurement of how quickly the website responds to user interactions such as clicks and key presses.

Interaction to next paint time is considered good if the response to a customer's first interaction with the website is below or at 200ms, needs improvement if it takes up to 500ms, and poor if it takes longer than 500ms.

rum.web_vitals.cumulative_layout_shift  

Cumulative Layout Shift. Measures how much a webpage shifts unexpectedly while a user is viewing the webpage. A shift may occur if content loads at different speeds or if elements are added to the website dynamically.

A cumulative layout shift value of less than .1 is considered good, a value up to .25 needs improvement, and a value greater than .25 is poor.

rum.web_vitals.first_input_delay seconds (s)

First Input Delay. Time from when a user first interacts with your site to the time when the browser is able to respond to the interaction. First input delay (FID) helps measure the first impression a user has of your site's responsiveness.

The FID is considered good if responding to a customer’s first interaction with the site takes less than 100ms, needs improvement if it takes up to 300 ms, and poor if it takes longer than 300 ms.

rum.session.count   Sessions, Top 10 countries by session. The total number of sessions, or visits, to the website during the selected time period and by country. A single session includes every action that the user takes during the entirety of their visit to the website.

Infrastructure/self-managed host metrics

Metrics for self-managed host entities are sent by the SolarWinds Observability Agent monitoring your host. See Host monitoring for more information.

Metric Units Description
system.cpu.utilization Percent (%)

The percentage of CPU time broken down by different states, as a percentage.

system.cpu.utilization.aggregated Percent (%)

CPU Utilization. The average amount of CPU capacity in use, as a percentage.

system.memory.utilization.aggregated Percent (%) Memory Utilization. The average amount of memory in use, as a percentage.
system.filesystem.usage GB (Gigabytes)

The average amount of used space on each drive over time, in Gigabytes.

system.filesystem.utilization Percent (%)

Disk Utilization. The average amount of used space on each drive over time, as a percentage.

system.disk.operations.read.aggregated.rate   Disk Read Operations. The average number of read operations performed on a disk per second.
system.disk.operations.write.aggregated.rate   Disk Write Operations. The average number of write operations performed on a disk per second.
system.network.io.receive.aggregated.rate Binary Bytes Network In. The average amount of data received over the network, in bytes. System metrics report bytes per second.
system.network.io.transmit.aggregated.rate Binary Bytes Network Out. The average amount of data sent over the network, in bytes. System metrics report bytes per second.

SolarWinds Observability Agent metrics

Metrics for entities are sent by the SolarWinds Observability Agent monitoring your agent. See SolarWinds Observability Agents for more information.

Metric Units Description
swo.uams.agent.status possible values: ok , updating, update_failed, restarting, disconnected, stopping, jwt_expired

The reported operating status of the Agent

swo.uams.agent.heartbeat  

Reported by the SolarWinds Observability Agent every 1 minute, when it is not reported then may indicate problems with network or the agent.

swo.uams.agent.cpu Percent (%) The average amount of CPU capacity in use, as a percentage
swo.uams.agent.memory Percent (%)

The average amount of memory in use, as a percentage.

swo.uams.agent.diskUsage Percent (%) The amount of storage being used by files and data.
swo.uams.agent.networkIn  

The average amount of data received over the network, in bits.

This metric is not collected for Windows due to operating system limitations.

swo.uams.agent.networkOut

 

The average amount of data send over the network, in bits.

This metric is not collected for Windows due to operating system limitations.

swo.uams.agent.errors.count   The amount of errors from the Agent logs - it is calculated from the recent Agent restart.
swo.uams.agent.uptime   The amount of time from the recent SWO Agent restart.
swo.uams.plugin.cpu   The average amount of CPU used by the plugin, as a percentage .
swo.uams.plugin.memory   The average amount of memory used by the plugin, as a percentage.
swo.uams.plugin.uptime   The amount of time from the recent plugin or SWO Agent restart.
swo.uams.plugin.status   The reported operating status of plugin. See Possible values for plugin status.
swo.uams.plugin.healthy 0,1 It is calculated based on reported operating status of plugin and indicate for problems with plugin.

Possible values for plugin status

Plugin status Healthy metric value Description
STATUS_CODE_OK 1 The plugin is responding via health checks.
STATUS_CODE_STOPPED 0 The plugin process stopped by user, not caused by error.
STATUS_CODE_BROKEN 0 The plugin was not deployed correctly.
STATUS_CODE_START_FAILED 0 The plugin process cannot be started and Agent tries run it in the loop.
STATUS_CODE_NOT_RESPONDING 0 The health check from the plugin process was not received for a defined amount of time but the plugin process is running.
STATUS_CODE_HEALTHCHECK_FAILED 0 Failed to send a health check request to the plugin process.
STATUS_CODE_CONFIGURATION_ISSUE 0 Reported by plugin and it indicates an invalid or missing configuration.
STATUS_CODE_FAILED 0 The plugin process was stopped unexpectedly.
STATUS_CODE_STARTING 0 Start for plugin process was called.
STATUS_CODE_RESTARTING 1 Restart was called.
STATUS_CODE_STOPPING 0 Stop for plugin process was called.
STATUS_CODE_UPDATING 0 Update for plugin was called.
STATUS_CODE_CRITICAL 0 Reported by plugin.
STATUS_CODE_WARNING 0 Reported by plugin.
STATUS_CODE_JWT_EXPIRED 0 It is not possible to refresh JWT.
STATUS_CODE_UPDATE_FAILED 0 Problems with plugin update.
STATUS_CODE_INVALID 0 Unknown reason.

Infrastructure/AWS metrics

Metrics for AWS entities are collected by integrating SolarWinds Observability SaaS with your AWS cloud account. See AWS cloud platform monitoring.

API Gateway

Metric Units Description
AWS.ApiGateway.4XXError Count

4XXError. The total number of client-side errors for REST APIs captured in a given period.

AWS.ApiGateway.4xx Count

4xx. The total number of client-side errors for HTTP APIs captured in a given period.

AWS.ApiGateway.5XXError Count

5XXError. The total number of server-side errors for REST APIs captured in a given period.

AWS.ApiGateway.5xx Count

5xx. The total number of server-side errors for HTTP APIs captured in a given period.

AWS.ApiGateway.CacheHitCount Count

CacheHitCount. The total number of requests served from the API cache in a given period.

AWS.ApiGateway.CacheMissCount Count

CacheMissCount. The total number of requests served from the backend in a given period, when API caching is enabled.

AWS.ApiGateway.ClientError Count

ClientError. The total number of requests that have a 4XX response returned by API Gateway before the integration is invoked.

AWS.ApiGateway.ConnectCount Count

ConnectCount. The total number of messages sent to the connect route integration.

AWS.ApiGateway.Count Count

Count. The total number of API requests in a given period.

AWS.ApiGateway.DataProcessed bytes

DataProcessed. The total amount of data processed in bytes.

AWS.ApiGateway.ExecutionError Count

ExecutionError. The total number of errors that occurred when calling the integration.

AWS.ApiGateway.HttpRateOf5xxError Count

The number of HTTP 5xx errors (server-side errors) that occur in a given period for REST APIs

AWS.ApiGateway.IntegrationError Count

IntegrationError. The total number of requests that return a 4XX or 5XX response from the integration.

AWS.ApiGateway.IntegrationLatency milliseconds (ms)

IntegrationLatency. The average time between when API Gateway relays a request to the backend and when it receives a response from the backend.

AWS.ApiGateway.Latency milliseconds (ms)

Latency. The average time between when API Gateway receives a request from a client and when it returns a response to the client.

AWS.ApiGateway.MessageCount Count

MessageCount. The total number of messages sent to the WebSocket API, either from or to the client.

AWS.ApiGateway.RestRateOf5xxError Count

The number of 5xx errors for REST APIs.

AWS.ApiGateway.WebsocketRateOfExecutionError Count

The rate of execution errors for WebSocket APIs.

Application ELB

Metric Units Description
AWS.ApplicationELB.ActiveConnectionCount Count

ActiveConnectionCount. The total number of concurrent TCP connections active from clients to the load balancer and from the load balancer to targets.

AWS.ApplicationELB.AnomalousHostCount Count The number of hosts detected with anomalies.
AWS.ApplicationELB.ClientTLSNegotiationErrorCount Count

The number of TLS connections initiated by the client that did not establish a session with the load balancer due to a TLS error.

AWS.ApplicationELB.ConsumedLCUs Count

ConsumedLCUs. The total number of load balancer capacity units (LCU) used by load balancer.

AWS.ApplicationELB.DesyncMitigationMode_NonCompliant_Request_Count Count

The number of requests that do not comply with RFC 7230 and are classified as non-compliant.

AWS.ApplicationELB.DroppedInvalidHeaderRequestCount Count

The number of requests that were dropped because they contained invalid headers.

AWS.ApplicationELB.ELBAuthError Count

The number of authentication errors encountered by the load balancer.

AWS.ApplicationELB.ELBAuthFailure Count

The number of authentication failures.

AWS.ApplicationELB.ELBAuthLatency milliseconds (ms)

Time taken by the load balancer to authenticate requests. It includes the time from when the request is received to when the authentication process is completed.

AWS.ApplicationELB.ELBAuthRefreshTokenSuccess Count

The number of successful token refresh operations performed by the load balancer.

AWS.ApplicationELB.ELBAuthSuccess Count

The number of successful authentication attempts by the load balancer.

AWS.ApplicationELB.ELBAuthUserClaimsSizeExceeded Count

The number of authentication requests that were rejected because the size of the user claims exceeded the allowed limit.

AWS.ApplicationELB.ForwardedInvalidHeaderRequestCount Count

The number of requests with invalid headers that were forwarded to the backend servers. The load balancer forwards these requests even though they contain invalid headers.

AWS.ApplicationELB.GrpcRequestCount Count

The number of gRPC requests processed by the load balancer. It includes both IPv4 and IPv6 requests.

AWS.ApplicationELB.HealthyHostCount Count

HealthyHostCount. The average number of targets that are considered healthy.

AWS.ApplicationELB.HealthyHostRate Percent (%)

The rate at which the registered targets in an Application Load Balancer (ALB) are healthy.

AWS.ApplicationELB.HealthyStateDNS Count

Indicates the health status of the DNS endpoints for the ALB. It shows whether the DNS endpoints are healthy and able to route traffic correctly.

AWS.ApplicationELB.HealthyStateRouting Count

Reflects the health status of the routing components of the ALB. It indicates whether the load balancer is successfully routing traffic to healthy targets.

AWS.ApplicationELB.HTTP_Fixed_Response_Count Count

the number of HTTP responses with a status code of 503 (Service Unavailable) returned by the ALB.

AWS.ApplicationELB.HTTP_Redirect_Count Count

The number of HTTP responses with a status code of 301 (Moved Permanently) or 302 (Found) returned by the ALB.

AWS.ApplicationELB.HTTP_Redirect_Url_Limit_Exceeded_Count Count

The number of times the ALB has exceeded the limit for the number of URLs that can be included in HTTP redirect responses.

AWS.ApplicationELB.HTTPCode_ELB_3XX_Count Count

The number of HTTP responses with a status code in the 300-399 range (Multiple Choices, Redirection) returned by the ALB.

AWS.ApplicationELB.HTTPCode_ELB_4XX_Count Count

HTTPCode_ELB_4XX_Count. The total number of HTTP 4XX client error codes that originate from the load balancer.

AWS.ApplicationELB.HTTPCode_ELB_5XX_Count Count

HTTPCode_ELB_5XX_Count. The total number of HTTP 5XX client error codes that originate from the load balancer.

AWS.ApplicationELB.HTTPCode_ELB_500_Count Count

The number of HTTP 500 (Internal Server Error) responses returned by the Application Load Balancer (ALB).

AWS.ApplicationELB.HTTPCode_ELB_502_Count Count

The number of HTTP 502 (Bad Gateway) responses returned by the ALB. It indicates that the ALB received an invalid response from an inbound server while acting as a gateway or proxy.

AWS.ApplicationELB.HTTPCode_ELB_503_Count Count

The number of HTTP 503 (Service Unavailable) responses returned by the ALB. It indicates that the ALB is temporarily unable to handle the request, usually due to a temporary overloading or maintenance of the server.

AWS.ApplicationELB.HTTPCode_ELB_504_Count Count

The number of HTTP 504 (Gateway Timeout) responses returned by the ALB. It indicates that the ALB did not receive a timely response from an upstream server while acting as a gateway or proxy.

AWS.ApplicationELB.HTTPCode_Target_2XX_Count Count

The number of HTTP 2xx (Success) responses returned by the targets in response to the ALB. It indicates that the request was successfully processed by the target.

AWS.ApplicationELB.HTTPCode_Target_3XX_Count Count

The number of HTTP 3xx (Redirection) responses returned by the targets in response to the ALB. It indicates that further action needs to be taken by the client to complete the request.

AWS.ApplicationELB.HTTPCode_Target_4XX_Count Count

HTTPCode_Target_4XX_Count. The total number of HTTP response with 4xx status codes generated by the targets. This does not include any response codes generated by the load balancer.

AWS.ApplicationELB.HTTPCode_Target_5XX_Count Count

HTTPCode_Target_5XX_Count. The total number of HTTP response with 5xx status codes generated by the targets. This does not include any response codes generated by the load balancer.

AWS.ApplicationELB.IPv6ProcessedBytes bytes

The total number of bytes processed by the load balancer for IPv6 traffic.

AWS.ApplicationELB.IPv6RequestCount Count

The total number of IPv6 requests received by the load balancer.

AWS.ApplicationELB.LambdaInternalError Count

The number of errors that occurred within the Lambda function when it was invoked by the load balancer.

AWS.ApplicationELB.LambdaTargetProcessedBytes Count

The total number of bytes processed by the Lambda target.

AWS.ApplicationELB.LambdaUserError Count

The number of errors returned by the Lambda function due to user requests.

AWS.ApplicationELB.MitigatedHostCount Count The number of hosts mitigated by the load balancer to handle traffic.
AWS.ApplicationELB.NewConnectionCount Count

NewConnectionCount. The total number of new TCP connections established from clients to the load balancer and from the load balancer to targets.

AWS.ApplicationELB.NonStickyRequestCount Count

The number of requests that are not handled by sticky sessions. Sticky sessions ensure that a client's requests are always sent to the same target during a session. When sticky sessions are disabled, or if the load balancer cannot determine the session stickiness, the requests are considered non-sticky.

AWS.ApplicationELB.ProcessedBytes bytes

ProcessedBytes. The total number of bytes processed by the load balancer over IPv4 and IPv6 (HTTP header and HTTP payload).

AWS.ApplicationELB.RejectedConnectionCount Count

RejectedConnectionCount. The total number of connections that were rejected because the load balancer had reached its maximum number of connections.

AWS.ApplicationELB.RequestCount Count

RequestCount. The total number of requests processed over IPv4 and IPv6. This metric is only incremented for requests where the load balancer node was able to choose a target.

AWS.ApplicationELB.RequestCountPerTarget Count

RequestCountPerTarget. The total number of requests received by each target in a target group.

AWS.ApplicationELB.RuleEvaluations Count

This metric counts the number of times the rules defined for your Application Load Balancer (ALB) are evaluated. Each rule determines how the load balancer routes requests to the targets in one or more target groups.

AWS.ApplicationELB.TargetConnectionErrorCount Count

TargetConnectionErrorCount. The total number of connections that were not successfully established between the load balancer and target. This metric does not apply if the target is a Lambda function.

AWS.ApplicationELB.TargetResponseTime seconds (s)

TargetResponseTime. The average time elapsed, in seconds, after the request leaves the load balancer until a response from the target is received.

AWS.ApplicationELB.TargetResponseTime.p50 seconds (s)

The 50th percentile (median) of the target response times. It means that 50% of the responses have a lower response time, and 50% have a higher response time

AWS.ApplicationELB.TargetResponseTime.p90 seconds (s) The 90th percentile of the target response times. It indicates that 90% of the responses have a lower response time, and 10% have a higher response time.
AWS.ApplicationELB.TargetResponseTime.p95 seconds (s)

The 95th percentile of the target response times. It means that 95% of the responses have a lower response time, and 5% have a higher response time.

AWS.ApplicationELB.TargetResponseTime.p99 seconds (s)

The 99th percentile of the target response times. It indicates that 99% of the responses have a lower response time, and 1% have a higher response time.

AWS.ApplicationELB.TargetTLSNegotiationErrorCount Count

The number of TLS negotiation errors that occur when the load balancer tries to establish a secure connection with the target.

AWS.ApplicationELB.UnHealthyHostCount Count

UnhealthyHostCount. The average number of targets that are considered unhealthy.

AWS.ApplicationELB.UnhealthyRoutingRequestCount Count The number of requests routed to targets that are marked as unhealthy by the Application Load Balancer (ALB). It indicates how often requests are being sent to targets that may not be able to handle them properly.
AWS.ApplicationELB.UnhealthyStateDNS Count The health status of the DNS endpoints for the ALB when they are in an unhealthy state. It indicates issues with the DNS endpoints that could affect the routing of traffic.
AWS.ApplicationELB.UnhealthyStateRouting Count The health status of the routing components of the ALB when they are in an unhealthy state. It indicates issues with the load balancer's ability to route traffic correctly to healthy targets.

Aurora Cluster

Metric Units Description
AWS.RDS.AuroraGlobalDBReplicationLag milliseconds (ms)

AuroraGlobalDBReplicationLag. The total amount of lag when replicating updates from the primary AWS region.

AWS.RDS.AuroraVolumeBytesLeftTotal bytes

AuroraVolumeBytesLeftTotal. The total available space for the cluster volume.

AWS.RDS.BacktrackChangeRecordsCreationRate Count

BacktrackChangeRecordsCreationRate. The total number of backtrack change records created over five minutes for the DB cluster.

AWS.RDS.BacktrackChangeRecordsStored Count

BacktrackChangeRecordsCreationStored. The total number of backtrack change records used by the DB cluster.

AWS.RDS.ServerlessDatabaseCapacity Count

ServerlessDatabaseCapacity. The total current capacity of an Aurora Serverless DB cluster.

AWS.RDS.SnapshotStorageUsed bytes

SnapshotStorageUsed. The total amount of backup storage consumed by all Aurora snapshots for an Aurora DB cluster outside its backup retention window.

AWS.RDS.VolumeBytesUsed bytes

VolumeBytesUsed. The total amount of storage used by the Aurora DB instance.

AWS.RDS.VolumeReadIOPs Count

VolumeReadIOPs. The total number of billed read I/O operations from a cluster volume within a five-minute interval.

AWS.RDS.VolumeWriteIOPs Count

VolumeWriteIOPs. The total number of write disk I/O operations to the cluster volume, reported at five-minute intervals.

Aurora Instance

Metric Units Description
AWS.RDS.ActiveTransactions Count per second

ActiveTransactions. The total number of current transactions executing on an Aurora database instance per second.

AWS.RDS.AuroraReplicaLag milliseconds (ms)

AuroraReplicaLag. The total amount of lag when replicating updates from the primary instance.

AWS.RDS.CPUCreditBalance Count

CPUCreditBalance. The total number of CPU credits that an instance has accumulated, reported at five-minute intervals. You can use this metric to determine how long a DB instance can burst beyond its baseline performance level at a given rate.

AWS.RDS.CPUCreditUsage Count

CPUCreditUsage. The total number of CPU credits consumed during the specified period, reported at five-minute intervals. This metric measures the amount of time during which physical CPUs have been used for processing instructions by virtual CPUs allocated to the DB instance.

AWS.RDS.CPUUtilization Percent (%)

CPUUtilization. The total percentage of CPU used by an Aurora DB instance.

AWS.RDS.ConnectionAttempts Count

ConnectionAttempts. The total number of attempts to connect to an instance, whether successful or not.

AWS.RDS.DDLLatency milliseconds (ms)

DDLLatency. The total duration of requests such as example, create, alter, and drop requests.

AWS.RDS.DDLThroughput Count per second

DDLThroughput. The total number of DDL requests per second.

AWS.RDS.DMLLatency milliseconds (ms)

DMLLatency. The total duration of inserts, updates, and deletes.

AWS.RDS.DMLThroughput Count per second

DMLThroughput. The total number of inserts, updates, and deletes per second.

AWS.RDS.DatabaseConnections Count

DatabaseConnections. The total number of client network connections to the database instance.

AWS.RDS.FreeableMemory Binary Bytes

FreeableMemory. The total amount of available random access memory.

AWS.RDS.LoginFailures Count per second

LoginFailures. The total number of failed login attempts per second.

AWS.RDS.MaximumUsedTransactionIDs Count

MaximumUsedTransactionIDs. The total age of the oldest unvacuumed transaction ID, in transactions. If this value reaches 2,146,483,648 (2^31 - 1,000,000), the database is forced into read-only mode to avoid transaction ID wraparound.

AWS.RDS.ReadIOPS Count per second

ReadIOPS. The total number of disk I/O operations per second.

AWS.RDS.ReadLatency seconds (s)

ReadLatency. The total amount of time taken per disk I/O operation.

AWS.RDS.ReadThroughput bps

ReadThroughput. The total number of bytes read from disk per second.

AWS.RDS.TransactionLogsDiskUsage Megabytes (MB)

TransactionLogsDiskUsage. The average amount of disk space consumed by transaction logs on the Aurora PostgreSQL DB instance.

AWS.RDS.WriteIOPS Count per second

WriteIOPS. The total number of Aurora storage write records generated per second.

AWS.RDS.WriteLatency seconds (s)

WriteLatency. The total amount of time taken per disk I/O operation.

AWS.RDS.WriteThroughput bps

WriteThroughput. The total number of bytes written to persistent storage every second.

Auto Scaling Group

Metric Units Description
AWS.AutoScaling.GroupAndWarmPoolDesiredCapacity Count

The total number of instances that the Auto Scaling group and warm pool are attempting to maintain. It includes both the desired capacity of the Auto Scaling group and the instances in the warm pool.

AWS.AutoScaling.GroupAndWarmPoolTotalCapacity Count

The total number of instances in the Auto Scaling group and warm pool, including instances that are in service, pending, or terminating.

AWS.AutoScaling.GroupDesiredCapacity Count

GroupDesiredCapacity. The average number of instances that the Auto Scaling group attempts to maintain.

AWS.AutoScaling.GroupInServiceCapacity Count

The total number of instances that are currently in service in the Auto Scaling group. These instances are actively handling requests and are considered part of the desired capacity.

AWS.AutoScaling.GroupInServiceInstances Count

GroupInServiceInstances. The average number of instances that are running as part of the Auto Scaling group.

AWS.AutoScaling.GroupInServiceInstancesPercent Percent (%)

The percentage of instances in the Auto Scaling group that are currently in service. It is calculated as the number of in-service instances divided by the desired capacity, multiplied by 100.

AWS.AutoScaling.GroupMaxSize Count

GroupMaxSize. The average maximum size of the Auto Scaling group.

AWS.AutoScaling.GroupMinSize Count

GroupMinSize. The average minimum size of the Auto Scaling group.

AWS.AutoScaling.GroupPendingCapacity Count

The number of instances that are in the process of launching but are not yet in service. These instances are pending and have not yet started handling requests.

AWS.AutoScaling.GroupPendingInstances Count

GroupPendingInstances. The average number of instances that are pending.

AWS.AutoScaling.GroupStandbyCapacity Count

The number of instances in a standby state within an Auto Scaling group. Standby instances are running but not actively serving traffic.

AWS.AutoScaling.GroupStandbyInstances Count

GroupStandbyInstances. The average number of instances that are in standby state.

AWS.AutoScaling.GroupTerminatingCapacity Count

The number of instances that are in the process of terminating and being removed from the Auto Scaling group. These instances are no longer handling requests.

AWS.AutoScaling.GroupTerminatingInstances Count

GroupTerminatingInstances. The average number of instances that are in the process of terminating.

AWS.AutoScaling.GroupTotalCapacity Count

The total number of instances in the Auto Scaling group, including instances that are in service, pending, and terminating.

AWS.AutoScaling.GroupTotalInstances Count

GroupTotalInstances. The average number of total instances.

AWS.AutoScaling.PredictiveScalingCapacityForecast Count A forecast of the capacity needed for predictive scaling. It analyzes historical load data to predict future capacity requirements, helping to proactively scale your resources.
AWS.AutoScaling.PredictiveScalingLoadForecast Count Predictions of hourly load values based on historical load data from CloudWatch and an analysis of historical trends. It helps in forecasting future capacity needs to proactively scale the Auto Scaling group.
AWS.AutoScaling.PredictiveScalingMetricPairCorrelation Count The correlation between a load metric and a scaling metric used in predictive scaling policies. A strong correlation ensures that the predictive scaling policy can accurately forecast and adjust capacity.
AWS.AutoScaling.WarmPoolDesiredCapacity Count

The desired number of instances in the warm pool. The warm pool is a group of pre-initialized instances that can be quickly started to handle sudden increases in load.

AWS.AutoScaling.WarmPoolMinSize Count

The minimum number of instances that should be maintained in the warm pool. The warm pool is a group of pre-initialized EC2 instances that can quickly respond to scale-out events.

AWS.AutoScaling.WarmPoolPendingCapacity Count

The number of instances that are currently being initialized or are in the process of becoming available in the warm pool.

AWS.AutoScaling.WarmPoolTerminatingCapacity Count

The number of instances in the warm pool that are currently being terminated.

AWS.AutoScaling.WarmPoolTotalCapacity Count

The total number of instances in the warm pool, including both pending and active instances.

AWS.AutoScaling.WarmPoolWarmedCapacity Count

The number of instances in the warm pool that are fully initialized and ready to serve traffic.

Certificate Manager

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of AWS Certificate Manager entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awscertificatemanager.

AWS.CertificateManager.DaysToExpiry Count The number of days until a certificate expires. ACM stops publishing the metrics after a certificate expires.
AWS.CertificateManager.CertificateArn   The Amazon Resource Name (ARN) of the certificate.

CloudFront

Metric Units Description
AWS.CloudFront.4xxErrorRate Percent (%)

4xx error rate. The average percentage of all viewer requests for which the response's HTTP status code is 4xx.

AWS.CloudFront.5xxErrorRate Percent (%)

5xx error rate. The average percentage of all viewer requests for which the response's HTTP status code is 5xx.

AWS.CloudFront.BytesDownloaded bytes

Bytes downloaded. The average number of bytes downloaded by viewers for GET, HEAD, and OPTIONS requests.

AWS.CloudFront.BytesUploaded bytes

Bytes uploaded. The average number of bytes that viewers uploaded to your origin with CloudFront using POST and PUT requests.

AWS.CloudFront.CacheHitRate Percent (%) The percentage of viewer requests that are served directly from the CloudFront cache without needing to fetch the content from the origin server. A higher cache hit rate indicates better performance and reduced latency.
AWS.CloudFront.OriginalLatency milliseconds (ms) The time taken by the origin server to respond with the first byte of the requested content. It helps in understanding the performance of the origin server and the overall latency experienced by end users.
AWS.CloudFront.Requests Count

Requests. The total number of viewer requests received by CloudFront for all HTTP methods and for both HTTP and HTTPS requests.

AWS.CloudFront.TotalErrorRate Percent (%)

Total error rate. The average percentage of all viewer requests for which the response's HTTP status code is 4xx or 5xx.

Direct Connect

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of AWS Direct Connect entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awsdirectconnect.

aws.dx.ConnectionBpsEgress bit per second The bit rate for outbound data from the AWS side of the connection
aws.dx.ConnectionBpsIngress bit per second The bit rate for inbound data to the AWS side of the connection
aws.dx.ConnectionEncryptionState Count The encryption state of an AWS Direct Connect connection. It shows whether the data traversing the connection is encrypted or not.
aws.dx.ConnectionCRCErrorCount Count The number of times cyclic redundancy check (CRC) errors are observed for the data received at the connection
aws.dx.ConnectionErrorCount Count The number of errors that occur on an AWS Direct Connect connection. This metric helps you monitor the health and stability of your Direct Connect connection by providing insights into the frequency and types of errors encountered.
aws.dx.ConnectionLightLevelRx Count Indicates the health of the fiber connection for ingress (inbound) traffic to the AWS side of the connection
aws.dx.ConnectionLightLevelTx Count Indicates the health of the fiber connection for egress (outbound) traffic from the AWS side of the connection
aws.dx.ConnectionPpsEgress Count per second The packet rate for outbound data from the AWS side of the connection
aws.dx.ConnectionPpsIngress Count per second The packet rate for inbound data to the AWS side of the connection
aws.dx.ConnectionState Boolean The state of the connection. 0 indicates DOWN and 1 indicates UP
aws.dx.VirtualInterfaceBpsEgress bps The bitrate for outbound data from the AWS side of the virtual interface. It represents the amount of data leaving AWS in bits per second (bps).
aws.dx.VirtualInterfaceBpsIngress bps The bitrate for inbound data to the AWS side of the virtual interface. It represents the amount of data coming into AWS in bits per second (bps).
aws.dx.VirtualInterfacePpsEgress Count per second The packet rate for outbound data from the AWS side of the virtual interface. It represents the number of packets leaving AWS per second.
aws.dx.VirtualInterfacePpsIngress Count per second The packet rate for inbound data to the AWS side of the virtual interface. It represents the number of packets coming into AWS per second.

DynamoDB

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of DynamoDB entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awsdynamodb.

AWS.DynamoDB.AccountMaxReads Count The maximum number of read capacity units that can be provisioned across all tables in your AWS account.
AWS.DynamoDB.AccountMaxTableLevelReads Count The maximum number of read capacity units that can be provisioned for a single table or global secondary index in your AWS account.
AWS.DynamoDB.AccountMaxTableLevelWrites Count The maximum number of write capacity units that can be provisioned for a single table or global secondary index in your AWS account.
AWS.DynamoDB.AccountMaxWrites Count The maximum number of write capacity units that can be provisioned across all tables in your AWS account.
AWS.DynamoDB.AccountProvisionedReadCapacityUtilization Percent (%) The percentage of provisioned read capacity units that are being used across all tables in your AWS account.
AWS.DynamoDB.AgeOfOldestUnreplicatedRecord milliseconds (ms) The age of the oldest record in a DynamoDB table that has not yet been replicated.

AWS.DynamoDB.ConditionalCheckFailedRequests

Count

The number of failed attempts to perform conditional writes.

The PutItem, UpdateItem, and DeleteItem operations let you provide a logical condition that must evaluate to true before the operation can proceed. If this condition evaluates to false, ConditionalCheckFailedRequests is incremented by one.

AWS.DynamoDB.ConsumedChangeDataCaptureUnits Count The number of consumed units for change data capture operations.

AWS.DynamoDB.ConsumedReadCapacityUnits

Count

The number of read capacity units consumed over the specified time period, so you can track how much of your provisioned throughput is used. You can retrieve the total consumed read capacity for a table and all of its global secondary indexes, or for a particular global secondary index.

AWS.DynamoDB.ConsumedWriteCapacityUnits

Count

The number of write capacity units consumed over the specified time period, so you can track how much of your provisioned throughput is used. You can retrieve the total consumed write capacity for a table and all of its global secondary indexes, or for a particular global secondary index.

AWS.DynamoDB.FailedToReplicateRecordCount Count The number of records that failed to replicate.

AWS.DynamoDB.OnlineIndexConsumedWriteCapacity

Count

The number of write capacity units consumed when adding a new global secondary index to a table. If the write capacity of the index is too low, incoming write activity during the backfill phase might be throttled; this can increase the time it takes to create the index.

You should monitor this statistic while the index is being built to determine whether the write capacity of the index is underprovisioned.

AWS.DynamoDB.OnlineIndexPercentageProgress

Count

The percentage of completion when a new global secondary index is being added to a table. DynamoDB must first allocate resources for the new index, and then backfill attributes from the table into the index. For large tables, this process might take a long time.

You should monitor this statistic to view the relative progress as DynamoDB builds the index.

AWS.DynamoDB.OnlineIndexThrottleEvents

Count

The number of write throttle events that occur when adding a new global secondary index to a table. These events indicate that the index creation will take longer to complete, because incoming write activity is exceeding the provisioned write throughput of the index.

AWS.DynamoDB.PendingReplicationCount Count The number of item updates that have been written to one replica but have not yet been written to another replica.

AWS.DynamoDB.ProvisionedReadCapacityUnits

Count

The number of provisioned read capacity units for a table or a global secondary index.

AWS.DynamoDB.ProvisionedWriteCapacityUnits

Count

The number of provisioned write capacity units for a table or a global secondary index.

AWS.DynamoDB.ReadThroAccountProvisionedWriteCapacityUtilizationttleEvents Percent (%) This metric measures the number of read throttling events due to provisioned write capacity utilization.

AWS.DynamoDB.ReadThrottleEvents

Count

Requests to DynamoDB that exceed the provisioned read capacity units for a table or a global secondary index.

AWS.DynamoDB.ReplicationLatency milliseconds (ms) The time between when an updated item appears in the DynamoDB stream for one replica and when it appears in another replica.

AWS.DynamoDB.ReturnedBytes

Binary Bytes

The number of bytes returned by GetRecords operations (Amazon DynamoDB Streams) during the specified time period.

AWS.DynamoDB.ReturnedItemCount

Count

The number of items returned by Query or Scan operations during the specified time period.

AWS.DynamoDB.ReturnedRecordsCount

Count

The number of stream records returned by GetRecords operations (Amazon DynamoDB Streams) during the specified time period.

AWS.DynamoDB.SuccessfulRequestLatency

milliseconds (ms)

Successful requests to DynamoDB or Amazon DynamoDB Streams during the specified time period.

AWS.DynamoDB.SystemErrors

Count

Requests to DynamoDB or Amazon DynamoDB Streams that generate an HTTP 500 status code during the specified time period.

AWS.DynamoDB.TimeToLiveDeletedItemCount

Count

The number of items deleted by Time To Live (TTL) during the specified time period. This metric helps you monitor the rate of TTL deletions on your table.

AWS.DynamoDB.ThrottledPutRecordCount Count The number of put records that were throttled.

AWS.DynamoDB.ThrottledRequests

Count

Requests to DynamoDB that exceed the provisioned throughput limits on a resource (such as a table or an index).

AWS.DynamoDB.TransactionConflict Count The number of transaction conflicts that occurred.

AWS.DynamoDB.UserErrors

Count

Requests to DynamoDB or Amazon DynamoDB Streams that generate an HTTP 400 status code during the specified time period.

AWS.DynamoDB.WriteThrottleEvents

Count

Requests to DynamoDB that exceed the provisioned write capacity units for a table or a global secondary index.

EBS

Metric Units Description
AWS.EBS.AverageReadLatency milliseconds (ms)

AverageReadLatency. The average time required to complete a read request during the specified time period.

AWS.EBS.AverageWriteLatency milliseconds (ms)

AverageWriteLatency. The average time required to complete a write request during the specified time period.

AWS.EBS.BurstBalance Percent (%)

Used with General Purpose SSD (gp2), Throughput Optimized HDD (st1) and Cold HDD (sc1) volumes only. Provides information about the percentage of I/O credits (for gp2) or throughput credits (for st1 and sc1) remaining in the burst bucket.

AWS.EBS.FastSnapshotRestoreCreditsBalance Count The number of credits available for fast snapshot restore operations. These credits are used to accelerate the snapshot restoration process, and having a balance of credits ensures that you can perform fast snapshot restores when needed.
AWS.EBS.FastSnapshotRestoreCreditsBucketSize Count The maximum number of credits that can be stored in the credit bucket for fast snapshot restore operations. It helps you understand the total capacity of credits you can accumulate for performing these accelerated restores.
AWS.EBS.VolumeConsumedReadWriteOps Count

VolumeConsumedReadWriteOps. The total amount of read and write operations (normalized to 256K capacity units) consumed during the specified time period.

AWS.EBS.VolumeIdleTime seconds (s)

The total number of seconds in a specified period of time when no read or write operations were submitted.

AWS.EBS.VolumeQueueLength Count

VolumeQueueLength. The number of read and write operation requests waiting to be completed during the specified time period.

AWS.EBS.VolumeReadBytes Binary Bytes

VolumeReadBytes. The total number of bytes transferred by read operations during the specified time period.

AWS.EBS.VolumeReadOps Count

VolumeReadOps. The total number of read operations during the specified time period. Read operations are counted on completion.

AWS.EBS.VolumeStalledIOCheck Count The number of stalled I/O operations on an EBS volume. It helps identify potential performance issues or bottlenecks related to I/O operations.
AWS.EBS.VolumeThroughputPercentage Percent (%)

VolumeThroughputPercentage. The percentage of I/O operations per second (IOPS) delivered of the total IOPS provisioned for an Amazon EBS volume.

AWS.EBS.VolumeTotalOps Count

The total number of I/O operations performed on an EBS volume. It includes both read and write operations and provides an overall view of the volume's activity.

AWS.EBS.VolumeTotalReadTime seconds (s)

The total number of seconds spent by input operations that completed in a specified period of time.

AWS.EBS.VolumeTotalWriteTime seconds (s)

The total number of seconds spent by output operations that completed in a specified period of time.

AWS.EBS.VolumeWriteBytes Binary Bytes

VolumeWriteBytes. The total number of bytes transferred by write operations during the specified time period.

AWS.EBS.VolumeWriteOps Count

VolumeWriteOps. The total number of write operations during the specified time period. Write operations are counted on completion.

EC2

Metric Units Description
AWS.EC2.CPUCreditBalance Count

For T2 Instances. The number of CPU credits available for the instance to burst beyond its base CPU utilization. Credits are stored in the credit balance after they are earned and removed from the credit balance after they expire. Credits expire 24 hours after they are earned.

AWS.EC2.CPUCreditUsage Count

For T2 Instances. The number of CPU credits consumed by the instance. One CPU credit equals one vCPU running at 100% utilization for one minute or an equivalent combination of vCPUs, utilization, and time (for example, one vCPU running at 50% utilization for two minutes or two vCPUs running at 25% utilization for two minutes).

AWS.EC2.CPUSurplusCreditBalance Count

The number of CPU credits that an instance has accumulated beyond its baseline performance level.

AWS.EC2.CPUSurplusCreditsCharged Count

The number of CPU credits that have been consumed above the baseline performance level.

AWS.EC2.CPUUtilization Percent (%)

The percentage of allocated EC2 compute units that are currently in use on the instance. This metric identifies the processing power required to run an application upon a selected instance.

AWS.EC2.DedicatedHostCPUUtilization Percent (%)

The percentage of CPU utilization on a dedicated host. It helps in monitoring the overall CPU usage of instances running on a dedicated host.

AWS.EC2.DiskIOps

The number of read and write operations per second (IOPS) on the instance store volumes of an EC2 instance.

AWS.EC2.DiskReadBytes bytes

Bytes read from all instance store volumes available to the instance. This metric is used to determine the volume of the data the application reads from the hard disk of the instance. This can be used to determine the speed of the application.

AWS.EC2.DiskReadOps Count

Completed read operations from all instance store volumes available to the instance in a specified period of time.

AWS.EC2.DiskWriteBytes bytes

Bytes written to all instance store volumes available to the instance. This metric is used to determine the volume of the data the application writes onto the hard disk of the instance. This can be used to determine the speed of the application.

AWS.EC2.DiskWriteOps Count

Completed write operations to all instance store volumes available to the instance in a specified period of time.

AWS.EC2.EBSByteBalance Percent (%)

The percentage of throughput credits remaining in the burst bucket for your EBS volumes.

AWS.EC2.EBSIOBalance Percent (%)

The percentage of I/O credits remaining in the burst bucket for your EBS volumes.

AWS.EC2.EBSReadBytes bytes

The total number of bytes read from your EBS volumes per second.

AWS.EC2.EBSReadOps Count

The total number of read operations (I/O operations) performed on your EBS volumes per second.

AWS.EC2.EBSWriteBytes bytes

The total number of bytes written to Amazon Elastic Block Store (EBS) volumes per second.

AWS.EC2.EBSWriteOps Count

The total number of write operations (I/O operations) performed on EBS volumes per second.

AWS.EC2.MetadataNoToken Count

The number of requests to the Instance Metadata Service (IMDS) that did not include a token.

AWS.EC2.MetadataNoTokenRejected Count The number of requests to the Instance Metadata Service (IMDS) that were rejected because they did not include a token.
AWS.EC2.NetworkIO bps

The total network input/output (I/O) operations per second for an EC2 instance.

AWS.EC2.NetworkIn bytes

The number of bytes received on all network interfaces by the instance. This metric identifies the volume of incoming network traffic to a single instance.

AWS.EC2.NetworkOut bytes

The number of bytes sent out on all network interfaces by the instance. This metric identifies the volume of outgoing network traffic from a single instance.

AWS.EC2.NetworkPacketsIn Count

The number of packets received on all network interfaces by the instance. This metric identifies the volume of incoming traffic in terms of the number of packets on a single instance. This metric is available for basic monitoring only.

AWS.EC2.NetworkPacketsOut Count

The number of packets sent out on all network interfaces by the instance. This metric identifies the volume of outgoing traffic in terms of the number of packets on a single instance. This metric is available for basic monitoring only.

AWS.EC2.StatusCheckFailed Count

Reports whether the instance has passed both the instance status check and the system status check in the last minute.This metric can be either 0 (passed) or 1 (failed).

AWS.EC2.StatusCheckFailed_AttachedEBS Count Indicates whether there is a failure in the status check related to attached EBS volumes.
AWS.EC2.StatusCheckFailed_Instance boolean

Reports whether the instance has passed the instance status check in the last minute.This metric can be either 0 (passed) or 1 (failed).

AWS.EC2.StatusCheckFailed_System boolean

Indicates whether there is a failure in the system status check, detecting underlying problems with the AWS systems on which your instance runs, such as hardware or network issues.

ECS Cluster

Metric Unit Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of AWS ECS Cluster entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awsecs.

AWS.ECS.CPUUtilization Percent (%) The percentage of CPU units that is used by the cluster.
AWS.ECS.MemoryUtilization Percent (%) The percentage of memory in use by the cluster.
AWS.ECS.GPUReservation Percent (%) The percentage of total available GPUs that are reserved by running tasks in the cluster.
AWS.ECS.EBSFilesystemUtilization Percent (%) The percentage of the Amazon EBS filesystem that is used by tasks in a service.
AWS.ECS.ActiveConnectionCount Count The total number of concurrent connections active from clients to the Amazon ECS Service Connect proxies that run in tasks.
AWS.ECS.NewConnectionCount Count The total number of new connections established from clients to the Amazon ECS Service Connect proxies that run in tasks.
AWS.ECS.ProcessedBytes bytes The total number of bytes of inbound traffic processed by the Service Connect proxies.
AWS.ECS.RequestCount Count The number of inbound traffic requests processed by the Service Connect proxies.
AWS.ECS.GrpcRequestCount Count The number of GRPC inbound traffic requests processed by the Service Connect proxies.
AWS.ECS.HTTPCode_Target_2XX_Count Count The number of HTTP response codes with numbers 200 to 299 generated by the applications in the tasks.
AWS.ECS.HTTPCode_Target_3XX_Count Count The number of HTTP response codes with numbers 300 to 399 generated by the applications in the tasks.
AWS.ECS.HTTPCode_Target_4XX_Count Count The number of HTTP response codes with numbers 400 to 499 generated by the applications in the tasks. 
AWS.ECS.HTTPCode_Target_5XX_Count Count The number of HTTP response codes with numbers 500 to 599 generated by the applications in the tasks.
AWS.ECS.RequestCountPerTarget Count The average number of requests received by each target.
AWS.ECS.TargetProcessedBytes bytes The total number of bytes processed by the Service Connect proxies.
AWS.ECS.TargetResponseTime milliseconds (ms) The time elapsed, in milliseconds, after the request reached the Service Connect proxy in the target task until a response from the target application is received back to the proxy.
AWS.ECS.ClientTLSNegotiationErrorCount Count The total number of times the TLS connection failed.
AWS.ECS.TargetTLSNegotiationErrorCount Count The total number of times the TLS connection failed due to missing client certificates.
AWS.ECS.CPUReservation Percent (%) The percentage of CPU units that are reserved in the cluster.
AWS.ECS.MemoryReservation Percent (%) The percentage of memory that is reserved by running tasks in the cluster.

ECS Service

Metric Unit Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of AWS ECS Service entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awsecs.

AWS.ECS.CPUUtilization Percent (%) The percentage of CPU units that is used by the cluster.
AWS.ECS.MemoryUtilization Percent (%) The percentage of memory in use by the cluster.
AWS.ECS.EBSFilesystemUtilization Percent (%) The percentage of the Amazon EBS filesystem that is used by tasks in a service.
AWS.ECS.ActiveConnectionCount Count The total number of concurrent connections active from clients to the Amazon ECS Service Connect proxies that run in tasks.
AWS.ECS.NewConnectionCount Count The total number of new connections established from clients to the Amazon ECS Service Connect proxies that run in tasks.
AWS.ECS.ProcessedBytes bytes The total number of bytes of inbound traffic processed by the Service Connect proxies.
AWS.ECS.RequestCount Count The number of inbound traffic requests processed by the Service Connect proxies.
AWS.ECS.GrpcRequestCount Count The number of GRPC inbound traffic requests processed by the Service Connect proxies.
AWS.ECS.HTTPCode_Target_2XX_Count Count The number of HTTP response codes with numbers 200 to 299 generated by the applications in the tasks.
AWS.ECS.HTTPCode_Target_3XX_Count Count The number of HTTP response codes with numbers 300 to 399 generated by the applications in the tasks.
AWS.ECS.HTTPCode_Target_4XX_Count Count The number of HTTP response codes with numbers 400 to 499 generated by the applications in the tasks. 
AWS.ECS.HTTPCode_Target_5XX_Count Count The number of HTTP response codes with numbers 500 to 599 generated by the applications in the tasks.
AWS.ECS.RequestCountPerTarget Count The average number of requests received by each target.
AWS.ECS.TargetProcessedBytes bytes The total number of bytes processed by the Service Connect proxies.
AWS.ECS.TargetResponseTime milliseconds (ms) The time elapsed, in milliseconds, after the request reached the Service Connect proxy in the target task until a response from the target application is received back to the proxy.
AWS.ECS.ClientTLSNegotiationErrorCount Count The total number of times the TLS connection failed.
AWS.ECS.TargetTLSNegotiationErrorCount Count The total number of times the TLS connection failed due to missing client certificates.

EFS

Metric Units Description
AWS.EFS.BurstCreditBalance bytes

BurstCreditBalance. The average number of burst credits that a file system has. Burst credits allow a file system to burst to throughput levels above a file system’s baseline level for periods of time.

AWS.EFS.ClientConnections Count

ClientConnections. The total number of client connections to a file system. When using a standard client, there is one connection per mounted Amazon EC2 instance.

AWS.EFS.DataReadIOBytes bytes

DataReadIOBytes. The average number of bytes for each file system read operation.

AWS.EFS.DataWriteIOBytes bytes

DataWriteIOBytes. The average number of bytes for each file system write operation.

AWS.EFS.MetadataIOBytes bytes

MetadataIOBytes. The average number of bytes for each metadata operation.

AWS.EFS.MeteredIOBytes bytes

MeteredIOBytes. The average number of metered bytes for each file system operation, including data read, data write, and metadata operations, with read operations metered at one-third the rate of other operations.

AWS.EFS.PercentIOLimit Percent (%)

PercentIOLimit. How close a file system is to reaching the I/O limit of the General Purpose performance mode. Data is available only for file systems running with General Purpose performance mode.

AWS.EFS.PermittedThroughput bps

PermittedThroughput. The maximum amount of throughput that a file system can drive.

AWS.EFS.StorageBytes bytes

StorageBytes. The average size of the file system in bytes, including the amount of data stored in the EFS Standard and EFS Standard–Infrequent Access (EFS Standard-IA) storage classes.

AWS.EFS.TimeSinceLastSync seconds (s)

TimeSinceLastSync. The average amount of time that has passed since the last successful sync to the destination file system in a replication configuration.

AWS.EFS.TotalIOBytes bytes

TotalIOBytes. The total number of bytes for each file system operation, including data read, data write, and metadata operations. This is the actual amount that your application is driving, and not the throughput the file system is being metered at.

Elastic Beanstalk

Metric Units Description

AWS.ElasticBeanstalk.ApplicationLatencyP99.9

  • ApplicationLatencyP95
  • ApplicationLatencyP90
  • ApplicationLatencyP85
  • ApplicationLatencyP75
  • ApplicationLatencyP50
  • ApplicationLatencyP10
milliseconds (ms)

P99.9. The average latency for the slowest x percent of requests over the last 10 seconds, where x is the difference between the number and 100. For example, p99 1.403 indicates the slowest 1% of requests over the last 10 seconds had an average latency of 1.403 seconds.

AWS.ElasticBeanstalk.ApplicationRequests2xx Count

Status 2xx. The average number of requests over the last 10 seconds that resulted in a status code greater than or equal to 200 but less than 300.

AWS.ElasticBeanstalk.ApplicationRequests3xx Count

Status 3xx. The average number of requests over the last 10 seconds that resulted in a status code greater than or equal to 300 but less than 400.

AWS.ElasticBeanstalk.ApplicationRequests4xx Count

Status 4xx. The average number of requests over the last 10 seconds that resulted in a status code greater than or equal to 400 but less than 500.

AWS.ElasticBeanstalk.ApplicationRequests5xx Count

Status 5xx. The average number of requests over the last 10 seconds that resulted in a status code greater than or equal to 500 but less than 600.

AWS.ElasticBeanstalk.ApplicationRequestsTotal Count

Request Count. The average number of requests handled by the web server per second over the last 10 seconds.

AWS.ElasticBeanstalk.CPUIdle Percent (%) Percentage of time that the CPU has spent in the Idle state over the last 10 seconds.
AWS.ElasticBeanstalk.CPUIowait Percent (%) Percentage of time that the CPU has spent in the I/O Wait state over the last 10 seconds. Available on Linux environments only.
AWS.ElasticBeanstalk.CPUIrq Count Percentage of time that the CPU has spent in the IRQ (Interrupt Request) state over the last 10 seconds. Available on Linux environments only.
AWS.ElasticBeanstalk.CPUNice Percent (%) Percentage of time that the CPU has spent in the Nice state over the last 10 seconds. Available on Linux environments only.
AWS.ElasticBeanstalk.CPUPriveleged Percent (%) Percentage of time that the CPU has spent in the Privileged state over the last 10 seconds. Available on Linux environments only.
AWS.ElasticBeanstalk.CPUSoftirq Percent (%) Percentage of time that the CPU has spent in the SoftIRQ state over the last 10 seconds. Available on Linux environments only.
AWS.ElasticBeanstalk.CPUSystem Percent (%) Percentage of time that the CPU has spent in the System state over the last 10 seconds. Available on Linux environments only.
AWS.ElasticBeanstalk.CPUUser Count Percentage of time that the CPU has spent in the User state over the last 10 seconds.
AWS.ElasticBeanstalk.EnvironmentHealth Count

The health status of the environment. The possible values are 0 (OK), 1 (Info), 5 (Unknown), 10 (No data), 15 (Warning), 20 (Degraded) and 25 (Severe).

AWS.ElasticBeanstalk.InstancesDegraded Count The number of instances in your Elastic Beanstalk environment that are in a degraded state, meaning they are not functioning optimally and may be impacting the performance of your application.
AWS.ElasticBeanstalk.InstanceHealth Count Information about the health of instances in your Elastic Beanstalk environment. It includes attributes such as health status, color, causes, application metrics, and more.
AWS.ElasticBeanstalk.InstancesInfo Count Information about the Amazon EC2 instances in your Elastic Beanstalk environment, including instance IDs, types, and other relevant details.
AWS.ElasticBeanstalk.InstancesNoData Count The number of instances in your Elastic Beanstalk environment that are not reporting any data, which could suggest issues with data collection or instance health.
AWS.ElasticBeanstalk.InstancesOk Count The number of instances in your Elastic Beanstalk environment that are functioning correctly and passing health checks.
AWS.ElasticBeanstalk.InstancesPending Count The number of instances in your Elastic Beanstalk environment that are in a pending state, meaning they are being provisioned or are not yet fully operational.
AWS.ElasticBeanstalk.InstancesSevere Count The number of instances in your Elastic Beanstalk environment that are in a severe state, meaning they are experiencing critical issues that require immediate attention.
AWS.ElasticBeanstalk.InstancesUnknown Count The number of instances whose health status is unknown, meaning Elastic Beanstalk is unable to determine their health status.
AWS.ElasticBeanstalk.InstancesWarning Count The number of instances in your environment that are in a warning state, indicating potential issues that may need to be addressed but are not critical.
AWS.ElasticBeanstalk.LoadAverage1min Count The 1-minute load average of your instances, which is an indicator of the average number of processes that are either in a runnable or uninterruptible state over the past minute.
AWS.ElasticBeanstalk.RootFilesystemUtil Percent (%) The percentage of the root file system's disk space that is being used on your instances.
AWS.ElasticBeanstalk.Status5xxPercent Percent (%)

The percentage of HTTP requests to your instances that resulted in server errors (status codes 5xx).

ElastiCache Memcached

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of ElastiCache Memcached entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awselasticachememcached.

AWS.ElastiCache.CPUCreditBalance Count The number of earned CPU credits that an instance has accrued since it was launched or started.
AWS.ElastiCache.CPUCreditUsage Count The number of CPU credits spent by the instance for CPU utilization.
AWS.ElastiCache.CPUUtilization Percent (%) The percentage of CPU utilization for the entire host.
AWS.ElastiCache.CurrConnections Count The number of connections connected to the cache at an instant in time.
AWS.ElastiCache.Evictions Count The number of non-expired items the cache evicted to allow space for new writes.
AWS.ElastiCache.FreeableMemory bytes The amount of free memory available on the host.
AWS.ElastiCache.NetworkBandwidthInAllowanceExceeded Count The number of packets shaped because the inbound aggregate bandwidth exceeded the maximum for the instance.
AWS.ElastiCache.NetworkBandwidthOutAllowanceExceeded Count The number of packets shaped because the outbound aggregate bandwidth exceeded the maximum for the instance.
AWS.ElastiCache.NetworkBytesIn bytes The number of bytes the host has read from the network.
AWS.ElastiCache.NetworkBytesOut bytes The number of bytes sent out on all network interfaces by the instance.
AWS.ElastiCache.NetworkConntrackAllowanceExceeded Count The number of packets shaped because connection tracking exceeded the maximum for the instance and new connections could not be established.
AWS.ElastiCache.NetworkMaxBytesIn bytes The maximum burst of received bytes within each minute.
AWS.ElastiCache.NetworkMaxBytesOut bytes The maximum burst of transmitted bytes within each minute.
AWS.ElastiCache.NetworkMaxPacketsIn Count The maximum burst of received packets within each minute.
AWS.ElastiCache.NetworkMaxPacketsOut Count The maximum burst of transmitted packets within each minute.
AWS.ElastiCache.NetworkPacketsIn Count The number of packets received on all network interfaces by the instance.
AWS.ElastiCache.NetworkPacketsOut Count The number of packets sent out on all network interfaces by the instance.
AWS.ElastiCache.NetworkPacketsPerSecondAllowanceExceeded Count The number of packets shaped because the bidirectional packets per second exceeded the maximum for the instance.
AWS.ElastiCache.NewConnections Count The number of new connections the cache has received.
AWS.ElastiCache.NewItems Count The number of new items the cache has stored.
AWS.ElastiCache.SwapUsage bytes The amount of swap used on the host.
AWS.ElastiCache.UnusedMemory bytes The amount of memory not used by data.

ElastiCache Redis

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of ElastiCache Redis entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awselasticacheredis.

AWS.ElastiCache.ActiveDefragHits Count The count of value reallocations per minute performed by the active defragmentation process.
AWS.ElastiCache.AuthenticationFailures Count The total count of failed attempts to authenticate to Redis using the AUTH command.
AWS.ElastiCache.BytesReadFromDisk bytes The total count of bytes read from disk per minute. Supported only for clusters using Data tiering.
AWS.ElastiCache.BytesReadIntoMemcached bytes The number of bytes read from the network by the cache node.
AWS.ElastiCache.BytesUsedForCache bytes The total count of bytes allocated by Redis for all purposes, including the dataset, buffers, and so on.
AWS.ElastiCache.BytesUsedForCacheItems bytes The number of bytes used to store cache items.
AWS.ElastiCache.BytesUsedForHash bytes The number of bytes currently used by hash tables.
AWS.ElastiCache.BytesWrittenOutFromMemcached bytes The number of bytes written to the network by the cache node.
AWS.ElastiCache.BytesWrittenToDisk bytes The total count of bytes written to disk per minute. Supported only for clusters using Data tiering.
AWS.ElastiCache.CacheHitRate Percent (%) Indicates the usage efficiency of the Redis instance.
AWS.ElastiCache.CacheHits Count The count of successful read-only key lookups in the main dictionary.
AWS.ElastiCache.CacheMisses Count The count of unsuccessful read-only key lookups in the main dictionary.
AWS.ElastiCache.CasBadval Count The number of CAS (check and set) requests where the CAS value provided did not match the stored CAS value.
AWS.ElastiCache.CasHits Count The number of CAS requests where the requested key was found and the CAS value matched.
AWS.ElastiCache.CasMisses Count The number of CAS requests where the requested key was not found.
AWS.ElastiCache.ChannelAuthorizationFailures Count The total count of failed attempts by users to access channels they do not have permission to access.
AWS.ElastiCache.ClusterBasedCmds Count The total number of commands executed on your ElastiCache cluster.
AWS.ElastiCache.ClusterBasedCmdsLatency microseconds The latency of commands executed on your ElastiCache cluster.
AWS.ElastiCache.CmdConfigGet Count The number of CONFIG GET commands executed on your ElastiCache cluster.
AWS.ElastiCache.CmdConfigSet Count The number of CONFIG SET commands executed on your ElastiCache cluster.
AWS.ElastiCache.CmdFlush Count The number of FLUSH commands executed on your ElastiCache cluster.
AWS.ElastiCache.CmdGets Count The number of GET commands executed on your ElastiCache cluster.
AWS.ElastiCache.CmdSet Count The number of SET commands executed on your ElastiCache cluster.
AWS.ElastiCache.CmdTouch Count The number of TOUCH commands executed on your ElastiCache cluster.
AWS.ElastiCache.CommandAuthorizationFailures Count The total count of failed attempts by users to run commands they don’t have permission to call.
AWS.ElastiCache.CPUCreditBalance minutes The count of earned CPU credits that an instance has accrued since it was launched or started.
AWS.ElastiCache.CPUCreditUsage minutes The count of CPU credits spent by the instance for CPU utilization.
AWS.ElastiCache.CPUUtilization Percent (%) The percentage of CPU utilization for the entire host. Because Redis is single-threaded, we recommend you monitor EngineCPUUtilization metric for nodes with 4 or more vCPUs.
AWS.ElastiCache.CurrConfig Count The current configuration of the ElastiCache cluster. It includes details about the settings and parameters that are currently applied to the cluster.
AWS.ElastiCache.CurrConnections Count The count of client connections, excluding connections from read replicas.
AWS.ElastiCache.CurrItems Count The count of items in the cache.
AWS.ElastiCache.CurrVolatileItems Count Total count of keys in all databases that have a ttl set.
AWS.ElastiCache.DatabaseCapacityUsageCountedForEvictPercentage Percent (%) Percentage of the total data capacity for the cluster that is in use, excluding the memory used for overhead and COB.
AWS.ElastiCache.DatabaseCapacityUsagePercentage Percent (%) The percentage of the database's capacity that is currently being used.
AWS.ElastiCache.DatabaseMemoryUsagecountedForEvictpercentage Percent (%) percentage of the memory for the cluster that is in use, excluding memory used for overhead and COB.
AWS.ElastiCache.DatabaseMemoryUsagePercentage Percent (%) percentage of the memory for the cluster that is in use.
AWS.ElastiCache.DBOAverageTTL milliseconds (ms) Exposes avg_ttl of DBO from the keyspace statistic of Redis INFO command.
AWS.ElastiCache.DecrHits Count The number of successful decrement operations (decr) where the requested key was found in the cache.
AWS.ElastiCache.DecrMisses Count The number of decrement operations (decr) where the requested key was not found in the cache.
AWS.ElastiCache.DeleteHits Count The number of successful delete operations (del) where the requested key was found in the cache.
AWS.ElastiCache.DeleteMisses Count The number of delete operations (del) where the requested key was not found in the cache.
AWS.ElastiCache.ElastiCacheProcessingUnits Count The total number of ElastiCacheProcessingUnits (ECPUs) consumed by the requests executed on your cache.
AWS.ElastiCache.EngineCPUUtilization Percent (%) Provides CPU utilization of the Redis engine thread.
AWS.ElastiCache.EvalBasedCmds Count The total number of EVAL-based commands executed on your ElastiCache cluster.
AWS.ElastiCache.EvalBasedCmdsLatency microseconds The latency of EVAL-based commands.
AWS.ElastiCache.EvictedUnfetched Count The number of valid items that were evicted from the cache because they were never fetched after being set. These items were removed to make space for new writes.
AWS.ElastiCache.Evictions Count The count of keys that have been evicted due to the maxmemory limit.
AWS.ElastiCache.ExpiredUnfetched Count The number of items that expired and were reclaimed from the cache because they were never fetched after being set. These items were removed to make space for new writes.
AWS.ElastiCache.FreeableMemory bytes The amount of free memory available on the host.
AWS.ElastiCache.GeoSpatialBasedCmds Count The number of geospatial commands executed per second.
AWS.ElastiCache.GeoSpatialBasedCmdsLatency microseconds The average latency for geospatial commands.
AWS.ElastiCache.GetHits Count The number of successful get commands (for example, the requested key was found) per second.
AWS.ElastiCache.GetMisses Count The number of unsuccessful get commands (for example, the requested key was not found) per second.
AWS.ElastiCache.GetTypeCmds Count The number of commands of a specific type executed per second.
AWS.ElastiCache.GetTypeCmdsLatency microseconds The average latency for commands of a specific type.
AWS.ElastiCache.GlobalDatastoreReplicationLag seconds (s) This is the lag between the secondary Region's primary node and the primary Region's primary node.
AWS.ElastiCache.HashBasedCmds Count The total number of commands executed on the cache that are based on hash tables.
AWS.ElastiCache.HashBasedCmdsLatency microseconds The latency of commands executed on the cache that are based on hash tables.
AWS.ElastiCache.HyperLogLogBasedCmds Count The total number of commands executed on the cache that are based on HyperLogLog data structures.
AWS.ElastiCache.HyperLogLogBasedCmdsLatency Count The latency of commands executed on the cache that are based on HyperLogLog data structures.
AWS.ElastiCache.IamAuthenticationExpirations Count The total count of expired IAM-authenticated Redis connections.
AWS.ElastiCache.IamAuthenticationThrottling Count The total count of throttled IAM-authenticated Redis AUTH or HELLO requests.
AWS.ElastiCache.IncrHits Count The number of successful increment operations (incr) where the requested key was found in the cache and the increment operation was successfully performed.
AWS.ElastiCache.IncrMisses Count The number of increment operations (incr) where the requested key was not found in the cache, resulting in a miss.
AWS.ElastiCache.IsMaster Count Indicates whether the node is the primary node of current shard/cluster.
AWS.ElastiCache.JsonBasedCmds Count The total number of JSON-based commands executed in your ElastiCache cluster.
AWS.ElastiCache.JsonBasedCmdsLatency microseconds The latency of JSON-based commands executed in your ElastiCache cluster.
AWS.ElastiCache.JsonBasedGetCmds Count The number of JSON-based GET commands executed in your ElastiCache cluster.
AWS.ElastiCache.JsonBasedGetCmdsLatency microseconds The latency of JSON-based GET commands executed in your ElastiCache cluster.
AWS.ElastiCache.JsonBasedSetCmds Count The number of JSON-based SET commands executed in your ElastiCache cluster.
AWS.ElastiCache.JsonBasedSetCmdsLatency microseconds The latency of JSON-based SET commands executed in your ElastiCache cluster.
AWS.ElastiCache.KeyAuthorizationFailures Count The total count of failed attempts by users to access keys they don’t have permission to access
AWS.ElastiCache.KeyBasedCmds Count The total number of key-based commands executed on your ElastiCache cluster. Key-based commands include operations like GET, SET, and DELETE.
AWS.ElastiCache.KeyBasedCmdsLatency milliseconds (ms) The latency of key-based commands executed on your ElastiCache cluster.
AWS.ElastiCache.KeysTracked Count The count of keys being tracked by Redis key tracking as a percentage of tracking-table-max-keys
AWS.ElastiCache.ListBasedCmds Count The total number of list-based commands executed on the cache. Examples of list-based commands include LPOP, LPUSH, or LRANGE.
AWS.ElastiCache.ListBasedCmdsLatency microseconds The latency for executing list-based commands on the cache.
AWS.ElastiCache.MasterLinkHealthStatus Count The health status of the master link in a replication group.
AWS.ElastiCache.MemoryFragmentationRatio Count Indicates the efficiency in the allocation of memory of the Redis engine
AWS.ElastiCache.NetworkBandwidthInAllowanceExceeded Count The count of packets queued or dropped because the inbound aggregate bandwidth exceeded the maximum for the instance.
AWS.ElastiCache.NetworkBandwidthOutAllowanceExceeded Count The count of packets queued or dropped because the outbound aggregate bandwidth exceeded the maximum for the instance.
AWS.ElastiCache.NetworkBytesIn bytes The count of bytes the host has read from the network.
AWS.ElastiCache.NetworkBytesOut bytes The count of bytes sent out on all network interfaces by the instance.
AWS.ElastiCache.NetworkConntrackAllowanceExceeded Count The count of packets dropped because connection tracking exceeded the maximum for the instance and new connections could not be established
AWS.ElastiCache.NetworkMaxBytesIn bytes The maximum burst of received bytes within each minute.
AWS.ElastiCache.NetworkMaxBytesOut bytes The maximum burst of transmitted bytes within each minute.
AWS.ElastiCache.NetworkMaxPacketsIn Count The maximum burst of received packets within each minute.
AWS.ElastiCache.NetworkMaxPacketsOut Count The maximum burst of transmitted packets within each minute.
AWS.ElastiCache.NetworkPacketsIn Count The count of packets received on all network interfaces by the instance
AWS.ElastiCache.NetworkPacketsOut Count The count of packets sent out on all network interfaces by the instance
AWS.ElastiCache.NetworkPacketsPerSecondAllowanceExceeded Count The count of packets queued or dropped because the bidirectional packets per second exceeded the maximum for the instance.
AWS.ElastiCache.NewConnections Count The total number of new connections accepted by the cache server during a specific period.
AWS.ElastiCache.NewItems Count The number of new items added to the cache.
AWS.ElastiCache.NonKeyTypeCmds Count The total number of non-key type commands executed on the cache. Examples include HGETALL, HSET, ZADD.
AWS.ElastiCache.NonKeyTypeCmdsLatency microseconds The latency for executing non-key type commands on the cache.
AWS.ElastiCache.NumItemsReadFromDisk Count The total count of items retrieved from disk per minute.
AWS.ElastiCache.NumItemsWrittenToDisk Count The total number of items written to disk by the ElastiCache cluster.
AWS.ElastiCache.PubSubBasedCmds Count The number of publish/subscribe commands executed in the ElastiCache cluster.
AWS.ElastiCache.PubSubBasedCmdsLatency microseconds The latency of publish/subscribe commands.
AWS.ElastiCache.Reclaimed Count The number of items that have been evicted from the cache due to expiration.
AWS.ElastiCache.ReplicationBytes Count The number of bytes transferred between the primary and replica nodes in a replication group.
AWS.ElastiCache.ReplicationLag seconds (s) The time difference (lag) between the primary node and its read replicas. It's crucial for monitoring the replication delay.
AWS.ElastiCache.SaveInProgress Count The percentage of time the system is actively saving data to disk.
AWS.ElastiCache.SetBasedCmds Count The number of commands executed that modify the cache, such as SET commands in Redis.
AWS.ElastiCache.SetBasedCmdsLatency microseconds The latency (response time) for SET based commands.
AWS.ElastiCache.SetTypeCmds Count The number of SET type commands executed, which include commands like HSET, SADD.
AWS.ElastiCache.SetTypeCmdsLatency milliseconds (ms) The latency for SET type commands.
AWS.ElastiCache.SlabsMoved Count The number of memory slabs moved during memory allocation and deallocation operations.
AWS.ElastiCache.SortedSetBasedCmds Count The total number of commands executed on your ElastiCache cluster that are based on sorted sets.
AWS.ElastiCache.SortedSetBasedCmdsLatency microseconds The latency of commands executed on your ElastiCache cluster that are based on sorted sets.
AWS.ElastiCache.StreamBasedCmds Count The total number of commands executed on your ElastiCache cluster that are based on streams.
AWS.ElastiCache.StreamBasedCmdsLatency microseconds The latency of commands executed on your ElastiCache cluster that are based on streams.
AWS.ElastiCache.StringsBasedCmds Count The total number of commands executed on your ElastiCache cluster that are based on strings.
AWS.ElastiCache.StringsBasedCmdsLatency microseconds The latency of commands executed on your ElastiCache cluster that are based on strings.
AWS.ElastiCache.SuccessfulReadRequestLatency microseconds Latency of successful read requests.
AWS.ElastiCache.SuccessfulWriteRequestLatency microseconds Latency of successful write requests.
AWS.ElastiCache.SwapUsage bytes The amount of swap used on the host.
AWS.ElastiCache.TotalCmdsCount Count Total count of all commands executed on your cache.
AWS.ElastiCache.TouchHits Count The number of times items in the cache were accessed (touched) and found to be valid.
AWS.ElastiCache.TouchMisses Count The number of times items in the cache were accessed (touched) but were not found, indicating a cache miss.

ELB

Metric Units Description
AWS.ELB.BackendConnectionErrors Count

BackendConnectionErrors. The total number of connections that were not successfully established between the load balancer and the registered instances.

AWS.ELB.BackendConnectionErrorsRate Percent (%)

The rate at which connections between the load balancer and backend instances fail. It includes retries and health check-related errors.

AWS.ELB.DesyncMitigationMode_NonCompliant_Request_Count Count

The number of requests that do not comply with RFC 7230, which are potentially harmful and could lead to HTTP desync attacks.

AWS.ELB.EstimatedALBActiveConnectionCount Count The total number of concurrent TCP connections from clients to the load balancer and from the load balancer to targets.
AWS.ELB.EstimatedALBConsumedLCUs Count per second The number of Load Balancer Capacity Units (LCUs) consumed by the Application Load Balancer.
AWS.ELB.EstimatedALBNewConnectionCount Count The number of new TCP connections initiated from clients to the load balancer.
AWS.ELB.EstimatedProcessedBytes bytes The total number of bytes processed by the load balancer.
AWS.ELB.HTTPCode_Backend_2XX Count

The number of HTTP 2XX status codes returned by the backend instances. These status codes indicate successful responses.

AWS.ELB.HTTPCode_Backend_3XX Count

The number of HTTP 3XX status codes returned by the backend instances. These status codes indicate redirection responses.

AWS.ELB.HTTPCode_Backend_4XX Count

The number of HTTP 4XX status codes returned by the backend instances. These status codes indicate client error responses, such as Bad Request or Not Found.

AWS.ELB.HTTPCode_Backend_5XX Count

The number of HTTP 5XX status codes returned by the backend instances. These status codes indicate server error responses, such as Internal Server Error or Service Unavailable.

AWS.ELB.HTTPCode_ELB_4XX Count

HTTPCode_ELB_4XX. The total number of HTTP 4XX client error codes generated by the load balancer.

AWS.ELB.HTTPCode_ELB_5XX Count

HTTPCode_ELB_5XX. The total number of HTTP 5XX client error codes generated by the load balancer.

AWS.ELB.HealthyHostCount Count

healthyHostCount. The average number of healthy instances registered with your load balancer.

AWS.ELB.HealthyHostPercent Percent (%)

The percentage of healthy hosts in a target group over a specified period.

AWS.ELB.HttpCodeELB5xxRate Percent (%)

The rate of HTTP 5xx error codes (server errors) returned by the load balancer.

AWS.ELB.Latency milliseconds (ms)

The time it takes for the load balancer to respond to requests. It includes the time spent processing the request and the time spent waiting for a response from the backend server.

AWS.ELB.Latency.p50 milliseconds (ms)

The 50th percentile (median) of the latency metric. It represents the middle value of the latency distribution, meaning 50% of the requests have a lower latency and 50% have a higher latency.

AWS.ELB.Latency.p95 milliseconds (ms)

The 95th percentile of the latency metric. It represents the latency below which 95% of the requests fall, providing a sense of the higher end of the latency distribution.

AWS.ELB.Latency.p99 milliseconds (ms)

The 99th percentile of the latency metric. It represents the latency below which 99% of the requests fall, giving you an idea of the very high end of the latency distribution.

AWS.ELB.RequestCount Count

RequestCount. The total number of requests completed or connections made during the specified interval

AWS.ELB.SpilloverCount Count

SpilloverCount. The total number of requests that were rejected because the surge queue is full.

AWS.ELB.SurgeQueueLength Count

SurgeQueueLength. The total number of requests (HTTP listener) or connections (TCP listener) that are pending routing to a healthy instance.

AWS.ELB.UnHealthyHostCount Count

UnHealthyHostCount. The average number of unhealthy instances registered with your load balancer. An instance is considered unhealthy after it exceeds the unhealthy threshold configured for health checks.

FSx

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of FSx entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awsfsx.

AWS.FSx.CapacityPoolReadBytes Count

The total number of bytes read from the capacity pool by clients.

AWS.FSx.CapacityPoolReadOperations Count

The number of read operations performed on the capacity pool by clients.

AWS.FSx.CapacityPoolWriteBytes Count

The total number of bytes written to the capacity pool by clients.

AWS.FSx.CapacityPoolWriteOperations Count

The number of write operations performed on the capacity pool by clients.

AWS.FSx.ClientConnections Count The total number of active connections between clients and the file server.
AWS.FSx.CompressionRatio ratio Average ratio of compressed storage usage to uncompressed storage usage.
AWS.FSx.CPUUtilization Percent (%)

The average percentage utilization of your file server’s CPU resources.

AWS.FSx.DataReadBytes bytes

Total number of bytes for file system read operations.

AWS.FSx.DataReadOperations Count

Total number of read operations.

AWS.FSx.DataReadOperationsPercent Percent (%)

The percentage of read operations performed by clients on the file system.

AWS.FSx.DataReadOperationTime seconds (s)

Total time spent within the file system for read operations (network I/O) from clients accessing data in the volume.

AWS.FSx.DataReadThroughputPercent Percent (%)

The percentage of network throughput utilized for read operations.

AWS.FSx.DataWriteBytes bytes

Total number of bytes for file system write operations.

AWS.FSx.DataWriteOperations Count

Total number of write operations.

AWS.FSx.DataWriteOperationsPercent Percent (%)

The percentage of write operations performed by clients on the file system.

AWS.FSx.DataWriteOperationTime seconds (s)

Total time spent within the file system for fulfilling write operations (network I/O) from clients accessing data in the volume.

AWS.FSx.DataWriteThroughputPercent Percent (%)

The percentage of network throughput utilized for write operations. It measures how much of the available write throughput capacity is being used.

AWS.FSx.DeduplicationSavedStorage bytes The average amount of storage space saved by data deduplication, if enabled.
AWS.FSx.DiskIopsUtilization Percent (%)

The average disk IOPS between your file server and storage volumes, as a percentage of the provisioned IOPS limit determined by the storage volumes.

AWS.FSx.DiskReadBytes bytes

Total number of bytes for read operations that access storage volumes.

AWS.FSx.DiskReadOperations Count Total number of read operations for the file server accessing storage volumes.
AWS.FSx.DiskThroughputBalance Percent (%) The average percentage of available burst credits for disk throughput for the storage volumes.
AWS.FSx.DiskThroughputUtilization Percent (%) The average disk throughput between your file server and its storage volumes, as a percentage of the provisioned limit determined by the storage volumes.
AWS.FSx.DiskWriteBytes bytes Total number of bytes for write operations that access storage volumes.
AWS.FSx.DiskWriteOperations Count Total number of write operations for the file server accessing storage volumes.
AWS.FSx.FilesCapacity Count The total number of files (or inodes) that can be created on the volume.
AWS.FSx.FileServerCacheHitRatio Percent (%) The ratio of cache hits to the total number of cache requests. A higher cache hit ratio indicates better performance as more data is served from the cache rather than from the disk.
AWS.FSx.FileServerDiskIopsBalance Percent (%) The average percentage of available burst credits for disk IOPS between your file server and its storage volumes.
AWS.FSx.FileServerDiskIopsUtilization Percent (%) The average disk IOPS between your file server and storage volumes, as a percentage of the provisioned limit determined by throughput capacity.
AWS.FSx.FileServerDiskThroughputBalance Percent (%) The average percentage of available burst credits for disk throughput between your file server and its storage volumes.
AWS.FSx.FileServerDiskThroughputUtilization Percent (%) The average disk throughput between your file server and its storage volumes, as a percentage of the provisioned limit determined by throughput capacity.
AWS.FSx.FilesUsed Count The total number of used files (or inodes) on the volume.
AWS.FSx.FreeDataStorageCapacity bytes The average amount of available storage capacity.
AWS.FSx.FreeStorageCapacity bytes The average amount of available storage capacity.
AWS.FSx.LogicalDataStored bytes The average amount of logical data stored on the file system, considering both the SSD tier and the capacity pool tier.
AWS.FSx.LogicalDiskUsage bytes The average amount of logical data stored (uncompressed).
AWS.FSx.MemoryUtilization Percent (%) The average percentage utilization of your file server’s memory resources.
AWS.FSx.MetadataOperations Count The average number of metadata operations.
AWS.FSx.MetadataOperationTime seconds (s) Total time spent within the file system for fulfilling metadata operations (network I/O) from clients that are accessing data in the volume.
AWS.FSx.NetworkReceivedBytes bytes The total number of bytes received by the file system, including data movement to and from linked data repositories.
AWS.FSx.NetworkSentBytes bytes The total number of bytes sent by the file system, including data movement to and from linked data repositories.
AWS.FSx.NetworkThroughputUtilization Percent (%) The average network throughput for clients accessing the file system, as a percentage of the provisioned limit.
AWS.FSx.NfsBadCalls Count Average number of calls rejected by the NFS server Remote Procedure Call (RPC) mechanism.
AWS.FSx.PhysicalDiskUsage bytes The average amount of storage physically occupied by file system data (compressed).
AWS.FSx.StorageCapacity bytes The average storage capacity of the primary (SSD) tier.
AWS.FSx.StorageCapacityUtilization Percent (%) The used physical storage capacity as a average percentage of total storage capacity.
AWS.FSx.StorageEfficiencySavings bytes The amount of storage savings achieved through data deduplication and compression techniques.
AWS.FSx.StorageUsed bytes The average amount of physical data stored on the file system, on both the primary (SSD) tier and the capacity pool tier.
AWS.FSx.UsedStorageCapacity bytes The total storage used on the volume.

Kinesis Data Firehose

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Kinesis Firehose entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awsfirehose.

AWS.Firehose.ActivePartitionsLimit Count The maximum number of active partitions that a Firehose stream can process before sending data to the error bucket.
AWS.Firehose.BackupToS3.Bytes bytes The number of bytes that have been backed up to Amazon S3.
AWS.Firehose.BackupToS3.Success Count The number of successful backup operations to Amazon S3.
AWS.Firehose.BytesPerSecondLimit bps The maximum number of bytes that can be processed per second.
AWS.Firehose.DataReadFromKinesisStream.Bytes bytes The number of bytes read from the Kinesis data stream.
AWS.Firehose.DataReadFromKinesisStream.Records Count The number of records read from the Kinesis data stream.
AWS.Firehose.DataReadFromSource.Backpressured bytes Indicates whether the data source is backpressured, meaning it is temporarily unable to accept more data.
AWS.Firehose.DataReadFromSource.Bytes bytes The number of raw bytes read from the source database.
AWS.Firehose.DataReadFromSource.Records Count The number of records read from the source database.
AWS.Firehose.DeliveryToAmazonOpenSearchServerless.AuthFailure Count The number of delivery attempts that failed due to authentication issues.
AWS.Firehose.DeliveryToAmazonOpenSearchServerless.Bytes bytes The number of bytes delivered to Amazon OpenSearch Serverless.
AWS.Firehose.DeliveryToAmazonOpenSearchServerless.DataFreshness seconds (s) The age of the oldest record in the delivery stream, measured from the time Firehose ingested the data to the present time.
AWS.Firehose.DeliveryToAmazonOpenSearchServerless.DeliveryRejected Count The number of delivery attempts that were rejected.
AWS.Firehose.DeliveryToAmazonOpenSearchServerless.Records Count The number of records delivered to Amazon OpenSearch Serverless.
AWS.Firehose.DeliveryToAmazonOpenSearchServerless.Success Count The number of successful deliveries to Amazon OpenSearch Serverless.
AWS.Firehose.DeliveryToAmazonOpenSearchService.AuthFailure Count The number of delivery failures due to authentication issues when delivering to Amazon OpenSearch Service.
AWS.Firehose.DeliveryToAmazonOpenSearchService.Bytes bytes The total number of bytes delivered to Amazon OpenSearch Service.
AWS.Firehose.DeliveryToAmazonOpenSearchService.DataFreshness seconds (s) The age of the oldest record in the delivery stream, measured from the time Firehose ingested the data to the present time.
AWS.Firehose.DeliveryToAmazonOpenSearchService.DeliveryRejected Count The number of delivery attempts that were rejected by Amazon OpenSearch Service.
AWS.Firehose.DeliveryToAmazonOpenSearchService.Records Count The number of records successfully delivered to Amazon OpenSearch Service.
AWS.Firehose.DeliveryToAmazonOpenSearchService.Success Count The number of successful deliveries to Amazon OpenSearch Service.

AWS.Firehose.DeliveryToElasticsearch.Bytes

bytes

The number of bytes indexed to Amazon ES over the specified time period.

AWS.Firehose.DeliveryToElasticsearch.Records

Count

The number of records indexed to Amazon ES over the specified time period.

AWS.Firehose.DeliveryToElasticsearch.Success

Count

The sum of the successfully indexed records over the sum of records that were attempted.

AWS.Firehose.DeliveryToHttpEndpoint.Bytes bytes The number of bytes sent to the HTTP endpoint.
AWS.Firehose.DeliveryToHttpEndpoint.DataFreshness seconds (s) The age of the oldest record in the delivery stream, measured from the time Firehose ingested the data to the present time.
AWS.Firehose.DeliveryToHttpEndpoint.ProcessedBytes bytes The number of bytes processed by Firehose for delivery to the HTTP endpoint.
AWS.Firehose.DeliveryToHttpEndpoint.ProcessedRecords Count The number of records processed by Firehose for delivery to the HTTP endpoint.
AWS.Firehose.DeliveryToHttpEndpoint.Record Count The number of individual records sent to the HTTP endpoint.
AWS.Firehose.DeliveryToHttpEndpoint.Success Count The number of successful deliveries to the HTTP endpoint.

AWS.Firehose.DeliveryToRedshift.Bytes

bytes

The number of bytes copied to Amazon Redshift over the specified time period.

AWS.Firehose.DeliveryToRedshift.Records

Count

The number of records copied to Amazon Redshift over the specified time period.

AWS.Firehose.DeliveryToRedshift.Success

Count

The sum of successful Amazon Redshift COPY commands over the sum of all Amazon Redshift COPY commands.

AWS.Firehose.DeliveryToS3.Bytes

bytes

The number of bytes delivered to Amazon S3 over the specified time period.

AWS.Firehose.DeliveryToS3.DataFreshness

seconds (s)

The age (from getting into Kinesis Firehose to now) of the oldest record in Kinesis Firehose. Any record older than this age has been delivered to the S3 bucket.

AWS.Firehose.DeliveryToS3.ObjectCount Count The number of objects that are being delivered to your S3 bucket.

AWS.Firehose.DeliveryToS3.Records

Count

The number of records delivered to Amazon S3 over the specified time period.

AWS.Firehose.DeliveryToS3.Success

Count

The sum of successful Amazon S3 put commands over the sum of all Amazon S3 put commands.

AWS.Firehose.DescribeDeliveryStream.Latency

milliseconds (ms)

The time taken per DescribeDeliveryStream operation, measured over the specified time period.

AWS.Firehose.DescribeDeliveryStream.Requests

Count

The total number of DescribeDeliveryStream requests.

AWS.Firehose.FailedValidation.Bytes bytes The number of bytes that failed validation during data processing.
AWS.Firehose.FailedValidation.Records Count The number of records that failed validation during data processing.

AWS.Firehose.IncomingBytes

bytes

The number of bytes ingested into the Kinesis Firehose stream over the specified time period.

AWS.Firehose.IncomingPutRequests Count The number of incoming put requests to the Firehose stream.

AWS.Firehose.IncomingRecords

Count

The number of records ingested into the Kinesis Firehose stream over the specified time period.

AWS.Firehose.JQProcessing.Duration milliseconds (ms) The amount of time it took to execute the JQ expression in the JQ Lambda function.
AWS.Firehose.KafkaOffsetLag Count The difference between the last record written to the Kafka topic and the last record processed by the consumer.
AWS.Firehose.KMSKeyAccessDenied Count Indicates that access to the KMS key was denied. It usually means that the necessary permissions are not set correctly for the Kinesis Data Firehose to use the KMS key.
AWS.Firehose.KMSKeyDisabled Count Indicates that the KMS key is disabled and cannot be used.
AWS.Firehose.KMSKeyInvalidState Count Indicates that the KMS key is in an invalid state and cannot be used.
AWS.Firehose.KMSKeyNotFound Count Indicates that the KMS key was not found. It usually means that the specified key does not exist or the Firehose delivery stream is not configured correctly to use the key.

AWS.Firehose.ListDeliveryStreams.Latency

milliseconds (ms)

The time taken per ListDeliveryStream operation, measured over the specified time period.

AWS.Firehose.ListDeliveryStreams.Requests

Count

The total number of ListFirehose requests.

AWS.Firehose.PartitionCount Count The number of partitions that are currently being used in the delivery stream. It helps you monitor the distribution of data across partitions.
AWS.Firehose.PartitionCountExceeded Count Indicates that the number of partitions being used has exceeded the configured limit.
AWS.Firehose.PerPartitionThroughput bps The throughput for each partition in the delivery stream.

AWS.Firehose.PutRecord.Bytes

bytes

The number of bytes put to the Kinesis Firehose delivery stream using PutRecord over the specified time period.

AWS.Firehose.PutRecord.Latency

milliseconds (ms)

The time taken per PutRecord operation, measured over the specified time period.

AWS.Firehose.PutRecord.Requests

Count

The total number of PutRecord requests, which is equal to total number of records from PutRecord operations.

AWS.Firehose.PutRecordBatch.Bytes

bytes

The number of bytes put to the Kinesis Firehose delivery stream using PutRecordBatch over the specified time period.

AWS.Firehose.PutRecordBatch.Latency

milliseconds (ms)

The time taken per PutRecordBatch operation, measured over the specified time period.

AWS.Firehose.PutRecordBatch.Records

Count

The total number of records from PutRecordBatch operations.

AWS.Firehose.PutRecordBatch.Requests

Count

The total number of PutRecordBatch requests.

AWS.Firehose.PutRequestsPerSecondLimit Count per second The maximum number of put requests that can be processed per second by the Firehose delivery stream.
AWS.Firehose.RecordsPerSecondLimit Count per second The maximum number of records that can be processed per second by the Firehose delivery stream.
AWS.Firehose.SourceThrottled.Delay milliseconds (ms) The amount of time that records were delayed due to throttling at the data source.
AWS.Firehose.ThrottledRecords Count The number of records that were throttled (temporarily paused) due to exceeding the processing capacity of the Firehose delivery stream.

AWS.Firehose.UpdateDeliveryStream.Latency

milliseconds (ms)

The time taken per UpdateDeliveryStream operation, measured over the specified time period.

AWS.Firehose.UpdateDeliveryStream.Requests

Count

The total number of UpdateDeliveryStream requests.

Kinesis Data Stream

Basic Stream-level

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Kinesis Data Stream entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awskinesis.

AWS.Kinesis.GetRecords.Bytes

bytes

The number of bytes retrieved from the Kinesis stream, measured over the specified time period.

AWS.Kinesis.GetRecords.IteratorAgeMilliseconds

milliseconds (ms)

The age of the last record in all GetRecords calls made against an Kinesis stream, measured over the specified time period.

AWS.Kinesis.GetRecords.Latency

milliseconds (ms)

The time taken per GetRecords operation, measured over the specified time period.

AWS.Kinesis.GetRecords.Records

Count

The number of records retrieved from the shard, measured over the specified time period. Minimum, Maximum, and Average statistics represent the records in a single GetRecords operation for the stream in the specified time period.

AWS.Kinesis.GetRecords.Success

Count

The number of successful GetRecords operations per stream, measured over the specified time period.

AWS.Kinesis.IncomingBytes

bytes

The number of bytes successfully put to the Kinesis stream over the specified time period.

AWS.Kinesis.IncomingRecords

Count

The number of records successfully put to the Kinesis stream over the specified time period.

AWS.Kinesis.PutRecord.Bytes

bytes

The number of bytes put to the Kinesis stream using the PutRecord operation over the specified time period.

AWS.Kinesis.PutRecord.Latency

milliseconds (ms)

The time taken per PutRecord operation, measured over the specified time period.

AWS.Kinesis.PutRecord.Success

Count

The number of successful PutRecord operations per Kinesis stream, measured over the specified time period.

AWS.Kinesis.PutRecords.Bytes

Binary Bytes

The number of bytes put to the Kinesis stream using the PutRecords operation over the specified time period.

AWS.Kinesis.PutRecords.FailedRecords Count The number of records that failed to be added to the Kinesis data stream. It helps in identifying issues with data ingestion.

AWS.Kinesis.PutRecords.Latency

ms

The time taken per PutRecords operation, measured over the specified time period.

AWS.Kinesis.PutRecords.PutRecords.ThrottledRecords Count The number of records that were throttled (temporarily paused) due to exceeding the provisioned throughput for the stream. It helps in monitoring and managing the data flow.

AWS.Kinesis.PutRecords.Success

Count

The number of PutRecords operations where at least one record succeeded, per Kinesis stream, measured over the specified time period.

AWS.Kinesis.PutRecords.SuccessfulRecords Count The number of records that were successfully added to the Kinesis data stream. It provides insights into the overall success rate of data ingestion.

AWS.Kinesis.PutRecords.TotalRecords

Count

The number of successful records in a PutRecords operation per Kinesis stream, measured over the specified time period.

AWS.Kinesis.ReadProvisionedThroughputExceeded

Count

The number of GetRecords calls throttled for the stream over the specified time period.

AWS.Kinesis.uptime bytes The uptime or availability of the Kinesis Data Streams service. It helps in monitoring the reliability and performance of the service.

AWS.Kinesis.WriteProvisionedThroughputExceeded

Count

The number of records rejected due to throttling for the stream over the specified time period. This metric includes throttling from PutRecord and PutRecords operations.

Enhanced Shard-level

Metric Units Description

AWS.Kinesis.IncomingBytes

bytes

The number of bytes successfully put to the shard over the specified time period.

AWS.Kinesis.IncomingRecords

Count

The number of records successfully put to the shard over the specified time period.

AWS.Kinesis.IteratorAgeMilliseconds

milliseconds (ms)

The age of the last record in all GetRecords calls made against a shard, measured over the specified time period.

AWS.Kinesis.OutgoingBytes

bytes

The number of bytes retrieved from the shard, measured over the specified time period.

AWS.Kinesis.OutgoingRecords

Count

The number of records retrieved from the shard, measured over the specified time period.

AWS.Kinesis.ReadProvisionedThroughputExceeded

Count

The number of GetRecords calls throttled for the shard over the specified time period. This exception count covers all dimensions of the following limits: 5 reads per shard per second or 2 MB per second per shard.

AWS.Kinesis.SubscribeToShard.RateExceeded Count per second The number of times the rate limit was exceeded when calling SubscribeToShard. It helps you monitor and manage throttling issues.
AWS.Kinesis.SubscribeToShard.Success Count The number of successful SubscribeToShard calls. It helps you track the success rate of your subscription requests.
AWS.Kinesis.SubscribeToShardEvent.Bytes bytes The number of bytes received from the shard in a SubscribeToShardEvent. It helps you monitor the volume of data being processed.
AWS.Kinesis.SubscribeToShardEvent.MillisBehindLatest milliseconds (ms) The number of milliseconds the consumer is behind the latest record in the shard. A value of zero means the consumer is up-to-date with the stream.
AWS.Kinesis.SubscribeToShardEvent.Records Count The number of records received in a SubscribeToShardEvent. It helps you monitor the number of records being processed.
AWS.Kinesis.SubscribeToShardEvent.Success Count The number of successful SubscribeToShardEvent calls. It helps you track the success rate of your event processing.

AWS.Kinesis.WriteProvisionedThroughputExceeded

Count

The number of records rejected due to throttling for the shard over the specified time period. This metric includes throttling from PutRecord and PutRecords operations and covers all dimensions of the following limits: 1,000 records per second per shard or 1 MB per second per shard.

Kinesis Video Stream

Metric Unit Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of AWS Kinesis Video Stream entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awskinesisvideo.

AWS.KinesisVideo.ArchivedFragmentsConsumed.Media Count The number of fragment media quota points that were consumed by all of the APIs.
AWS.KinesisVideo.ArchivedFragmentsConsumed.Metadata Count The number of fragments metadata quota points that were consumed by all of the APIs.
AWS.KinesisVideo.GetClip.Latency milliseconds (ms) The latency of the GetClip API calls.
AWS.KinesisVideo.GetClip.Outgoingbytes bytes The total number of bytes sent out from the service as part of the GetClip API.
AWS.KinesisVideo.GetClip.Requests Count The number of GetClip API requests.
AWS.KinesisVideo.GetClip.Success Count The number of Successful GetClip API requests.
AWS.KinesisVideo.GetDASHManifest.Latency milliseconds (ms) The latency of the GetDASHManifest API calls.
AWS.KinesisVideo.GetDASHManifest.Requests Count The number of GetDASHManifest API requests.
AWS.KinesisVideo.GetDASHManifest.Success Count The number of Successful GetDASHManifest API requests.
AWS.KinesisVideo.GetDASHStreamingSessionURL.Latency milliseconds (ms) The latency of the GetDASHStreamingSessionURL API calls.
AWS.KinesisVideo.GetDASHStreamingSessionURL.Requests Count The number of GetDASHStreamingSessionURL API requests.
AWS.KinesisVideo.GetDASHStreamingSessionURL.Success Count The number of Successful GetDASHStreamingSessionURL API requests.
AWS.KinesisVideo.GetHLSMasterPlaylist.Latency milliseconds (ms) The latency of the GetHLSMasterPlaylist API calls.
AWS.KinesisVideo.GetHLSMasterPlaylist.Requests Count The number of GetHLSMasterPlaylist API requests.
AWS.KinesisVideo.GetHLSMasterPlaylist.Success Count The number of successful GetHLSMasterPlaylist API requests.
AWS.KinesisVideo.GetHLSMediaPlaylist.Latency milliseconds (ms) The latency of the GetHLSMediaPlaylist API calls.
AWS.KinesisVideo.GetHLSMediaPlaylist.Requests Count The number of GetHLSMediaPlaylist API requests.
AWS.KinesisVideo.GetHLSMediaPlaylist.Success Count The number of Successful GetHLSMediaPlaylist API requests.
AWS.KinesisVideo.GetHLSStreamingSessionURL.Latency milliseconds (ms) The latency of the GetHLSStreamingSessionURL API calls.
AWS.KinesisVideo.GetHLSStreamingSessionURL.Requests Count The number of GetHLSStreamingSessionURL API requests.
AWS.KinesisVideo.GetHLSStreamingSessionURL.Success Count The number of successful GetHLSStreamingSessionURL API requests.
AWS.KinesisVideo.GetMedia.ConnectionErrors Count The number of connections that were not successfully established.
AWS.KinesisVideo.GetMedia.MillisBehindNow milliseconds (ms) The time difference between the current server timestamp and the server timestamp of the last fragment sent.
AWS.KinesisVideo.GetMedia.Outgoingbytes bytes The total number of bytes sent out from the service as part of the GetMedia API for a given stream.
AWS.KinesisVideo.GetMedia.OutgoingFragments Count The number of fragments sent while doing GetMedia for the stream.
AWS.KinesisVideo.GetMedia.OutgoingFrames Count The number of frames sent during GetMedia on the given stream.
AWS.KinesisVideo.GetMedia.Requests Count The number of GetMedia API requests for a given stream.
AWS.KinesisVideo.GetMedia.Success Count The number of connections that were successfully established.
AWS.KinesisVideo.GetMediaForFragmentList.Outgoingbytes bytes The total number of bytes sent out from the service as part of the GetMediaForFragmentList API for a given stream.
AWS.KinesisVideo.GetMediaForFragmentList.OutgoingFragments Count The total number of fragments sent out from the service as part of the GetMediaForFragmentList API.
AWS.KinesisVideo.GetMediaForFragmentList.OutgoingFrames Count The total number of frames sent out from the service as part of the GetMediaForFragmentList API.
AWS.KinesisVideo.GetMediaForFragmentList.Requests Count The number of GetMediaForFragmentList API requests for a given stream.
AWS.KinesisVideo.GetMediaForFragmentList.Success Count The number of Successful GetMediaForFragmentList API requests for a given stream.
AWS.KinesisVideo.GetMP4InitFragment.Latency milliseconds (ms) The latency of the GetMP4InitFragment API calls.
AWS.KinesisVideo.GetMP4InitFragment.Requests Count The number of GetMP4InitFragment API requests.
AWS.KinesisVideo.GetMP4InitFragment.Success Count The number of Successful GetMP4InitFragment API requests.
AWS.KinesisVideo.GetMP4MediaFragment.Latency milliseconds (ms) The latency of the GetMP4MediaFragment API calls.
AWS.KinesisVideo.GetMP4MediaFragment.Outgoingbytes bytes The total number of bytes sent out from the service as part of the GetMP4MediaFragment API.
AWS.KinesisVideo.GetMP4MediaFragment.Requests Count The number of GetMP4MediaFragment API requests.
AWS.KinesisVideo.GetMP4MediaFragment.Success Count The number of Successful GetMP4MediaFragment API requests.
AWS.KinesisVideo.GetTSFragment.Latency milliseconds (ms) The latency of the GetTSFragment API calls.
AWS.KinesisVideo.GetTSFragment.Outgoingbytes bytes The total number of bytes sent out from the service as part of the GetTSFragment API.
AWS.KinesisVideo.GetTSFragment.Requests Count The number of GetTSFragment API requests.
AWS.KinesisVideo.GetTSFragment.Success Count The number of successful GetTSFragment API requests.
AWS.KinesisVideo.ListFragments.Latency milliseconds (ms) The latency of the ListFragments API calls.
AWS.KinesisVideo.ListFragments.Requests Count The number of ListFragments API requests.
AWS.KinesisVideo.ListFragments.Success Count The number of successful ListFragments API requests.
AWS.KinesisVideo.PutMedia.ActiveConnections Count The total number of connections to the service host.
AWS.KinesisVideo.PutMedia.BufferingAckLatency milliseconds (ms) The time difference between when the first byte of a new fragment is received by Amazon Kinesis Video Streams and when the Buffering ACK is sent for the fragment.
AWS.KinesisVideo.PutMedia.ConnectionErrors Count The errors while establishing PutMedia connection for the stream.
AWS.KinesisVideo.PutMedia.ErrorAckCount Count The number of Error ACKs sent while doing PutMedia for the stream.
AWS.KinesisVideo.PutMedia.FragmentIngestionLatency milliseconds (ms) The time difference between when the first and last bytes of a fragment are received by Amazon Kinesis Video Streams.
AWS.KinesisVideo.PutMedia.FragmentPersistLatency milliseconds (ms) The time taken from when the complete fragment data is received and archived.
AWS.KinesisVideo.PutMedia.Incomingbytes bytes The number of bytes received as part of PutMedia for the stream.
AWS.KinesisVideo.PutMedia.IncomingFragments Count The number of complete fragments received as part of PutMedia for the stream.
AWS.KinesisVideo.PutMedia.IncomingFrames Count The number of complete frames received as part of PutMedia for the stream.
AWS.KinesisVideo.PutMedia.Latency milliseconds (ms) The time difference between the request and the HTTP response from InletService while establishing the connection.
AWS.KinesisVideo.PutMedia.PersistedAckLatency milliseconds (ms) The time difference between when the last byte of a new fragment is received by Amazon Kinesis Video Streams and when the Persisted ACK is sent for the fragment.
AWS.KinesisVideo.PutMedia.ReceivedAckLatency milliseconds (ms) The time difference between when the last byte of a new fragment is received by Amazon Kinesis Video Streams and when the Received ACK is sent for the fragment.
AWS.KinesisVideo.PutMedia.Requests Count The number of PutMedia API requests for a given stream.
AWS.KinesisVideo.PutMedia.Success Count The number of Successes sent while doing PutMedia for the stream.

Lambda

Metric Units Description
AWS.Lambda.AsyncEventsAge milliseconds (ms) The age of asynchronous events that are being processed. It helps in understanding the latency of event processing in the Lambda function.
AWS.Lambda.AsyncEventsDropped Count The number of asynchronous events that were dropped due to errors or exceeded retries. It helps in identifying issues with event processing and potential data loss.
AWS.Lambda.AsyncEventsReceived Count The number of asynchronous events received by the Lambda function. It provides insights into the workload and the volume of events being processed.
AWS.Lambda.ClaimedAccountConcurrency Count The number of concurrent executions claimed by the Lambda function. It helps in monitoring the utilization of the function's reserved concurrency and overall account concurrency.
AWS.Lambda.ConcurrentExecutions Count

ConcurrentExecutions. The maximum number of function instances that are processing events.

AWS.Lambda.DeadLetterErrors Count

DeadLetterErrors. The total number of times that Lambda attempts to send an event to a dead-letter queue but fails. Dead-letter errors can occur due to permissions errors, misconfigured resources, or size limits.

AWS.Lambda.DestinationDeliveryFailures Count The number of times an asynchronous invocation's result could not be delivered to its destination due to issues such as permission errors or unreachable endpoints.
AWS.Lambda.Duration milliseconds (ms)

Duration. The average amount of time that your function code spends processing an event.

AWS.Lambda.ErrorRate Count

The rate of errors that occurred while invoking the Lambda function. It helps in understanding the reliability and stability of the function's execution.

AWS.Lambda.Errors Count

Errors. The total number of invocations that result in a function error.

AWS.Lambda.Invocations Count

Invocations. The total number of times that a function code is invoked, including successful invocations and invocations that result in a function error.

AWS.Lambda.IteratorAge milliseconds (ms)

IteratorAge. The maximum amount of time between when a stream receives the record and when the event source mapping sends the event to the function.

AWS.Lambda.OffsetLag milliseconds (ms) The difference between the last record processed and the latest record available in the event source. It helps in identifying whether the function is keeping up with the incoming data.
AWS.Lambda.OversizedRecordCount Count The number of records that exceed the maximum allowable size for processing. It helps in monitoring and managing large records that might need special handling.
AWS.Lambda.PostRuntimeExtensionsDuration milliseconds (ms)

The time spent running post-runtime extensions after the function execution completes. It helps in understanding the additional overhead introduced by extensions.

AWS.Lambda.ProvisionedConcurrentExecutions Count The number of concurrent executions that are provisioned for the Lambda function. It helps in ensuring that the function has enough capacity to handle incoming requests.
AWS.Lambda.ProvisionedConcurrencyInvocations Count The number of invocations that use provisioned concurrency. It helps you understand how many function executions are benefiting from pre-allocated compute capacity.
AWS.Lambda.ProvisionedConcurrencySpilloverInvocations Count The number of invocations that exceeded the provisioned concurrency and used on-demand capacity instead. It provides insights into how often your function is going beyond its reserved capacity.
AWS.Lambda.ProvisionedConcurrencyUtilization Percent (%) The percentage of provisioned concurrency that is being utilized. It helps you monitor the efficiency of your provisioned resources.
AWS.Lambda.RecursiveInvocationsDropped Count The number of recursive invocations that were dropped to prevent infinite loops or excessive recursion. It's useful for ensuring stability and avoiding resource exhaustion.
AWS.Lambda.ThrottleRate Percent (%)

The rate of throttled invocations due to reaching concurrency limits. It helps in identifying capacity issues and optimizing function performance.

AWS.Lambda.Throttles Count

Throttles. The total number of invocation requests that are throttled. When all function instances are processing requests and no concurrency is available to scale up, Lambda rejects additional requests with a TooManyRequestsException error.

AWS.Lambda.UnreservedConcurrentExecutions Count

The number of concurrent executions that are not using reserved concurrency. It helps in monitoring the utilization of the function's overall concurrency capacity.

Managed Apache Flink

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Managed Apache Flink entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awsapacheinstance.

AWS.KinesisAnalysis.uptime milliseconds (ms) The time that the job has been running without interruption.
AWS.KinesisAnalysis.lastCheckpointSize bytes The total size of the last checkpoint.
AWS.KinesisAnalysis.lastCheckpointDuration milliseconds (ms) The time it took to complete the last checkpoint.
AWS.KinesisAnalysis.cpuUtilization Percent (%) Overall percentage of CPU utilization across task managers.
AWS.KinesisAnalysis.containerCPUUtilization Percent (%) Overall percentage of CPU utilization across task manager containers in Flink application cluster.
AWS.KinesisAnalysis.containerMemoryUtilization Percent (%) Overall percentage of memory utilization across task manager containers in Flink application cluster.
AWS.KinesisAnalysis.containerDiskUtilization Percent (%) Overall percentage of disk utilization across task manager containers in Flink application cluster.
AWS.KinesisAnalysis.heapMemoryUtilization Percent (%) Overall heap memory utilization across task managers.
AWS.KinesisAnalysis.downtime milliseconds (ms) For jobs currently in a failing/recovering situation, the time elapsed during this outage.
AWS.KinesisAnalysis.fullRestarts Count The total number of times this job has fully restarted since it was submitted.
AWS.KinesisAnalysis.managedMemoryUtilization Percent (%) Derived by managedMemoryUsed/managedMemoryTotal.
AWS.KinesisAnalysis.numRecordsInPerSecond Count per second The total number of records this application, operator or task has received per second.
AWS.KinesisAnalysis.numRecordsOutPerSecond Count per second The total number of records this application, operator or task has emitted per second.
AWS.KinesisAnalysis.threadcount Count The total number of live threads used by the application.
AWS.KinesisAnalysis.backPressuredTimeMsPerSecond milliseconds (ms) The time this task or operator is back pressured per second.
AWS.KinesisAnalysis.busyTimeMsPerSecond milliseconds (ms) The time this task or operator is busy (neither idle nor back pressured) per second.
AWS.KinesisAnalysis.currentInputWatermark milliseconds (ms) The last watermark this application/operator/task/thread has received.
AWS.KinesisAnalysis.currentOutputWatermark milliseconds (ms) The last watermark this application/operator/task/thread has emitted.
AWS.KinesisAnalysis.idleTimeMsPerSecond milliseconds (ms) The time this task or operator is idle per second.
AWS.KinesisAnalysis.managedMemoryUsed bytes The amount of managed memory currently used.
AWS.KinesisAnalysis.managedMemoryTotal bytes The total amount of managed memory.
AWS.KinesisAnalysis.numberOfFailedCheckpoints Count The number of times checkpointing has failed.
AWS.KinesisAnalysis.numRecordsIn Count The total number of records this application, operator, or task has received.
AWS.KinesisAnalysis.numRecordsOut Count The total number of records this application, operator or task has emitted.
AWS.KinesisAnalysis.numLateRecordsDropped Count The number of records that were dropped because they arrived late and were beyond the processing window.
AWS.KinesisAnalysis.oldGenerationGCcount Count The number of times the old generation garbage collection has occurred.
AWS.KinesisAnalysis.oldGenerationGCTime milliseconds (ms) The total time spent on old generation garbage collection.
AWS.KinesisAnalysis.millisBehindLatest milliseconds (ms) Indicates how many milliseconds behind the latest data the application is.
AWS.KinesisAnalysis.bytesRequestedPerFetch bytes The number of bytes requested per fetch operation from the data stream.
AWS.KinesisAnalysis.currentoffsets Count The current offsets of the data being processed in a Kinesis Data Analytics application.
AWS.KinesisAnalysis.commitsFailed Count The number of failed commit attempts in the application.
AWS.KinesisAnalysis.commitsSucceeded Count The number of successful commit operations.
AWS.KinesisAnalysis.committedoffsets Count The number of offsets that have been successfully committed.
AWS.KinesisAnalysis.records_lag_max Count The maximum lag in records being processed, measured in milliseconds.
AWS.KinesisAnalysis.bytes_consumed_rate bytes The rate at which data is consumed from the Kinesis stream.
AWS.KinesisAnalysis.zeppelinCpuUtilization Percent (%) The percentage of CPU resources being used by the Zeppelin server.
AWS.KinesisAnalysis.zeppelinHeapMemoryUtilization Percent (%) The percentage of heap memory utilized by the Zeppelin server.
AWS.KinesisAnalysis.zeppelinThreadcount Count per second The number of active threads being used by the Zeppelin server.
AWS.KinesisAnalysis.zeppelinWaitingJobs Count The number of jobs waiting to be executed in the Zeppelin server.
AWS.KinesisAnalysis.zeppelinServerUptime seconds (s) The uptime of the Zeppelin server, indicating how long it has been running continuously.

NAT Gateway

Metric Units Description
AWS.NATGateway.ActiveConnectionCount Count

ActiveConnectionCount. The maximum number of concurrent active TCP connections through the NAT gateway.

AWS.NATGateway.Bandwidth bps

The total network bandwidth used by the NAT gateway.

AWS.NATGateway.BytesInFromDestination bytes

BytesInFromDestination. The total number of bytes received by the NAT gateway from the destination.

AWS.NATGateway.BytesInFromSource bytes

BytesInFromSource. The total number of bytes received by the NAT gateway from clients in VPC.

AWS.NATGateway.BytesOutToDestination bytes

BytesOutToDestination. The total number of bytes sent out through the NAT gateway to the destination.

AWS.NATGateway.BytesOutToSource bytes

BytesOutToSource. The total number of bytes sent through the NAT gateway to the clients in VPC.

AWS.NATGateway.ConnectionAttemptCount Count

ConnectionAttemptCount. The total number of connection attempts made through the NAT gateway.

AWS.NATGateway.ConnectionEstablishedCount Count

ConnectionEstablishedCount. The total number of connections established through the NAT gateway.

AWS.NATGateway.ConnectionEstablishedPercent Percent (%)

The percentage of connection attempts that successfully establish a connection through the NAT gateway.

AWS.NATGateway.ErrorPortAllocation Count

ErrorPortAllocation. The total number of times the NAT gateway could not allocate a source port.

AWS.NATGateway.IdleTimeoutCount Count

IdleTimeoutCount. The total number of connections that transitioned from the active state to the idle state.

AWS.NATGateway.PacketsDropCount Count

PacketsDropCount. The total number of packets dropped by the NAT gateway.

AWS.NATGateway.PacketsInFromDestination Count

PacketsInFromDestination. The total number of packets received by the NAT gateway from the destination.

AWS.NATGateway.PacketsInFromSource Count

PacketsInFromSource. The total number of packets received by the NAT gateway from clients in VPC.

AWS.NATGateway.PacketsOutToDestination Count

PacketsOutToDestination. The total number of packets sent out through the NAT gateway to the destination.

AWS.NATGateway.PacketsOutToSource Count

PacketsOutToSource. The total number of packets sent through the NAT gateway to the clients in VPC.

AWS.NATGateway.PeakBytesPerSecond Count The peak rate of bytes transferred per second through the NAT gateway.
AWS.NATGateway.PeakPacketsPerSecond Count The peak rate of packets transferred per second through the NAT gateway.

Neptune

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Neptune entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awsneptune.

AWS.Neptune.BackupRetentionPeriodStorageUsed bytes The total amount of backup storage used to support from the Neptune DB cluster's backup retention window
AWS.Neptune.BufferCacheHitRatio Percent (%) The percentage of requests that are served by the buffer cache.
AWS.Neptune.ClusterReplicaLag milliseconds (ms) For a read replica, the amount of lag when replicating updates from the primary instance.
AWS.Neptune.ClusterReplicaLagMaximum milliseconds (ms) The maximum amount of lag between the primary instance and each Neptune DB instance in the DB cluster.
AWS.Neptune.ClusterReplicaLagMinimum milliseconds (ms) The minimum amount of lag between the primary instance and each Neptune DB instance in the DB cluster.
AWS.Neptune.ClusterReplicaLagMinimum milliseconds (ms) The minimum amount of lag between the primary instance and each Neptune DB instance in the DB cluster.
AWS.Neptune.CPUUtilization Percent (%) The percentage of CPU utilization.
AWS.Neptune.EngineUptime seconds The amount of time that the instance has been running.
AWS.Neptune.FreeableMemory bytes The amount of available random access memory.
AWS.Neptune.GlobalDbDataTransferBytes bytes The number of bytes of redo log data transferred from the primary AWS Region to a secondary AWS Region in a Neptune global database.
AWS.Neptune.GlobalDbProgressLag milliseconds (ms) The number of milliseconds that a secondary cluster is behind the primary cluster for both user transactions and system transactions.
AWS.Neptune.GlobalDbReplicatedWriteIO Count The number of write I/O operations replicated from the primary AWS Region in the global database to the cluster volume in a secondary AWS Region.
AWS.Neptune.GremlinRequestsPerSec Count per second Number of requests per second to the Gremlin engine.
AWS.Neptune.GremlinWebSocketOpenConnections Count The number of open WebSocket connections to Neptune.
AWS.Neptune.LoaderRequestsPerSec Count per second Number of loader requests per second.
AWS.Neptune.MainRequestQueuePendingRequests Count The number of requests waiting in the input queue pending execution. Neptune starts throttling requests when they exceed the maximum queue capacity.
AWS.Neptune.NCUUtilization Percent (%) At a cluster level, NCUUtilization reports the percentage of maximum capacity being used by the cluster as a whole.
AWS.Neptune.NetworkThroughput bps The amount of network throughput both received from and transmitted to clients by each instance in the Neptune DB cluster, in bytes per second. This throughput does not include network traffic between instances in the DB cluster and the cluster volume.
AWS.Neptune.NetworkTransmitThroughput bps The amount of outgoing network throughput transmitted to clients by each instance in the Neptune DB cluster, in bytes per second. This throughput does not include network traffic between instances in the DB cluster and the cluster volume.
AWS.Neptune.NumTxCommitted Count per second The number of transactions successfully committed per second.
AWS.Neptune.NumTxOpened Count per second The number of transactions opened on the server per second.
AWS.Neptune.NumTxRolledBack Count per second For write queries, the number of transactions per second rolled back on the server because of errors. For read-only queries, this metric is equal to the number of completed read-only transactions per second.
AWS.Neptune.OpenCypherBoltOpenConnections Count The number of open Bolt connections to Neptune.
AWS.Neptune.OpenCypherRequestsPerSec Count per second Number of requests per second (both HTTPS and Bolt) to the openCypher engine.
AWS.Neptune.ServerlessDatabaseCapacity Count As an instance-level metric, ServerlessDatabaseCapacity reports the current instance capacity of a given Neptune serverless instance, in NCUs. At a cluster-level, ServerlessDatabaseCapacity reports the average of all the ServerlessDatabaseCapacity values of the DB instances in the cluster.
AWS.Neptune.SnapshotStorageUsed bytes The total amount of backup storage consumed by all snapshots for a Neptune DB cluster outside its backup retention window, in bytes. Included in the total reported by the TotalBackupStorageBilled metric.
AWS.Neptune.SparqlRequestsPerSec Count per second The number of requests per second to the SPARQL engine.
AWS.Neptune.StatsNumStatementsScanned Count The total number of statements scanned for DFE statistics since the server started.
AWS.Neptune.TotalBackupStorageBilled bytes The total amount of backup storage for which you are billed for a given Neptune DB cluster, in bytes. Includes the backup storage measured by the BackupRetentionPeriodStorageUsed and SnapshotStorageUsed metrics.
AWS.Neptune.TotalClientErrorsPerSec Count per second The total number per second of requests that errored out because of client-side issues.
AWS.Neptune.TotalRequestsPerSec Count per second The total number of requests per second to the server from all sources.
AWS.Neptune.TotalServerErrorsPerSec Count per second The total number per second of requests that errored out on the server because of internal failures.
AWS.Neptune.UndoLogListSize Count The count of undo logs in the undo log list.
AWS.Neptune.VolumeBytesUsed bytes The total amount of storage allocated to your Neptune DB cluster.
AWS.Neptune.VolumeReadIOPs Count The average number of billed read I/O operations from a cluster volume, reported at 5-minute intervals. Billed read operations are calculated at the cluster volume level, aggregated from all instances in the Neptune DB cluster, and then reported at 5-minute intervals.
AWS.Neptune.VolumeWriteIOPs Count The average number of write disk I/O operations to the cluster volume, reported at 5-minute intervals.

NetworkELB

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Network ELB entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awsnetworkelb.

AWS.NetworkELB.ActiveFlowCount Count

The total number of concurrent flows (or connections) from clients to targets. This metric includes connections in the SYN_SENT and ESTABLISHED states. TCP connections are not terminated at the load balancer, so a client opening a TCP connection to a target counts as a single flow.

AWS.NetworkELB.ActiveFlowCount_TCP Count

The total number of concurrent TCP flows (or connections) from clients to targets. This metric includes connections in the SYN_SENT and ESTABLISHED state. TCP connections are not terminated at the load balancer, so a client opening a TCP connection to a target counts as a single flow.

AWS.NetworkELB.ActiveFlowCount_TLS Count The total number of concurrent TLS flows (or connections) from clients to targets. This metric includes connections in the SYN_SENT and ESTABLISHED state.
AWS.NetworkELB.ActiveFlowCount_UDP Count The total number of concurrent UDP flows (or connections) from clients to targets.
AWS.NetworkELB.ClientTLSNegotiationErrorCount Count The total number of TLS handshakes that failed during negotiation between a client and a TLS listener.
AWS.NetworkELB.ConsumedLCUs Count

The number of load balancer capacity units (LCU) used by your load balancer. You pay for the number of LCUs that you use per hour. For more information, see Elastic Load Balancing Pricing.

AWS.NetworkELB.ConsumedLCUs_TCP Count

The number of load balancer capacity units (LCU) used by your load balancer for TCP. You pay for the number of LCUs that you use per hour. For more information, see Elastic Load Balancing Pricing.

AWS.NetworkELB.ConsumedLCUs_TLS Count The number of load balancer capacity units (LCU) used by your load balancer for TLS. You pay for the number of LCUs that you use per hour. For more information, see Elastic Load Balancing Pricing.
AWS.NetworkELB.ConsumedLCUs_UDP Count The number of load balancer capacity units (LCU) used by your load balancer for UDP. You pay for the number of LCUs that you use per hour. For more information, see Elastic Load Balancing Pricing.
AWS.NetworkELB.HealthyHostCount Count

The number of targets that are considered healthy. This metric does not include any Application Load Balancers registered as targets.

AWS.NetworkELB.NewFlowCount Count

The total number of new flows (or connections) established from clients to targets in the time period.

AWS.NetworkELB.NewFlowCount_TCP Count

The total number of new TCP flows (or connections) established from clients to targets in the time period.

AWS.NetworkELB.NewFlowCount_TLS Count The total number of new TLS flows (or connections) established from clients to targets in the time period.
AWS.NetworkELB.NewFlowCount_UDP Count The total number of new UDP flows (or connections) established from clients to targets in the time period.
AWS.NetworkELB.PeakPacketsPerSecond Count per second

Highest average packet rate (packets processed per second), calculated every 10 seconds during the sampling window. This metric includes health check traffic.

AWS.NetworkELB.PortAllocationErrorCount Count

The total number of ephemeral port allocation errors during a client IP translation operation. A non-zero value indicates dropped client connections.

Network Load Balancers support 55,000 simultaneous connections or about 55,000 connections per minute to each unique target (IP address and port) when performing client address translation. To fix port allocation errors, add more targets to the target group.

AWS.NetworkELB.ProcessedBytes bytes

The total number of bytes processed by the load balancer, including TCP/IP headers. This count includes traffic to and from targets, minus health check traffic.

AWS.NetworkELB.ProcessedBytes_TCP bytes

The total number of bytes processed by TCP listeners.

AWS.NetworkELB.ProcessedBytes_TLS bytes The total number of bytes processed by TLS listeners.
AWS.NetworkELB.ProcessedBytes_UDP bytes The total number of bytes processed by UDP listeners.
AWS.NetworkELB.ProcessedPackets Count

The total number of packets processed by the load balancer. This count includes traffic to and from targets, including health check traffic.

AWS.NetworkELB.RejectedFlowCount Count

The number of network flows rejected by the Network Load Balancer due to security group rules or other policies.

AWS.NetworkELB.SecurityGroupBlockedFlowCount_Inbound_ICMP Count The number of new ICMP messages rejected by the inbound rules of the load balancer security groups.
AWS.NetworkELB.SecurityGroupBlockedFlowCount_Inbound_TCP Count

The number of new TCP flows rejected by the inbound rules of the load balancer security groups.

AWS.NetworkELB.SecurityGroupBlockedFlowCount_Inbound_UDP Count The number of new UDP flows rejected by the inbound rules of the load balancer security groups.
AWS.NetworkELB.SecurityGroupBlockedFlowCount_Outbound_ICMP Count The number of new ICMP messages rejected by the outbound rules of the load balancer security groups.
AWS.NetworkELB.SecurityGroupBlockedFlowCount_Outbound_TCP Count The number of new TCP flows rejected by the outbound rules of the load balancer security groups.
AWS.NetworkELB.SecurityGroupBlockedFlowCount_Outbound_UDP Count The number of new UDP flows rejected by the outbound rules of the load balancer security groups.
AWS.NetworkELB.TargetTLSNegotiationErrorCount Count The total number of TLS handshakes that failed during negotiation between a TLS listener and a target.
AWS.NetworkELB.TCP_Client_Reset_Count Count

The total number of reset (RST) packets sent from a client to a target. These resets are generated by the client and forwarded by the load balancer.

AWS.NetworkELB.TCP_ELB_Reset_Count Count

The total number of reset (RST) packets generated by the load balancer. For more information, see Troubleshooting.

AWS.NetworkELB.TCP_Target_Reset_Count Count

The total number of reset (RST) packets sent from a target to a client. These resets are generated by the target and forwarded by the load balancer.

AWS.NetworkELB.UnHealthyHostCount Count The number of targets that are considered unhealthy. This metric does not include any Application Load Balancers registered as targets. Reporting criteria: Reported if health checks are enabled.
AWS.NetworkELB.UnhealthyRoutingFlowCount Count The number of flows (or connections) that are routed using the routing failover action (fail open).
AWS.NetworkELB.ZonalHealthStatus Status Indicator Represents the health status of a Network Load Balancer in each availability zone, helping to identify failover events and potential issues.

OpenSearch Collection

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of OpenSearch Collection entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awsopensearchcollection.

AWS.AOSS.2xx Count The number of 2XX HTTP status code responses. These indicate successful requests.
AWS.AOSS.3xx Count The number of 3XX HTTP status code responses. These indicate redirections
AWS.AOSS.4xx Count The number of 4XX HTTP status code responses. These indicate client errors, such as bad requests or unauthorized access.
AWS.AOSS.5xx Count The number of 5XX HTTP status code responses. These indicate server errors, such as server overload or server-side issues.
AWS.AOSS.ActiveCollection Count

Indicates whether a collection is active. A value of 1 means the collection is in an ACTIVE state. This metric is emitted upon successful creation of a collection and remains 1 until the collection is deleted.

AWS.AOSS.DeletedDocuments Count The total number of documents that have been deleted from the collection. This metric increases after delete requests are processed and decreases after index segments are merged within the cluster.
AWS.AOSS.HotStorageUsed bytes The amount of storage used for hot data, which is data that is frequently accessed and needs to be readily available.
AWS.AOSS.IndexingOCU Count The number of OpenSearch Compute Units (OCUs) used to ingest collection data. This metric applies at the account level and helps monitor the compute resources used for indexing.
AWS.AOSS.IngestionDataRate Gigabytes per second (GB/s)

The indexing rate in GiB per second to a collection or index. This metric only applies to bulk indexing requests and helps track the data ingestion speed.

AWS.AOSS.IngestionDocumentErrors Count

The number of document ingestion errors that occur while indexing data. This metric helps in identifying issues during the data ingestion process.

AWS.AOSS.IngestionDocumentRate Count per second

The rate per second at which documents are being ingested to a collection or index. This metric applies to bulk indexing requests.

AWS.AOSS.IngestionRequestErrors Count

The total number of bulk indexing request errors to a collection. This metric is emitted when a bulk indexing request fails for any reason, such as an authentication or availability issue.

AWS.AOSS.IngestionRequestLatency seconds (s) The time it takes for ingestion requests to be processed and completed. This metric measures the latency from the start to the end of the ingestion request.
AWS.AOSS.IngestionRequestRate Count per second

The rate at which ingestion requests are being made to a collection or index. This metric tracks the number of requests per unit of time.

AWS.AOSS.IngestionRequestSuccess Count

The number of successful ingestion requests to a collection or index. This metric counts the requests that were successfully processed without errors.

AWS.AOSS.SearchableDocuments Count The total number of searchable documents in the OpenSearch domain.
AWS.AOSS.SearchOCU Count The number of OpenSearch Compute Units (OCUs) used for search operations.
AWS.AOSS.SearchRequestErrors Count per minute

The total number of errors encountered during search requests.

AWS.AOSS.SearchRequestLatency milliseconds (ms) The average latency (response time) for search requests.
AWS.AOSS.SearchRequestRate Count per minute

The rate at which search requests are being made.

AWS.AOSS.StorageUsedInS3 bytes The amount of storage used in Amazon S3 for OpenSearch data.

OpenSearch Domain

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of OpenSearch Domain entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awsopensearchdomain.

AWS.ES.2xx Count

The number of successful responses from the OpenSearch Service. For example, a 200 status code indicates that the request was successfully processed.

AWS.ES.3xx Count

The number of redirection responses. For example, a 301 status code means that the requested resource has been moved to a new URL.

AWS.ES.4xx Count

The number of client error responses. For example, a 404 status code indicates that the requested resource was not found.

AWS.ES.5xx Count

The number of server error responses. For example, a 500 status code means that the server encountered an unexpected condition that prevented it from fulfilling the request.

AWS.ES.ADAnomalyDetectorsIndexStatus.red Boolean Indicates the status of the anomaly detectors index in Amazon OpenSearch Service. A red status means that at least one primary shard and its replicas are not allocated to a node.
AWS.ES.ADAnomalyDetectorsIndexStatusIndexExists Boolean Checks if the anomaly detectors index exists in the OpenSearch Service cluster.
AWS.ES.ADAnomalyResultsIndexStatus.red Boolean Indicates the status of the anomaly results index. A red status means that at least one primary shard and its replicas are not allocated to a node.
AWS.ES.ADAnomalyResultsIndexStatusIndexExists Boolean Checks if the anomaly results index exists in the OpenSearch Service cluster.
AWS.ES.ADExecuteFailureCount Count The number of failures when executing anomaly detection tasks.
AWS.ES.ADExecuteRequestCount Count

The number of requests made to execute anomaly detection tasks.

AWS.ES.ADHCExecuteFailureCount Count The number of failures when executing anomaly detection tasks in high-availability clusters.
AWS.ES.ADHCExecuteRequestCount Count The number of requests executed by the asynchronous data processing component (ADHC).
AWS.ES.ADModelsCheckpointIndexStatus.red Boolean The health status of the checkpoint index for anomaly detection models. A status of "red" means there is a problem with the index.
AWS.ES.ADModelsCheckpointIndexStatusIndexExists Boolean Checks if the checkpoint index for anomaly detection models exists.
AWS.ES.ADPluginUnhealthy Boolean Indicates the health status of the anomaly detection plugin. A status of "unhealthy" means the plugin is not functioning correctly.
AWS.ES.AlertingDegraded Boolean

Indicates the health status of the alerting system. A status of "degraded" means the alerting system is not performing optimally.

AWS.ES.AlertingIndexExists Boolean

Checks if the alerting index exists.

AWS.ES.AlertingIndexStatus.green Boolean

Indicates the health status of the alerting index. A status of "green" means the index is healthy and functioning correctly.

AWS.ES.AlertingIndexStatus.red Boolean

Indicates that the alerting index is in a red status, meaning that at least one primary shard and its replicas are not allocated to a node.

AWS.ES.AlertingIndexStatus.yellow Boolean

Indicates that the alerting index is in a yellow status, meaning that at least one replica shard is not allocated to a node.

AWS.ES.AlertingNodesNotOnSchedule Count

The number of alerting nodes that are not on schedule.

AWS.ES.AlertingNodesOnSchedule Count

The number of alerting nodes that are on schedule.

AWS.ES.AlertingScheduledJobEnabled Boolean

Indicates whether the scheduled job for alerting is enabled.

AWS.ES.AsynchronousSearchCancelled Count

The number of asynchronous search requests that were canceled.

AWS.ES.AsynchronousSearchCompletionRate Percent (%)

The rate at which asynchronous search requests are completed successfully.

AWS.ES.AsynchronousSearchFailureRate Percent (%)

Indicates the rate at which asynchronous search requests fail.

AWS.ES.AsynchronousSearchInitializedRate Count

The rate at which asynchronous search requests are initialized.

AWS.ES.AsynchronousSearchPersistFailedRate Percent (%) The rate at which attempts to persist asynchronous search results fail.
AWS.ES.AsynchronousSearchPersistRate Percent (%) The rate at which asynchronous search results are successfully persisted.
AWS.ES.AsynchronousSearchRejected Count The number of asynchronous search requests that are rejected due to various reasons, such as exceeding resource limits.
AWS.ES.AsynchronousSearchRunningCurrent Count The current number of asynchronous search requests that are running.
AWS.ES.AsynchronousSearchSubmissionRate Count The rate at which asynchronous search requests are submitted.
AWS.ES.AsyncQueryCancelApiFailedRequestCusErrCount Count The number of failed asynchronous query cancel API requests due to customer errors.
AWS.ES.AsyncQueryCancelApiFailedRequestSysErrCount Count The number of failed asynchronous query cancel API requests due to system errors.
AWS.ES.AsyncQueryCancelApiRequestCount Count The total number of asynchronous query cancel API requests.
AWS.ES.AsyncQueryCreateApiFailedRequestCusErrCount Count The number of failed asynchronous query create API requests due to customer errors.
AWS.ES.AsyncQueryCreateApiFailedRequestSysErrCount Count The number of failed asynchronous query create API requests due to system errors.
AWS.ES.AsyncQueryCreateApiRequestCount Count The total number of asynchronous query create API requests.
AWS.ES.AsyncQueryGetApiFailedRequestCusErrCount Count The number of failed asynchronous query get API requests due to customer errors.
AWS.ES.AsyncQueryGetApiFailedRequestSysErrCount Count The number of failed asynchronous query get API requests due to system errors.
AWS.ES.AsyncQueryGetApiRequestCount Count The total number of asynchronous query get API requests.
AWS.ES.AutomatedSnapshotFailure Count The number of automated snapshot failures in the OpenSearch Service. It helps in identifying issues with the automated snapshot process, such as network problems, insufficient storage, or high CPU utilization.
AWS.ES.AvgPointInTimeAliveTime milliseconds (ms) Average alive time for point-in-time search requests.
AWS.ES.BurstBalance Percent (%) The burst balance of the instance.
AWS.ES.ClusterIndexWritesBlocked Count Number of blocked index writes in the cluster.
AWS.ES.ClusterStatus.green Status Indicator Indicates if the cluster status is green.
AWS.ES.ClusterStatus.red Status Indicator Indicates if the cluster status is red.
AWS.ES.ClusterStatus.yellow Status Indicator Indicates if the cluster status is yellow.
AWS.ES.ClusterUsedSpace Bytes The total space used by the cluster.
AWS.ES.ColdStorageSpaceUtilization Percent (%) The percentage of cold storage space utilized.
AWS.ES.ColdToWarmMigrationFailureCount Count The number of cold to warm migration failures.
AWS.ES.ColdToWarmMigrationLatency milliseconds (ms) The latency of cold to warm migrations.
AWS.ES.ColdToWarmMigrationQueueSize Count The size of the cold to warm migration queue.
AWS.ES.ColdToWarmMigrationSuccessCount Count The number of successful cold to warm migrations.
AWS.ES.ConcurrentSearchLatency milliseconds (ms) The latency of concurrent search requests.
AWS.ES.ConcurrentSearchRate Count per second The rate of concurrent search requests.
AWS.ES.CoordinatingWriteRejected Count The number of coordinating write requests rejected.
AWS.ES.CPUCreditBalance Credits The remaining CPU credit balance.
AWS.ES.CPUUtilization Percent (%) The percentage of CPU utilization.
AWS.ES.DeletedDocuments Count The total number of deleted documents.
AWS.ES.DiskQueueDepth Count The depth of the disk queue.
AWS.ES.ESReportingFailedRequestSysErrCount Count The number of failed OpenSearch reporting requests due to system errors.
AWS.ES.ESReportingFailedRequestUserErrCount Count The number of failed OpenSearch reporting requests due to user errors.
AWS.ES.ESReportingRequestCount Count The total number of OpenSearch reporting requests.
AWS.ES.ESReportingSuccessCount Count The number of successful OpenSearch reporting requests.
AWS.ES.FreeStorageSpace bytes The amount of free storage space available.
AWS.ES.HasActivePointInTime Boolean Indicates if there is an active point-in-time search.
AWS.ES.HasUsedPointInTime Boolean Indicates if a point-in-time search has been used.
AWS.ES.HotStorageSpaceUtilization Percent (%) The percentage of hot storage space utilized.
AWS.ES.HotToWarmMigrationFailureCount Count The number of hot to warm migration failures.
AWS.ES.HotToWarmMigrationForceMergeLatency milliseconds (ms) The latency of force merge operations during hot to warm migrations.
AWS.ES.HotToWarmMigrationProcessingLatency milliseconds (ms) The processing latency of hot to warm migrations.
AWS.ES.HotToWarmMigrationQueueSize Count Number of migration tasks from hot to warm storage currently in the queue.
AWS.ES.HotToWarmMigrationSnapshotLatency milliseconds (ms) Time taken to create a snapshot during hot to warm migration.
AWS.ES.HotToWarmMigrationSuccessCount Count Total number of successful hot to warm migrations.
AWS.ES.HotToWarmMigrationSuccessLatency milliseconds (ms) Time taken for a successful hot to warm migration.
AWS.ES.IndexingLatency milliseconds (ms) Time taken to index documents.
AWS.ES.IndexingRate Count per second Number of documents indexed per second.
AWS.ES.InFlightFetches Count Number of fetch operations currently in progress.
AWS.ES.InvalidHostHeaderRequests Count Number of requests with an invalid host header.
AWS.ES.IopsThrottle Count Number of IO operations throttled due to exceeding provisioned IOPS limits.
AWS.ES.JVMGCOldCollectionCount Count Number of old generation garbage collection events in the JVM.
AWS.ES.JVMGCOldCollectionTime milliseconds (ms) Time spent in old generation garbage collection in the JVM.
AWS.ES.JVMGCYoungCollectionCount Count Number of young generation garbage collection events in the JVM.
AWS.ES.JVMGCYoungCollectionTime milliseconds (ms) Time spent in young generation garbage collection in the JVM.
AWS.ES.JVMMemoryPressure Percent (%) JVM memory pressure expressed as a percentage of the total available memory.
AWS.ES.KMSKeyError Count Number of errors encountered while accessing KMS keys.
AWS.ES.KMSKeyInaccessible Count Number of times KMS keys were found to be inaccessible.
AWS.ES.KNNCacheCapacityReached Count Number of times the KNN cache capacity was reached.
AWS.ES.KNNCircuitBreakerTriggered Count Number of times the KNN circuit breaker was triggered.
AWS.ES.KNNEvictionCount Count Number of evictions from the KNN cache.
AWS.ES.KNNFaissInitialized Count Number of times FAISS index was initialized for KNN.
AWS.ES.KNNGraphIndexErrors Count Number of errors encountered while building KNN graph indexes.
AWS.ES.KNNGraphIndexRequests Count Number of requests for building KNN graph indexes.
AWS.ES.KNNGraphMemoryUsage bytes Memory usage of the KNN graph indexes.
AWS.ES.KNNGraphMemoryUsagePercentage Percent (%) Memory usage of the KNN graph indexes expressed as a percentage of the total available memory.
AWS.ES.KNNGraphQueryErrors Count Number of errors encountered during KNN graph queries.
AWS.ES.KNNGraphQueryRequests Count Number of KNN graph queries made.
AWS.ES.KNNHitCount Count The number of successful k-NN (k-Nearest Neighbors) searches.
AWS.ES.KNNLoadExceptionCount Count The number of exceptions encountered while loading k-NN models.
AWS.ES.KNNLoadSuccessCount Count The number of successful k-NN model loads.
AWS.ES.KNNLuceneInitialized Boolean Indicates whether the Lucene engine for k-NN is initialized.
AWS.ES.KNNMissCount Count The number of k-NN searches that did not find a match.
AWS.ES.KNNNmslibInitialized Boolean Indicates whether the NMSLIB engine for k-NN is initialized.
AWS.ES.KNNQueryRequests Count The number of k-NN (k-Nearest Neighbors) query requests.
AWS.ES.KNNScriptCompilationErrors Count The number of errors encountered during the compilation of k-NN scripts.
AWS.ES.KNNScriptCompilations Count The number of k-NN script compilations.
AWS.ES.KNNScriptQueryErrors Count The number of errors encountered during k-NN script queries.
AWS.ES.KNNScriptQueryRequests Count The number of k-NN script query requests.
AWS.ES.KNNTotalLoadTime milliseconds (ms) The total time taken to load k-NN models.
AWS.ES.KNNTrainingErrors Count The number of errors encountered during k-NN model training.
AWS.ES.KNNTrainingMemoryUsage bytes The memory usage during k-NN (k-Nearest Neighbors) model training.
AWS.ES.KNNTrainingMemoryUsagePercentage Percent (%) The percentage of memory used during k-NN model training.
AWS.ES.KNNTrainingRequests Count The number of k-NN model training requests.
AWS.ES.LTRFeatureMemoryUsageInBytes bytes The memory usage of Learning to Rank (LTR) features.
AWS.ES.LTRFeaturesetMemoryUsageInBytes bytes The memory usage of Learning to Rank (LTR) feature sets.
AWS.ES.LTRModelMemoryUsageInBytes bytes The memory usage of Learning to Rank (LTR) models.
AWS.ES.LTRPluginUnhealthy Boolean Indicates whether the Learning to Rank (LTR) plugin is unhealthy.
AWS.ES.LTRRequestErrorCount Count The number of errors encountered during Learning to Rank (LTR) requests.
AWS.ES.LTRRequestTotalCount Count The total number of Learning to Rank (LTR) requests.
AWS.ES.LTRStatus.red Boolean The health status of the Learning to Rank (LTR) plugin. A red status means that at least one primary shard and its replicas are not allocated to a node.
AWS.ES.MasterCPUCreditBalance Count The number of CPU credits that a burstable instance has accrued.
AWS.ES.MasterCPUUtilization Percent (%) The percentage of allocated EC2 compute units that are in use on the instance.
AWS.ES.MasterJVMMemoryPressure Percent (%) The percentage of the Java heap in a cluster node.
AWS.ES.MasterOldGenJVMMemoryPressure Percent (%) The memory pressure in the old generation memory pool of the Java heap.
AWS.ES.MasterReachableFromNode Boolean Indicates whether the master node is reachable from other nodes in the cluster.
AWS.ES.MasterSysMemoryUtilization Percent (%) The percentage of system memory utilization on the master node.
AWS.ES.MaxProvisionedThroughput Megabytes per second The maximum provisioned throughput for the cluster.
AWS.ES.MlCircuitBreakerTriggerCount Count The number of times the machine learning circuit breaker has been triggered.
AWS.ES.MLCommonsPluginUnhealthy Boolean Indicates whether the ML Commons plugin is unhealthy.
AWS.ES.MlConnectorCount Count The number of machine learning connectors.
AWS.ES.MlConnectorIndexStatus.red Boolean The health status of the machine learning connector index. A red status means that at least one primary shard and its replicas are not allocated to a node.
AWS.ES.MlConnectorIndexStatusIndexExists Boolean Indicates whether the machine learning connector index exists.
AWS.ES.MlDeployedModelCount Count The number of deployed machine learning models.
AWS.ES.MlExecutingTaskCount Count The number of executing machine learning tasks.
AWS.ES.MlFailureCount Count The number of failures encountered during machine learning tasks.
AWS.ES.MlModelCount Count The total number of machine learning models.
AWS.ES.MlModelIndexStatus.red Boolean The health status of the machine learning model index. A red status means that at least one primary shard and its replicas are not allocated to a node.
AWS.ES.MlModelIndexStatusIndexExists Boolean Indicates whether the machine learning model index exists.
AWS.ES.MlRequestCount Count The number of machine learning requests.
AWS.ES.MlTaskIndexStatus.red Boolean The health status of the machine learning task index. A red status means that at least one primary shard and its replicas are not allocated to a node.
AWS.ES.MlTaskIndexStatusIndexExists Boolean Indicates whether the machine learning task index exists.
AWS.ES.Nodes Count The total number of nodes in the OpenSearch Service cluster.
AWS.ES.OldGenJVMMemoryPressure Percent (%) The memory pressure in the old generation memory pool of the Java heap.
AWS.ES.OpenSearchDashboardsConcurrentConnections Count The number of concurrent connections to OpenSearch Dashboards.
AWS.ES.OpenSearchDashboardsHealthyNodes Count The number of healthy nodes in the OpenSearch Dashboards.
AWS.ES.OpenSearchDashboardsHeapTotal bytes The total heap memory allocated for OpenSearch Dashboards.
AWS.ES.OpenSearchDashboardsHeapUsed bytes The amount of heap memory currently used by OpenSearch Dashboards.
AWS.ES.OpenSearchDashboardsHeapUtilization Percent (%) The percentage of heap memory used out of the total allocated heap memory for OpenSearch Dashboards.
AWS.ES.OpenSearchDashboardsIndexMigrationFailed Count The number of failed index migrations in OpenSearch Dashboards.
AWS.ES.OpenSearchDashboardsOS1MinuteLoad Count The 1-minute load average on the operating system running OpenSearch Dashboards.
AWS.ES.OpensearchDashboardsReportingFailedRequestSysErrCount Count The number of failed reporting requests due to system errors in OpenSearch Dashboards.
AWS.ES.OpensearchDashboardsReportingFailedRequestUserErrCount Count The number of failed reporting requests due to user errors in OpenSearch Dashboards.
AWS.ES.OpensearchDashboardsReportingRequestCount Count The total number of reporting requests in OpenSearch Dashboards.
AWS.ES.OpensearchDashboardsReportingSuccessCount Count The number of successful reporting requests in OpenSearch Dashboards.
AWS.ES.OpenSearchDashboardsRequestTotal Count The total number of requests made to OpenSearch Dashboards.
AWS.ES.OpenSearchDashboardsResponseTimesMaxInMillis milliseconds (ms) The maximum response time for requests to OpenSearch Dashboards.
AWS.ES.OpenSearchRequests Count The total number of requests made to the OpenSearch cluster.
AWS.ES.PPLFailedRequestCountByCusErr Count The number of failed Piped Processing Language (PPL) requests due to customer errors in OpenSearch.
AWS.ES.PPLFailedRequestCountBySysErr Count The number of failed Piped Processing Language (PPL) requests due to system errors in OpenSearch.
AWS.ES.PPLRequestCount Count The total number of Piped Processing Language (PPL) requests in OpenSearch.
AWS.ES.PrimaryWriteRejected Count The number of primary write requests that were rejected in OpenSearch due to resource constraints.
AWS.ES.ReadIOPS Count per second The average number of read input/output operations per second (IOPS) in OpenSearch.
AWS.ES.ReadIOPSMicroBursting Count per second The number of read IOPS micro-bursting events in OpenSearch, indicating short periods of high read IOPS activity.
AWS.ES.ReadLatency milliseconds (ms) The average time taken to complete read operations in OpenSearch.
AWS.ES.ReadThroughput bps The average number of bytes read from disk per second in OpenSearch.
AWS.ES.ReadThroughputMicroBursting Count The number of read throughput micro-bursting events in OpenSearch, indicating short periods of high read throughput activity.
AWS.ES.RemoteStorageUsedSpace bytes The amount of space used in remote storage by OpenSearch.
AWS.ES.RemoteStorageWriteRejected Count The number of write requests to remote storage that were rejected in OpenSearch due to resource constraints.
AWS.ES.ReplicationNumBootstrappingIndices Count The number of indices in the bootstrapping phase during replication in OpenSearch.
AWS.ES.ReplicationNumFailedIndices Count The number of indices that have failed during replication in OpenSearch.
AWS.ES.ReplicationNumPausedIndices Count The number of indices that have paused during replication in OpenSearch.
AWS.ES.ReplicationNumSyncingIndices Count The number of indices currently syncing during replication in OpenSearch.
AWS.ES.ReplicaWriteRejected Count The number of replica write requests that were rejected in OpenSearch due to resource constraints.
AWS.ES.SearchableDocuments Count The total number of documents that are searchable in the OpenSearch cluster.
AWS.ES.SearchLatency milliseconds (ms) The average time taken to complete search operations in OpenSearch.
AWS.ES.SearchRate Count per second The number of search requests per second in OpenSearch.
AWS.ES.SearchShardTaskCancelled Count The number of search shard tasks that were cancelled in OpenSearch.
AWS.ES.SearchTaskCancelled Count The number of search tasks that were cancelled in OpenSearch.
AWS.ES.SegmentCount Count The total number of segments in the OpenSearch index.
AWS.ES.Shards.active Count The total number of active primary and replica shards in the OpenSearch cluster.
AWS.ES.Shards.activePrimary Count The total number of active primary shards in the OpenSearch cluster.
AWS.ES.Shards.delayedUnassigned Count The number of shards whose node allocation has been delayed by the timeout settings in OpenSearch.
AWS.ES.Shards.initializing Count The number of shards that are currently in the initializing state in OpenSearch.
AWS.ES.Shards.relocating Count The number of shards that are currently being relocated to different nodes in OpenSearch.
AWS.ES.Shards.unassigned Count The number of shards that are not allocated to any nodes in the OpenSearch cluster.
AWS.ES.SQLDefaultCursorRequestCount Count The total number of SQL default cursor requests in OpenSearch.
AWS.ES.SQLFailedRequestCountByCusErr Count The number of failed SQL requests due to customer errors in OpenSearch.
AWS.ES.SQLFailedRequestCountBySysErr Count The number of failed SQL requests due to system errors in OpenSearch.
AWS.ES.SQLRequestCount Count The total number of SQL requests in OpenSearch.
AWS.ES.SQLUnhealthy Count The number of unhealthy SQL instances in OpenSearch.
AWS.ES.SysMemoryUtilization Percent (%) The percentage of system memory utilized by OpenSearch.
AWS.ES.ThreadCount Count The total number of threads in use by OpenSearch.
AWS.ES.ThreadpoolBulkQueue Count The number of bulk requests waiting in the queue in OpenSearch.
AWS.ES.ThreadpoolBulkRejected Count The number of bulk requests that were rejected due to the thread pool being full in OpenSearch.
AWS.ES.ThreadpoolBulkThreads Count The number of threads in the bulk thread pool in OpenSearch.
AWS.ES.ThreadpoolForce_mergeQueue Count The number of force merge requests waiting in the queue in OpenSearch.
AWS.ES.ThreadpoolForce_mergeRejected Count The number of force merge requests that were rejected due to the thread pool being full in OpenSearch.
AWS.ES.ThreadpoolForce_mergeThreads Count The number of threads in the force merge thread pool in OpenSearch.
AWS.ES.ThreadpoolIndexQueue Count The number of index requests waiting in the queue in OpenSearch.
AWS.ES.ThreadpoolIndexRejected Count The number of index requests that were rejected due to the thread pool being full in OpenSearch.
AWS.ES.ThreadpoolIndexSearcherQueue Count The number of index searcher requests waiting in the queue in OpenSearch.
AWS.ES.ThreadpoolIndexSearcherRejected Count The number of index searcher requests that were rejected due to the thread pool being full in OpenSearch.
AWS.ES.ThreadpoolIndexSearcherThreads Count The number of threads in the index searcher thread pool in OpenSearch.
AWS.ES.ThreadpoolIndexThreads Count The number of threads in the index thread pool in OpenSearch.
AWS.ES.ThreadpoolSearchQueue Count The number of search requests waiting in the queue in OpenSearch.
AWS.ES.ThreadpoolSearchRejected Count The number of search requests that were rejected due to the thread pool being full in OpenSearch.
AWS.ES.ThreadpoolSearchThreads Count The number of threads in the search thread pool in OpenSearch.
AWS.ES.ThreadpoolsqlWorkerQueue Count The number of SQL worker requests waiting in the queue in OpenSearch.
AWS.ES.ThreadpoolsqlWorkerRejected Count The number of SQL worker requests that were rejected due to the thread pool being full in OpenSearch.
AWS.ES.ThreadpoolsqlWorkerThreads Count The number of threads in the SQL worker thread pool in OpenSearch.
AWS.ES.ThreadpoolWriteQueue Count The number of write requests waiting in the queue in OpenSearch.
AWS.ES.ThreadpoolWriteRejected Count The number of write requests that were rejected due to the thread pool being full in OpenSearch.
AWS.ES.ThreadpoolWriteThreads Count The number of threads in the write thread pool in OpenSearch.
AWS.ES.ThroughputThrottle Count The number of times throughput was throttled in OpenSearch.
AWS.ES.TLSNegotiationError Count The number of TLS negotiation errors encountered in OpenSearch.
AWS.ES.WarmCPUUtilization Percent (%) The percentage of CPU utilization for warm nodes in OpenSearch.
AWS.ES.WarmFreeStorageSpace bytes The amount of free storage space available in warm nodes in OpenSearch.
AWS.ES.WarmJVMGCOldCollectionCount Count The number of old generation garbage collection events in the JVM for warm nodes in OpenSearch.
AWS.ES.WarmJVMGCYoungCollectionCount Count The number of young generation garbage collection events in the JVM for warm nodes in OpenSearch.
AWS.ES.WarmJVMGCYoungCollectionTime milliseconds (ms) The total time spent on young generation garbage collection in the JVM for warm nodes in OpenSearch.
AWS.ES.WarmJVMMemoryPressure Percent (%) The percentage of JVM memory pressure for warm nodes in OpenSearch, indicating the overall heap usage including young and old pools.
AWS.ES.WarmNodes Count The number of warm nodes in the OpenSearch cluster.
AWS.ES.WarmOldGenJVMMemoryPressure Percent (%) The percentage of old generation JVM memory pressure for warm nodes in OpenSearch, indicating the usage of the old generation memory pool.
AWS.ES.WarmSearchLatency milliseconds (ms) The average time taken to complete search operations in warm nodes of OpenSearch.
AWS.ES.WarmSearchRate Count per second The number of search requests per second in warm nodes of OpenSearch.
AWS.ES.WarmSearchableDocuments Count The total number of documents that are searchable in warm nodes of OpenSearch.
AWS.ES.WarmStorageSpaceUtilization Percent (%) The percentage of storage space utilized in warm nodes of OpenSearch.
AWS.ES.WarmSysMemoryUtilization Percent (%) The percentage of system memory utilized by warm nodes in OpenSearch.
AWS.ES.WarmThreadpoolSearchQueue Count The number of search requests waiting in the queue in warm nodes of OpenSearch.
AWS.ES.WarmThreadpoolSearchRejected Count The number of search requests that were rejected due to the thread pool being full in warm nodes of OpenSearch.
AWS.ES.WarmThreadpoolSearchThreads Count The number of threads in the search thread pool in warm nodes of OpenSearch.
AWS.ES.WarmToColdMigrationFailureCount Count The number of failed migrations from warm to cold nodes in OpenSearch.
AWS.ES.WarmToColdMigrationLatency milliseconds (ms) The average time taken to migrate data from warm to cold nodes in OpenSearch.
AWS.ES.WarmToColdMigrationQueueSize Count The number of migration tasks from warm to cold nodes that are waiting in the queue in OpenSearch.
AWS.ES.WarmToColdMigrationSuccessCount Count The number of successful migrations from warm to cold nodes in OpenSearch.
AWS.ES.WarmToHotMigrationQueueSize Count The number of migration tasks from warm to hot nodes that are waiting in the queue in OpenSearch.
AWS.ES.WriteIOPS Count per second The average number of write input/output operations per second (IOPS) in OpenSearch.
AWS.ES.WriteIOPSMicroBursting Count per second The number of write IOPS micro-bursting events in OpenSearch, indicating short periods of high write IOPS activity
AWS.ES.WriteLatency milliseconds (ms) The average time taken to complete write operations in OpenSearch.
AWS.ES.WriteThroughput bps The average number of bytes written to disk per second in OpenSearch.
AWS.ES.WriteThroughputMicroBursting bps The number of write throughput micro-bursting events in OpenSearch, indicating short periods of high write throughput activity.

OpenSearch Ingestion Pipeline

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of OpenSearch Ingestion Pipeline entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awsopensearchingestionpipeline.

AWS.OSIS.computeUnits Count

The number of Ingestion OpenSearch Compute Units (Ingestion OCUs) in use by a pipeline.

AWS.OSIS.jvm.memory.committed.value bytes

The amount of memory that is committed for use by the Java virtual machine (JVM).

AWS.OSIS.jvm.memory.max.value bytes

The maximum amount of memory that can be used for memory management.

AWS.OSIS.jvm.memory.used.value bytes

The total amount of memory used.

AWS.OSIS.log-pipeline.BlockingBuffer.bufferUsage.value Percent (%)

Percent usage of the buffer_size based on the number of records in the buffer.

AWS.OSIS.log-pipeline.BlockingBuffer.checkpointTimeElapsed.count Count

A count of data points recorded while checkpointing.

AWS.OSIS.log-pipeline.BlockingBuffer.checkpointTimeElapsed.max milliseconds (ms) The maximum time elapsed while checkpointing.
AWS.OSIS.log-pipeline.BlockingBuffer.checkpointTimeElapsed.sum milliseconds (ms)

The total time elapsed while checkpointing.

AWS.OSIS.log-pipeline.BlockingBuffer.readTimeElapsed.count Count

A count of data points recorded while reading from a buffer.

AWS.OSIS.log-pipeline.BlockingBuffer.readTimeElapsed.max milliseconds (ms) The maximum time elapsed while reading from a buffer
AWS.OSIS.log-pipeline.BlockingBuffer.readTimeElapsed.sum milliseconds (ms)

The total time elapsed while reading from a buffer.

AWS.OSIS.log-pipeline.BlockingBuffer.recordsInBuffer.value Count

The number of records currently in a buffer.

AWS.OSIS.log-pipeline.BlockingBuffer.recordsInFlight.value Count

The number of unchecked records read from a buffer.

AWS.OSIS.log-pipeline.BlockingBuffer.recordsRead.count Count

The number of records read from a buffer.

AWS.OSIS.log-pipeline.BlockingBuffer.recordsWriteFailed.count Count

The number of records that the pipeline failed to write to the sink.

AWS.OSIS.log-pipeline.BlockingBuffer.recordsWritten.count Count

The number of records written to a buffer.

AWS.OSIS.log-pipeline.BlockingBuffer.writeTimeElapsed.count Count

A count of data points recorded while writing to a buffer.

AWS.OSIS.log-pipeline.BlockingBuffer.writeTimeElapsed.max milliseconds (ms) The maximum amount of time that the write operation has elapsed.
AWS.OSIS.log-pipeline.BlockingBuffer.writeTimeElapsed.sum milliseconds (ms)

The total amount of time that the write operation has elapsed.

AWS.OSIS.log-pipeline.BlockingBuffer.writeTimeouts.count Count

The count of write timeouts to a buffer.

AWS.OSIS.log-pipeline.date.recordsIn.count Count The ingress of records to a pipeline component.
AWS.OSIS.log-pipeline.date.recordsOut.count Count The egress of records from a pipeline component.
AWS.OSIS.log-pipeline.date.timeElapsed.count Count A count of data points recorded during execution of a pipeline component.
AWS.OSIS.log-pipeline.date.timeElapsed.max milliseconds (ms) The maximum time elapsed during execution of a pipeline component.
AWS.OSIS.log-pipeline.date.timeElapsed.sum milliseconds (ms) The total time elapsed during execution of a pipeline component.
AWS.OSIS.log-pipeline.http.AuthFailure.count Count The number of failed Signature V4 requests to the pipeline.
AWS.OSIS.log-pipeline.http.AuthServerError.count Count The number of Signature V4 requests to the pipeline that returned server errors.
AWS.OSIS.log-pipeline.http.AuthSuccess.count Count The number of successful Signature V4 requests to the pipeline.
AWS.OSIS.log-pipeline.opensearch.recordsIn.count Count The ingress of records to a pipeline component.
AWS.OSIS.log-pipeline.recordsProcessed.count Count The number of records read from a buffer and processed by a pipeline.
AWS.OSIS.system.cpu.count.value Count The total amount of CPU usage for all data nodes.
AWS.OSIS.system.cpu.usage.value Percent (%) The percentage of available CPU usage for all data nodes.

RDS

Metric Units Description
AWS.RDS.AbortedClients Count

The number of connections that were aborted because the client died and didn't correctly close the connection.

AWS.RDS.ActiveTransactions Count

The number of active transactions in the database.

AWS.RDS.ACUUtilization Percent (%) The percentage of Aurora Capacity Units (ACUs) utilized by the Aurora Serverless v2 cluster.
AWS.RDS.AuroraBinlogReplicaLag seconds (s) The amount of lag in seconds for binary log replication between the primary instance and the replica.
AWS.RDS.AuroraEstimatedSharedMemoryBytes bytes The estimated amount of shared memory used by the Aurora MySQL database.
AWS.RDS.AuroraGlobalDBDataTransferBytes bytes The amount of redo log data transferred from the source AWS Region to a secondary AWS Region in an Aurora Global Database.
AWS.RDS.AuroraGlobalDBProgressLag milliseconds (ms) The measure of how far the secondary cluster is behind the primary cluster for both user transactions and system transactions in an Aurora Global Database.
AWS.RDS.AuroraGlobalDBReplicatedWriteIO Count The number of write I/O operations replicated from the primary AWS Region to the cluster volume in a secondary AWS Region in an Aurora Global Database.
AWS.RDS.AuroraGlobalDBRPOLag milliseconds (ms) The recovery point objective (RPO) lag time, measuring how far the secondary cluster is behind the primary cluster for user transactions in an Aurora Global Database.
AWS.RDS.AuroraOptimizedReadsCacheHitRatio Percent (%) The percentage of read operations that are served from the cache in an Aurora database.
AWS.RDS.AuroraReplicaLag milliseconds (ms) The amount of lag in milliseconds for replication between the primary instance and the Aurora replica.
AWS.RDS.AuroraReplicaLagMaximum milliseconds (ms) The maximum amount of lag in milliseconds for replication between the primary instance and the Aurora replica.
AWS.RDS.AuroraReplicaLagMinimum milliseconds (ms) The minimum amount of lag in milliseconds for replication between the primary instance and the Aurora replica.
AWS.RDS.AuroraSlowConnectionHandleCount Count The number of slow connection handles in Aurora.
AWS.RDS.AuroraSlowHandshakeCount Count The number of slow handshakes in Aurora.
AWS.RDS.AuroraVolumeBytesLeftTotal bytes The remaining available space for the cluster volume in Aurora.
AWS.RDS.BacktrackChangeRecordsCreationRate Count per minute

The number of backtrack change records created over a specified period for your Aurora DB cluster.

AWS.RDS.BacktrackChangeRecordsStored Count

The actual number of backtrack change records stored by your Aurora DB cluster.

AWS.RDS.BacktrackWindowActual minutes

The actual amount of time you can backtrack your Aurora DB cluster, which can be smaller than the target backtrack window.

AWS.RDS.BacktrackWindowAlert Count

The number of times the actual backtrack window is smaller than the target backtrack window for a given period.

AWS.RDS.BackupRetentionPeriodStorageUsed bytes

The amount of storage used by automated backups that are retained for the backup retention period.

AWS.RDS.BinLogDiskUsage bytes

BinLogDiskUsage. The average amount of disk space occupied by binary logs.

AWS.RDS.BlockedTransactions Count

The number of transactions that are blocked due to row-level locks in the database.

AWS.RDS.BufferCacheHitRatio Percent (%)

The percentage of requests that are served from the buffer cache, indicating the efficiency of the cache.

AWS.RDS.BurstBalance Percent (%)

BurstBalance. The average percent of General Purpose SSD (gp2) burst-bucket I/O credits available.

AWS.RDS.CheckpointLag seconds (s) The amount of time since the most recent checkpoint.
AWS.RDS.CommitLatency microseconds

The cumulative commit latency, measured as the time between when a client submits a commit request and when it receives the commit acknowledgment.

AWS.RDS.CommitThroughput Count per second

The number of commit operations per second in the database.

AWS.RDS.ConnectionAttempts Count

The number of attempts to connect to an instance, whether successful or not.

AWS.RDS.CPUCreditBalance Count

CpuCreditBalance. The average number of earned CPU credits that an instance has accrued since it was launched or started.

AWS.RDS.CPUCreditUsage Count

CpuCreditUsage. The average number of CPU credits spent by the instance for CPU utilization.

AWS.RDS.CPUSurplusCreditBalance Count

The number of surplus CPU credits spent to sustain CPU utilization when the CPUCreditBalance value is zero.

AWS.RDS.CPUSurplusCreditsCharged Count

The number of surplus CPU credits exceeding the maximum number of CPU credits that can be earned in a 24-hour period, attracting an additional charge.

AWS.RDS.CPUUtilization Percent (%)

CpuUtilization. The average percentage of CPU utilization.

AWS.RDS.DatabaseConnections Count

DatabaseConnections. The total number of client network connections to the database instance.

AWS.RDS.DBLoad Average Active Sessions (AAS)

Measures the level of session activity in your database, representing the activity of the DB instance in average active sessions.

AWS.RDS.DBLoadCPU Average Active Sessions (AAS)

The number of active sessions where the wait event type is CPU.

AWS.RDS.DBLoadNonCPU Average Active Sessions (AAS)

The number of active sessions where the wait event type is not CPU.

AWS.RDS.DDLLatency milliseconds (ms)

The average time taken to complete Data Definition Language (DDL) operations in the database.

AWS.RDS.DDLThroughput Count per second

The number of Data Definition Language (DDL) operations per second in the database.

AWS.RDS.Deadlocks Count

The number of deadlock events detected in the database. A deadlock occurs when two or more processes are waiting on the same resource and each process is waiting on the other process to complete before moving forward.

AWS.RDS.DeleteLatency milliseconds (ms)

The average time taken to complete delete operations in the database.

AWS.RDS.DeleteThroughput Count per second

The number of delete operations per second in the database.

AWS.RDS.DiskQueueDepth Count

DiskQueueDepth. The average number of outstanding I/Os (read/write requests) waiting to access the disk.

AWS.RDS.DiskQueueDepthLogVolume Count The number of input and output (I/O) requests that were submitted by the application but haven't been sent to the storage device yet.
AWS.RDS.DMLLatency milliseconds (ms)

The average time taken to complete Data Manipulation Language (DML) operations in the database.

AWS.RDS.DMLThroughput Count per second

The number of Data Manipulation Language (DML) operations per second in the database.

AWS.RDS.EBSByteBalance Percent (%)

The percentage of throughput credits remaining in the burst bucket of your RDS database.

AWS.RDS.EBSIOBalance Percent (%)

The percentage of I/O credits remaining in the burst bucket of your RDS database.

AWS.RDS.EngineUptime seconds (s)

The number of seconds since the last time a DB instance was started.

AWS.RDS.FailedSQLServerAgentJobsCount Count

The number of SQL Server Agent jobs that have failed.

AWS.RDS.ForwardingMasterDMLLatency milliseconds (ms)

The average response time of forwarded DML statements on the master DB instance.

AWS.RDS.ForwardingMasterDMLThroughput Count per second

The number of forwarded DML statements processed each second by the master DB instance.

AWS.RDS.ForwardingMasterOpenSessions Count

The number of open sessions on the master DB instance processing forwarded queries.

AWS.RDS.ForwardingReplicaDMLLatency milliseconds (ms)

The average response time in milliseconds of forwarded DML statements on the replica DB instance.

AWS.RDS.ForwardingReplicaDMLThroughput Count per second

The number of forwarded DML (Data Manipulation Language) statements processed each second by the replica DB instance.

AWS.RDS.ForwardingReplicaOpenSessions Count

The number of open sessions on the replica DB instance that are processing forwarded queries.

AWS.RDS.ForwardingReplicaReadWaitLatency milliseconds (ms)

The average wait time in milliseconds that the replica waits to be consistent with the Log Sequence Number (LSN) of the writer DB instance.

AWS.RDS.ForwardingReplicaReadWaitThroughput Count per second

The total number of SELECT statements processed each second in all sessions that are forwarding writes.

AWS.RDS.ForwardingReplicaSelectLatency milliseconds (ms)

The average response time in milliseconds of forwarded SELECT statements on the replica.

AWS.RDS.ForwardingReplicaSelectThroughput Count per second

The number of forwarded SELECT statements processed each second by the replica DB instance.

AWS.RDS.ForwardingWriterDMLLatency milliseconds (ms)

The average time to process each forwarded DML statement on the writer DB instance. It doesn't include the time for the DB cluster to forward the write request or the time to replicate changes back to the writer.

AWS.RDS.ForwardingWriterDMLThroughput Count per second

The number of forwarded DML statements processed each second by the writer DB instance.

AWS.RDS.ForwardingWriterOpenSessions Count

The number of forwarded sessions on the writer DB instance.

AWS.RDS.FreeEphemeralStorage bytes The amount of free ephemeral storage available in the RDS instance.
AWS.RDS.FreeableMemory bytes

FreeableMemory. The average amount of available random access memory.

AWS.RDS.FreeLocalStorage bytes

The amount of free local storage available in the RDS instance.

AWS.RDS.FreeStorageSpace bytes

FreeStorageSpace. The average amount of available storage space.

AWS.RDS.FreeStorageSpaceLogVolume bytes The amount of free storage space available in the log volume of the RDS instance.
AWS.RDS.InsertLatency microseconds

The cumulative commit latency, measured as the time between when a client submits a commit request and when it receives the commit acknowledgment.

AWS.RDS.InsertThroughput Count per second

The number of insert operations per second in the database.

AWS.RDS.LoginFailures Count

The number of failed login attempts to the database.

AWS.RDS.LVMReadIOPS Count

The average number of read input/output operations per second (IOPS) for the logical volume manager (LVM) in the RDS instance.

AWS.RDS.LVMWriteIOPS Count

The average number of write input/output operations per second (IOPS) for the logical volume manager (LVM) in the RDS instance.

AWS.RDS.MaximumUsedTransactionIDs Count

MaximumUsedTransactionIDs. The maximum transaction IDs that have been used. This metric applies to PostgreSQL.

AWS.RDS.NetworkReceiveThroughput bps

NetworkReceiveThroughput. The average incoming (receive) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication.

AWS.RDS.NetworkThroughput bps

The average number of bytes transmitted and received over the network per second in the RDS instance.

AWS.RDS.NetworkTransmitThroughput bps

NetworkTransmitThroughput. The average outgoing (transmit) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication.

AWS.RDS.NumBinaryLogFiles Count

The number of binary log files on the RDS instance..

AWS.RDS.OldestReplicationSlotLag bytes

OldestReplicationSlotLag. The average lagging size of the replica lagging the most in terms of write-ahead log (WAL) data received. This metric applies to PostgreSQL.

AWS.RDS.Queries Count

The number of queries executed on the RDS instance.

AWS.RDS.RDSToAuroraPostgreSQLReplicaLag milliseconds (ms)

The amount of lag in milliseconds for replication from an RDS for PostgreSQL DB instance to an Aurora PostgreSQL DB cluster.

AWS.RDS.ReadIOPS Count per second

ReadIOPS. The average number of disk read I/O operations per second.

AWS.RDS.ReadIOPSEphemeralStorage Count per second The average number of read input/output operations per second (IOPS) for ephemeral storage in the RDS instance.
AWS.RDS.ReadIOPSLogVolume Count per second The average number of read input/output operations per second (IOPS) for the log volume in the RDS instance.
AWS.RDS.ReadLatency seconds (s)

Readlatency. The average amount of time taken per disk I/O operation.

AWS.RDS.ReadLatencyEphemeralStorage milliseconds (ms) The average time taken to complete read operations on ephemeral storage in the RDS instance.
AWS.RDS.ReadLatencyLogVolume milliseconds (ms) The average time taken to complete read operations on the log volume in the RDS instance.
AWS.RDS.ReadThroughput bps

ReadThroughput. The average number of bytes read from disk per second.

AWS.RDS.ReadThroughputLogVolume bps The average number of bytes read from the log volume per second in the RDS instance.
AWS.RDS.ReplicaLag seconds (s)

ReplicaLag. For read replica configurations, the average amount of time a read replica DB instance lags behind the source DB instance.

AWS.RDS.ReplicationChannelLag seconds (s) The amount of lag in seconds for replication between the primary instance and the replica in the RDS instance.
AWS.RDS.ReplicationSlotDiskUsage bytes

ReplicationSlotDiskUsage. The average disk space used by replication slot files. This metric applies to PostgreSQL.

AWS.RDS.ResultSetCacheHitRatio Percent (%)

The percentage of read operations that are served from the result set cache in the RDS instance.

AWS.RDS.RollbackSegmentHistoryListLength Count

The length of the undo log or rollback segment history list, which contains the before images of database records used during transaction rollbacks or to provide a consistent read view for long-running transactions.

AWS.RDS.RowLockTime milliseconds (ms)

The average time spent waiting for row locks in the RDS instance.

AWS.RDS.SelectLatency milliseconds (ms)

The average time taken to complete select operations in the database.

AWS.RDS.SelectThroughput Count per second

The number of select operations per second in the database.

AWS.RDS.ServerlessDatabaseCapacity Aurora Capacity Units (ACUs)

The capacity of the Aurora Serverless v2 database, measured in Aurora Capacity Units (ACUs).

AWS.RDS.SnapshotStorageUsed bytes

The amount of storage used by snapshots in the RDS instance.

AWS.RDS.StorageNetworkReceiveThroughput bps

The amount of network throughput received from the storage subsystem by the RDS instance.

AWS.RDS.StorageNetworkThroughput bps

The total network throughput for storage operations in the RDS instance.

AWS.RDS.StorageNetworkTransmitThroughput bps

The amount of network throughput sent to clients by each instance in the RDS instance.

AWS.RDS.SumBinaryLogSize bytes

The total size of all binary log files on the RDS instance.

AWS.RDS.SwapUsage bytes

SwapUsage. The average amount of swap space used on the DB instance. This metric is not available for SQL Server.

AWS.RDS.TempStorageIOPS Count per second The average number of input/output operations per second (IOPS) for temporary storage in the RDS instance.
AWS.RDS.TempStorageThroughput bps The average number of bytes read from or written to temporary storage per second in the RDS instance.
AWS.RDS.TotalBackupStorageBilled bytes

The total amount of backup storage billed for the RDS instance.

AWS.RDS.TransactionLogsDiskUsage bytes

TransactionLogsDiskUsage. The average disk space used by transaction logs. This metric applies to PostgreSQL.

AWS.RDS.TransactionLogsGeneration bps

TransactionLogsGeneration. The average size of transaction logs generated per second. This metric applies to PostgreSQL.

AWS.RDS.UpdateLatency milliseconds (ms)

The average time taken to complete update operations in the database.

AWS.RDS.UpdateThroughput Count per second

The number of update operations per second in the database.

AWS.RDS.VolumeBytesUsed bytes

The amount of storage space used by the volume in the RDS instance.

AWS.RDS.VolumeReadIOPs Count per minute

The average number of read input/output operations per second (IOPS) for the volume in the RDS instance.

AWS.RDS.VolumeWriteIOPs Count per minute

The average number of write input/output operations per second (IOPS) for the volume in the RDS instance.

AWS.RDS.WriteIOPS Count per second

WriteIOPS. The average number of disk write I/O operations per second.

AWS.RDS.WriteIOPSEphemeralStorage Count per second The average number of write input/output operations per second (IOPS) for ephemeral storage in the RDS instance.
AWS.RDS.WriteIOPSLogVolume Count per second The average number of write input/output operations per second (IOPS) for the log volume in the RDS instance.
AWS.RDS.WriteLatency seconds (s)

WriteLatency. The average amount of time taken per disk I/O operation.

AWS.RDS.WriteLatencyEphemeralStorage milliseconds (ms) The average time taken to complete write operations on ephemeral storage in the RDS instance.
AWS.RDS.WriteLatencyLogVolume seconds (s) The average time taken to complete write operations on the log volume in the RDS instance.
AWS.RDS.WriteThroughput bps

WriteThroughput. The average number of bytes written to disk per second.

AWS.RDS.WriteThroughputEphemeralStorage bps The average number of bytes written to ephemeral storage per second in the RDS instance.
AWS.RDS.WriteThroughputLogVolume bps The average number of bytes written to the log volume per second in the RDS instance.

Redshift Cluster

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Redshift entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awsredshift.

AWS.Redshift.CommitQueueLength Count The number of transactions waiting to commit at a given point in time.
AWS.Redshift.ConcurrencyScalingActiveClusters Count The number of concurrency scaling clusters that are actively processing queries at any given time.
AWS.Redshift.ConcurrencyScalingSecond Count The number of seconds used by concurrency scaling clusters that have active query processing activity.

AWS.Redshift.CPUUtilization

Percent (%)

The percentage of CPU utilization.

AWS.Redshift.DatabaseConnections

Count

The number of database connections to a cluster.

AWS.Redshift.HealthStatus

Boolean

Indicates the health of the cluster. Every minute the cluster connects to its database and performs a simple query. If it is able to perform this operation successfully, the cluster is considered healthy. Otherwise, the cluster is unhealthy.

AWS.Redshift.MaintenanceMode

Boolean

Indicates whether the cluster is in maintenance mode.

AWS.Redshift.MaxConfiguredConcurrencyScalingClusters Count Sets the maximum number of concurrency scaling clusters allowed when concurrency scaling is enabled.

AWS.Redshift.NetworkReceiveThroughput

bps

The rate at which the node or cluster receives data.

AWS.Redshift.NetworkTransmitThroughput

bps

The rate at which the node or cluster writes data.

AWS.Redshift.NumExceededSchemaQuotas Count The number of times schema quotas have been exceeded in the Redshift cluster.

AWS.Redshift.PercentageDiskSpaceUsed

Percent (%)

The percent of disk space used.

AWS.Redshift.PercentageQuotaUsed Percent (%) The percentage of the quota that has been used in the Redshift cluster.
AWS.Redshift.QueriesCompletedPerSecond Count per second The number of queries completed per second in the Redshift cluster.
AWS.Redshift.QueryDuration milliseconds (ms) The average amount of time taken to complete a query in the Redshift cluster.
AWS.Redshift.QueryRuntimeBreakdown milliseconds (ms) The breakdown of query runtime into various stages such as planning, waiting, and execution.

AWS.Redshift.ReadIOPS

Count per second

The average number of disk read operations per second.

AWS.Redshift.ReadLatency

seconds (s)

The average amount of time taken for disk read I/O operations.

AWS.Redshift.ReadThroughput

bytes

The average number of bytes read from disk per second.

AWS.Redshift.RedshiftManagedStorageTotalCapacity bytes The total capacity of managed storage available in the Redshift cluster.
AWS.Redshift.SchemaQuota Megabytes The amount of disk space that a schema can use in the Redshift cluster.
AWS.Redshift.StorageUsed bytes The amount of storage space used by the Redshift cluster.
AWS.Redshift.TotalTableCount Count The total number of tables in the Redshift cluster.
AWS.Redshift.UsageLimitAvailable Count The amount of usage limit available for the specified feature in the Redshift cluster.
AWS.Redshift.UsageLimitConsumed Count The amount of usage limit consumed for the specified feature in the Redshift cluster.
AWS.Redshift.WLMQueriesCompletedPerSecond Count per second The number of queries completed per second in the workload management (WLM) queues of the Redshift cluster.
AWS.Redshift.WLMQueryDuration milliseconds (ms) The average amount of time taken to complete a query in the workload management (WLM) queues of the Redshift cluster.
AWS.Redshift.WLMQueueLength Count The number of queries waiting in the workload management (WLM) queues of the Redshift cluster.
AWS.Redshift.WLMQueueWaitTime milliseconds (ms) The amount of time queries wait in the workload management (WLM) queue before being processed.
AWS.Redshift.WLMRunningQueries Count The number of queries currently running in the workload management (WLM) queue.

AWS.Redshift.WriteIOPS

Count per second

The average number of write operations per second.

AWS.Redshift.WriteLatency

seconds (s)

The average amount of time taken for disk write I/O operations.

AWS.Redshift.WriteThroughput

bps

The average number of bytes written to disk per second.

Route 53

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Route53 entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awsroute53.

AWS.Route53.ChildHealthCheckHealthyCount

Count

For a calculated health check, the number of health checks that are healthy among the health checks that Amazon Route 53 is monitoring.

AWS.Route53.ConnectionTime

milliseconds (ms)

The average time, in milliseconds, that it took Amazon Route 53 health checkers to establish a TCP connection with the endpoint.

AWS.Route53.DNSQueries Count The number of DNS queries received by Amazon Route 53.

AWS.Route53.HealthCheckPercentageHealthy

Percent (%)

The percentage of Amazon Route 53 health checkers that consider the selected endpoint to be healthy.

AWS.Route53.HealthCheckStatus

Boolean

The status of the health check endpoint that CloudWatch is checking 1 indicates healthy, and 0 indicates unhealthy.

AWS.Route53.SSLHandshakeTime

milliseconds (ms)

The average time, in milliseconds, that it took Amazon Route 53 health checkers to complete the SSL handshake.

AWS.Route53.TimeToFirstByte

milliseconds (ms)

The average time, in milliseconds, that it took Amazon Route 53 health checkers to receive the first byte of the response to an HTTP or HTTPS request.

S3

Metric Units Description
AWS.S3.4xxErrors Count

4xxErrors. The number of HTTP 4xx client error status code requests made to an Amazon S3 bucket with a value of either 0 or 1.

AWS.S3.5xxErrors Count

5xxErrors. The number of HTTP 5xx server error status code requests made to an Amazon S3 bucket with a value of either 0 or 1.

AWS.S3.AllRequests Count

AllRequests. The total number of HTTP requests made to an Amazon S3 bucket, regardless of type.

AWS.S3.BucketSizeBytes bytes

BucketSizeBytes. The amount of data that is stored in a bucket, in bytes.

AWS.S3.BytesDownloaded bytes

BytesDownloaded. The number of bytes downloaded for requests made to an Amazon S3 bucket where the response includes a body.

AWS.S3.BytesUploaded bytes

BytesUploaded. The number of bytes uploaded for requests made to an Amazon S3 bucket where the request includes a body.

AWS.S3.DeleteRequests Count

The number of HTTP DELETE requests made for objects in a bucket.

AWS.S3.FirstByteLatency milliseconds (ms)

FirstByteLatency. The per-request time from the complete request being received by an Amazon S3 bucket to when the response starts to be returned.

AWS.S3.GetRequests Count

GetRequests. The number of HTTP GET requests made for objects in an Amazon S3 bucket. This doesn't include list operations.

AWS.S3.HeadRequests Count

The number of HTTP HEAD requests made to a bucket.

AWS.S3.InvokedLambda Count The number of times an AWS Lambda function is invoked by Amazon S3.
AWS.S3.LambdaResponse4xx Count The number of 4xx (client error) responses returned by AWS Lambda functions invoked by Amazon S3.
AWS.S3.LambdaResponse5xx Count The number of 5xx (server error) responses returned by AWS Lambda functions invoked by Amazon S3.
AWS.S3.LambdaResponseRequests Count The number of requests made to AWS Lambda functions invoked by Amazon S3.
AWS.S3.ListRequests Count

The number of HTTP requests that list the contents of a bucket.

AWS.S3.NumberOfObjects Count

NumberOfObjects. The total number of objects stored in a bucket for all storage classes. This value is calculated by counting all objects in the bucket (both current and noncurrent objects) and the total number of parts for all incomplete multipart uploads to the bucket.

AWS.S3.PostRequests Count

PostRequests. The number of HTTP POST requests made to an Amazon S3 bucket.

AWS.S3.ProxiedRequests Count The number of requests that are proxied through Amazon S3.
AWS.S3.PutRequests Count

PutRequests. The number of HTTP PUT requests made for objects in an Amazon S3 bucket.

AWS.S3.ReplicationLatency milliseconds (ms)

The maximum number of seconds by which the replication destination bucket is behind the source bucket for a given replication rule.

AWS.S3.SelectBytesReturned bytes

The amount of data returned with Select requests from S3 Standard storage.

AWS.S3.SelectBytesScanned bytes

The amount of data scanned with Select requests from S3 Standard storage.

AWS.S3.SelectRequests Count

The number of requests made to Amazon S3 Select.

AWS.S3.TotalRequestLatency milliseconds (ms)

TotalRequestLatency. The elapsed per-request time from the first byte received to the last byte sent to an Amazon S3 bucket. This metric includes the time taken to receive the request body and send the response body, which is not included in FirstByteLatency.

SNS

Metric Units Description
AWS.SNS.NotificationSuccessRate Percent (%)

The percentage of successfully delivered notifications out of the total notifications attempted.

AWS.SNS.NumberOfMessagesPublished Count

NumberOfMessagesPublished. The average number of messages published to Amazon SNS topics.

AWS.SNS.NumberOfNotificationsDelivered Count

NumberOfNotificationsDelivered. The average number of messages successfully delivered from Amazon SNS topics to subscribing endpoints.

AWS.SNS.NumberOfNotificationsFailed Count

NumberOfNotificationsFailed. The average number of messages that Amazon SNS failed to deliver.

AWS.SNS.NumberOfNotificationsFailedToRedriveToDlq Count

NumberOfNotificationsFailedToRedriveToDlq. The average number of messages that couldn't be moved to a dead-letter queue.

AWS.SNS.NumberOfNotificationsFilteredOut Count

NumberOfNotificationsFilteredOut. The average number of messages that were rejected by subscription filter policies. A filter policy rejects a message when the message attributes don't match the policy attributes.

AWS.SNS.NumberOfNotificationsFilteredOut-InvalidAttributes Count

NumberOfNotificationsFilteredOut-InvalidAttributes. The average number of messages that were rejected by subscription filter policies because the messages' attributes are invalid.

AWS.SNS.NumberOfNotificationsFilteredOut-InvalidMessageBody Count The number of notifications filtered out due to invalid message body content.
AWS.SNS.NumberOfNotificationsFilteredOut-MessageAttributes Count The number of notifications filtered out due to message attributes not matching the filter policy.
AWS.SNS.NumberOfNotificationsFilteredOut-MessageBody Count The number of notifications filtered out due to message body content not matching the filter policy.
AWS.SNS.NumberOfNotificationsFilteredOut-NoMessageAttributes Count

NumberOfNotificationsFilteredOut-NoMessageAttributes. The average number of messages that were rejected by subscription filter policies because the messages have no attributes.

AWS.SNS.NumberOfNotificationsRedrivenToDlq Count

NumberOfNotificationsRedrivenToDlq. The average number of messages that have been moved to a dead-letter queue.

AWS.SNS.PublishSize bytes

PublishSize. The average size of messages published.

AWS.SNS.SMSMonthToDateSpentUSD USD

The total amount of money spent on SMS messages for the current month.

AWS.SNS.SMSSuccessRate Percent (%) The percentage of successfully delivered SMS messages out of the total SMS messages attempted.

SQS

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Simple Queue Service (SQS) entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awssqs.

AWS.SQS.ApproximateAgeOfOldestMessage

seconds (s)

The approximate age of the oldest non-deleted message in the queue.

AWS.SQS.ApproximateNumberOfMessagesDelayed

Count

The number of messages in the queue that are delayed and not available for reading immediately.

AWS.SQS.ApproximateNumberOfMessagesNotVisible

Count

The number of messages that are "in flight." Messages are considered in flight if they have been sent to a client but have not yet been deleted or have not yet reached the end of their visibility window.

AWS.SQS.ApproximateNumberOfMessagesVisible

Count

The number of messages available for retrieval from the queue.

AWS.SQS.NumberOfEmptyReceives

Count

The number of ReceiveMessage API calls that did not return a message.

AWS.SQS.NumberOfMessagesDeleted

Count

The number of messages deleted from the queue.

AWS.SQS.NumberOfMessagesReceived

Count

The number of messages returned by calls to the ReceiveMessage API action.

AWS.SQS.NumberOfMessagesSent

Count

The number of messages added to a queue.

AWS.SQS.SentMessageSize

bytes

The size of messages added to a queue.

Transfer Family

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Transfer Family entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awstransferfamily.

AWS.Transfer.BytesIn Count

The total number of bytes received by the server.

AWS.Transfer.BytesOut Count

The total number of bytes sent by the server.

AWS.Transfer.FilesIn Count

The total number of files received by the server.

AWS.Transfer.FilesOut Count

The total number of files sent from the server.

AWS.Transfer.InboundMessage Count The total number of AS2 messages successfully received from a trading partner. This metric is emitted as soon as the inbound message has finished processing successfully.
AWS.Transfer.InboundFailedMessage Count The total number of AS2 messages that were unsuccessfully received from a trading partner. This means a trading partner sent a message, but the Transfer Family server was not able to successfully process it.
AWS.Transfer.OnUploadExecutionsStarted Count

The total number of workflow executions started on the server.

AWS.Transfer.OnUploadExecutionsSuccess Count

The total number of successful workflow executions on the server.

AWS.Transfer.OnUploadExecutionsFailed Count

The total number of unsuccessful workflow executions on the server.

AWS.Transfer.OnPartialUploadExecutionsStarted Count

The total number of on-partial-upload workflow executions started on the server.

AWS.Transfer.OnPartialUploadExecutionsSuccess Count

The total number of successful, on-partial-upload workflow executions on the server.

AWS.Transfer.OnPartialUploadExecutionsFailed Count

The total number of unsuccessful, on-partial-upload workflow executions on the server.

Transit Gateway

Metric Units Description
AWS.TransitGateway.BytesDropCountBlackhole Count

BytesDropCountBlackhole. The total number of bytes dropped because they matched a blackhole route.

AWS.TransitGateway.BytesDropCountNoRoute Count

BytesDropCountNoRoute. The total number of bytes dropped because they did not match a route.

AWS.TransitGateway.BytesDropPercentage Percent (%)

The percentage of bytes dropped by the transit gateway due to various reasons such as blackhole routes or no matching routes.

AWS.TransitGateway.BytesIn Count

BytesIn. The total number of bytes received by the transit gateway.

AWS.TransitGateway.BytesOut Count

BytesOut. The total number of bytes sent from the transit gateway.

AWS.TransitGateway.PacketDropCountBlackhole Count

PacketDropCountBlackhole. The total number of packets dropped because they matched a blackhole route.

AWS.TransitGateway.PacketDropCountNoRoute Count

PacketDropCountNoRoute. The total number of packets dropped because they did not match a route.

AWS.TransitGateway.PacketsDropPercentage Percent (%)

The percentage of packets dropped by the transit gateway due to various reasons such as blackhole routes, no matching routes, or TTL expiration.

AWS.TransitGateway.PacketsIn Count

PacketsIn. The total number of packets received by the transit gateway.

AWS.TransitGateway.PacketsOut Count

PacketsOut. The total number of packets sent by the transit gateway.

VPN

Metric Units Description
AWS.VPN.TunnelDataIn bytes

TunnelDataIn. The total bytes received on the AWS side of the connection through the VPN tunnel from a customer gateway.

AWS.VPN.TunnelDataOut bytes

TunnelDataOut. The total bytes sent from the AWS side of the connection through the VPN tunnel to the customer gateway. Each metric data point represents the number of bytes sent after the previous data point.

AWS.VPN.TunnelState Count

TunnelState. The average state of the tunnels. For static VPNs, 0 indicates DOWN and 1 indicates UP.

Infrastructure/Azure metrics

Metrics for Azure entities are collected by integrating SolarWinds Observability SaaS with your Azure cloud account. See Azure cloud platform monitoring.

API Management Service

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of API Management Service entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select azureapimanagementservice.

azure.api.management.BackendDuration milliseconds (ms) Duration of backend requests.
azure.api.management.Capacity Percent (%)

Utilization metric for ApiManagement service.

For SKUs other than Premium, Max aggregation will show the value as 0.

azure.api.management.ConnectionAttempts Count Count of WebSocket connection attempts based on selected source and destination.
azure.api.management.Duration milliseconds (ms) Overall duration of gateway requests.
azure.api.management.EventHubDroppedEvents Count Number of events skipped because of queue size limit reached.
azure.api.management.EventHubRejectedEvents Count Number of rejected EventHub events (wrong configuration or unauthorized).
azure.api.management.EventHubSuccessfulEvents Count Number of successful EventHub events.
azure.api.management.EventHubThrottledEvents Count Number of throttled EventHub events.
azure.api.management.EventHubTimedoutEvents Count Number of timed out EventHub events.
azure.api.management.EventHubTotalBytesSent bytes Total size of EventHub events.
azure.api.management.EventHubTotalEvents Count Number of events sent to EventHub.
azure.api.management.EventHubTotalFailedEvents Count Number of failed EventHub events.
azure.api.management.NetworkConnectivity Count Network Connectivity status of dependent resource types from API Management service.
azure.api.management.Requests Count Gateway request metrics with multiple dimensions.
azure.api.management.WebSocketMessages Count Count of WebSocket messages based on selected source and destination.

App Service

Metric Units Description
azure.sites.app_connections Count

The average number of connections established by an application in Azure App Service.

azure.sites.app_domains Count

Total App Domains. The average number of AppDomains loaded in this application.

azure.sites.app_domains.unloaded Count

The number of application domains that have been unloaded in an Azure App Service environment, which can be useful for monitoring app lifecycle events.

azure.sites.collections.gen1 Count

The number of garbage collection events for Generation 1 objects in an Azure App Service instance.

azure.sites.collections.gen2 Count

The number of garbage collection events for Generation 2 objects in an Azure App Service instance.

azure.sites.cpu_time seconds (s)

CPU Time. The total amount of CPU consumed by the app, in seconds.

azure.sites.current_assemblies Count

The number of assemblies currently loaded across all application domains in an Azure App Service instance.

azure.sites.function_executions Count

The total number of function executions in an Azure Functions app, providing insight into function activity and usage.

azure.sites.handles Count

The number of open file handles in an Azure App Service environment. This metric helps monitor resource usage and potential file access issues.

azure.sites.http.101 Count

Tracks HTTP 101 responses, which indicate protocol switching (for example, upgrading from HTTP to WebSockets).

azure.sites.http.2xx Count

Http2xx. The total number of requests resulting in an HTTP status code greater than or equal to 200 but less than 300.

azure.sites.http.3xx Count

HTTP 3xx responses, which indicate redirection. These status codes signal that the requested resource has moved to a different location.

azure.sites.http.401 Count

HTTP 401 responses, which indicate unauthorized access. This occurs when authentication credentials are missing or invalid.

azure.sites.http.403 Count

HTTP 403 responses, which indicate forbidden access. This happens when a request is denied due to insufficient permissions or security restrictions.

azure.sites.http.404 Count

HTTP 404 responses, which indicate that the requested resource was not found. This can occur when a URL is incorrect or the resource has been removed.

azure.sites.http.406 Count

HTTP 406 responses, which indicate that the requested format is not acceptable. This happens when the server cannot provide content in the format specified by the request.

azure.sites.http.4xx Count

Http4xx. The total number of requests resulting in an HTTP status code greater than or equal to 400 but less than 500.

azure.sites.http.5xx Count

Http5xx. The total number of requests resulting in an HTTP status code greater than or equal to 500 but less than 600.

azure.sites.io.bytes_received bytes

Bytes Received. The total amount of incoming bandwidth consumed by the app.

azure.sites.io.bytes_sent bytes

Bytes Sent. The total amount of outgoing bandwidth consumed by the app.

azure.sites.io.other_bytes bps

The rate at which the app process issues bytes to I/O operations that do not involve data transfer, such as control operations.

azure.sites.io.other_ops Count per second

The rate at which the app process issues I/O operations that are not read or write operations.

azure.sites.io.read_bytes bps

IoReadBytesPerSecond. The number of bytes per second the app is reading from I/O operations.

azure.sites.io.read_ops Count per second

The number of read operations performed by the app process.

azure.sites.io.write_bytes bps

IO Write Bytes Per Second. The number of bytes per second the app is writing to I/O operations.

azure.sites.io.write_ops Count per second

The number of write operations performed by the app process.

azure.sites.memory.working_set bytes

Memory Working Set. The current amount of memory used by the app.

azure.sites.memory.working_set.avg Megabytes (MB)

Average Memory Working Set. The average amount of memory used by the app, in megabytes.

azure.sites.private_bytes bytes

The amount of memory allocated by the app process that cannot be shared with other processes. This includes allocated memory, local variables, heap memory, and other runtime data.

azure.sites.queued_requests Count

Requests In Application Queue. The average number of requests in the application request queue.

azure.sites.requests Count

Requests. The total number of requests regardless of their resulting HTTP status code.

azure.sites.response_time seconds (s)

Average Response Time. The average time taken for the app to serve requests, in seconds.

azure.sites.threads Count

Threads. The average number of threads currently active in the app process.

Application Gateway

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Application Gateway entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select azureapplicationgateway.

azure.applicationgateway.ApplicationGatewayTotalTime milliseconds (ms) Average time that it takes for a request to be processed and its response to be sent.
azure.applicationgateway.AvgRequestCountPerHealthyHost Percent (%)

Average request count per minute per healthy backend host in a pool.

azure.applicationgateway.BackendConnectTime Count Time spent establishing a connection with a backend server.
azure.applicationgateway.BackendFirstByteResponseTime milliseconds (ms) Time interval between start of establishing a connection to backend server and receiving the first byte of the response header.
azure.applicationgateway.BackendLastByteResponseTime Count Time interval between start of establishing a connection to backend server and receiving the last byte of the response body.
azure.applicationgateway.BackendResponseStatus Count The number of HTTP response codes generated by the backend members.
azure.applicationgateway.BlockedCount Count Web Application Firewall blocked requests rule distribution.
azure.applicationgateway.BytesReceived Count The total number of bytes received by the Application Gateway from the clients.
azure.applicationgateway.BytesSent Count The total number of bytes sent by the Application Gateway to the clients.
azure.applicationgateway.CapacityUnits bytes Capacity Units consumed.
azure.applicationgateway.ClientRtt Count Average round trip time between clients and Application Gateway.
azure.applicationgateway.ComputeUnits Count Compute Units consumed.
azure.applicationgateway.CpuUtilization Count Current CPU utilization of the Application Gateway.
azure.applicationgateway.CurrentConnections Count Count of current connections established with Application Gateway
azure.applicationgateway.EstimatedBilledCapacityUnits Count Estimated capacity units that will be charged.
azure.applicationgateway.FailedRequests Count Count of failed requests that Application Gateway has served.
azure.applicationgateway.FixedBillableCapacityUnits Count Minimum capacity units that will be charged.
azure.applicationgateway.HealthyHostCount Count Number of healthy backend hosts.
azure.applicationgateway.MatchedCount Count Web Application Firewall Total Rule Distribution for the incoming traffic.
azure.applicationgateway.NewConnectionsPerSecond Count per second New connections per second established with Application Gateway.
azure.applicationgateway.ResponseStatus Count Http response status returned by Application Gateway.
azure.applicationgateway.Throughput bps Number of bytes per second the Application Gateway has served.
azure.applicationgateway.TlsProtocol Count The number of TLS and non-TLS requests initiated by the client that established connection with the Application Gateway.
azure.applicationgateway.TotalRequests Count Count of successful requests that Application Gateway has served.
azure.applicationgateway.UnhealthyHostCount Count Number of unhealthy backend hosts.
azure.applicationgateway.AzwafBotProtection Count Number of matched Bot Rules.
azure.applicationgateway.AzwafCustomRule Count Number of matched Custom Rules.
azure.applicationgateway.AzwafSecRule Count Number of matched Managed Rules.
azure.applicationgateway.AzwafTotalRequests Count Total number of requests evaluated by WAF.

Application Insights

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Application Insights entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select azureapplicationinsights.

azure.application.insight.Availability Percent (%) Tracks the availability and responsiveness of an application by sending web requests at regular intervals. If the application isn't responding or the response time is too slow, alerts can be triggered.
azure.application.insight.Availability_tests Count Recurring web tests that monitor an application's availability from various locations worldwide. These tests help ensure uptime and detect performance issues.
azure.application.insight.Availability_test_duration milliseconds (ms) The duration of availability tests, helping assess response times and detect slow performance.
azure.application.insight.client_processing_time milliseconds (ms) Represents the time taken by the client to process a request before sending a response. This metric helps analyze client-side performance.
azure.application.insight.Receiving_response_time milliseconds (ms) Tracks the time taken to receive a response from the server after a request is sent. This metric helps monitor network latency and server responsiveness.
azure.application.insight.Send_request_time milliseconds (ms) Measures the time taken to send a request from the client to the server. This metric helps assess network performance and request transmission speed.
azure.application.insight.Browser_page_load_time milliseconds (ms) The time taken for a web page to fully load in the browser. This metric helps assess user experience and identify performance bottlenecks.
azure.application.insight.Dependency_calls Count The number of external service or database calls made by an application. This metric helps monitor interactions with dependencies like APIs, databases, and storage.
azure.application.insight.Dependency_duration milliseconds (ms) The time taken for a dependency call to complete, including connection time and response retrieval. This metric helps analyze performance and detect slow dependencies.
azure.application.insight.Dependency_call_failures Count The number of failed dependency calls, helping identify issues with external services or databases that impact application functionality.
azure.application.insight.Browser_exceptions Count Exceptions that occur in the browser, such as JavaScript errors. This metric helps diagnose client-side issues affecting user experience.
azure.application.insight.Exceptions Count The number of exceptions encountered by the application, including both client-side and server-side errors. This metric helps troubleshoot failures and improve application stability.
azure.application.insight.Server_exceptions Count Tracks exceptions that occur on the server side of an application. These exceptions can be correlated with failed requests and other events to diagnose issues efficiently.
azure.application.insight.Page_views Count The number of times a page is viewed in an application. This metric helps analyze user engagement and behavior.
azure.application.insight.Page_view_load_time milliseconds (ms) The time taken for a web page to fully load in the browser. This metric helps assess user experience and identify performance bottlenecks.
azure.application.insight.Exception_rate Count per second The rate of exceptions occurring in an application. This metric helps monitor application stability and detect potential issues.
azure.application.insight.Process_CPU Percent (%) The percentage of CPU usage by the application process. High values may indicate increased workload or performance bottlenecks.
azure.application.insight.Processor_time Percent (%) The total processor time consumed by the application. This metric helps monitor resource utilization and performance efficiency.
azure.application.insight.Process_private_bytes bytes Represents the amount of memory allocated by an application that cannot be shared with other processes. This metric helps monitor memory usage and detect potential performance issues.
azure.application.insight.HTTP_request_execution_time milliseconds (ms) The time taken to execute an HTTP request within an application. This metric helps assess request processing efficiency and identify bottlenecks.
azure.application.insight.HTTP_requests_in_application_queue Count The number of HTTP requests waiting in the application queue before being processed. A high value may indicate performance issues or resource constraints.
azure.application.insight.HTTP_request_rate Count per second The rate at which HTTP requests are received by the application. This metric helps monitor traffic patterns and detect potential spikes in demand.
azure.application.insight.Server_requests Count The number of requests received by the server, providing insights into application workload and performance.
azure.application.insight.Server_response_time milliseconds (ms) The time taken for the server to respond to incoming requests. This metric helps assess application responsiveness and detect slow performance.
azure.application.insight.Failed_requests Count The number of failed requests in an application. This metric helps diagnose errors, exceptions, and faults affecting application stability.
azure.application.insight.Server_request_rate Count per second The rate at which server requests are received by the application. This metric helps monitor workload and traffic patterns.
azure.application.insight.Traces Count per second Captures trace logs generated by an application, providing insights into debugging, performance monitoring, and distributed tracing.

Blob Storage

Metric Units Description
azure.storage.blob.availability Percent (%)

Availability. The average percentage of availability for the storage service or the specified API operation. Availability is calculated by taking the total billable requests value and dividing it by the number of applicable requests.

azure.storage.blob.blobs Count

BlobCount. The average number of blob objects stored in the storage account.

azure.storage.blob.capacity bytes

BlobCapacity. The average amount of blob storage used in the storage account.

azure.storage.blob.containers

ContainerCount. The average number of containers in the storage account.

azure.storage.blob.egress bytes

Egress. The total amount of egress data. This number includes egress from an external client into Azure Storage as well as egress within Azure. As a result, this number does not reflect billable egress.

azure.storage.blob.index_capacity

IndexCapacity. The average amount of storage used by ADLS Gen2 Hierarchical Index.

azure.storage.blob.ingress bytes

Ingress. The total amount of ingress data. This number includes ingress from an external client into Azure Storage as well as ingress within Azure.

azure.storage.blob.success.e2e_latency milliseconds (ms)

SuccessE2ELatency. The average end-to-end latency of successful requests made to a storage service or the specified API operation. This value includes the required processing time within Azure Storage to read the request, send the response, and receive acknowledgment of the response.

azure.storage.blob.success.server_latency milliseconds (ms)

SuccessServerLatency. The average time used to process a successful request by Azure Storage.

azure.storage.blob.transactions Count

Transactions. The total number of requests made to a storage service or the specified API operation. This number includes successful and failed requests, as well as requests that produced errors.

Cache for Redis

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Cache for Redis entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select azureredis.

azure.redis.alloperationsPerSecond Count The number of instantaneous operations per second executed on the cache.
azure.redis.allpercentprocessortime Percent (%) The CPU utilization of the Azure Redis Cache server as a percentage.
azure.redis.cacheLatency Count The latency to the cache in microseconds.
azure.redis.LatencyP99 Count Measures the worst-case (99th percentile) latency of server-side commands in microseconds. Measured by issuing PING commands from the load balancer to the Redis server and tracking the time to respond.
azure.redis.allcachehits Count The number of successful key lookups.
azure.redis.allcachemisses Count The number of failed key lookups.
azure.redis.allconnectedclients Count The number of client connections to the cache.
azure.redis.allserverLoad Percent (%) The percentage of cycles in which the Redis server is busy processing and not waiting idle for messages.
azure.redis.allusedmemorypercentage Percent (%) The percentage of cache memory used for key/value pairs.
azure.redis.allexpiredkeys Count The number of items expired from the cache.
azure.redis.errors Count The number errors that occured on the cache.
azure.redis.allcacheRead bps The amount of data read from the cache in bytes per second.
azure.redis.allcacheWrite bps The amount of data written to the cache in bytes per second.
azure.redis.allConnectionsClosedPerSecond Count per second The number of instantaneous connections closed per second on the cache via port 6379 or 6380 (SSL).
azure.redis.allConnectionsCreatedPerSecond Count per second The number of instantaneous connections created per second on the cache via port 6379 or 6380 (SSL).
azure.redis.allevictedkeys Count The number of items evicted from the cache.
azure.redis.allgetcommands Count The number of get operations from the cache.
azure.redis.allsetcommands Count The number of set operations to the cache.
azure.redis.alltotalcommandsprocessed Count The total number of commands processed by the cache server.
azure.redis.alltotalkeys Count The total number of items in the cache.
azure.redis.allusedmemory bytes The amount of cache memory used for key/value pairs in the cache in MB.
azure.redis.allusedmemoryRss bytes The amount of cache memory used in MB, including fragmentation and metadata.
azure.redis.cachehits Count The number of successful key lookups.
azure.redis.cachemisses Count The number of failed key lookups.
azure.redis.cachemissrate Percent (%) The % of get requests that miss.
azure.redis.cacheRead bps The amount of data read from the cache in bytes per second.
azure.redis.cacheWrite bps The amount of data written to the cache in bytes per second.
azure.redis.connectedclients Count The number of client connections to the cache.
azure.redis.ConnectedClientsUsingAADToken Count The number of client connections to the cache using AAD Token.
azure.redis.evictedkeys Count The number of items evicted from the cache.
azure.redis.expiredkeys Count The number of items expired from the cache.
azure.redis.GeoReplicationConnectivityLag seconds (s) Time in seconds since last successful data synchronization with geo-primary cache. Value will continue to increase if the link status is down.
azure.redis.GeoReplicationDataSyncOffset bytes Approximate amount of data in bytes that needs to be synchronized to geo-secondary cache.
azure.redis.GeoReplicationFullSyncEventFinished Count Fired on completion of a full synchronization event between geo-replicated caches. This metric reports 0 most of the time because geo-replication uses partial resynchronizations for any new data added after the initial full synchronization.
azure.redis.GeoReplicationFullSyncEventStarted Count Fired on initiation of a full synchronization event between geo-replicated caches. This metric reports 0 most of the time because geo-replication uses partial resynchronizations for any new data added after the initial full synchronization.
azure.redis.GeoReplicationHealthy Count The health status of geo-replication link. 1 if healthy and 0 if disconnected or unhealthy.
azure.redis.getcommands Count The number of get operations from the cache.
azure.redis.operationsPerSecond Count The number of instantaneous operations per second executed on the cache.
azure.redis.percentProcessorTime Percent (%) The CPU utilization of the Azure Redis Cache server as a percentage.
azure.redis.serverLoad Percent (%) The percentage of cycles in which the Redis server is busy processing and not waiting idle for messages.
azure.redis.setcommands Count The number of set operations to the cache.
azure.redis.totalcommandsprocessed Count The total number of commands processed by the cache server.
azure.redis.totalkeys Count The total number of items in the cache.
azure.redis.usedmemory bytes The amount of cache memory used for key/value pairs in the cache in MB.
azure.redis.usedmemorypercentage Percent (%) The percentage of cache memory used for key/value pairs.
azure.redis.usedmemoryRss bytes The amount of cache memory used in MB, including fragmentation and metadata.

Cache for Redis

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Cache for Redis Enterprise entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select azureredisenterprise.

azure.redis.enterprise.operationsPerSecond Count The number of instantaneous operations per second executed on the cache.
azure.redis.enterprise.percentProcessorTime Percent (%) The CPU utilization of the Azure Redis Cache server as a percentage.
azure.redis.enterprise.usedmemorypercentage Percent (%) The percentage of cache memory used for key/value pairs.
azure.redis.enterprise.cachehits Count The number of successful key lookups.
azure.redis.enterprise.cacheLatency Count The latency to the cache in microseconds.
azure.redis.enterprise.cachemisses Count The number of failed key lookups.
azure.redis.enterprise.serverLoad Percent (%) The percentage of cycles in which the Redis server is busy processing and not waiting idle for messages.
azure.redis.enterprise.connectedclients Count The number of client connections to the cache.
azure.redis.enterprise.errors Count The number errors that occured on the cache.
azure.redis.enterprise.cacheRead

Megabytes per second (MB/s)

The amount of data read from the cache in Megabytes per second (MB/s).
azure.redis.enterprise.cacheWrite Megabytes per second (MB/s) The amount of data written to the cache in Megabytes per second (MB/s).
azure.redis.enterprise.totalkeys Count The total number of items in the cache.
azure.redis.enterprise.evictedkeys Count The number of items evicted from the cache.
azure.redis.enterprise.expiredkeys Count The number of items expired from the cache.
azure.redis.enterprise.geoReplicationHealthy Count The health of geo replication in an Active Geo-Replication group. 0 represents Unhealthy and 1 represents Healthy.
azure.redis.enterprise.getcommands Count The number of get operations from the cache.
azure.redis.enterprise.setcommands Count The number of set operations to the cache.
azure.redis.enterprise.totalcommandsprocessed Count The total number of commands processed by the cache server.
azure.redis.enterprise.usedmemory Megabytes (MB) The amount of cache memory used for key/value pairs in the cache in MB.

Container Instances Group

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Container Instances Group entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select azurecontainerinstancesgroup.

azure.container.instances.group.CpuUsage Count CPU usage on all cores in millicores.
azure.container.instances.group.MemoryUsage bytes Total memory usage in byte.
azure.container.instances.group.NetworkBytesReceivedPerSecond bytes The network bytes received per second.
azure.container.instances.group.NetworkBytesTransmittedPerSecond bytes The network bytes transmitted per second.

CDN

Metric Units Description
azure.cdn.byte_hit_ratio Percent (%)

ByteHitRatio. Of the total number of response bytes, the percentage that were served from the CDN cache.

azure.cdn.origin_health_percentage Percent (%)

OriginHealthPercentage. The percentage of successful health probes sent to backends.

azure.cdn.origin_latency milliseconds (ms)

OriginLatency. The average time from when the request was sent to the backend to when the last response byte was received.

azure.cdn.origin_request_count Count

OriginRequestCount. The total number of requests sent to origin.

azure.cdn.percentage_4XX Percent (%)

Percentage4XX. The average percentage of requests with a status code greater than or equal to 400 but less than 500.

azure.cdn.percentage_5XX Percent (%)

Percentage5XX. The average percentage of requests with a status code greater than or equal to 500 but less than 600.

azure.cdn.request_count Count

RequestCount. The total number of client requests served by CDN.

azure.cdn.request_size bytes

RequestSize. The total number of bytes sent as requests from clients.

azure.cdn.response_size bytes

ResponseSize. The total number of bytes sent as responses from CDN edge to clients.

azure.cdn.total_latency milliseconds (ms)

TotalLatency. The average time from the client request being received by CDN until the last response byte is sent from CDN to the client.

azure.cdn.web_application_firewall_request_count Count

WebApplicationFirewallRequestCount. The total number of matched WAF requests.

Cosmos DB

Metric Units Description
azure.cosmos.autoscale_max_throughput Count

AutoscaleMaxThroughput. The maximum throughput the autoscale will scale to.

azure.cosmos.available_storage bytes

AvailableStorage. The total amount of available storage reported at 5-minute granularity per region.

azure.cosmos.cassandra.connection.avg_replication_latency milliseconds (ms)

CassandraConnectorAvgReplicationLatency. The average replication latency of the Cassandra Connector.

azure.cosmos.cassandra.connection.replication_health_status

CassandraConnectorReplicationHealthStatus. The replication health status of the Cassandra Connector.

azure.cosmos.cassandra.connection_closures Count

CassandraConnectionClosures. The total number of Cassandra Connections closed.

azure.cosmos.cassandra.request_charges Count

CassandraRequestCharges. The total number of request units consumed by the API for Cassandra.

azure.cosmos.cassandra.requests Count

CassandraRequests. The total number of Cassandra API requests made.

azure.cosmos.data.usage bytes

DataUsage. The total data usage reported at 5-minute granularity per region.

azure.cosmos.document.count Count

DocumentCount. The total document count reported at 5-minute granularity per region.

azure.cosmos.document.quota bytes

DocumentQuota. The total storage quota reported at 5-minute granularity per region.

azure.cosmos.gremlin.request_charge Count

GremlinRequestCharges. The total number of request units consumed by Gremlin queries.

azure.cosmos.gremlin.requests Count

GremlinRequests. The total number of requests made by Gremlin queries.

azure.cosmos.index_usage bytes

IndexUsage. The total Index usage reported at 5-minute granularity per region.

azure.cosmos.mongo.request_charge Count

MongoRequestCharge. The total number of Mongo request units consumed.

azure.cosmos.mongo.requests Count

MongoRequests. The total number of Mongo requests made.

azure.cosmos.normalized_ru_consumption Percent (%)

NormalizedRUConsumption. The maximum request unit consumption percentage per minute.

azure.cosmos.provisioned_throughput Count

ProvisionedThroughput. The maximum provisioned throughput at container granularity.

azure.cosmos.replication_latency.p99 milliseconds (ms)

ReplicationLatency. The average replication latency across the source and target regions for a geo-enabled account.

azure.cosmos.requests.metadata Count

MetadataRequests. The total number of metadata requests.

azure.cosmos.requests.total Count

TotalRequests. The total number of requests made.

azure.cosmos.requests.total_units Count

TotalRequestUnits. The total number of request units consumed.

azure.cosmos.server_side_latency milliseconds (ms)

ServerSideLatency. The average amount of time taken by the server to process a request.

azure.cosmos.service_availability Percent (%)

ServiceAvailability. The average account request availability at one-hour granularity.

Data Factory

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Data Factory entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select azuredatafactory.

azure.data.factory.PipelineFailedRuns Count The number of pipeline runs that have failed due to errors or unexpected conditions during execution. This helps identify issues in data workflows and troubleshoot failures.
azure.data.factory.PipelineSucceededRuns Count The number of pipeline runs that have successfully completed without errors. This metric helps monitor the reliability and efficiency of data pipelines.
azure.data.factory.PipelineCancelledRuns Count The number of pipeline runs that were manually or automatically canceled before completion. This can be useful for tracking interruptions in data processing.
azure.data.factory.ActivityCancelledRuns Count The number of individual activities within a pipeline that were canceled before execution or completion. This helps monitor workflow interruptions at the activity level.
azure.data.factory.ActivitySucceededRuns Count The number of activities within a pipeline that have successfully completed without errors. This metric helps assess the effectiveness of individual tasks in a data pipeline.
azure.data.factory.ActivityFailedRuns Count The number of activities within a pipeline that have failed due to errors or unexpected conditions. This helps pinpoint specific issues within a pipeline execution.
azure.data.factory.TriggerFailedRuns Count The number of trigger runs that have failed due to errors or unexpected conditions during execution. This helps identify issues in automated data workflows.
azure.data.factory.TriggerSucceededRuns Count The number of trigger runs that have successfully completed without errors. This metric helps monitor the reliability and efficiency of scheduled or event-driven triggers.
azure.data.factory.TriggerCancelledRuns Count The number of trigger runs that were manually or automatically canceled before completion. This can be useful for tracking interruptions in data processing.
azure.data.factory.MaxAllowedResourceCount Count The maximum number of resources allowed within an Azure Data Factory instance. This metric helps monitor resource allocation limits.
azure.data.factory.ResourceCount Count The total number of resources currently in use within an Azure Data Factory instance. This metric helps track resource consumption and availability.
azure.data.factory.FactorySizeInGbUnits Count The total size of the Azure Data Factory instance in gigabyte units. This metric helps monitor storage and processing capacity.
azure.data.factory.IntegrationRuntimeCpuPercentage Percent (%) The percentage of CPU utilization for the integration runtime. Higher values may indicate increased workload or potential performance bottlenecks.
azure.data.factory.IntegrationRuntimeAvailableMemory bytes The amount of available memory for the integration runtime. This metric helps monitor resource usage and ensure optimal performance.
azure.data.factory.IntegrationRuntimeAvailableNodeNumber Count The number of available nodes in the integration runtime. This metric is useful for assessing scalability and resource allocation.
azure.data.factory.IntegrationRuntimeQueueLength Count The number of tasks waiting in the queue for execution within the integration runtime. A high queue length may indicate processing delays or resource constraints.
azure.data.factory.IntegrationRuntimeAverageTaskPickupDelay seconds (s) The average delay before a task is picked up for execution by the integration runtime. Longer delays may suggest resource contention or inefficiencies in task scheduling.

Disk Storage

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Disk Storage entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select azurediskstorage.

azure.disk.storage.Disk_Read_Bytes_sec bps Bytes per second read from disk during the monitoring period.
azure.disk.storage.Disk_Read_Operations_sec Count per second Number of read IOs performed on a disk during monitoring period.
azure.disk.storage.Disk_Write_Bytes_sec bps Bytes per second written to disk during monitoring period.
azure.disk.storage.Disk_Write_Operations_sec Count per second Number of Write IOs performed on a disk during the monitoring period.
azure.disk.storage.DiskPaidBurstIOPS Count The accumulated operations of burst transactions used for disks with on-demand burst enabled.

DNS Zone

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of DNS Zone entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select azurednszone.

azure.dnszone.QueryVolume Count Number of queries served for a DNS zone.
azure.dnszone.RecordSetCapacityUtilization Percent (%) Percent of Record Set capacity utilized by a DNS zone.
azure.dnszone.RecordSetCount Count Number of Record Sets in a DNS zone.

Event Hubs

Metric Units Description
azure.eventhubs.namespaces.active_connections Count

ActiveConnections. The maximum number of active connections on a namespace and on an entity (event hub) in the namespace.

azure.eventhubs.namespaces.capture_backlog Count

Tracks the backlog of events waiting to be captured in Azure Event Hubs.

azure.eventhubs.namespaces.captured_bytes bytes

CapturedBytes. The total number of captured bytes for an event hub.

azure.eventhubs.namespaces.captured_messages Count

CapturedMessages. The total number of captured messages for an event hub.

azure.eventhubs.namespaces.connections_closed Cloud

ConnectionsClosed. The total number of closed connections.

azure.eventhubs.namespaces.connections_opened Count

ConnectionsOpened. The total number of open connections.

azure.eventhubs.namespaces.incoming_bytes bytes

IncomingBytes. The number of incoming bytes for an event hub during the specified period.

azure.eventhubs.namespaces.incoming_messages Count

IncomingMessages. The total number of events or messages sent to Event Hubs over a specified period.

azure.eventhubs.namespaces.incoming_requests Count

IncomingRequests. The total number of requests made to the Event Hubs service over a specified period. This metric includes all the data and management plane operations.

azure.eventhubs.namespaces.namespace_cpu_usage Percent (%)

NamespaceCpuUsage. The maximum namespace CPU usage.

azure.eventhubs.namespaces.namespace_memory_usage Percent (%)

NamespaceMemoryUsage. The maximum namespace memory usage.

azure.eventhubs.namespaces.outgoing_bytes bytes

OutgoingBytes. The number of outgoing bytes for an event hub during the specified period.

azure.eventhubs.namespaces.outgoing_messages Count

OutgoingMessages. The total number of events or messages received from Event Hubs over a specified period.

azure.eventhubs.namespaces.quota_exceeded_errors Count

QuotaExceededErrors. The total number of errors caused by exceeding quotas over a specified period.

azure.eventhubs.namespaces.server_errors Count

ServerErrors. The total number of requests not processed because of an error in the Event Hubs service over a specified period.

azure.eventhubs.namespaces.size bytes

Size. The average size of an event hub.

azure.eventhubs.namespaces.successful_requests Count

SuccessfulRequests. The total number of successful requests made to the Event Hubs service over a specified period.

azure.eventhubs.namespaces.throttled_requests Count

ThrottledRequests. The total number of requests that were throttled because the usage was exceeded.

azure.eventhubs.namespaces.user_errors Count

UserErrors. The total number of requests not processed because of user errors over a specified period.

ExpressRoute Gateway

Metric Units Description
sw.metrics.healthscore Percent (%)

Health state. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of ExpressRoute Gateway entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select azureexpressroutegateway.

azure.expressroutegateway.ErGatewayConnectionBitsInPerSecond bits per second Bits per second ingressing Azure via ExpressRoute Gateway which can be further split for specific connections
azure.expressroutegateway.ErGatewayConnectionBitsOutPerSecond bits per second Bits per second egressing Azure via ExpressRoute Gateway which can be further split for specific connections
azure.expressroutegateway.ExpressRouteGatewayActiveFlows Count Number of Active Flows on ExpressRoute Gateway
azure.expressroutegateway.ExpressRouteGatewayBitsPerSecond bits per second Total Bits received on ExpressRoute Gateway per second
azure.expressroutegateway.ExpressRouteGatewayCountOfRoutesAdvertisedToPeer Count Count Of Routes Advertised To Peer by ExpressRoute Gateway
azure.expressroutegateway.ExpressRouteGatewayCountOfRoutesLearnedFromPeer Count Count Of Routes Learned From Peer by ExpressRoute Gateway
azure.expressroutegateway.ExpressRouteGatewayCpuUtilization Percent (%) CPU Utilization of the ExpressRoute Gateway
azure.expressroutegateway.ExpressRouteGatewayFrequencyOfRoutesChanged Count Frequency of Routes change in ExpressRoute Gateway
azure.expressroutegateway.ExpressRouteGatewayMaxFlowsCreationRate Count per second Maximum Number of Flows Created Per Second on ExpressRoute Gateway
azure.expressroutegateway.ExpressRouteGatewayNumberOfVmInVnet Count Number of VMs in the Virtual Network
azure.expressroutegateway.ExpressRouteGatewayPacketsPerSecond Count per second Total Packets received on ExpressRoute Gateway per second

Files

Metric Units Description
azure.storage.files.availability Percent (%)

Availability. The average percentage of availability for the storage service or the specified API operation. Availability is calculated by taking the total billable requests value and dividing it by the number of applicable requests, including those requests that produced unexpected errors.

azure.storage.files.egress bytes

Egress. The total amount of egress data. This number includes egress from an external client into Azure Storage as well as egress within Azure.

azure.storage.files.file_capacity bytes

FileCapacity. The average amount of file storage used by the storage account.

azure.storage.files.file_count Count

FileCount. The average number of files in the storage account.

azure.storage.files.fileshare_count Count

FileShareCount. The average number of file shares in the storage account.

azure.storage.files.fileshare_quota bytes

FileShareQuota. The average upper limit on the amount of storage that can be used by Azure Files service in bytes.

azure.storage.files.fileshare_snapshotcount Count

FileShareSnapshotCount. The average number of snapshots present on the share in the storage account's Azure Files service.

azure.storage.files.fileshare_snapshotsize bytes

FileShareSnapshotSize. The average amount of storage used by the snapshots in the storage account's Azure Files service.

azure.storage.files.ingress bytes

Ingress. The total amount of ingress data. This number includes ingress from an external client into Azure Storage as well as ingress within Azure.

azure.storage.files.success.e2e_latency milliseconds (ms)

SuccessE2ELatency. The average end-to-end latency of successful requests made to a storage service or the specified API operation. This value includes the required processing time within Azure Storage to read the request, send the response, and receive acknowledgment of the response.

azure.storage.files.success.server_latency milliseconds (ms)

SuccessServerLatency. The average time used to process a successful request by Azure Storage. This value does not include the network latency specified in SuccessE2ELatency.

azure.storage.files.transactions Count

Transactions. The total number of requests made to a storage service or the specified API operation. This number includes successful and failed requests, as well as requests that produced errors.

Front Door

Metric Units Description
azure.frontdoor.backend_health_percentage Percent (%)

BackendHealthPercentage. The average percentage of successful health probes from AFD to origin.

azure.frontdoor.backend_request_count Count

BackendRequestCount. The total number of requests sent from AFD to origin.

azure.frontdoor.backend_request_latency milliseconds (ms)

BackendRequestLatency. The average time calculated from when the request was sent by AFD edge to the backend until AFD received the last response byte from the backend.

azure.frontdoor.billable_response_size bytes

BillableResponseSize. The total number of billable bytes (minimum 2KB per request) sent as responses from HTTP/S proxy to clients.

azure.frontdoor.request_count Count

RequestCount. The total number of client requests served by CDN.

azure.frontdoor.request_size bytes

RequestSize. The total number of bytes sent as requests from clients to AFD.

azure.frontdoor.response_size bytes

ResponseSize. The total number of bytes sent as responses from Front Door to clients.

azure.frontdoor.total_latency milliseconds (ms)

TotalLatency. The average time from the client request being received by CDN until the last response byte is sent from CDN to the client.

azure.frontdoor.web_application_firewall_request_count Count

WebApplicationFirewallRequestCount. The total number of matched WAF requests.

Functions

Metric Units Description
azure.sites.app_connections

Represents the number of active connections established by an application. This metric helps monitor app connectivity and resource usage.

azure.sites.app_domains Count

Total App Domains. The average number of app domains loaded in the application.

azure.sites.app_domains.unloaded Count

Total App Domains Unloaded. The average number of application domains unloaded.

azure.sites.collections.gen1 Count

The number of garbage collection events for Generation 1 objects in an Azure Functions instance. This metric helps assess memory management efficiency.

azure.sites.collections.gen2 Count

The number of garbage collection events for Generation 2 objects in an Azure Functions instance. Higher generation garbage collections include all lower generation collections.

azure.sites.collections.gen3 Count

The number of garbage collection events for Generation 3 objects in an Azure Functions instance.

azure.sites.current_assemblies Count

The number of assemblies currently loaded across all application domains in an Azure Functions instance. This metric helps track application dependencies and runtime behavior.

azure.sites.function_executions Count

Function Execution Count. The total number of times a function app has executed. This value correlates to the number of times a function runs in an app.

azure.sites.function_executions.unit Count

Function Execution Units. The number of function execution units.

azure.sites.handles Count

Tracks the number of open file handles in an Azure Functions environment. This metric helps monitor resource usage and potential file access issues.

azure.sites.http.101 Count

Tracks HTTP 101 responses which indicate protocol switching (for example, upgrading from HTTP to WebSockets).

azure.sites.http.2xx Count

HTTP 2xx responses which indicate successful requests. These status codes confirm that the server successfully processed the request.

azure.sites.http.3xx Count

HTTP 3xx responses which indicate redirection. These status codes signal that the requested resource has moved to a different location.

azure.sites.http.401 Count

HTTP 401 responses which indicate unauthorized access. This occurs when authentication credentials are missing or invalid.

azure.sites.http.403 Count

HTTP 403 responses which indicate forbidden access. This happens when a request is denied due to insufficient permissions or security restrictions.

azure.sites.http.404 Count

HTTP 404 responses which indicate that the requested resource was not found. This can occur when a URL is incorrect or the resource has been removed.

azure.sites.http.406 Count

HTTP 406 responses which indicate that the requested format is not acceptable. This happens when the server cannot provide content in the format specified by the request.

azure.sites.http.4xx Count

HTTP 4xx responses which indicate client-side errors. These errors typically occur due to incorrect requests, authentication failures, or missing resources.

azure.sites.http.5xx Count

HTTP 5xx. The total number of requests with a status code greater than or equal to 500 but less than 600.

azure.sites.io.bytes_received bytes

Bytes Received. The number of incoming data bytes.

azure.sites.io.bytes_sent bytes

Bytes Sent. The number of outgoing data bytes.

azure.sites.io.other_bytes bps

IO Other Bytes Per Second

azure.sites.io.other_ops Count per second

IO Other Operations Per Second

azure.sites.io.read_bytes bps

IO Read Bytes Per Second. The number of bytes per second the app is reading from I/O operations.

azure.sites.io.read_ops Count per second

IO Read Operations Per Second. The number of read I/O operations per second the app is issuing.

azure.sites.io.write_bytes bps

IO Write Bytes Per Second. The number of bytes per second the app is writing to I/O operations.

azure.sites.io.write_ops Count per second

IO Write Operations Per Second. The number of write I/O operations per second the app is issuing.

azure.sites.memory.working_set bytes

Memory Working Set. The average amount of memory used by the app.

azure.sites.memory.working_set.avg bytes

Average Memory Working Set. The average amount of memory used by the app.

azure.sites.private_bytes bytes

Private Bytes. The average number of private bytes allocated to the app.

azure.sites.queued_requests Count

Requests In Application Queue. The average number of requests in the application queue.

azure.sites.requests Count

Requests. The total number of requests.

azure.sites.response_time seconds (s)

Average Response Time. The average time taken for the app to serve requests.

azure.sites.threads Count

The number of active threads in an Azure Functions instance.

Key Vault

Metric Units Description
azure.key_vault.service_api.hit Count

Service API Hit. The total number of service API hits.

azure.key_vault.service_api.latency milliseconds (ms)

Service API Latency. The average latency of service API requests.

azure.key_vault.service_api.result Count

Service API Result. The total number of service API results.

Load Balancer

Name Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Load Balancer entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select azureloadbalancer.

azure.lb.vip_availability Count Average Load Balancer data path availability per time duration.
azure.lb.dip_availability Count Average Load Balancer health probe status per time duration.
azure.lb.bytes bytes Total number of Bytes transmitted within time period.
azure.lb.packets Count Total number of Packets transmitted within time period.
azure.lb.syns Count Total number of SYN Packets transmitted within time period.
azure.lb.snat_connections Count Total number of new SNAT connections created within time period.
azure.lb.allocated_snat_ports Count Total number of SNAT ports allocated within time period.
azure.lb.used_snat_ports Count Total number of SNAT ports used within time period.

Logic Apps

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Logic Apps entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select azurelogicapps.

azure.logic.workflow.ActionLatency seconds (s) Latency of completed workflow actions.
azure.logic.workflow.ActionsCompleted Count Number of workflow actions completed.
azure.logic.workflow.ActionsFailed Count Number of workflow actions failed.
azure.logic.workflow.ActionsSkipped Count Number of workflow actions skipped.
azure.logic.workflow.ActionsStarted Count Number of workflow actions started.
azure.logic.workflow.ActionsSucceeded Count Number of workflow actions succeeded.
azure.logic.workflow.ActionSuccessLatency seconds (s) Latency of succeeded workflow actions.
azure.logic.workflow.ActionThrottledEvents Count Number of workflow action throttled events..
azure.logic.workflow.BillableActionExecutions Count Number of workflow action executions getting billed.
azure.logic.workflow.BillableTriggerExecutions Count Number of workflow trigger executions getting billed.
azure.logic.workflow.BillingUsageNativeOperation Count Number of native operation executions getting billed.
azure.logic.workflow.BillingUsageStandardConnector Count Number of standard connector executions getting billed.
azure.logic.workflow.BillingUsageStorageConsumption Count Number of storage consumption executions getting billed.
azure.logic.workflow.RunFailurePercentage Percent (%) Percentage of workflow runs failed.
azure.logic.workflow.RunLatency seconds (s) Latency of completed workflow runs.
azure.logic.workflow.RunsCancelled Count Number of workflow runs cancelled.
azure.logic.workflow.RunsCompleted Count Number of workflow runs completed.
azure.logic.workflow.RunsFailed Count Number of workflow runs failed.
azure.logic.workflow.RunsStarted Count Number of workflow runs started.
azure.logic.workflow.RunsSucceeded Count Number of workflow runs succeeded.
azure.logic.workflow.RunStartThrottledEvents Count Number of workflow run start throttled events.
azure.logic.workflow.RunSuccessLatency seconds Latency of succeeded workflow runs.
azure.logic.workflow.RunThrottledEvents Count Number of workflow action or trigger throttled events.
azure.logic.workflow.TotalBillableExecutions Count Number of workflow executions getting billed.
azure.logic.workflow.TriggerFireLatency seconds (s) Latency of fired workflow triggers.
azure.logic.workflow.TriggerLatency seconds (s) Latency of completed workflow triggers.
azure.logic.workflow.TriggersCompleted Count Number of workflow triggers completed.
azure.logic.workflow.TriggersFailed Count Number of workflow triggers failed.
azure.logic.workflow.TriggersFired Count Number of workflow triggers fired.
azure.logic.workflow.TriggersSkipped Count Number of workflow triggers skipped.
azure.logic.workflow.TriggersStarted Count Number of workflow triggers started.
azure.logic.workflow.TriggersSucceeded Count Number of workflow triggers succeeded.
azure.logic.workflow.TriggerSuccessLatency seconds (s) Latency of succeeded workflow triggers.
azure.logic.workflow.TriggerThrottledEvents Count Number of workflow trigger throttled events.

MySQL Flexible Server

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of MySQL Flexible Server entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select azuremysqlflexible.

azure.mysql.flexible.aborted_connections Count Aborted Connections.
azure.mysql.flexible.active_connections Count Active Connections.
azure.mysql.flexible.available_memory_bytes bytes Amount of physical memory, in bytes.
azure.mysql.flexible.cpu_percent Percent (%) Host CPU Percent.
azure.mysql.flexible.backup_storage_used bytes Backup Storage Used.
azure.mysql.flexible.binlog_storage_used bytes Storage used by Binlog files.
azure.mysql.flexible.memory_percent Percent (%) Host Memory Percent.
azure.mysql.flexible.network_bytes_egress bytes Host Network egress in bytes.
azure.mysql.flexible.network_bytes_ingress bytes Host Network ingress in bytes.
azure.mysql.flexible.Queries Count Number of queries.
azure.mysql.flexible.Slow_queries Count The number of queries that have taken more than long_query_time seconds.
azure.mysql.flexible.replication_lag seconds (s) Replication lag in seconds.
azure.mysql.flexible.storage_io_count Count The number of storage I/O consumed.
azure.mysql.flexible.storage_limit bytes Storage Limit.
azure.mysql.flexible.storage_used bytes Storage Used.
azure.mysql.flexible.total_connections Count Total Connections.
azure.mysql.flexible.storage_percent Percent (%) Percentage of storage.
azure.mysql.flexible.Threads_running Count The number of threads that are not sleeping.
azure.mysql.flexible.Com_alter_table Count The number of times ALTER TABLE statement has been executed.
azure.mysql.flexible.Com_create_db Count The number of times CREATE DB statement has been executed.
azure.mysql.flexible.Com_create_table Count The number of times CREATE TABLE statement has been executed.
azure.mysql.flexible.Com_delete Count The number of times DELETE statement has been executed.
azure.mysql.flexible.Com_drop_db Count The number of times DROP DB statement has been executed.
azure.mysql.flexible.Com_drop_table Count The number of times DROP TABLE statement has been executed.
azure.mysql.flexible.Com_insert Count The number of times INSERT statement has been executed.
azure.mysql.flexible.Com_select Count The number of times SELECT statement has been executed.
azure.mysql.flexible.Com_update Count The number of times UPDATE statement has been executed.
azure.mysql.flexible.cpu_credits_consumed Count CPU Credits Consumed.
azure.mysql.flexible.cpu_credits_remaining Count CPU Credits Remaining.
azure.mysql.flexible.data_storage_used bytes Storage used by data files.
azure.mysql.flexible.HA_IO_status Count Status for replication IO thread running.
azure.mysql.flexible.HA_replication_lag seconds (s) HA Replication lag in seconds.
azure.mysql.flexible.HA_SQL_status Count Status for replication SQL thread running.
azure.mysql.flexible.ibdata1_storage_used bytes Storage used by ibdata1 files.
azure.mysql.flexible.Innodb_buffer_pool_pages_data Count The number of pages in the InnoDB buffer pool containing data.
azure.mysql.flexible.Innodb_buffer_pool_pages_dirty Count The current number of dirty pages in the InnoDB buffer pool.
azure.mysql.flexible.Innodb_buffer_pool_pages_flushed Count The number of requests to flush pages from the InnoDB buffer pool.
azure.mysql.flexible.Innodb_buffer_pool_pages_free Count The number of free pages in the InnoDB buffer pool.
azure.mysql.flexible.Innodb_buffer_pool_read_requests Count The number of logical read requests.
azure.mysql.flexible.Innodb_buffer_pool_reads Count The number of logical reads that InnoDB could not satisfy from the buffer pool, and had to read directly from disk.
azure.mysql.flexible.Innodb_data_writes Count The total number of data writes.
azure.mysql.flexible.Innodb_row_lock_time milliseconds (ms) The total time spent in acquiring row locks for InnoDB tables, in milliseconds.
azure.mysql.flexible.Innodb_row_lock_waits Count The number of times operations on InnoDB tables had to wait for a row lock.
azure.mysql.flexible.io_consumption_percent Percent (%) Storage I/O consumption percent.
azure.mysql.flexible.others_storage_used bytes Storage used by other files.
azure.mysql.flexible.Replica_IO_Running Count Status for replication IO thread running.
azure.mysql.flexible.Replica_SQL_Running Count Status for replication SQL thread running.
azure.mysql.flexible.serverlog_storage_limit bytes Serverlog Storage Limit.
azure.mysql.flexible.serverlog_storage_percent Percent (%) Serverlog Storage Percent.
azure.mysql.flexible.serverlog_storage_usage bytes Serverlog Storage Used.
azure.mysql.flexible.storage_throttle_count Count Storage IO requests throttled in the selected time range. Deprecated, please check Storage IO Percent for throttling.

NAT Gateway

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of NAT Gateway entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select azurenatgateway.

azure.natgateway.ByteCount bytes Total number of bytes transmitted within time period.
azure.natgateway.DatapathAvailability Count NAT Gateway Datapath Availability.
azure.natgateway.PacketCount Count Total number of Packets transmitted within time period.
azure.natgateway.PacketDropCount Count Count of dropped packets.
azure.natgateway.SNATConnectionCount Count Total concurrent active connections.
azure.natgateway.TotalConnectionCount Count Total number of active SNAT connections.

OpenAI Service

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of OpenAI Service entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select azureopenai.

azure.openai.AzureOpenAIProvisionedManagedUtilizationV2 Percent (%) The utilization of provisioned managed throughput in Azure OpenAI. This metric helps track the efficiency of allocated processing capacity for AI workloads.
azure.openai.AzureOpenAITimeToResponse milliseconds (ms) The time taken for Azure OpenAI to generate a response after receiving a request. This metric is useful for monitoring latency and performance.
azure.openai.TotalEvents Count The total number of events processed by Azure OpenAI, including requests, completions, and other interactions.
azure.openai.AzureOpenAIRequests Count The total number of requests sent to Azure OpenAI, helping monitor usage and workload demand.
azure.openai.ActiveTokens Count The number of active tokens being processed in Azure OpenAI, which can indicate the complexity and scale of ongoing operations.
azure.openai.ProcessedPromptTokens Count The number of prompt tokens processed by Azure OpenAI, helping assess input complexity and resource consumption.
azure.openai.TokenTransaction Count Represents the number of token transactions processed by Azure OpenAI, tracking usage and billing-related metrics.
azure.openai.GeneratedTokens Count The total number of tokens generated by Azure OpenAI models in response to user queries.
azure.openai.FineTunedTrainingHours Count The number of hours spent fine-tuning models in Azure OpenAI, helping monitor resource consumption and optimization.
azure.openai.ClientErrors Count Errors caused by incorrect or invalid requests from users, such as authentication failures or malformed API calls.
azure.openai.ServerErrors Count Errors occurring on the server side, such as internal failures or service outages.
azure.openai.AvailabilityRate Percent (%) The percentage of time Azure OpenAI services are available and operational, helping assess reliability and uptime.

PostgreSQL Flexible Server

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of PostgreSQL Flexible Server entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select azurepostgresqlflexible.

azure.postgresql.flexible.active_connections Count Active Connections.
azure.postgresql.flexible.backup_storage_used bytes Backup Storage Used.
azure.postgresql.flexible.client_connections_active Count Connections from clients which are associated with a PostgreSQL connection.
azure.postgresql.flexible.connections_failed Count Failed Connections.
azure.postgresql.flexible.connections_succeeded Count Succeeded Connections.
azure.postgresql.flexible.cpu_percent Percent (%) CPU percent.
azure.postgresql.flexible.cpu_credits_consumed Count Total number of credits consumed by the database server.
azure.postgresql.flexible.disk_queue_depth Count Number of outstanding I/O operations to the data disk.
azure.postgresql.flexible.iops Count IO Operations per second.
azure.postgresql.flexible.is_db_alive Count Indicates if the database is up or not.
azure.postgresql.flexible.memory_percent Percent (%) Memory percent.
azure.postgresql.flexible.maximum_used_transactionIDs Count Maximum Used Transaction IDs.
azure.postgresql.flexible.network_bytes_egress bytes Network Out across active connections.
azure.postgresql.flexible.network_bytes_ingress bytes Network In across active connections.
azure.postgresql.flexible.read_iops Count Number of data disk I/O read operations per second.
azure.postgresql.flexible.read_throughput Count bytes read per second from the data disk during monitoring period.
azure.postgresql.flexible.server_connections_active Count Connections to PostgreSQL that are in use by a client connection.
azure.postgresql.flexible.storage_free bytes Storage Free.
azure.postgresql.flexible.storage_percent Percent (%) Storage percent.
azure.postgresql.flexible.storage_used bytes Storage used.
azure.postgresql.flexible.write_iops Count Number of data disk I/O write operations per second.
azure.postgresql.flexible.write_throughput Count bytes written per second to the data disk during monitoring period.
azure.postgresql.flexible.xact_total Count Number of total transactions executed in this database.
azure.postgresql.flexible.analyze_count_user_tables Count Number of times user only tables have been manually analyzed in this database.
azure.postgresql.flexible.autoanalyze_count_user_tables Count Number of times user only tables have been analyzed by the autovacuum daemon in this database.
azure.postgresql.flexible.autovacuum_count_user_tables Count Number of times user only tables have been vacuumed by the autovacuum daemon in this database.
azure.postgresql.flexible.blks_hit Count Number of times disk blocks were found already in the buffer cache, so that a read was not necessary.
azure.postgresql.flexible.blks_read Count Number of disk blocks read in this database.
azure.postgresql.flexible.bloat_percent Percent (%) Estimated bloat percentage for user only tables in this database.
azure.postgresql.flexible.client_connections_waiting Count Connections from clients that are waiting for a PostgreSQL connection to service them.
azure.postgresql.flexible.cpu_credits_remaining Count Total number of credits available to burst.
azure.postgresql.flexible.deadlocks Count Number of deadlocks detected in this database.
azure.postgresql.flexible.disk_bandwidth_consumed_percentage Percent (%) Percentage of disk bandwidth consumed per minute.
azure.postgresql.flexible.disk_iops_consumed_percentage Percent (%) Percentage of disk I/Os consumed per minute.
azure.postgresql.flexible.logical_replication_delay_in_bytes bytes Maximum lag across all logical replication slots.
azure.postgresql.flexible.longest_query_time_sec seconds (s) The age in seconds of the longest query that is currently running.
azure.postgresql.flexible.longest_transaction_time_sec seconds (s) The age in seconds of the longest transaction (including idle transactions).
azure.postgresql.flexible.max_connections Count Max connections.
azure.postgresql.flexible.n_dead_tup_user_tables Count Estimated number of dead rows for user only tables in this database.
azure.postgresql.flexible.n_live_tup_user_tables Count Estimated number of live rows for user only tables in this database.
azure.postgresql.flexible.n_mod_since_analyze_user_tables Count Estimated number of rows modified since user only tables were last analyzed.
azure.postgresql.flexible.num_pools Count Total number of connection pools.
azure.postgresql.flexible.numbackends Count Number of backends connected to this database.
azure.postgresql.flexible.oldest_backend_time_sec seconds (s) The age in seconds of the oldest backend (irrespective of the state).
azure.postgresql.flexible.oldest_backend_xmin Count The actual value of the oldest xmin.
azure.postgresql.flexible.oldest_backend_xmin_age Count Age in units of the oldest xmin. It indicated how many transactions passed since oldest xmin.
azure.postgresql.flexible.physical_replication_delay_in_bytes bytes Maximum lag across all asynchronous physical replication slots.
azure.postgresql.flexible.physical_replication_delay_in_seconds seconds (s) Read Replica lag in seconds.
azure.postgresql.flexible.server_connections_idle Count Connections to PostgreSQL that are idle, ready to service a new client connection.
azure.postgresql.flexible.sessions_by_state Count Overall state of the backends.
azure.postgresql.flexible.sessions_by_wait_event_type Count Sessions by the type of event for which the backend is waiting.
azure.postgresql.flexible.tables_analyzed_user_tables Count Number of user only tables that have been analyzed in this database.
azure.postgresql.flexible.tables_autoanalyzed_user_tables Count Number of user only tables that have been analyzed by the autovacuum daemon in this database.
azure.postgresql.flexible.tables_autovacuumed_user_tables Count Number of user only tables that have been vacuumed by the autovacuum daemon in this database.
azure.postgresql.flexible.tables_counter_user_tables Count Number of user only tables in this database.
azure.postgresql.flexible.tables_vacuumed_user_tables Count Number of user only tables that have been vacuumed in this database.
azure.postgresql.flexible.temp_bytes bytes Total amount of data written to temporary files by queries in this database.
azure.postgresql.flexible.temp_files Count Number of temporary files created by queries in this database.
azure.postgresql.flexible.total_pooled_connections Count Current number of pooled connections.
azure.postgresql.flexible.tps Count Number of transactions executed within a second.
azure.postgresql.flexible.tup_deleted Count Number of rows deleted by queries in this database.
azure.postgresql.flexible.tup_fetched Count Number of rows fetched by queries in this database.
azure.postgresql.flexible.tup_inserted Count Number of rows inserted by queries in this database.
azure.postgresql.flexible.tup_returned Count Number of rows returned by queries in this database.
azure.postgresql.flexible.tup_updated Count Number of rows updated by queries in this database.
azure.postgresql.flexible.txlogs_storage_used bytes Transaction Log Storage Used.
azure.postgresql.flexible.vacuum_count_user_tables Count Number of times user only tables have been manually vacuumed in this database (not counting VACUUM FULL).
azure.postgresql.flexible.xact_commit Count Number of transactions in this database that have been committed.
azure.postgresql.flexible.xact_rollback Count Number of transactions in this database that have been rolled back.

Recovery Services

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Recovery Services entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select azurerecoveryservice.

azure.recovery.service.backup_health_event Count The count of health events related to backup job health within a specific time frame. When a backup job completes, Azure Backup generates a backup health event, with dimensions varying based on the job status (for example, succeeded or failed).
azure.recovery.service.restore_health_event Count The count of health events related to restore job health within a specific time frame. When a restore job completes, Azure Backup generates a restore health event, with dimensions varying based on the job status (for example, succeeded or failed).

Service Bus

Metric Units Description
azure.servicebus.namespaces.abandon_message Count

AbandonMessage. The total number of messages abandoned over a specified period.

azure.servicebus.namespaces.active_connections Count

ActiveConnections. The total number of active connections on a namespace and on an entity in the namespace. The value for this metric is a point-in-time value. Connections that were active immediately after that point in time may not be reflected in the metric.

azure.servicebus.namespaces.active_messages Count

ActiveMessages. The average number of active messages in a queue/topic.

azure.servicebus.namespaces.complete_message Count

CompleteMessage. The total number of messages completed over a specified period.

azure.servicebus.namespaces.connections_closed Count

ConnectionsClosed. The average number of connections closed. The value for this metric is an aggregation and includes all connections that were opened in the aggregation time window.

azure.servicebus.namespaces.connections_opened Count

ConnectionsOpened. The average number of connections opened. The value for this metric is an aggregation and includes all connections that were opened in the aggregation time window.

azure.servicebus.namespaces.deadlettered_messages Count

DeadletteredMessages. The average number of dead-lettered messages in a queue/topic.

azure.servicebus.namespaces.incoming_messages Count

IncomingMessages. The total number of events or messages sent to Service Bus over a specified period. For basic and standard tiers, incoming auto-forwarded messages are included in this metric. For the premium tier, they aren't included.

azure.servicebus.namespaces.incoming_requests Count

IncomingRequests. The total number of requests made to the Service Bus service over a specified period.

azure.servicebus.namespaces.messages Count

Messages. The average number of messages in a queue/topic.

azure.servicebus.namespaces.namespace_cpu_usage Percent (%)

The percentage of CPU used by premium namespaces.

azure.servicebus.namespaces.outgoing_messages Count

OutgoingMessages. The total number of events or messages received from Service Bus over a specified period. The outgoing auto-forwarded messages aren't included in this metric.

azure.servicebus.namespaces.pending_checkpoint_operation_count Count

PendingCheckpointOperationCount. The average number of pending checkpoint operations on the namespace. Service starts to throttle when the pending checkpoint count exceeds limit of (500,000 + (500,000 * messaging units)) operations. This metric applies only to namespaces using the premium tier.

azure.servicebus.namespaces.scheduled_messages Count

ScheduledMessages. The average number of scheduled messages in a queue/topic.

azure.servicebus.namespaces.server_errors Count

ServerErrors. The total number of requests not processed because of an error in the Service Bus service over a specified period.

azure.servicebus.namespaces.server_send_latency milliseconds (ms)

ServerSendLatency. The average time taken by the Service Bus service to complete the request.

azure.servicebus.namespaces.size bytes

Size. The average size of an entity (queue or topic) in bytes.

azure.servicebus.namespaces.successful_requests Count

SuccessfulRequests. The total number of successful requests made to the Service Bus service over a specified period.

azure.servicebus.namespaces.throttled_requests Count

ThrottledRequests. The total number of requests that were throttled because the usage was exceeded.

azure.servicebus.namespaces.user_errors Count

UserErrors. The total number of requests not processed because of user errors over a specified period.

SQL Database

Metric Units Description
azure.sql.servers.databases.allocated_data_storage bytes

The amount of formatted file space allocated for storing database data. This space grows automatically but does not decrease after data deletions, ensuring faster future inserts.

azure.sql.servers.databases.blocked_by_firewall Count

The number of connection attempts that were blocked due to firewall rules in Azure SQL Database. This helps monitor access control and security settings.

azure.sql.servers.databases.connection_failed Count

Failed Connections. The total number of connections that failed.

azure.sql.servers.databases.connection_successful Count

Successful Connections. The total number of successful connections.

azure.sql.servers.databases.cpu_percent Percent (%)

CPU Utilization. The average percentage of CPU used.

azure.sql.servers.databases.deadlock Count

Deadlocks. The total number of deadlocks.

azure.sql.servers.databases.dtu_consumption_percent Percent (%)

The percentage of Database Transaction Units (DTUs) consumed relative to the allocated DTU limit. DTUs represent a blend of CPU, memory, reads, and writes, helping gauge database performance.

azure.sql.servers.databases.dtu_limit Count

The maximum number of DTUs allocated to a database. This limit determines the available compute, storage, and I/O resources for the database.

azure.sql.servers.databases.dtu_used Count

The actual number of DTUs consumed by the database workload. This metric helps assess resource utilization and performance.

azure.sql.servers.databases.log_write_percent Percent (%)

Log Write Percentage. The average log I/O percentage based on the limit of the service tier.

azure.sql.servers.databases.physical_data_read_percent Percent (%)

Data IO Percentage. The average data I/O percentage based on the limit of the service tier.

azure.sql.servers.databases.rateOfConnectionFailure Percent (%)

The rate of failed connection attempts to an Azure SQL Database. Connection failures can occur due to firewall rules, authentication issues, or transient network errors.

azure.sql.servers.databases.sessions_percent Percent (%)

Sessions Percentage. The average percentage of concurrent sessions based on the limit of the service tier.

azure.sql.servers.databases.storage bytes

Data Space Used. The total amount of space used to store data.

azure.sql.servers.databases.storage_percent Percent (%)

Storage Utilization. The average percentage of spaced used to store data based on the limit of the service tier.

azure.sql.servers.databases.workers_percent Percent (%)

The percentage of available worker threads being utilized in an Azure SQL Database. High worker utilization can indicate performance bottlenecks, excessive concurrent queries, or inefficient query execution.

azure.sql.servers.databases.xtp_storage_percent Percent (%)

The percentage of In-Memory OLTP storage used in an Azure SQL Database. This metric is relevant for databases using memory-optimized tables and helps monitor available memory for in-memory processing.

Traffic Manager

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Traffic Manager entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select azuretrafficmanager.

azure.trafficmanager.ProbeAgentCurrentEndpointStateByProfileResourceId Count Status of the endpoint’s probe. 1 indicates the probe is enabled, 0 indicates the probe is disabled.
azure.trafficmanager.QpsByEndpoint Count Number of times a Traffic Manager endpoint was returned.

Virtual Machines

Metric Units Description
azure.vm.cpu.credits_consumed credits

Total number of credits consumed by the Virtual Machine

azure.vm.cpu.credits_remaining credits

Total number of credits available to burst

azure.vm.cpu.percentage Percent (%)

The percentage of allocated compute units that are currently in use by the Virtual Machine(s)

azure.vm.disk.cache.data.read_hit Percent (%)

The number of successful read operations from the data disk cache. A higher value indicates efficient caching performance.

azure.vm.disk.cache.data.read_miss Percent (%)

The number of read operations that were not found in the data disk cache, requiring retrieval from the underlying storage.

azure.vm.disk.cache.os.read_hit Percent (%)

The number of successful read operations from the OS disk cache, improving performance by reducing direct disk access.

azure.vm.disk.cache.os.read_miss Percent (%)

The number of read operations that were not found in the OS disk cache, leading to additional disk access.

azure.vm.disk.data.bandwidth.consumed.percentage Percent (%)

The percentage of allocated bandwidth consumed by data disk operations. A high percentage may suggest bandwidth saturation.

azure.vm.disk.data.iops.consumed.percentage Percent (%)

The percentage of allocated IOPS (Input/Output Operations Per Second) consumed by data disk operations. A high percentage may indicate performance bottlenecks.

azure.vm.disk.data.max.burst.bandwidth Count

The maximum bandwidth that a data disk can achieve when bursting is enabled. This allows temporary performance boosts beyond the provisioned limits.

azure.vm.disk.data.max.burst.iops Count

The maximum IOPS (Input/Output Operations Per Second) a data disk can reach during burst periods. This helps handle short-term spikes in workload demand.

azure.vm.disk.data.queue_depth Count

The number of outstanding I/O requests waiting to be processed by the data disk. A higher queue depth may indicate disk contention or performance bottlenecks.

azure.vm.disk.data.read_bytes bps

The total number of bytes read from the data disk over a given period. This metric helps monitor disk read performance and workload patterns.

azure.vm.disk.data.read_ops Count per second

The total number of read operations performed on the data disk. This metric is useful for analyzing disk activity and optimizing performance.

azure.vm.disk.data.target.bandwidth Count

The expected bandwidth allocation for a data disk based on its provisioned performance tier. This helps ensure consistent throughput for workloads.

azure.vm.disk.data.target.iops Count

Represents the expected IOPS (Input/Output Operations Per Second) allocation for a data disk based on its provisioned performance tier. This helps ensure consistent disk performance.

azure.vm.disk.data.used.burst.bps.credits.percentage Percent (%) The percentage of burst bandwidth credits used by a data disk. Azure premium disks allow temporary performance bursts beyond provisioned limits, and this metric helps track credit consumption.
azure.vm.disk.data.used.burst.io.credits.percentage Percent (%) The percentage of burst IOPS credits used by a data disk. This metric helps monitor how much of the available burst capacity has been consumed.
azure.vm.disk.data.write_bytes bps

The total number of bytes written to the data disk over a given period. This metric helps monitor disk write performance and workload patterns.

azure.vm.disk.data.write_ops Count per second

The total number of write operations performed on the data disk. This metric is useful for analyzing disk activity and optimizing performance.

azure.vm.disk.os.bandwidth.consumed.percentage Percent (%)

The percentage of allocated bandwidth consumed by the OS disk. A high percentage may indicate bandwidth saturation.

azure.vm.disk.os.iops.consumed.percentage Percent (%)

The percentage of allocated IOPS (Input/Output Operations Per Second) consumed by the OS disk. A high percentage may suggest performance bottlenecks.

azure.vm.disk.os.max.burst.bandwidth Count

The maximum bandwidth that the OS disk can achieve when bursting is enabled. This allows temporary performance boosts beyond the provisioned limits.

azure.vm.disk.os.max.burst.iops Count

The maximum IOPS the OS disk can reach during burst periods. This helps handle short-term spikes in workload demand.

azure.vm.disk.os.queue_depth Count

The number of outstanding I/O requests waiting to be processed by the OS disk. A higher queue depth may indicate disk contention or performance bottlenecks.

azure.vm.disk.os.read_bytes bps

The total number of bytes read from the OS disk over a given period. This metric helps monitor disk read performance and workload patterns.

azure.vm.disk.os.read_ops Count per second

The total number of read operations performed on the OS disk. This metric helps monitor disk activity and performance.

azure.vm.disk.os.target.bandwidth Count

The expected bandwidth allocation for the OS disk based on its provisioned performance tier. This helps ensure consistent throughput for workloads.

azure.vm.disk.os.target.iops Count

The expected IOPS (Input/Output Operations Per Second) allocation for the OS disk based on its provisioned performance tier. This helps maintain stable disk performance.

azure.vm.disk.os.used.burst.bps.credits.percentage Percent (%) The percentage of burst bandwidth credits used by the OS disk. Azure premium disks allow temporary performance bursts beyond provisioned limits, and this metric helps track credit consumption.
azure.vm.disk.os.used.burst.io.credits.percentage Percent (%) The percentage of burst IOPS credits used by the OS disk. This metric helps monitor how much of the available burst capacity has been consumed.
azure.vm.disk.os.write_bytes bps

The total number of bytes written to the OS disk over a given period. This metric helps monitor disk write performance and workload patterns.

azure.vm.disk.os.write_ops Count per second

The total number of write operations performed on the OS disk. This metric helps monitor disk activity and performance.

azure.vm.disk.read_bytes bytes

Bytes read from disk during monitoring period

azure.vm.disk.read_ops Count per second

Disk Read IOPS.

azure.vm.disk.write_bytes bytes

Bytes written to disk during monitoring period.

azure.vm.disk.write_ops Count per second

Disk Write IOPS.

azure.vm.memory.available_bytes bytes

The amount of physical memory, in bytes, immediately available for allocation to a process or for system use in the virtual machine.

azure.vm.network.in bytes

The number of billable bytes received on all network interfaces by the Virtual Machine(s) (Incoming Traffic).

azure.vm.network.inbound_flows Count

The number of inbound network flows to the virtual machine. This metric helps monitor network traffic and connection patterns.

azure.vm.network.inbound_flows_maximum_creation_rate Count per second

The maximum rate at which inbound network flows are created for the virtual machine. This metric helps assess network performance and connection handling.

azure.vm.network.out bytes

The total amount of outbound network traffic from the virtual machine. This metric helps monitor bandwidth usage and network performance.

azure.vm.network.outbound_flows Count

The number of outbound network flows from the virtual machine. This metric helps monitor outgoing connections and network activity.

azure.vm.network.outbound_flows_maximum_creation_rate Count per second

The maximum rate at which outbound network flows are created for the virtual machine. This metric helps assess network performance and connection handling.

azure.vm.network.total_in bytes

The total amount of inbound network traffic received by the virtual machine. This metric helps monitor bandwidth usage and network performance.

azure.vm.network.total_out bytes

the total amount of outbound network traffic sent from the virtual machine. This metric helps monitor outgoing bandwidth usage and network efficiency.

Virtual Machine Scale Sets

Metric Units Description
azure.vmss.cpu.credits_consumed Count The total number of CPU credits consumed by a Virtual Machine Scale Set (VMSS) instance. This metric is relevant for B-series burstable VMs, which use a credit-based system to manage CPU performance.
azure.vmss.cpu.percentage Percent (%)

Percentage CPU. The percentage of allocated compute units that are currently in use by the VM(s).

azure.vmss.cpu.credits_remaining Count The total number of CPU credits available for a VMSS instance to use for bursting. When credits run out, the VM operates at its baseline performance level.
azure.vmss.disk.cache.data.read_hit Percent (%)

The number of successful read operations from the data disk cache. A higher value indicates efficient caching performance.

azure.vmss.disk.cache.data.read_miss Percent (%)

The number of read operations that were not found in the data disk cache, requiring retrieval from the underlying storage.

azure.vmss.disk.cache.os.read_hit Percent (%)

The number of successful read operations from the OS disk cache, improving performance by reducing direct disk access.

azure.vmss.disk.cache.os.read_miss Percent (%)

The number of read operations that were not found in the OS disk cache, leading to additional disk access.

azure.vmss.disk.data.queue_depth Count

The number of outstanding I/O requests waiting to be read from or written to the data disk in a Virtual Machine Scale Set (VMSS). A higher queue depth may indicate disk contention or performance bottlenecks.

azure.vmss.disk.data.read_bytes bps

Data Disk Read. The average number of bytes per second read from a single disk during the monitoring period.

azure.vmss.disk.data.read_ops Count per second

The total number of read operations performed on the data disk in a VMSS. This metric helps monitor disk activity and performance.

azure.vmss.disk.data.write_bytes bps

Data Disk Write. The average number of bytes per second written to a single disk during the monitoring period.

azure.vmss.disk.data.write_ops Count per second

The total number of write operations performed on the data disk in a VMSS. This metric is useful for analyzing disk activity and optimizing performance.

azure.vmss.disk.os.queue_depth Count

The number of outstanding I/O requests waiting to be read from or written to the OS disk in a VMSS. A higher queue depth may indicate disk contention or performance bottlenecks.

azure.vmss.disk.os.read_bytes bps

The total number of bytes read from the OS disk in a VMSS over a given period. This metric helps monitor disk read performance and workload patterns.

azure.vmss.disk.os.read_ops Count per second

The total number of read operations performed on the OS disk in a VMSS. This metric provides insight into overall disk activity and utilization.

azure.vmss.disk.os.write_bytes bps

Tracks the total number of bytes written to the OS disk in a Virtual Machine Scale Set (VMSS). This metric helps monitor disk write performance and workload patterns.

azure.vmss.disk.os.write_ops Count per second

The total number of write operations performed on the OS disk in a VMSS. This metric provides insight into overall disk activity and utilization.

azure.vmss.disk.read_bytes bytes

Disk Read. The total number of bytes read from disk during the monitoring period.

azure.vmss.disk.read_ops Count per second

Disk Read Operations. The average number of input operations read in a second from all disks attached to the VM(s).

azure.vmss.disk.write_bytes bytes

Disk Write. The total number of bytes written to disk during the monitoring period.

azure.vmss.disk.write_ops Count per second

Disk Write Operations. The average number of output operations written in a second to all disks attached to the VM(s).

azure.vmss.memory.available_bytes bytes

Available Memory Bytes. The amount of physical memory, in bytes, immediately available for allocation to a process or for system use in the VM(s).

azure.vmss.network.inbound_flows Count

The number of inbound network flows to a VMSS instance. This metric helps monitor network traffic and connection patterns.

azure.vmss.network.inbound_flows_maximum_creation_rate Count per second

The maximum rate at which inbound network flows are created for a VMSS instance. This metric helps assess network performance and connection handling.

azure.vmss.network.outbound_flows Count

The number of outbound network flows from a VMSS instance. This metric helps monitor outgoing connections and network activity.

azure.vmss.network.outbound_flows_maximum_creation_rate Count per second

The maximum rate at which outbound network flows are created for a VMSS instance. This metric helps assess network performance and connection handling.

azure.vmss.network.total_in bytes

Network In Total. The number of bytes received on all network interfaces by the VM(s) (incoming traffic).

azure.vmss.network.total_out bytes

Network Out Total. The number of bytes out on all network interfaces by the VM(s) (outgoing traffic).

vNet

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Virtual Network entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select azurevirtualnetwork.

azure.virtualnetwork.BytesDroppedDDoS bps The maximum inbound bytes dropped per second, DDoS.
azure.virtualnetwork.BytesForwardedDDoS bps The maximum inbound bytes forwarded per second, DDoS.
azure.virtualnetwork.BytesInDDoS bps The maximum inbound bytes per second, DDoS.
azure.virtualnetwork.DDoSTriggerSYNPackets Count per second The maximum inbound SYN packets per second to trigger DDoS mitigation.
azure.virtualnetwork.DDoSTriggerTCPPackets Count per second The maximum inbound TCP packets per second to trigger DDoS mitigation.
azure.virtualnetwork.DDoSTriggerUDPPackets Count per second The maximum inbound UDP packets per second to trigger DDoS mitigation.
azure.virtualnetwork.IfUnderDDoSAttack Count The maximum entities under DDoS attack.
azure.virtualnetwork.PacketsDroppedDDoS Count per second The maximum inbound packets dropped per second, DDoS.
azure.virtualnetwork.PacketsForwardedDDoS Count per second The maximum inbound packets forwarded per second, DDoS.
azure.virtualnetwork.PacketsInDDoS Count per second The maximum inbound packets per second, DDoS.
azure.virtualnetwork.PingMeshAverageRoundtripMs milliseconds (ms) The average round trip time for Pings sent to a destination VM.
azure.virtualnetwork.PingMeshProbesFailedPercent Percent (%) Of the total number of pings sent to a destination VM, the average percentage of pings that failed.
azure.virtualnetwork.TCPBytesDroppedDDoS bps The maximum inbound TCP bytes dropped per second, DDoS.
azure.virtualnetwork.TCPBytesForwardedDDoS bps The maximum inbound TCP bytes forwarded per second, DDoS.
azure.virtualnetwork.TCPBytesInDDoS bps The Maximum inbound TCP bytes per second, DDoS.
azure.virtualnetwork.TCPPacketsDroppedDDoS Count per second The maximum inbound TCP packets dropped per second, DDoS.
azure.virtualnetwork.TCPPacketsForwardedDDoS Count per second The Maximum inbound TCP packets forwarded per second, DDoS.
azure.virtualnetwork.TCPPacketsInDDoS Count per second The maximum inbound TCP packets per second, DDoS.
azure.virtualnetwork.UDPBytesDroppedDDoS bps The maximum inbound UDP bytes dropped per second, DDoS.
azure.virtualnetwork.UDPBytesForwardedDDoS bps The maximum inbound UDP bytes forwarded per second, DDoS.
azure.virtualnetwork.UDPBytesInDDoS bps The maximum inbound UDP bytes per second, DDoS.
azure.virtualnetwork.UDPPacketsDroppedDDoS Count per second The maximum inbound UDP packets dropped per second, DDoS.
azure.virtualnetwork.UDPPacketsForwardedDDoS Count per second The maximum inbound UDP packets forwarded per second, DDoS
azure.virtualnetwork.UDPPacketsInDDoS Count per second The maximum inbound UDP packets per second, DDoS.

VPN Gateway

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of VPN Gateway entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select azurevpngateway.

azure.vpngateway.AverageBandwidth bps Site-to-site bandwidth of a gateway in bytes per second.
azure.vpngateway.BgpPeerStatus Count Status of BGP peer.
azure.vpngateway.BgpRoutesAdvertised Count Count of Bgp Routes Advertised through tunnel.
azure.vpngateway.BgpRoutesLearned Count Count of Bgp Routes Learned through tunnel.
azure.vpngateway.MmsaCount Count MMSA Count.
azure.vpngateway.QmsaCount Count QMSA Count.
azure.vpngateway.TunnelAverageBandwidth bps Average bandwidth of a tunnel in bytes per second.
azure.vpngateway.TunnelEgressBytes bytes Outgoing bytes of a tunnel.
azure.vpngateway.TunnelEgressPacketDropCount Count Count of outgoing packets dropped by tunnel.
azure.vpngateway.TunnelEgressPacketDropTSMismatch Count Outgoing packet drop count from traffic selector mismatch of a tunnel.
azure.vpngateway.TunnelEgressPackets Count Outgoing packet count of a tunnel.
azure.vpngateway.TunnelIngressBytes bytes Incoming bytes of a tunnel.
azure.vpngateway.TunnelIngressPacketDropCount Count Count of incoming packets dropped by tunnel.
azure.vpngateway.TunnelIngressPacketDropTSMismatch Count Incoming packet drop count from traffic selector mismatch of a tunnel.
azure.vpngateway.TunnelIngressPackets Count Incoming packet count of a tunnel.
azure.vpngateway.TunnelNatAllocations Count Count of allocations for a NAT rule on a tunnel.
azure.vpngateway.TunnelNatedBytes bytes Number of bytes that were NATed on a tunnel by a NAT rule.
azure.vpngateway.TunnelNatedPackets Count Number of packets that were NATed on a tunnel by a NAT rule.
azure.vpngateway.TunnelNatFlowCount Count Number of NAT flows on a tunnel by flow type and NAT rule.
azure.vpngateway.TunnelNatPacketDrop Count Number of NATed packets on a tunnel that dropped by drop type and NAT rule.
azure.vpngateway.TunnelPeakPackets Count Tunnel Peak Packets Per Second.
azure.vpngateway.TunnelReverseNatedBytes bytes Number of bytes that were reverse NATed on a tunnel by a NAT rule.
azure.vpngateway.TunnelReverseNatedPackets Count Number of packets on a tunnel that were reverse NATed by a NAT rule.
azure.vpngateway.TunnelTotalFlowCount Count Total flow count on a tunnel.
azure.vpngateway.VnetAddressPrefixCount Count Count of Vnet address prefixes behind gateway.

Infrastructure/Google Cloud Platform metrics

Metrics for GCP entities are collected by integrating SolarWinds Observability SaaS with your Google cloud account. See Google Cloud Platform monitoring.

Google Compute Engine (GCE)

Metric Units Description
sw.metrics.healthscore Percent (%)

Health state. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of GCE entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select gce.

gcp.compute.googleapis.com.instance.cpu.guestVisibleVcpus Count Number of vCPUs visible and available inside the guest VM instance.
gcp.compute.googleapis.com.instance.cpu.reservedCores Count Number of reserved vCPUs on a host.
gcp.compute.googleapis.com.instance.cpu.schedulerWaitTime seconds (s) Time a vCPU spends in the ready state but not scheduled to run.
gcp.compute.googleapis.com.instance.cpu.usageTime seconds (s) vCPU usage per instance for a time interval in vCPU-seconds.
gcp.compute.googleapis.com.instance.cpu.utilization Percent (%) CPU utilization as a fraction of the total allocated capacity (0.0-1.0).
gcp.compute.googleapis.com.instance.disk.averageIOLatency Microseconds Average latency of disk I/O operations over the past minute.
gcp.compute.googleapis.com.instance.disk.averageIOQueueDepth Count Average number of I/O requests in the disk's queue over the past minute.
gcp.compute.googleapis.com.instance.disk.maxReadBytesCount bytes Max per-second read throughput of a Google Cloud Persistent Disk.
gcp.compute.googleapis.com.instance.disk.maxReadOpsCount Count Max number of read requests per second on a Google Cloud Storage bucket.
gcp.compute.googleapis.com.instance.disk.maxWriteBytesCount bytes Maximum per-second written bytes to a Google Cloud Storage bucket.
gcp.compute.googleapis.com.instance.disk.maxWriteOpsCount Count Maximum number of per-second write requests on a Google Cloud Storage bucket.
gcp.compute.googleapis.com.instance.disk.performanceStatus Count Disk read throughput, in bytes per second, for a Compute Engine instance.
gcp.compute.googleapis.com.instance.disk.provisioningIOPS Count User-specified provisioned IOPS for a Google Cloud Platform Persistent Disk.
gcp.compute.googleapis.com.instance.disk.provisioningSize bytes Size in bytes of a Google Cloud Storage bucket's disk specified by the user.
gcp.compute.googleapis.com.instance.disk.provisioningThroughput Count Rate at which bytes are written to a Google Cloud Storage bucket's disk.
gcp.compute.googleapis.com.instance.disk.readBytesCount bytes Count of bytes read from disk per sample period.
gcp.compute.googleapis.com.instance.disk.readOpsCount Count Disk read I/O operation count.
gcp.compute.googleapis.com.instance.disk.writeBytesCount bytes Bytes written per minute to a Google Cloud Storage bucket.
gcp.compute.googleapis.com.instance.disk.writeOpsCount Count Disk writes IO count.
gcp.compute.googleapis.com.instance.globalDNS.requestCount Count Number of global internal DNS queries from a Google Compute Engine VM.
gcp.compute.googleapis.com.instance.gpu.infraHealth Count VM instance GPU infrastructure health status.
gcp.compute.googleapis.com.instance.gpu.packetRetransmissionCount Count Packet retransmission count observed by GPU NICs per timestamp.
gcp.compute.googleapis.com.instance.gpu.throughputRxBytes bytes Network traffic in bytes per minute for GPU VM machine types.
gcp.compute.googleapis.com.instance.gpu.throughputTxBytes bytes Network throughput (bytes per minute) for GPU VM machine types.
gcp.compute.googleapis.com.instance.integrity.earlyBootValidationStatus Count Early boot integrity policy validation status.
gcp.compute.googleapis.com.instance.integrity.lateBootValidationStatus Count Validation status of late boot integrity policy.
gcp.compute.googleapis.com.instance.interruptionCount Count Current count of interruptions by type and reason.
gcp.compute.googleapis.com.instance.memory.balloonRamSize bytes Total memory allocated to a VM instance.
gcp.compute.googleapis.com.instance.memory.balloonRamUsed bytes Current memory usage on a VM instance.
gcp.compute.googleapis.com.instance.memory.balloonSwapInBytesCount bytes Memory usage from swap space.
gcp.compute.googleapis.com.instance.memory.balloonSwapOutBytesCount bytes Memory written to swap space.
gcp.compute.googleapis.com.instance.network.receivedBytesCount bytes Bytes received over the network.
gcp.compute.googleapis.com.instance.network.receivedPacketsCount Count Packets received count.
gcp.compute.googleapis.com.instance.network.sentBytesCount bytes Bytes transmitted over a network interface.
gcp.compute.googleapis.com.instance.network.sentPacketsCount Count Number of packets transmitted.
gcp.compute.googleapis.com.instance.uptime seconds (s) Instance deltaUpTime.
gcp.compute.googleapis.com.instance.uptimeTotal seconds (s) Time elapsed since a VM instance was started.

Google Cloud Storage (GCS)

Metric Units Description
sw.metrics.healthscore Percent (%)

Health state. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of GCS entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select gcs.

gcp.storage.googleapis.com.anywhereCache.ingestedBytesCount bytes Increase in total raw bytes ingested into Cloud Memorystore per second, by zone.
gcp.storage.googleapis.com.anywhereCache.requestCount Count The number of API call events per minute, categorized by API method, response code, cache zone, and cache hit status.
gcp.storage.googleapis.com.anywhereCache.sentBytesCount bytes The number of bytes transmitted per API method, response code, cache zone, and cache hit status in a 60-second period.
gcp.storage.googleapis.com.anywhereCacheMetering.cacheStorageBytesCount bytes Total count of bytes stored in caches across all zones.
gcp.storage.googleapis.com.anywhereCacheMetering.cacheStorageKbsecCount Kibibytes Changes in storage size in KiB (1024 bytes) for each Anywhere Cache zone over a time interval, in seconds.
gcp.storage.googleapis.com.anywhereCacheMetering.evictionByteCount bytes Change in bytes evicted from cache per zone.
gcp.storage.googleapis.com.anywhereCacheMetering.ingestedBillableBytesCount bytes The number of new, successfully ingested billable bytes per zone in Google Cloud Memorystore Cache.
gcp.storage.googleapis.com.api.lroCount Count Number of long-running operation completions.
gcp.storage.googleapis.com.api.requestCount Count Number of API call differences per method and response code within a 60-second window.
gcp.storage.googleapis.com.authn.authentication Count Number umber of authenticated requests per 60 second interval, grouped by authentication method and access ID.
gcp.storage.googleapis.com.authz.ACLBasedObjectAccessCount Count Number of access grants issued based on an object's Access Control List within a sampling window.
gcp.storage.googleapis.com.authz.ACLOperationsCount Count Aggregated number of Create, Get, Set, and Delete ACL operations per minute.
gcp.storage.googleapis.com.authz.objectSpecificACLMutationCount Count Changes in object-specific Access Control Lists (ACLs) counts within a sampling window of 60 seconds.
gcp.storage.googleapis.com.autoclass.transitionOperationCount Count The total number of Autoclass-initiated storage class transition operations within a sampling period.
gcp.storage.googleapis.com.autoclass.transitionedBytesCount bytes The total number of bytes transitioned between Google Cloud Platform resource classes monitored by Autoclass within a 300-second interval.
gcp.storage.googleapis.com.client.grpc.client.attempt.duration Seconds (s) Time taken for an RPC attempt, including channel selection.
gcp.storage.googleapis.com.client.grpc.client.attempt.RCVDTotalCompressedMessageSize bytes Total compressed bytes received per RPC attempt.
gcp.storage.googleapis.com.client.grpc.client.attempt.sentTotalCompressedMessageSize bytes The total number of compressed bytes transmitted for each RPC attempt, excluding metadata and framing.
gcp.storage.googleapis.com.client.grpc.client.attempt.started Count Number of RPC call initiations.
gcp.storage.googleapis.com.client.grpc.client.call.duration Seconds (s) The elapsed time between a client sending a request and receiving the corresponding response in a gRPC call.
gcp.storage.googleapis.com.client.grpc.lb.rls.cacheEntries Count The number of items currently stored in a Redis instance's read-through cache.
gcp.storage.googleapis.com.client.grpc.lb.rls.cacheSize bytes The current size in bytes of Redis Laboratory Service (RLS) cache.
gcp.storage.googleapis.com.client.grpc.lb.rls.defaultTargetPicks Count The number of load balancer picks directed towards the default target.
gcp.storage.googleapis.com.client.grpc.lb.rls.failedPicks Count Number of failed load balancer (LB) picks due to RLS request failures or RLS channel throttling.
gcp.storage.googleapis.com.client.grpc.lb.rls.targetPicks Count The number of load balancing picks made for each Registered Load Balancer (RLS) target.
gcp.storage.googleapis.com.client.grpc.lb.wrr.endpointWeightNotYetUsable Count Number of unsynchronized endpoints per scheduler update.
gcp.storage.googleapis.com.client.grpc.lb.wrr.endpointWeightStale Count Total number of stale endpoint weights in scheduler updates.
gcp.storage.googleapis.com.client.grpc.lb.wrr.endpointWeights Weight Endpoint weight histogram measures the distribution of endpoint weights.
gcp.storage.googleapis.com.client.grpc.lb.wrr.rrFallback Count Number of occurrences where WRR policy fell back to RR due to insufficient healthy endpoints with valid weights.
gcp.storage.googleapis.com.client.grpc.xdsClient.connected Boolean The current state of the ADS connection between xDS client and server (1 for active, 0 for inactive).
gcp.storage.googleapis.com.client.grpc.xdsClient.resourceUpdatesInvalid Count Resources received validation errors count.
gcp.storage.googleapis.com.client.grpc.xdsClient.resourceUpdatesValid Count Resources validated count.
gcp.storage.googleapis.com.client.grpc.xdsClient.resources Count The number of gRPC xDS resources in use.
gcp.storage.googleapis.com.client.grpc.xdsClient.serverFailure Count Number of xDS servers transitioning from healthy to unhealthy state.
gcp.storage.googleapis.com.network.receivedBytesCount bytes Deltas in bytes received per API method and response code over a period of 60 seconds.
gcp.storage.googleapis.com.network.sentBytesCount bytes The number of bytes sent over the network per API method and response code, measured every minute.
gcp.storage.googleapis.com.quota.anywhereCacheStorageSize.exceeded Count The number of attempts to exceed the quota limit for Cloud Storage anywhere cache size.
gcp.storage.googleapis.com.quota.anywhereCacheStorageSize.limit Kibibytes Measures the current size in bytes of a Google Cloud Memorystore anywhere cache's writable cache.
gcp.storage.googleapis.com.quota.anywhereCacheStorageSize.usage Kibibytes The current size of the anywhere cache in bytes.
gcp.storage.googleapis.com.quota.dualregionAnywhereCacheEgressBandwidth.limit bits Measures the current limit of egress bandwidth (in bytes per second) for Google Cloud Storage Nearline or Coldline cache anywhere in a dual region.
gcp.storage.googleapis.com.quota.dualregionAnywhereCacheEgressBandwidth.usage bits Measures the current usage of egress bandwidth for anywhere cache in dual-region setup.
gcp.storage.googleapis.com.quota.dualregionGoogleEgressBandwidth.limit bits Measures the current limit of dual-region egress bandwidth in bytes per second.
gcp.storage.googleapis.com.quota.dualregionGoogleEgressBandwidth.usage bits Measures the current usage of outbound network bandwidth in bytes per second across all projects and regions in Google Cloud Platform.
gcp.storage.googleapis.com.quota.dualregionInternetEgressBandwidth.limit bits The current limit (in bytes per second) of internet egress bandwidth for a dual-region configuration in Google Cloud Platform.
gcp.storage.googleapis.com.quota.dualregionInternetEgressBandwidth.usage bits Current dual-region internet egress bandwidth usage.
gcp.storage.googleapis.com.replication.meetingRPO Count Whether the specified storage class object meets its respective RPO (Reliability of Objects) requirement.
gcp.storage.googleapis.com.replication.missingRPOMinutesLast30d Count Number of minutes in the last 30 days where RPO was missed.
gcp.storage.googleapis.com.replication.objectReplicationsLast30d Count Total number of object replications over the past 30 days, with distinction between RPO met and missed replications.
gcp.storage.googleapis.com.replication.timeSinceMetricsUpdated Seconds (s) Time elapsed since last calculation of 'storage.googleapis.com/replication' metric values.
gcp.storage.googleapis.com.replication.turboMaxDelay Seconds (s) The maximum age (in seconds) of an unsynced object in a bucket's multi-object intelli-tiering storage class.
gcp.storage.googleapis.com.replication.v2.objectReplicationsLast30d Count Total count of object replications over the last 30 days.
gcp.storage.googleapis.com.replication.v2.timeSinceMetricsUpdated Seconds (s) The elapsed time since the last calculation of missing RPO minutes for Cloud Storage object replications over the past 30 days.
gcp.storage.googleapis.com.storage.objectCount Count Buckets: number of objects per storage class.
gcp.storage.googleapis.com.storage.totalByteSeconds bps The total daily bps of storage used by each storage class in a Google Cloud Storage bucket, excluding soft-deleted objects.
gcp.storage.googleapis.com.storage.totalBytes bytes The total size of all non-soft-deleted objects in the bucket, grouped by storage class.
gcp.storage.googleapis.com.storage.v2.deletedBytes bytes Deltas of deleted bytes per day, by storage class in each bucket.
gcp.storage.googleapis.com.storage.v2.totalByteSeconds bps Total daily bps used across all object storage classes and types within a bucket.
gcp.storage.googleapis.com.storage.v2.totalBytes bytes The total size of all objects and multipart-uploads in a Google Cloud Storage bucket.
gcp.storage.googleapis.com.storage.v2.totalCount Count Number of objects and multipart-uploads per bucket, grouped by storage class and type.

Infrastructure/Kubernetes metrics

Metrics for Kubernetes entities are collected by installing the SWO K8s Collector on a Kubernetes cluster that has Prometheus installed. See Kubernetes monitoring.

Cluster metrics

Metric Unit Description
k8s.cluster.cpu.allocatable core

The allocatable of CPU on cluster that are available for scheduling.

Metric type: Gauge.

k8s.cluster.cpu.capacity core

The cluster CPU capacity.

Metric type: Gauge.

k8s.cluster.cpu.utilization Percent (%)

The cluster CPU usage.

Metric type: Gauge.

k8s.cluster.memory.allocatable Binary Bytes

The allocatable of memory on cluster that are available for scheduling.

Metric type: Gauge.

k8s.cluster.memory.capacity Binary Bytes

The cluster memory capacity.

Metric type: Gauge.

k8s.cluster.memory.utilization Percent (%)

The cluster memory usage.

Metric type: Gauge.

k8s.cluster.nodes Count

The number of nodes on cluster.

Metric type: Gauge.

k8s.cluster.nodes.ready Count

The number of nodes with status condition ready.

Metric type: Gauge.

k8s.cluster.nodes.ready.avg Percent (%)

The percentage of nodes with status condition ready.

Metric type: Gauge.

k8s.cluster.pods Count

The number of pods on a cluster.

Metric type: Gauge.

k8s.cluster.pods.running Count

The number of pods in running phase.

Metric type: Gauge.

k8s.cluster.spec.cpu.requests cores

The total number of requested CPU by all containers in a cluster.

Metric type: Gauge.

k8s.cluster.spec.memory.requests Binary Bytes

The total number of requested memory by all containers in a cluster.

Metric type: Gauge.

Node metrics

Metric Unit Description
k8s.kube_node_created seconds (s)

Unix creation timestamp.

Metric type: Gauge.

k8s.kube_node_info  

Information about a cluster node.

Metric type: Gauge.

k8s.kube_node_spec_unschedulable  

Whether a node can schedule new pods.

Metric type: Gauge.

k8s.kube_node_status_allocatable

cpu=<core>

ephemeral_storage=<byte>

pods=<integer>

attachable_volumes_*=<byte>

hugepages_*=<byte>

memory=<byte>

The amount of resources allocatable for pods (after reserving some for system daemons).

Metric type: Gauge.

k8s.kube_node_status_capacity

cpu=<core>

ephemeral_storage=<byte>

pods=<integer>

attachable_volumes_*=<byte>

hugepages_*=<byte>

memory=<byte>

The total amount of resources available for a node.

Metric type: Gauge.

k8s.kube_node_status_condition  

The condition of a cluster node.

Metric type: Gauge.

k8s.kube_node_status_ready  

Node status (as tag sw.k8s.node.status).

Metric type: Gauge.

k8s.node.cpu.allocatable core

CPU Utilization. The allocatable of CPU on node that are available for scheduling.

Metric type: Gauge.

k8s.node.cpu.capacity core

CPU Utilization. The node CPU capacity.

Metric type: Gauge.

k8s.node.cpu.usage.seconds.rate core

CPU Utilization. The rate of node cumulative CPU time consumed.

Metric type: Gauge.

k8s.node.fs.iops  

Disk IOPS. Rate of reads and writes of all pods on node.

Metric type: Gauge.

k8s.node.fs.throughput  

Disk throughput. Rate of bytes read and written of all pods on node.

Metric type: Gauge.

k8s.node.fs.usage Binary Bytes

Disk Usage. Number of bytes that are consumed by containers on this node’s filesystem.

Metric type: Gauge.

k8s.node.memory.allocatable Binary Bytes

Memory Utilization. The allocatable of memory on node that are available for scheduling.

Metric type: Gauge.

k8s.node.memory.capacity Binary Bytes

Memory Utilization. The node memory capacity.

Metric type: Gauge.

k8s.node.memory.working_set Binary Bytes

Memory utilization. Current working set on node.

Metric type: Gauge.

k8s.node.network.bytes_received  

Network In. Rate of bytes received of all pods on node.

Metric type: Gauge.

k8s.node.network.bytes_transmitted  

Network Out. Rate of bytes transmitted of all pods on node.

Metric type: Gauge.

k8s.node.network.packets_received  

Rate of packets received of all pods on node.

Metric type: Gauge.

k8s.node.network.packets_transmitted  

Rate of packets transmitted of all pods on node.

Metric type: Gauge.

k8s.node.network.receive_packets_dropped  

Rate of packets dropped while receiving of all pods on node.

Metric type: Gauge.

k8s.node.network.transmit_packets_dropped  

Rate of packets dropped while transmitting of all pods on node.

Metric type: Gauge.

k8s.node.pods Count

Number of pods. The number of pods on a node.

Metric type: Gauge.

k8s.node.status.condition.diskpressure  

The condition diskpressure of a cluster node (1 when true, 0 when false or unknown).

Metric type: Gauge.

k8s.node.status.condition.memorypressure  

The condition memorypressure of a cluster node (1 when true, 0 when false or unknown).

Metric type: Gauge.

k8s.node.status.condition.networkunavailable  

The condition networkunavailable of a cluster node (1 when true, 0 when false or unknown).

Metric type: Gauge.

k8s.node.status.condition.pidpressure  

The condition pidpressure of a cluster node (1 when true, 0 when false or unknown).

Metric type: Gauge.

k8s.node.status.condition.ready  

The condition ready of a cluster node (1 when true, 0 when false or unknown).

Metric type: Gauge.

Pod metrics

Metric Unit Description
k8s.kube.pod.owner.daemonset  

Information about the DaemonSet owning the pod.

Metric type: Gauge.

k8s.kube.pod.owner.replicaset  

Information about the ReplicaSet owning the pod.

Metric type: Gauge.

k8s.kube.pod.owner.statefulset  

Information about the StatefulSet owning the pod.

Metric type: Gauge.

k8s.kube_pod_completion_time seconds (s)

Completion time in unix timestamp for a pod.

Metric type: Gauge.

k8s.kube_pod_created seconds (s)

Unix creation timestamp.

Metric type: Gauge.

k8s.kube_pod_info  

Information about the pod.

Metric type: Gauge.

k8s.kube_pod_owner  

Information about the pod owner.

Metric type: Gauge.

k8s.kube_pod_start_time seconds (s)

Start time in unix timestamp for a pod.

Metric type: Gauge.

k8s.kube_pod_status_phase  

The pod's current phase.

Metric type: Gauge.

k8s.kube_pod_status_ready  

Describes whether the pod is ready to serve requests.

Metric type: Gauge.

k8s.kube_pod_status_reason  

The pod status reasons.

Metric type: Gauge.

k8s.pod.containers Count

The number of containers on pod.

Metric type: Gauge.

k8s.pod.containers.running  

Current number of running containers on pod.

Metric type: Gauge.

k8s.pod.cpu.usage.seconds.rate seconds (s)

CPU Utilization. The rate of pod's cumulative CPU time consumed.

Metric type: Gauge.

k8s.pod.fs.iops  

Disk IOPS. Rate of reads and writes of all containers on pod.

Metric type: Gauge.

k8s.pod.fs.reads.bytes.rate  

Rate of bytes read of all containers on pod.

Metric type: Gauge.

k8s.pod.fs.reads.rate  

Rate of reads of all containers on pod.

Metric type: Gauge.

k8s.pod.fs.throughput  

Disk Throughput. Rate of bytes read and written of all containers on pod.

Metric type: Gauge.

k8s.pod.fs.usage.bytes Binary Bytes

Disk Usage. Number of bytes that are consumed by containers on this pod's filesystem.

Metric type: Gauge.

k8s.pod.fs.writes.bytes.rate  

Rate of bytes written of all containers on pod.

Metric type: Gauge.

k8s.pod.fs.writes.rate  

Rate of writes of all containers on pod.

Metric type: Gauge.

k8s.pod.memory.working_set Binary Bytes

Memory Utilization. Current working set on pod.

Metric type: Gauge.

k8s.pod.network.bytes_received  

Network In. Rate of bytes received of all containers on pod.

Metric type: Gauge.

k8s.pod.network.bytes_transmitted  

Network Out. Rate of bytes transmitted of all containers on pod.

Metric type: Gauge.

k8s.pod.network.packets_received  

Rate of packets received of all containers on pod.

Metric type: Gauge.

k8s.pod.network.packets_transmitted  

Rate of packets transmitted of all containers on pod.

Metric type: Gauge.

k8s.pod.network.receive_packets_dropped  

Rate of packets dropped while receiving of all containers on pod.

Metric type: Gauge.

k8s.pod.network.transmit_packets_dropped  

Rate of packets dropped while transmitting of all containers on pod.

Metric type: Gauge.

k8s.pod.spec.cpu.limit cores

CPU quota of all containers on pod in given CPU period.

Metric type: Gauge.

k8s.pod.spec.cpu.requests cores

The number of requested request resource by all containers on pod.

Metric type: Gauge.

k8s.pod.spec.memory.limit Binary Bytes

Memory Utilization. Memory limit for all containers on pod.

Metric type: Gauge.

k8s.pod.spec.memory.requests Binary Bytes

The number of requested memory by all containers on pod.

Metric type: Gauge.

k8s.pod.status.reason  

The current pod status reason.

Metric type: Gauge.

Container metrics

Metric Unit Description
k8s.container.spec.cpu.limit core

CPU quota of container in given CPU period.

Metric type: Gauge.

k8s.container.cpu.usage.seconds.rate cores

The rate of pod cumulative CPU time consumed.

Metric type: Gauge.

k8s.container.fs.iops  

Rate of reads and writes on container.

Metric type: Gauge.

k8s.container.fs.throughput  

Rate of bytes read and written on container.

Metric type: Gauge.

k8s.container.network.bytes_received  

Rate of bytes received on container.

Metric type: Gauge.

k8s.container.network.bytes_transmitted  

Rate of bytes transmitted on container.

Metric type: Gauge.

k8s.kube_pod_container_status_last_terminated_timestamp seconds

Last terminated time for a pod container in unix timestamp.

Metric type: Gauge.

k8s.kube_pod_init_container_info  

Information about an init container in a pod.

Metric type: Gauge.

k8s.kube_pod_init_container_status_waiting  

Describes whether the init container is currently in waiting state.

Metric type: Gauge.

k8s.kube_pod_init_container_status_waiting_reason  

Describes the reason the init container is currently in waiting state.

Metric type: Gauge.

k8s.kube_pod_init_container_status_running  

Describes whether the init container is currently in running state.

Metric type: Gauge.

k8s.kube_pod_init_container_status_terminated  

Describes whether the init container is currently in terminated state.

Metric type: Gauge.

k8s.kube_pod_init_container_status_terminated_reason  

Describes the reason the init container is currently in terminated state.

Metric type: Gauge.

k8s.kube_pod_init_container_status_last_terminated_reason   Describes the last reason the init container was in terminated state. Metric type: Gauge.
k8s.kube_pod_init_container_status_ready  

Describes whether the init containers readiness check succeeded.

Metric type: Gauge.

k8s.kube_pod_init_container_status_restarts_total  

The number of restarts for the init container.

Metric type: Gauge.

k8s.kube_pod_init_container_resource_limits  

The number of CPU cores requested limit by an init container.

Metric type: Gauge.

k8s.kube_pod_init_container_resource_requests  

The number of CPU cores requested by an init container.

Metric type: Gauge.

Deployment metrics

Metric Unit Description
k8s.deployment.condition.available  

Describes whether the deployment has an Available status condition.

Metric type: Gauge.

k8s.deployment.condition.progressing  

Describes whether the deployment has a Progressing status condition.

Metric type: Gauge.

k8s.deployment.condition.replicafailure  

Describes whether the deployment has a ReplicaFailure status condition.

Metric type: Gauge.

k8s.kube_deployment_created seconds (s)

Unix creation timestamp.

Metric type: Gauge.

k8s.kube_deployment_labels  

Kubernetes labels converted to Prometheus labels.

Metric type: Gauge.

k8s.kube_deployment_spec_paused  

Whether the deployment is paused and will not be processed by the deployment controller.

Metric type: Gauge.

k8s.kube_deployment_spec_replicas  

Number of desired pods for a deployment.

Metric type: Gauge.

k8s.kube_deployment_status_condition  

The current status conditions of a deployment.

Metric type: Gauge.

k8s.kube_deployment_status_replicas  

The number of replicas per deployment.

Metric type: Gauge.

k8s.kube_deployment_status_replicas_available  

The number of available replicas per deployment.

Metric type: Gauge.

k8s.kube_deployment_status_replicas_ready  

The number of ready replicas per deployment.

Metric type: Gauge.

k8s.kube_deployment_status_replicas_unavailable  

The number of unavailable replicas per deployment.

Metric type: Gauge.

k8s.kube_deployment_status_replicas_updated  

The number of updated replicas per deployment.

Metric type: Gauge.

StatefulSet metrics

Metric Unit Description
k8s.kube_statefulset_created seconds (s)

Unix creation timestamp.

 

k8s.kube_statefulset_labels  

Kubernetes labels converted to Prometheus labels.

Metric type: Gauge.

k8s.kube_statefulset_replicas  

Number of desired pods for a StatefulSet.

Metric type: Gauge.

k8s.kube_statefulset_status_replicas_current  

The number of current replicas per StatefulSet.

Metric type: Gauge.

k8s.kube_statefulset_status_replicas_ready  

The number of ready replicas per StatefulSet.

Metric type: Gauge.

k8s.kube_statefulset_status_replicas_updated  

The number of updated replicas per StatefulSet.

Metric type: Gauge.

DaemonSet metrics

Metric Unit Description
k8s.kube_daemonset_created seconds (s)

Unix creation timestamp.

Metric type: Gauge.

k8s.kube_daemonset_labels  

Kubernetes labels converted to Prometheus labels.

Metric type: Gauge.

k8s.kube_daemonset_status_current_number_scheduled  

The number of nodes that should be running a daemon pod and have at least one daemon pod running.

Metric type: Gauge.

k8s.kube_daemonset_status_desired_number_scheduled  

The number of nodes that should be running the daemon pod.

Metric type: Gauge.

k8s.kube_daemonset_status_number_available  

The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and available.

Metric type: Gauge.

k8s.kube_daemonset_status_number_misscheduled  

The number of nodes that should not be running a daemon pod and have one or more running anyway.

Metric type: Gauge.

k8s.kube_daemonset_status_number_ready  

The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and ready.

Metric type: Gauge.

k8s.kube_daemonset_status_number_unavailable  

The number of nodes that should be running the daemon pod and have none of the daemon pod running and available.

Metric type: Gauge.

k8s.kube_daemonset_status_updated_number_scheduled  

The total number of nodes that are running updated daemon pod.

Metric type: Gauge.

ReplicaSet metrics

Metric Unit Description
k8s.kube_replicaset_spec_replicas  

Information about the desired replicasets.

Metric type: Gauge.

k8s.kube_replicaset_status_ready_replicas  

Information about the ready replicasets.

Metric type: Gauge.

k8s.kube_replicaset_status_replicas  

Information about the current replicasets.

Metric type: Gauge.

Namespace metrics

Metric Unit Description
k8s.kube_namespace_created seconds (s)

Unix creation timestamp.

Metric type: Gauge.

k8s.kube_namespace_status_phase  

Kubernetes namespace status phase.

Metric type: Gauge.

k8s.kube_resourcequota

ResourceQuota metric.

Metric type: Gauge.

Other metrics

Metric Unit Description
k8s.apiserver.request.successrate Percent (%)

Success rate of Kubernetes API server calls.

Metric type: Gauge.

k8s.apiserver_request_total  

Kubernetes API server requests.

Metric type: Counter.

k8s.apiserver_request_duration_seconds  

Kubernetes API server requests latency.

Metric type: Histogram.

k8s.workqueue_adds_total  

Kubernetes workqueue adds.

Metric type: Counter.

k8s.workqueue_depth  

Kubernetes workqueue depth.

Metric type: Gauge.

k8s.workqueue_queue_duration_seconds  

How long item stays in Kubernetes workqueue.

Metric type: Histogram.

k8s.coredns_cache_entries  

The number of elements in the cache.

Metric type: Gauge.

k8s.coredns_cache_hits_total  

The count of cache hits.

Metric type: Counter.

k8s.coredns_cache_misses_total  

The count of cache misses.

Metric type: Counter.

k8s.coredns_dns_requests_total  

Counter of DNS requests made per zone, protocol and family.

Metric type: Counter.

k8s.coredns_dns_request_duration_seconds  

Histogram of the time (in seconds) each request took per zone.

Metric type: Histogram.

k8s.coredns_dns_responses_total  

Counter of response status codes.

Metric type: Counter.

Network metrics

Metrics for network device entities are sent by an installed Network Collector. See Network monitoring.

Standard metrics

Network device metrics

Metric Units Description
sw.collector.CPULoad.AvgLoad Percent (%)

Average CPU Utilization. Average CPU utilization of a network device instance or instances. Displayed as a percentage.

sw.collector.CPULoad.AvgPercentMemoryUsed Percent (%)

Average Memory Utilization. Average memory usages of the network device as a percentage.

sw.collector.Nodes.DisplayName [name]

Display name polled from the device to be used in custom widgets for filtering, sorting, or grouping data.

sw.collector.ResponseTime.Availability Percent (%)

Availability. Availability of the network device instance of instances. Displayed as a percentage.

May be displayed as:

  • Average Availability. An average availability of network devices.
sw.collector.ResponseTime.AvgResponseTime milliseconds (ms)

Average Response Time. The average time in milliseconds it takes the network device to respond.

sw.collector.ResponseTime.PercentLoss Percent (%)

Packet Loss. The packet loss of the network device as a percentage.

May be displayed as:

  • Average Packet Loss

Interface metrics

Metric Units Description
sw.collector.InterfaceAvailability.Availability Percent (%)

Availability. Availability of the interface instance of instances. Displayed as a percentage.

May be displayed as:

  • Average Availability
sw.collector.InterfaceTraffic.InPercentUtil Percent (%)

In Percent Utilization. Average utilization of an interface instance or instances. Displayed as a percentage.

May be displayed as:

  • In Percent Utilization Average
sw.collector.InterfaceTraffic.OutPercentUtil Percent (%)

Out Percent Utilization. Average utilization of an interface instance or instances. Displayed as a percentage.

May be displayed as:

  • Out Percent Utilization Average
sw.collector.InterfaceTraffic.InAveragebps Percent (%)

In Bits Per Second Average. Average utilization of an interface instance or instances. Displayed as a percentage.

sw.collector.InterfaceTraffic.OutAveragebps Percent (%)

Out Bits Per Second Average. Average utilization of an interface instance or instances. Displayed as a percentage.

sw.collector.InterfaceErrors.InDiscards Percent (%)

Out Discards. Average utilization of an interface instance or instances. Displayed as a percentage.

May be displayed as:

  • Out Discards Average
sw.collector.InterfaceErrors.OutDiscards Percent (%)

In Discards. Average utilization of an interface instance or instances. Displayed as a percentage.

May be displayed as:

  • In Discards Average
sw.collector.InterfaceErrors.InErrors Percent (%)

In Errors. Average utilization of an interface instance or instances. Displayed as a percentage.

May be displayed as:

  • In Errors Average
sw.collector.InterfaceErrors.OutErrors Percent (%)

Out Errors. Average utilization of an interface instance or instances. Displayed as a percentage.

May be displayed as:

  • Out Errors Average

Volume metrics

Metric Units Description
sw.collector.VolumeUsageHistory.PercentDiskUsed Percent (%)

Percent Disk Used. Indicates the overall disk usage as a percentage.

sw.collector.VolumeUsageHistory.AvgDiskUsed Gigabytes

Average Disk Used. Indicates the average disk usage in Gigabytes.

sw.collector.VolumeUsageHistory.DiskSize Gigabytes

Volume Size. Indicates the disk size in Gigabytes.

sw.collector.VolumePerformanceHistory.AvgDiskReads Percent (%)

Disk Read Average. Indicates the average read speed of the volume.

Only for volumes monitored via WMI.

sw.collector.VolumePerformanceHistory.AvgDiskWrites Percent (%)

Disk Write Average. Indicates the average write speed.

Only for volumes monitored via WMI.

Sensor metrics

Metric Units Description
sw.collector.HardwareHealth.HardwareItemStatistics.AvgValue V/°C/

Average Sensor Value. Indicates the sensor value in appropriate units, as provided by the sensor. Sensors include power supplies, temperature, or fan sensors.

Flow metrics

Metric Units Description
sw.collector.Netflow.Flows.Bytes GB

Top Protocols, Top Countries, Top Endpoints, Top Conversations, Top Applications, Top Advanced Applications.

Endpoints producing the most traffic on your network, most bandwidth-consuming conversations, protocols used for most traffic, countries hosting endpoints that transmit the most data, or applications responsible for most monitored traffic.

Wireless Controller and Thin Access Point metrics

Metric Units Description
sw.collector.Wireless.Interfaces N/A MAC, SSIDs, Channels and Radio Type details are gathered from wireless interfaces of that AP.
sw.collector.Wireless.Clients Number The sum of clients connected to all interfaces of AP.
sw.collector.Wireless.HistoricalClients.SignalStrength  

RSSI - signal strength

The following thresholds are used to convert dbm value to a strength indicator: -82, -72, -68, -63, -56 (-82 is the worst).

sw.collector.Wireless.HistoricalClients.OutDataRate   Data rate on clients

Special metrics

Metric Units Description

sw.collector.InterfaceTraffic.Averagebps

Percent (%) Total average bps (transmitted + received).

OTel metrics

When an OTel receiver is configured to send telemetry data directly to SolarWinds Observability SaaS, the metrics collected depend on what OTel data is sent. See OTel direct ingestion.

When you integrate with Apache, Elasticsearch, NGINX, Redis, or ZooKeeper, the SolarWinds Observability Agent is used to send metrics and log data to SolarWinds Observability SaaS. See Monitor with OTel.

Apache metrics

Metric Units Description
apache.cpu.load Percent (%)

The current load of the CPU.

apache.cpu.time Jiff The jiffs used by processes of a given category.
apache.current_connections Connections The number of active connections currently attached to the HTTP server.
apache.load.1 Percent (%) The average server load during the last minute.
apache.load.15 Percent (%) The average server load during the last 15 minutes.
apache.load.5 Percent (%) The average server load during the last 5 minutes.
apache.request.time milliseconds (ms) Total time spent on handling requests.
apache.request.time.rate milliseconds (ms) Total time spent on handling requests.
apache.requests Requests The number of requests serviced by the HTTP server per second.
apache.requests.rate milliseconds (ms) Total time spent on handling requests.
apache.scoreboard Workers The number of workers in each state.
apache.throughput Byte per request The average number of bytes served per request.
apache.time.perrequest milliseconds per request The average processing time per request.
apache.traffic Byte Total HTTP server traffic in bytes.
apache.traffic.rate Byte per request HTTP server traffic in bytes per second.
apache.uptime seconds (s) The amount of time that the server has been running in seconds.
apache.workers Workers The number of workers currently attached to the HTTP server.
apache.workers.idle Workers The number of idle workers.

Confluent Cloud metrics

Metric Units Description
confluent_kafka_server_active_connection_count {connections} The count of active authenticated connections.

confluent_kafka_server_partition_count

{partitions} The number of partitions.
confluent_kafka_server_received_bytes By (bytes)/60s The delta count of bytes of the customer's data received from the network. Each sample is the number of bytes received since the previous data sample. The count is sampled every 60 seconds.
confluent_kafka_server_received_records {records}/60s The delta count of records received. Each sample is the number of records received since the previous data sample. The count is sampled every 60 seconds.
confluent_kafka_server_request_bytes Bytes/60s The delta count of total request bytes from the specified request types sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
confluent_kafka_server_request_count {requests}/60s The delta count of requests received over the network. Each sample is the number of requests received since the previous data point. The count sampled every 60 seconds.
confluent_kafka_server_response_bytes Bytes/60s The delta count of total response bytes from the specified response types sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
confluent_kafka_server_retained_bytes Bytes/60s The current count of bytes retained by the cluster. The count is sampled every 60 seconds.
confluent_kafka_server_sent_bytes Bytes/60s The delta count of bytes sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds.
confluent_kafka_server_sent_records {records}/60s The delta count of records sent. Each sample is the number of records sent since the previous data point. The count is sampled every 60 seconds.
confluent_kafka_server_successful_authentication_count {successful authentications}/60s The delta count of successful authentications. Each sample is the number of successful authentications since the previous data point. The count sampled every 60 seconds.

Docker metrics

Metric Units Description
container.blockio.io_service_bytes_recursive bytes (By) The nof bytes transferred to/from the disk by the group and descendant groups.
container.cpu.throttling_data.periods {periods} The number of periods with throttling active.
container.cpu.usage.kernelmode nanosecond (ns) Time spent by tasks of the cgroup in kernel mode (Linux). Time spent by all container processes in kernel mode (Windows).
container.cpu.usage.total nanosecond (ns) Total CPU time consumed.
container.cpu.usage.usermode nanosecond (ns) Time spent by tasks of the cgroup in user mode (Linux). Time spent by all container processes in user mode (Windows).
container.cpu.utilization percentage (%)

Container CPU Utilization. Percentage of CPU used per container.

container.memory.file bytes (By) Amount of memory used to cache filesystem data, including tmpfs and shared memory (Only available with cgroups v2).
container.memory.percent percentage (%)

Container Memory Utilization. Percentage of memory used per container

container.memory.total_cache bytes (By) Total amount of memory used by the processes of this cgroup (and descendants) that can be associated with a block on a block device. Also accounts for memory used by tmpfs (Only available with cgroups v1).
container.memory.usage.limit bytes (By) Memory limit of the container.
container.memory.usage.total bytes (By) Memory usage of the container. This excludes the cache.
container.network.io.usage.rx_bytes bytes (By)

Total Received Bytes per Container. Total bytes received by the container.

container.network.io.usage.rx_dropped {packets}

Total Incoming Dropped Packets by Container . Total incoming packets dropped by the container.

container.network.io.usage.tx_bytes bytes (By)

Total Sent Bytes per Container. Total bytes sent by the container.

container.network.io.usage.tx_dropped {packets}

Total Outgoing Dropped Packets by Container. Total outgoing packets dropped by the container.

container.uptime seconds (s)

Total Container Uptime. The time elapsed since the start time of the container.

Elasticsearch metrics

Metric Units Description
elasticsearch.breaker.memory.estimated bytes (By)

The estimated memory used for the operation.

elasticsearch.breaker.memory.limit bytes (By) The memory limit for the circuit breaker.
elasticsearch.breaker.tripped 1 The total number of times the circuit breaker has been triggered and prevented an out of memory error.
elasticsearch.cluster.data_nodes {nodes} Data Nodes. The number of data nodes in the cluster.
elasticsearch.cluster.health status Cluster by Status. The health status of the cluster. Health status is based on the state of its primary and replica shards. Green indicates all shards are assigned. Yellow indicates that one or more replica shards are unassigned. Red indicates that one or more primary shards are unassigned, making some data unavailable.
elasticsearch.cluster.in_flight_fetch {fetches} The number of unfinished fetches.
elasticsearch.cluster.nodes {nodes} Nodes, Top 5 Clusters by Node Count. The total number of nodes in the cluster.
elasticsearch.cluster.pending_tasks {tasks} Pending Tasks in Cluster. The number of cluster-level changes that have not yet been executed.
elasticsearch.cluster.published_states.differences 1 The number of differences between published cluster states.
elasticsearch.cluster.published_states.full 1 The number of published cluster states.
elasticsearch.cluster.shards {shards} Active Shards, Shards by State. The number of shards in the cluster.
elasticsearch.cluster.state_queue 1 The number of cluster states in queue.
elasticsearch.cluster.state_update.count 1 The number of cluster state update attempts that changed the cluster state since the node started.
elasticsearch.cluster.state_update.time milliseconds (ms) The cumulative amount of time updating the cluster state since the node started.
elasticsearch.index.operations.completed {operations} The number of operations completed for an index.
elasticsearch.index.operations.time milliseconds (ms) Time spent on operations for an index.
elasticsearch.index.shards.size bytes (By) The size of the shards assigned to this index.
elasticsearch.indexing_pressure.memory.limit bytes (By) The configured memory limit, in bytes, for the indexing requests.
elasticsearch.indexing_pressure.memory.total.primary_rejections 1 The cumulative number of indexing requests rejected in the primary stage.
elasticsearch.indexing_pressure.memory.total.replica_rejections 1 The number of indexing requests rejected in the replica stage.
elasticsearch.memory.indexing_pressure bytes (By) Indexing Pressure. The memory consumed, in bytes, by indexing requests in the specified stage.
elasticsearch.node.cache.count {count} The total count of query cache misses across all shards assigned to selected nodes.
elasticsearch.node.cache.evictions {evictions} The number of evictions from the cache on a node.
elasticsearch.node.cache.memory.usage bytes (By) The size in bytes of the cache on a node.
elasticsearch.node.cluster.connections {connections} Cluster Connections. The number of open TCP connections for internal cluster communication.
elasticsearch.node.cluster.io bytes (By) The number of bytes sent and received on the network for internal cluster communication.
elasticsearch.node.cluster.io.rate bytes per second (By/s) Network Traffic. The number of bytes sent and received for internal cluster communication per second.
elasticsearch.node.disk.io.read kilobytes (KiBy) Disk Read and Write. The total number of kilobytes read across all file stores for this node.
elasticsearch.node.disk.io.write kilobytes (KiBy) Disk Read and Write. The total number of kilobytes written across all file stores for this node.
elasticsearch.node.documents {documents} The number of documents on the node.
elasticsearch.node.fs.disk.available bytes (By) The amount of disk space available to the JVM across all file stores for this node. Depending on OS or process level restrictions, this might appear less than free. This is the actual amount of free disk space the Elasticsearch node can use.
elasticsearch.node.fs.disk.free bytes (By) The amount of unallocated disk space across all file stores for this node.
elasticsearch.node.fs.disk.total bytes (By) The amount of disk space across all file stores for this node.
elasticsearch.node.http.connections {connections} The number of HTTP connections to the node.
elasticsearch.node.ingest.documents {documents} The total number of documents ingested during the lifetime of this node.
elasticsearch.node.ingest.documents.current {documents} The total number of documents currently being ingested.
lasticsearch.node.ingest.operations.failed {operation} The total number of failed ingest operations during the lifetime of this node.
elasticsearch.node.open_files {files} Open File Descriptors. The number of open file descriptors held by the node.
elasticsearch.node.operations.completed {operations} The number of operations completed by a node.
elasticsearch.node.operations.completed.rate {operations} per second Node Operations Completed per Second. The number of operations completed for an index per second.
elasticsearch.node.operations.time milliseconds (s) Total Time Spent on Operations. The time spent on operations by a node.
elasticsearch.node.pipeline.ingest.documents.current {documents} The total number of documents currently being ingested by a pipeline.
elasticsearch.node.pipeline.ingest.documents.preprocessed {documents} The number of documents preprocessed by the ingest pipeline.
elasticsearch.node.pipeline.ingest.operations.failed {operation} The total number of failed operations for the ingest pipeline.
elasticsearch.node.script.cache_evictions 1 The total number of times the script cache has evicted old data.
elasticsearch.node.script.compilation_limit_triggered 1 The total number of times the script compilation circuit breaker has limited inline script compilations.
elasticsearch.node.script.compilations {compilations} The total number of inline script compilations performed by the node.
elasticsearch.node.shards.data_set.size bytes (By) The total data set size of all shards assigned to the node. This includes the size of shards not stored fully on the node, such as the cache for partially mounted indices.
elasticsearch.node.shards.reserved.size bytes (By) A prediction of how much larger the shard stores on this node will eventually grow due to ongoing peer recoveries, restoring snapshots, and similar activities. A value of -1 indicates that this is not available.
elasticsearch.node.shards.size bytes (By) The size of the shards assigned to this node.
elasticsearch.node.thread_pool.tasks.finished {tasks} The number of tasks finished by the thread pool.
elasticsearch.node.thread_pool.tasks.queued {tasks} Queued Tasks in Thread Pool. The number of queued tasks in the thread pool.
elasticsearch.node.thread_pool.threads {threads} The number of threads in the thread pool.
elasticsearch.node.translog.operations {operations} The number of transaction log operations.
elasticsearch.node.translog.size bytes (By) The size of the transaction log.
elasticsearch.node.translog.uncommitted.size bytes (By) The size of uncommitted transaction log operations.
elasticsearch.os.cpu.load_avg.15m 1 CPU Utilization. The fifteen-minute load average on the system. The field is not present if fifteen-minute load average is not available.
elasticsearch.os.cpu.load_avg.1m 1 CPU Utilization. The one-minute load average on the system. The field is not present if one-minute load average is not available.
elasticsearch.os.cpu.load_avg.5m 1 CPU Utilization. The five-minute load average on the system. The field is not present if five-minute load average is not available.
elasticsearch.os.cpu.usage Percent (%) The recent CPU usage for the whole system, or -1 if not supported.
elasticsearch.os.memory bytes (By) The amount of physical memory.
jvm.classes.loaded 1 The number of loaded classes.
jvm.gc.collections.count 1 The total number of garbage collections that have occurred.
jvm.gc.collections.count.rate collections per second JVM GC Collection Count per Second. The number of Java Virtual Machine garbage collections that have occurred per second.
jvm.gc.collections.elapsed milliseconds (ms) Total JVM GC Collection Time. The approximate accumulated collection elapsed time .
jvm.memory.heap.committed bytes (By) JVM Memory Heap Committed vs Used. The amount of memory that is guaranteed to be available for the heap.
jvm.memory.heap.max bytes (By) The maximum amount of memory can be used for the heap .
jvm.memory.heap.used bytes (By) JVM Memory Heap Committed vs Used. The current heap memory usage.
jvm.memory.nonheap.committed bytes (By) The amount of memory that is guaranteed to be available for non-heap purposes.
jvm.memory.nonheap.used bytes (By) The current non-heap memory usage.
jvm.memory.pool.max bytes (By) The maximum amount of memory can be used for the memory pool.
jvm.memory.pool.used bytes (By) The current memory pool memory usage.
jvm.threads.count 1 The current number of threads.

IIS metrics

Metric Units Description
iis.connection.active {active connections} The number of active connections.
iis.connection.anonymous {anonymous connections} The number of connections established anonymously.
iis.connection.anonymous/rate {anonymous connections}/s The number of connections established anonymously per second.
iis.connection.attempt.count {connection attempts} The total number of attempts to connect to the server.
iis.connection.attempt.count/rate {connection attempts}/second (s) The total number of attempts to connect to the server per second.
iis.network.blocked bytes (By) The total number of bytes blocked due to bandwidth throttling.
iis.network.file.count bytes (By) The number of transmitted files.
iis.network.io bytes (By) The total amount of bytes sent and received.
iis.network.io/rate bytes (By)/second (s) The total amount of bytes sent and received per second
iis.request.count {requests} The total number of requests of a given type.
iis.request.queue.count {requests} The current number of requests in the queue.
iis.request.rejected {requests} The total number of requests rejected.
iis.thread.active {requests} The total number of active threads.
iis.uptime M/k The amount of time the server has been up.

Kafka metrics

Metric Units Description
kafka_controller_kafkacontroller_activecontrollercount {active controllers in cluster} Active Cluster Controllers. The average number of active controllers in the cluster.

kafka_log_logflushstats_logflushrateandtimems.95th
kafka_log_logflushstats_logflushrateandtimems.999th
kafka_log_logflushstats_logflushrateandtimems.median

  Log Flush Rate and Time. The maximum values of log flush rate and time.

kafka_network_requestmetrics_localtimems
kafka_network_requestmetrics_localtimems.95th
kafka_network_requestmetrics_localtimems.999th
kafka_network_requestmetrics_localtimems.median

ms (millisecond) Leader Request Time. The average time taken to process a request at the leader.

kafka_network_requestmetrics_totaltimems
kafka_network_requestmetrics_totaltimems.95th
kafka_network_requestmetrics_totaltimems.999th
kafka_network_requestmetrics_totaltimems.median

ms (millisecond) Producer Request Time. The average total time to serve a single 'Produce' request.
kafka_network_socketserver_networkprocessoravgidlepercent % (percentage) Broker Process Idle Time. The average fraction of time the network processors are idle.
kafka_server_brokertopicmetrics_bytesin_1minuterate Bytes/second Broker Incoming Bytes. The one-minute sum of incoming bytes per second.
kafka_server_brokertopicmetrics_bytesin_1minuterate Bytes/second/{topic} Broker Incoming Bytes per Topic. The one-minute average rate of incoming bytes per second distributed by Topic.
kafka_server_brokertopicmetrics_messagesin_1minuterate {messages}/second Broker Incoming Messages. The one-minute sum of incoming messages per second.
kafka_server_brokertopicmetrics_messagesin_1minuterate {messages}/second/{topic} Broker Incoming Messages per Topic. The one-minute average rate of incoming messages per second distributed per topic.
kafka_server_replicafetchermanager_maxlag {messages} Max Replica Lag. The average of maximum number of messages by which the consumer lags behind the producer.
kafka_server_replicamanager_isrshrinks_1minuterate {shrink events}/minute ISR Shrink Rate. The one-minute rate of ISR shrink events. If a broker goes down, ISR for some of the partitions will shrink. When that broker is up again and the replicas are fully caught up, ISR will expand.
kafka_server_replicamanager_leadercount {replica leaders} Leader Replicas. The average number of replica leaders.
kafka_server_replicamanager_partitioncount {partitions} Partitions. The average number of partitions on all brokers.
kafka_server_replicamanager_underreplicatedpartitions {under-replicated partitions} Under-Replicated Partitions. The average number of under-replicated partitions.

Memcached metrics

Metric Units Description
memcached.bytes bytes (By) Current Bytes Stored, Bytes Stored. The current number of bytes used by this server to store items.
memcached.commands {commands} The commands executed.
memcached.commands.rate {commands}/second Commands. The commands executed per second.
memcached.connections.current {connections} The current number of open connections.
memcached.connections.total {connections} The total number of connections opened since the server started running.
memcached.cpu.usage seconds (s) CPU User Time, CPU System Time. The accumulated user and system time.
memcached.current_items {items} Current Items in Cache, Active Connections. The number of items currently stored in the cache.
memcached.evictions {evictions} Total Evictions. The average total number of cache item evictions.
memcached.network bytes (By) Bytes transferred over the network.
memcached.network.rate bytes/second (By/s) Network Traffic. The average number of bytes transferred over the network, per second.
memcached.operation_hit_ratio percentage (%) Operation Hit Ratio. The hit ratio for operations, expressed as a percentage value between 0.0 and 100.0.
memcached.operations {operations} Hits and Misses Total. The average total counts of hits and misses.
memcached.operations.rate {operations}/second The average counts of hits and misses per second.
memcached.threads {threads} The number of threads used by the Memcached instance.

NGINX metrics

Metric Units Description
nginx.conections Connections

The current number of nginx connections by state.

nginx.connections_accepted Connections The total number of accepted client connections.
nginx.connections_accepted.gauge Connections The accepted client connections (gauge).
nginx.connections_accepted.rate Connections The number of accepted client connections per second.
nginx.connections_current Connections The current number of nginx connections by state.
nginx.connections_dropped Connections The total number of dropped client connections.
nginx.connections_dropped.rate Connections The number of dropped client connections per second.
nginx.connections_handled Connections The total number of handled connections. Generally, the parameter value is the same as nginx.connections_accepted unless some resource limits have been reached (for example, the worker_connections limit).
nginx.connections_handled.gauge Connections The handled client connections (gauge).
nginx.connections_handled.rate Connections The number of handled client connections per second.
nginx.requests Requests The total number of requests made to the server since it started.
nginx.requests.rate

Requests per second

The number of requests per second.

Oracle DB metrics

Metric Units Description
oracledb.cpu_time Seconds (s)

The cumulative CPU time, in seconds.

oracledb.dml_locks.limit {locks} The maximum limit of active Data Manipulation Language (DML) locks, -1 if unlimited.
oracledb.dml_locks.usage {locks} The current count of active Data Manipulation Language (DML) locks.
oracledb.enqueue_deadlocks {deadlocks} The total number of deadlocks between table or row locks in different sessions.
oracledb.enqueue_locks.limit {locks} The maximum limit of active en queue locks, -1 if unlimited.
oracledb.enqueue_locks.usage {locks} The current count of active en queue locks.
oracledb.enqueue_resources.limit {resources} The maximum limit of active en queue resources, -1 if unlimited.
oracledb.enqueue_resources.usage {resources} The current count of active en queue resources.
oracledb.exchange_deadlocks {deadlocks} The number of times that a process detected a potential deadlock when exchanging two buffers and raised an internal, restartable error. Index scans are the only operations that perform exchanges.
oracledb.executions {executions} The total number of calls (user and recursive) that executed SQL statements.
oracledb.hard_parses {parses} The number of hard parses.
oracledb.logical_reads {reads} The number of logical reads.
oracledb.parse_calls {parses} The total number of parse calls.
oracledb.pga_memory bytes (By) The Session Program Global Area (PGA) memory.
oracledb.physical_reads {reads} The number of physical reads.
oracledb.processes.limit {processes} The maximum limit of active processes, -1 if unlimited.
oracledb.processes.usage {processes} The current count of active processes.
oracledb.sessions.limit {processes} The maximum limit of active sessions, -1 if unlimited.
oracledb.sessions.usage {processes} The count of active sessions.
oracledb.tablespace_size.limit bytes (By) The maximum size of tablespace in bytes, -1 if unlimited.
oracledb.tablespace_size.usage bytes (By) The used tablespace in bytes.
oracledb.transactions.limit {transactions} The maximum limit of active transactions, -1 if unlimited.
oracledb.transactions.usage {transactions} The current count of active transactions.
oracledb.user_commits {commits} The number of user commits. When a user commits a transaction, the redo generated that reflects the changes made to database blocks must be written to disk. Commits often represent the closest thing to a user transaction rate.
oracledb.user_rollbacks 1 The number of times users manually issue the ROLLBACK statement or an error occurs during a user's transactions

RabbitMQ metrics

Metric Units Description
rabbitmq.message.current.sum {messages} Current Messages in Queues, Top 10 Queues by Depth. The total number of messages currently in the queues on RabbitMQ by queue name.
rabbitmq_channels {channels} Open Channels. The number of channels currently open on RabbitMQ.
rabbitmq_channel_messages_unacked {messages} Messages Unacknowledged. The average number of delivered but not yet acknowledged messages on RabbitMQ.
rabbitmq_consumers {consumers} Queue Consumers. The number of currently connected consumers on RabbitMQ.
rabbitmq_disk_space_available_bytes Bytes Free Disk Space. The average free disk space available on RabbitMQ.
rabbitmq_erlang_processes_used {processes} Used Processes. The total number of Erlang processes used by RabbitMQ.
rabbitmq.message.acknowledged.rate {messages}/s Messages Acknowledged per Second. The average number of messages acknowledged per second on RabbitMQ.
rabbitmq.message.delivered.rate {messages}/s Messages Delivered per Second. The average number of messages delivered per second on RabbitMQ.
rabbitmq.message.dropped.rate {messages}/s Messages Dropped per Second. The average number of messages dropped per second on RabbitMQ.
rabbitmq.message.published.rate {messages}/s Messages Published per Second. The average number of messages published per second on RabbitMQ.
rabbitmq_process_open_fds {file descriptors} Open File Descriptors. The average number of open file descriptors on RabbitMQ.
rabbitmq_process_open_tcp_sockets {sockets} Open Sockets. The total number of open TCP sockets on RabbitMQ.
rabbitmq_process_resident_memory_bytes Bytes Memory Consumed by Node. The memory used by node on RabbitMQ.
rabbitmq_queue_consumer_utilisation   Consumer Utilization. The average proportion of time that the queues can deliver messages to consumers on RabbitMQ.
rabbitmq_queue_process_memory_bytes Bytes Memory Consumed by Queues. The average memory used by the Erlang queue process on RabbitMQ.

Redis metrics

Metric Units Description
redis.clients.blocked   Blocked Clients, Clients. The number of clients pending on a blocking call.
redis.clients.connected   Redis Version, Clients. The number of client connections (excluding connections from replicas).
redis.clients.max_input_buffer   The biggest input buffer among current client connections .
redis.clients.max_output_buffer   The longest output list among current client connections.
redis.commands operations/s Processed Commands per Second. The number of commands processed per second.
redis.commands.processed   Total Processed Commands. The total number of commands processed by the server.
redis.connections.received   Total Connections. The total number of connections accepted by the server.
redis.connections.rejected   Total Connections. The number of connections rejected because of the maxclients limit.
redis.cpu.time seconds (s) Total CPU Time by State. The system CPU consumed by the Redis server in seconds since the server started.
redis.db.avg_ttl milliseconds (ms) The average keyspace keys TTL.
redis.db.expires   The number of keyspace keys with an expiration.
redis.db.keys   The number of keyspace keys.
redis.keys.evicted   Total Expired and Evicted Keys. The number of keys evicted due to the maxmemory limit.
redis.keys.expired   Total Expired and Evicted Keys. The total number of key expiration events.
redis.keyspace.hits   The number of successful lookup of keys in the main dictionary.
redis.keyspace.misses   The number of failed lookup of keys in the main dictionary.
redis.latest_fork microseconds (μs) The duration of the latest fork operation in microseconds.
redis.memory.fragmentation_ratio   Fragmentation Ratio. The ratio between used_memory_rss and used_memory.
redis.memory.lua bytes (By) Used Memory. The number of bytes used by the Lua engine.
redis.memory.peak bytes (By) Peak memory consumed by Redis (in bytes).
redis.memory.rss bytes (By) Used Memory. The number of bytes that Redis allocated as seen by the operating system.
redis.memory.used bytes (By) Used Memory. The total number of bytes allocated by Redis using its allocator.
redis.net.input bytes (By) The total number of bytes read from the network.
redis.net.output bytes (By) Total Network Traffic. The total number of bytes written to the network.
redis.rdb.changes_since_last_save   Changes Since Last Save. The number of changes since the last dump.
redis.replication.backlog_first_byte_offset   The master offset of the replication backlog buffer.
redis.replication.offset   The server's current replication offset.
redis.role   Role. The Redis node's role.
redis.slaves.connected   Clients. The number of connected replicas.
redis.uptime seconds (s) Uptime. The number of seconds since Redis server started.

Snowflake metrics

Metric Units Description
sw.metrics.healthscore Percent (%)

Health. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact.

To view the health of Apache Web Server entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select apacheinstance.

snowflake.database.bytes_scanned.avg Bytes (By)

Average bytes scanned in a database over the last 24-hour period.

snowflake.database.query.count {queries} Query Counts. Total query count for the database over the last 24-hour period.
snowflake.query.blocked {queries} Blocked query count for the warehouse over the last 24-hour period.
snowflake.query.bytes_deleted.avg Bytes (By) Query Bytes. Average bytes deleted in the database over the last 24-hour period.
snowflake.query.bytes_written.avg Bytes (By) Query Bytes. Average bytes written by the database over the last 24-hour period.
snowflake.query.compilation_time.avg Seconds (s) Query Times. Average time taken to compile a query over the last 24-hour period.
snowflake.query.executed {queries} Executed query count for the warehouse over the last 24-hour period.
snowflake.query.execution_time.avg Seconds (s) Query Times. Average time spent executing queries in the database over the last 24-hour period.
snowflake.query.queued_overload {queries} Overloaded query count for the warehouse over the last 24-hour period.
snowflake.query.queued_provision {queries} Number of compute resources queued for provisioning over the last 24-hour period.
snowflake.queued_overload_time.avg Seconds (s) Queued Times. Average time spent in the warehouse queue due to the warehouse being overloaded over the last 24-hour period.
snowflake.queued_provisioning_time.avg Seconds (s) Queued Times. Average time spent in the warehouse queue waiting for resources to provision over the last 24-hour period.
snowflake.queued_repair_time.avg Seconds (s) Queued Times. Average time spent in warehouse queue waiting for compute resources to be repaired over the last 24-hour period.
snowflake.storage.stage_bytes.total Bytes (By) Storage Bytes. Number of bytes of stage storage used by files in all internal stages (named, table, user).
snowflake.storage.storage_bytes.total Bytes (By) Storage Bytes. Number of bytes of table storage used, including bytes for data currently in Time Travel.
snowflake.total_elapsed_time.avg Seconds (s) Total Elapsed Time. Average elapsed time over the last 24-hour period.

Optional metrics

Metric Units Description
snowflake.billing.cloud_service.total {credits}

Reported total credits used in the cloud service over the last 24-hour period.

snowflake.billing.total_credit.total {credits} Used Credits. Reported total credits used across the account over the last 24-hour period.
snowflake.billing.virtual_warehouse.total {credits} Reported total credits used by the virtual warehouse service over the last 24-hour period.
snowflake.billing.warehouse.cloud_service.total {credits} Credits used across the cloud service for the given warehouse over the last 24-hour period.
snowflake.billing.warehouse.total_credit.total {credits} Total credits used associated with the given warehouse over the last 24-hour period.
snowflake.billing.warehouse.virtual_warehouse.total {credits} Total credits used by the virtual warehouse service for the given warehouse over the last 24-hour period.
snowflake.logins.total {logins} Total login attempts for account over the last 24-hour period.
snowflake.pipe.credits_used.total {credits} Snow pipe credits total used over the last 24-hour period.
snowflake.query.bytes_spilled.local.avg Bytes (By) Average bytes spilled (intermediate results do not fit in memory) by the local storage over the last 24-hour period.
snowflake.query.bytes_spilled.remote.avg Bytes (By) Average bytes spilled (intermediate results do not fit in memory) by the remote storage over the last 24-hour period.
snowflake.query.data_scanned_cache.avg Percentage (%) Average percentage of data scanned from cache over the last 24-hour period.
snowflake.query.partitions_scanned.avg {partitions} Number of partitions scanned during the query so far over the last 24-hour period.
snowflake.rows_deleted.avg {rows} Row Operations. Number of rows deleted from a table (or tables) over the last 24-hour period.
snowflake.rows_inserted.avg {rows} Row Operations. Number of rows inserted into a table (or tables) over the last 24-hour period.
snowflake.rows_produced.avg {rows} Row Operations. Average number of rows produced by the statement over the last 24-hour period.
snowflake.rows_unloaded.avg {rows} Row Operations. Average number of rows unloaded during data export over the last 24-hour period.
snowflake.rows_updated.avg {rows} Row Operations. Average number of rows updated in a table over the last 24-hour period.
snowflake.session_id.count {session ids} Distinct session id's associated with the snowflake username over the last 24-hour period.
snowflake.storage.failsafe_bytes.total Bytes (By) Number of bytes of data in Fail-safe.

ZooKeeper metrics

Metric Units Description
zookeeper.connection.active Connections

The number of active clients connected to a ZooKeeper server.

zookeeper.data_tree.ephemeral_node.count Nodes The number of ephemeral nodes that a ZooKeeper server has in its data tree.
zookeeper.data_tree.size Byte The size of data in bytes that a ZooKeeper server has in its data tree.
zookeeper.file_descriptor.available File_descriptors The number of file descriptors that a ZooKeeper still has available.
zookeeper.file_descriptor.limit File_descriptors The maximum number of file descriptors that a ZooKeeper server can open.
zookeeper.file_descriptor.open File_descriptors The number of file descriptors that a ZooKeeper server has open.
zookeeper.latency.max milliseconds (ms) The maximum time in milliseconds for requests to be processed.
zookeeper.latency.min milliseconds (ms) The minimum time in milliseconds for requests to be processed.
zookeeper.packet.count Packets The number of ZooKeeper packets received or sent by a server.
zookeeper.packet.count.rate Packets per second The number of ZooKeeper packets received and sent by a server.
zookeeper.request.active Requests The number of currently executing requests.
zookeeper.watch.count Watches The number of watches placed on Z-Nodes on a ZooKeeper server.
zookeeper.znode.count Znodes The number of Z-Nodes that a ZooKeeper server has in its data tree.

Telegraf metrics

When you integrate with DNS Query, FluentD, HAProxy, NGINX Plus API, NTPq, PHP-FPM, or Varnish, the SolarWinds Observability Agent is used to send metrics to SolarWinds Observability SaaS. See Monitor with Telegraf.

DNS Query metrics

For a comprehensive list of metrics, see DNS Query Input Plugin at GitHub.

MetricUnitsDescription
query_time_msMillisecond (ms)The time it takes the query to run (in milliseconds).
result_code

integers:

  • success = 0
  • timeout = 1
  • error = 2
The result code, as an integer. See DNS Query Input Plugin.
rcode_valueintegerResult code value. See DNS Query Input Plugin.

FluentD metrics

For a comprehensive list of metrics, see Fluentd Input Plugin at GitHub.

MetricUnitsDescription
fluentd_buffer_available_buffer_space_ratiosPercent (%)Available Buffer Space. The percentage of remaining available buffer space.
fluentd_buffer_queue_byte_sizeBytes (B)Buffer Queue Bytes. The current size of queued buffer chunks (in bytes).
fluentd_buffer_queue_length Buffer Queue Length. The length of the buffer queue.
fluentd_buffer_stage_byte_sizeBytes (B)Buffer Stage Bytes. The current size of staged buffer chunks (in bytes).
fluentd_buffer_stage_length Buffer Stage Length. The length of staged buffer chunks.
fluentd_buffer_total_queued_sizeBytes (B)Buffer Queue Size. The size of the buffer queue.
fluentd_emit_count{emits}Total Record Emit Count. The total number of emit calls.
fluentd_emit_records{records}Total Emit Records. The total number of emitted records.
fluentd_emit_sizeBytes (B)Total Emit Size. The total size of emit events.
fluentd_retry_count{retries}Retry Count. The number of retry attempts.
fluentd_rollback_count{count}Total Rollback Count. The total number of rollbacks. Rollbacks happen when write/try_write fails.
fluentd_slow_flush_count{count}Total Slow Flush Count. The total number of slow flushes. This count will be incremented when buffer flush is longer than slow_flush_log_threshold.
fluentd_write_count{count}The total number of writes.

HAProxy metrics

For a comprehensive list of metrics, see HAProxy Input Plugin at GitHub and HaProxy documentation at docs.haproxy.org.

SolarWinds Observability SaaS expects that metrics return a number. Some HAProxy metrics, such as status, return strings, and thus are not supported.

MetricUnitsDescription
haproxy_active_servers{servers}Active Servers. The number of currently active servers.
haproxy_backup_servers{servers}Backup Servers. The number of available backup servers.
haproxy_binbytesTotal In and Out Traffic. The cumulative total of incoming traffic.
haproxy_boutbytesTotal In and Out Traffic. The cumulative total of outgoing traffic.
haproxy_dreq{requests}Total Denied Requests. The cumulative number of requests denied because of security concerns.
haproxy_dcon{requests}Total Denied Requests. The cumulative number of requests denied by the 'tcp-request connection' rules.
haproxy_dses{requests}Total Denied Requests. The cumulative number of requests denied by the 'tcp-request session' rules.
haproxy_dresp{responses}Total Denied Responses. The cumulative number of responses denied because of security concerns. For HTTP, the responses are denied because of a matched http-request rule, or 'option checkcache'.
haproxy_eresp{responses}Total Denied Responses. The cumulative number of response errors, such as srv_abrt, or write errors on the client socket, or failure applying filters to the response.
haproxy_ereq{errors}Total Request Errors. The cumulative number of request errors, such as early termination from the client, read error, client timeout, client closed connection,.
haproxy_econ{errors}Total Request Errors. The cumulative number of request errors encountered when trying to connect to a backend server. The backend stat is the sum of the stat for all servers of that backend, plus any connection errors not associated with a particular server (such as the backend having no active servers).
haproxy_scur{sessions}Current Sessions. The number of current sessions per proxy
haproxy_slim{sessions}Session Limit. The currently configured session limit.
haproxy_stot{sessions}Total Sessions. The cumulative number of sessions.
haproxy_req_raterequests per secondRequest Rate. HTTP requests per second over the last elapsed second.
haproxy_rtimeMilliseconds (ms)Response Time. The average response time over the 1024 last requests (0 for TCP).
haproxy_req_tot{requests}Total Requests. The total number of received HTTP requests.
haproxy_ctimeMilliseconds (ms)Connection Time. The average connect time over the last 1024 responses.
haproxy_qtimeMilliseconds (ms)Queue Time. The average queue time over the last 1024 responses.
haproxy_ttimeMilliseconds (ms)Session Time. The average session time over the last 1024 responses.
haproxy_http_response.2xx{responses}Total Responses 2xx. The total number of HTTP responses with the 2xx code.
haproxy_http_response.3xx{responses}Total Responses 3xx. The total number of HTTP responses with the 3xx code.
haproxy_http_response.4xx{responses}Total Responses 4xx. The total number of HTTP responses with the 4xx code.
haproxy_http_response.5xx{responses}Total Responses 5xx. The total number of HTTP responses with the 5xx code.

NGINX Plus API metrics

For a more comprehensive list of metrics, see Nginx Virtual Host Traffic (VTS) Input Plugin and Nginx Plus API Input Plugin at GitHub.

MetricUnitsDescription
nginx_vts_connections{connections}The number of connections of individual types: active, reading, writing, waiting, accepted handled, requests.
nginx_vts_server, nginx_vts_filter  
nginx_vts_upstream 

 

nginx_vts_cache  

NTPq metrics

For a comprehensive list of metrics, see NTPQ Input Plugin at GitHub.

MetricUnitsDescription
ntpq_delayMilliseconds (ms)Round Trip Delay. Round trip communication delay to the remote peer or server.
ntpq_jitterMilliseconds (ms)Jitter. Mean deviation (jitter) in the time reported for the remote peer or server (RMS or difference of multiple time samples).
ntpq_offsetMilliseconds (ms)Time Offsets. Mean offset (phase) in the times reported between this local host and the remote peer or server (RMS)
ntpq_pollMinutes (min)Polling Frequency. RFC5905 suggests that this ranges in NTPv4 from 4 (16s) to 17 (36h) (log2 seconds), however, the observation suggests the actual displayed value is seconds for a much smaller range of 64 (26) to 1024 (210) seconds.
ntpq_reachOctal numbersReach. An 8-bit left-shift shift register value recording polls (bit set = successful, bit reset = fail) displayed in octal by default. The type can be changed to decimal/count/ratio by configuring it in the ntpq input section inside telegraf.conf.
ntpq_whenMinutes (min)Last Poll. The time since the last poll.

PHP-FPM

For a comprehensive list of metrics, see PHP-FPM Input Plugin at GitHub.

MetricUnitDescription
phpfpm_accepted_connCountTotal number of accepted connections.
phpfpm_active_processesCountNumber of active (busy) processes.
phpfpm_idle_processesCountNumber of idle (waiting) processes.
phpfpm_listen_queueCountNumber of requests in the queue of pending connections.
phpfpm_listen_queue_lenCountMaximum number of requests in the listen queue since FPM has started.
phpfpm_max_active_processesCountMaximum number of active processes since FPM has started.
phpfpm_max_children_reachedCountNumber of times the process limit has been reached.
phpfpm_max_listen_queueCountMaximum number of requests in the listen queue since FPM has started.
phpfpm_slow_requestsCountNumber of requests that exceeded the request_slowlog_timeout value.
phpfpm_start_sinceSecondsTime since FPM has started.
phpfpm_total_processesCountTotal number of processes.

Varnish metrics

For a comprehensive list of metrics, see Varnish Input Plugin at GitHub.

MetricUnitsDescription
varnish_client_req{requests}Total Client Requests. The number of good client requests.
varnish_s_req_bodybytesbytesTotal Bytes. Total bytes for requests and responses.
varnish_s_req_hdrbytesbytesTotal Bytes. Total bytes for requests and responses.
varnish_s_resp_bodybytesbytesTotal Bytes. Total bytes for requests and responses.
varnish_s_ressp_hdrbytesbytesTotal Bytes. Total bytes for requests and responses.
varnish_sess_dropped{sessions}Total Failed and Dropped Sessions. The number of sessions dropped for thread. The number of times an HTTP/1 session was drpped because the queue was too long already. See thread_queue_limit.
varnish_sess_fail{sessions}Total Failed and Dropped Sessions. The number of sessions accept failure. The number of failures to accept a TCP connection. This counter is the sum of the sess_fail_* counters which give more detailed information.
varnish_sess_closed{operations}Total Session Operations. The number of closed sessions.
varnish_sess_herd{operations}Total Session Operations. The number of times the timeout_linger triggered.
varnish_sess_readahead{operations}Total Session Operations. The number of read ahead sessions.
varnish_sess_closed_err{operations}Total Session Operations. The number of sessions. closed with errors.
varnish_s_sess{sessions}Total Sessions. The total number of sessions that occurred.
varnish_n_expired{objects}Total Number of Objects. The number of objects expired because of old age.
varnish_n_lru_moved{objects}Total Number of Objects. The number of moved LRU objects (move operations done on the LRU list).
varnish_n_lru_nuked{objects}Total Number of Objects. The number of objects that have been forcefully evicted from the storage to make room for a new object (LRU nuked objects).
varnish_cache_miss{count}Total Cache Hits and Misses. The number of cache misses. A cache miss indicates that the object was fetched from the backend before delivering it to the client.
varnish_cache_hit{count}Total Cache Hits and Misses. The number of cache hits. A cache hit indicates that the object was delivered to a client without fetching it from a backend server.
varnish_backend_busy
{connections}Total Backed Connections. The number of times Varnish encountered a situation where it considered the backend to be too busy to handle additional connections.
varnish_backend_conn
{connections}Total Backed Connections. The number of successful backend connections.
varnish_backend_fail
{connections}Total Backed Connections. The number of failed backend connections.
varnish_backend_recycle
{connections}Total Backed Connections. The number of recycled backend connections.
varnish_backend_retry
{connections}Total Backed Connections. The number of retried backend connections.
varnish_backend_reuse
{connections}Total Backed Connections. The number of reused backend connections.
varnish_backend_unhealthy{connections}Total Backed Connections. The number of unhealthy backend connections.
varnish_fetch_length
varnish_fetch_bad
varnish_fetch_eof
varnish_fetch_failed
varnish_fetch_head
varnish_fetch_chunked
varnish_fetch_1xx
varnish_fetch_204
varnish_fetch_304
varnish_fetch_none
varnish_fetch_no_thread
{fetches}Total HTTP Request Fetches. The number of all request fetches by type.
varnish_shm_cont
{operations}Total Shared Memory Operations. The number of contention operations (when multiple threads compete for access to SHM resources).
varnish_shm_cycles
{operations}Total Shared Memory Operations. The number of times data cycles through the shared memory.
varnish_shm_flushes
{operations}Total Shared Memory Operations. The number of flush operations.
varnish_shm_records
{operations}Total Shared Memory Operations. The number of record operations.
varnish_shm_writes
{operations}Total Shared Memory Operations. The number of write operations.
varnish_thread_queue_len{count}Total Session Queue Length. The length of session queue waiting for threads.
varnish_threads{workers}Total Workers. The number of threads in all pools.
varnish_sess_queued{sessions}Total Queued Sessions. Sessions queued for thread. The number of times a session was queued waiting for a thread.
varnish_threads_created{threads}Total Worker Threads. The total number of threads created in all pools.
varnish_threads_destroyed{threads}Total Worker Threads. The total number of threads destroyed in all pools.
varnish_threads_failed{threads}Total Worker Threads. The number of times creating a thread failed.
varnish_threads_limited{threads}Total Worker Threads. The number of times more threads were needed but the limit was reached in a thread pool.