Metrics for SolarWinds Observability SaaS entities
Many of the collected metrics from SolarWinds Observability entities are displayed as widgets in SolarWinds Observability explorers; additional metrics may be collected and available in the Metrics Explorer. You can also create an alert for when an entity's metric value moves out of a specific range. See Entities in SolarWinds Observability SaaS for information about entity types in SolarWinds Observability SaaS.
Common metrics
The following metric(s) are available for all entities in SolarWinds Observability SaaS.
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health state. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health state separately for each specific entity type in the Metrics Explorer, group the |
APM/service metrics
Metrics for service entities are sent by APM libraries installed and configured to monitor your service. See Application performance monitoring (APM) for more information.
Standard metrics
The tables below list the default set of metrics collected by the APM library for all requests. Counts are reported every minute.
Primary service metrics
Metric | Units | Description |
---|---|---|
trace.service.errors
|
Count | Count of requests that ended with an error status. Aggregate by Sum to see the total error count for the service. |
trace.service.error_ratio
|
% | Ratio of errors to requests, calculated by dividing the number of requests with errors by the total number of requests. |
trace.service.requests
|
Count | Count of requests for each HTTP status code (200, 404, etc.). Aggregate by Sum to see the total request count for the service. |
trace.service.request_rate
|
Count | Rate of requests per second, calculated by dividing the number of requests (trace.service.requests ) by the length of the aggregation period in seconds. |
trace.service.response_time
|
Milliseconds (ms) |
Duration of each entry span for the service, typically meaning the time taken to process an inbound request. This metric is based on the following attributes:
This is the primary metric to track service response time. |
Service metrics stored with percentiles
Metric | Units | Description |
---|---|---|
trace.service.service_response_time
|
ms | Duration of each entry span for the service, typically meaning the time taken to process an inbound request. This metric is based on the service.name attribute. |
trace.service.service_response_time.p50
trace.service.service_response_time.p95
trace.service.service_response_time.p99
trace.service.service_response_time.p999
|
ms | Percentile values for the trace.service.service_response_time metric. |
trace.service.transaction_response_time
|
ms | Duration of each entry span for the service, typically meaning the time taken to process an inbound request. |
trace.service.transaction_response_time.p50
trace.service.transaction_response_time.p95
trace.service.transaction_response_time.p99
trace.service.transaction_response_time.p999
|
ms | Percentile values for the trace.service.transaction_response_time metric. |
Service counters
Metrics representing counts of service related entities
Metric | Units | Description |
---|---|---|
trace.service.count
|
Count | Number of services that were reporting data in selected time period. |
trace.service.faas.count
|
Count | Number of AWS Lambda functions for which APM Services were reporting data during the selected time period. |
trace.service.faas.instance.count
|
Count | Number of AWS Lambda instances for which APM Services were reporting data during the selected time period. |
trace.service.hosts.count
|
Count |
Number of APM Hosts for which APM Services were reporting data during the selected time period. Unique APM Host is captured only for Azure VMs, AWS EC2 Instances, and hosts monitored with UAMS. |
trace.service.instance.count
|
Count | Number of service instances that were reporting data during the selected time period. |
trace.service.pod.count
|
Count | Number of Kubernetes Pods for which APM Services were reporting data in selected time period. |
trace.service.samplecount
|
Count | Count of requests that went through a sampling decision, which excludes those with valid upstream decision and trigger trace requests. |
trace.service.tracecount
|
Count | Count of traces generated from requests. |
trace.service.transaction.count
|
Count |
Number of transactions that were reporting data during the selected time period. This metric is based on the following attributes:
|
Sampled trace-derived metrics
Database metrics
Metric | Units | Description |
---|---|---|
trace.service.outbound_calls.database.query.response_time
|
ms | Duration of traced queries executed by the service to the database. |
Cache metrics
Metric | Units | Description |
---|---|---|
trace.service.outbound_calls.cache.op.hits
|
Count |
The count of successful retrievals from cache |
trace.service.outbound_calls.cache.op.requests
|
Count | Number of cache keys returned by the cache call. If the number of keys is not returned, every cache call is counted once. |
trace.service.outbound_calls.cache.op.response_time
|
ms | Duration of traced cache calls executed by the service to the cache engine. |
Remote service metrics
Metric | Units | Description |
---|---|---|
trace.service.outbound_calls.remote_service.call.response_time
|
ms | Duration of spans representing remote calls executed by the service to a remote endpoint or remote instrumented service. |
Exception metrics
Metric | Units | Description |
---|---|---|
trace.service.exceptions.count
|
Count |
Service exceptions count captured in traces. Total number of error events for traced requests. An event is classified as an error if:
|
Other sampled metrics
Metric | Units | Description |
---|---|---|
trace.service.breakdown.response_time
|
Microseconds (μs) |
Trace Response Time Breakdown, Average Trace Response Time Breakdown. The amount of time it takes to complete a service transaction, broken down by operation type (for example, application, database calls, or remote calls). The average trace response time breakdown is calculated based on collected traces, so it covers only sampled traces. It can be used to analyze the impact that different types of operations performed by the service have on the average response time. |
Runtime metrics
See the links below on the metrics for each language runtime and library-specific configuration:
Database metrics
Metrics for database instance entities are sent by the SolarWinds Observability Agent monitoring your databases. See Database monitoring for more information.
Metric | Units | Description |
---|---|---|
dbo.host.queries.errors.tput
|
EPS |
Errors, Error Rate. The number of recorded errors for your database instances per second; the total number of errors returned per second across your monitored databases. Incorrect database responses may indicate request are failing, while throughput and response time appear healthy. |
dbo.host.queries.latency_us
|
milliseconds (ms) |
Response Time. The amount of query latency in milliseconds per query execution across your monitored databases. May be displayed as:
|
dbo.host.queries.p99_latency_us
|
milliseconds (ms) |
Response Time 99th percentile. The amount of response time in the 99th percentile value for each of the top selected queries. |
dbo.host.queries.time_us
|
Count |
Load. The load on your monitored databases, as a number of requests executing simultaneously. Concurrency reveals load (or demand) in a way that is orthogonal to variations in request speed or frequency. |
dbo.host.queries.tput
|
QPS |
Throughput. The number of queries or statements completed per second. This is a metric of traffic intensity and frequency, showing how many requests your servers are processing. |
Digital Experience /website metrics
Metrics for website entities are either collected by probes that synthetically test your website's availability, or sent by the RUM script added to your website. See Digital experience monitoring .
Synthetic availability metrics
Synthetic transaction metrics
Metric | Units | Description |
---|---|---|
composite.synthetics.availability
|
Percent (%) | Status Changes, Status History. Represents if a synthetic transaction is available or unavailable.
Found in: Metrics Explorer, Entity Explorer (Overview tab) |
composite.synthetics.status.downtime.count
|
Count |
Value representing the number of times the synthetic transaction entity was down in a given time range. For example, if the entity was down the entirety of the time range, the count would be 1. |
composite.synthetics.status.downtime.total
|
Seconds (s) | The total downtime for the entity during the specified time range. |
synthetics.overall_status.duration.total
|
Seconds (s) | The total amount of time the entity had a given a status. |
synthetics.overall_status.duration
|
Seconds (s) | The amount of time in seconds the entity had a given status. |
synthetics.transaction.attempts
|
Count | The number of attempted executions of your synthetic transaction for the selected time period. |
synthetics.transaction.duration
|
Seconds (s) |
Historical Overview. The amount of time in seconds that it took your synthetic transaction to complete its execution. May be displayed as:
|
synthetics.transaction.error_rate
|
Percentage (%) | Test Success Rate. Value representing the percentage of failed transaction attempts for the selected time period. |
synthetics.transaction.errors
|
Count | Test Success Rate. Value representing the sum of failed transaction attempts for the selected time period. Used to calculate the Synthetic transaction error rate. |
synthetics.transaction.result
|
Count | The number of times the synthetic transaction resulted in a success or error. |
synthetics.transaction.success_rate
|
Percentage (%) | Test Success Rate. Value representing the percentage of successful transaction attempts for the selected time period. |
synthetics.transaction.successes
|
Count | Test Success Rate. Value representing the sum of successful transaction attempts for the selected time period. Used to calculate the Synthetic transaction success rate. |
RUM metrics
Metric | Units | Description |
---|---|---|
composite.rum.session.bounce_rate
|
Percent (%) | Bounce Rate. The percentage of users who abandon the website immediately after landing on one of its pages. |
rum.pageview.apdex_score
|
Apdex score. A measurement of user satisfaction, using the Application Performance Index standard to specify the degree to which measured performance meets user expectations. The satisfactory load time, tolerating, and frustrated load times are defined when creating the website entity. For more information about the Apdex standard, Defining the Application Performance Index. If the response time for requests takes less time than the satisfied load time threshold set for your website, the Apdex score is considered a satisfied load time. It is a tolerating load time if the response time takes up to four times the satisfied load time threshold, and a frustrated load time if it takes longer than four times the satisfied load time threshold. |
|
rum.pageview.client_processing
|
seconds (s) | Client Processing Time. Measurement of the time from when the browser sends the initial HTTP request until all synchronous load events have been processed, including layout and running scripts. |
rum.pageview.count
|
Count | PageViews. Count of the views of your webpage(s). |
rum.pageview.load_time
|
seconds (s) | Load Time. The amount of time for the website to fully load. |
rum.pageview.ttfb
|
seconds (s) | Time to First Byte. The amount of time between when the browser requested a page and when it received the first byte of information from the server. |
rum.web_vitals.largest_contentful_paint
|
seconds (s) |
Largest Contentful Paint. A measurement of how quickly the largest image or text content of a web page is loaded. Largest contentful paint time is considered good if loading the largest image or text block takes less than 2.5 seconds, needs improvement if it takes up to 4.0 seconds, and poor if it takes longer than 4.0 seconds. |
rum.web_vitals.interaction_to_next_paint
|
milliseconds(ms) |
Interaction to Next Paint. A measurement of how quickly the website responds to user interactions such as clicks and key presses. Interaction to next paint time is considered good if the response to a customer's first interaction with the website is below or at 200ms, needs improvement if it takes up to 500ms, and poor if it takes longer than 500ms. |
rum.web_vitals.cumulative_layout_shift
|
Cumulative Layout Shift. Measures how much a webpage shifts unexpectedly while a user is viewing the webpage. A shift may occur if content loads at different speeds or if elements are added to the website dynamically. A cumulative layout shift value of less than .1 is considered good, a value up to .25 needs improvement, and a value greater than .25 is poor. |
|
rum.web_vitals.first_input_delay
|
seconds (s) |
First Input Delay. Time from when a user first interacts with your site to the time when the browser is able to respond to the interaction. First input delay (FID) helps measure the first impression a user has of your site's responsiveness. The FID is considered good if responding to a customer’s first interaction with the site takes less than 100ms, needs improvement if it takes up to 300 ms, and poor if it takes longer than 300 ms. |
rum.session.count
|
Sessions, Top 10 countries by session. The total number of sessions, or visits, to the website during the selected time period and by country. A single session includes every action that the user takes during the entirety of their visit to the website. |
Infrastructure/self-managed host metrics
Metrics for self-managed host entities are sent by the SolarWinds Observability Agent monitoring your host. See Host monitoring for more information.
SolarWinds Observability Agent metrics
Metrics for entities are sent by the SolarWinds Observability Agent monitoring your agent. See SolarWinds Observability Agents for more information.
Metric | Units | Description |
---|---|---|
swo.uams.agent.status
|
possible values: ok , updating, update_failed, restarting, disconnected, stopping, jwt_expired |
The reported operating status of the Agent |
swo.uams.agent.heartbeat
|
Reported by the SolarWinds Observability Agent every 1 minute, when it is not reported then may indicate problems with network or the agent. |
|
swo.uams.agent.cpu
|
Percent (%) | The average amount of CPU capacity in use, as a percentage |
swo.uams.agent.memory
|
Percent (%) |
The average amount of memory in use, as a percentage. |
swo.uams.agent.diskUsage
|
Percent (%) | The amount of storage being used by files and data. |
swo.uams.agent.networkIn
|
The average amount of data received over the network, in bits. This metric is not collected for Windows due to operating system limitations. |
|
swo.uams.agent.networkOut
|
|
The average amount of data send over the network, in bits. This metric is not collected for Windows due to operating system limitations. |
swo.uams.agent.errors.count
|
The amount of errors from the Agent logs - it is calculated from the recent Agent restart. | |
swo.uams.agent.uptime
|
The amount of time from the recent SWO Agent restart. | |
swo.uams.plugin.cpu |
The average amount of CPU used by the plugin, as a percentage . | |
swo.uams.plugin.memory
|
The average amount of memory used by the plugin, as a percentage. | |
swo.uams.plugin.uptime
|
The amount of time from the recent plugin or SWO Agent restart. | |
swo.uams.plugin.status
|
The reported operating status of plugin. See Possible values for plugin status. | |
swo.uams.plugin.healthy
|
0,1 | It is calculated based on reported operating status of plugin and indicate for problems with plugin. |
Possible values for plugin status
Plugin status | Healthy metric value | Description |
---|---|---|
STATUS_CODE_OK
|
1 | The plugin is responding via health checks. |
STATUS_CODE_STOPPED
|
0 | The plugin process stopped by user, not caused by error. |
STATUS_CODE_BROKEN
|
0 | The plugin was not deployed correctly. |
STATUS_CODE_START_FAILED
|
0 | The plugin process cannot be started and Agent tries run it in the loop. |
STATUS_CODE_NOT_RESPONDING
|
0 | The health check from the plugin process was not received for a defined amount of time but the plugin process is running. |
STATUS_CODE_HEALTHCHECK_FAILED
|
0 | Failed to send a health check request to the plugin process. |
STATUS_CODE_CONFIGURATION_ISSUE
|
0 | Reported by plugin and it indicates an invalid or missing configuration. |
STATUS_CODE_FAILED
|
0 | The plugin process was stopped unexpectedly. |
STATUS_CODE_STARTING
|
0 | Start for plugin process was called. |
STATUS_CODE_RESTARTING
|
1 | Restart was called. |
STATUS_CODE_STOPPING
|
0 | Stop for plugin process was called. |
STATUS_CODE_UPDATING
|
0 | Update for plugin was called. |
STATUS_CODE_CRITICAL
|
0 | Reported by plugin. |
STATUS_CODE_WARNING
|
0 | Reported by plugin. |
STATUS_CODE_JWT_EXPIRED
|
0 | It is not possible to refresh JWT. |
STATUS_CODE_UPDATE_FAILED
|
0 | Problems with plugin update. |
STATUS_CODE_INVALID
|
0 | Unknown reason. |
Infrastructure/AWS metrics
Metrics for AWS entities are collected by integrating SolarWinds Observability SaaS with your AWS cloud account. See AWS cloud platform monitoring.
API Gateway
Metric | Units | Description |
---|---|---|
AWS.ApiGateway.4XXError
|
Count |
4XXError. The total number of client-side errors for REST APIs captured in a given period. |
AWS.ApiGateway.4xx
|
Count |
4xx. The total number of client-side errors for HTTP APIs captured in a given period. |
AWS.ApiGateway.5XXError
|
Count |
5XXError. The total number of server-side errors for REST APIs captured in a given period. |
AWS.ApiGateway.5xx
|
Count |
5xx. The total number of server-side errors for HTTP APIs captured in a given period. |
AWS.ApiGateway.CacheHitCount
|
Count |
CacheHitCount. The total number of requests served from the API cache in a given period. |
AWS.ApiGateway.CacheMissCount
|
Count |
CacheMissCount. The total number of requests served from the backend in a given period, when API caching is enabled. |
AWS.ApiGateway.ClientError
|
Count |
ClientError. The total number of requests that have a 4XX response returned by API Gateway before the integration is invoked. |
AWS.ApiGateway.ConnectCount
|
Count |
ConnectCount. The total number of messages sent to the connect route integration. |
AWS.ApiGateway.Count
|
Count |
Count. The total number of API requests in a given period. |
AWS.ApiGateway.DataProcessed
|
bytes |
DataProcessed. The total amount of data processed in bytes. |
AWS.ApiGateway.ExecutionError
|
Count |
ExecutionError. The total number of errors that occurred when calling the integration. |
AWS.ApiGateway.HttpRateOf5xxError
|
Count |
The number of HTTP 5xx errors (server-side errors) that occur in a given period for REST APIs |
AWS.ApiGateway.IntegrationError
|
Count |
IntegrationError. The total number of requests that return a 4XX or 5XX response from the integration. |
AWS.ApiGateway.IntegrationLatency
|
milliseconds (ms) |
IntegrationLatency. The average time between when API Gateway relays a request to the backend and when it receives a response from the backend. |
AWS.ApiGateway.Latency
|
milliseconds (ms) |
Latency. The average time between when API Gateway receives a request from a client and when it returns a response to the client. |
AWS.ApiGateway.MessageCount
|
Count |
MessageCount. The total number of messages sent to the WebSocket API, either from or to the client. |
AWS.ApiGateway.RestRateOf5xxError
|
Count |
The number of 5xx errors for REST APIs. |
AWS.ApiGateway.WebsocketRateOfExecutionError
|
Count |
The rate of execution errors for WebSocket APIs. |
Application ELB
Metric | Units | Description |
---|---|---|
AWS.ApplicationELB.ActiveConnectionCount
|
Count |
ActiveConnectionCount. The total number of concurrent TCP connections active from clients to the load balancer and from the load balancer to targets. |
AWS.ApplicationELB.AnomalousHostCount
|
Count | The number of hosts detected with anomalies. |
AWS.ApplicationELB.ClientTLSNegotiationErrorCount
|
Count |
The number of TLS connections initiated by the client that did not establish a session with the load balancer due to a TLS error. |
AWS.ApplicationELB.ConsumedLCUs
|
Count |
ConsumedLCUs. The total number of load balancer capacity units (LCU) used by load balancer. |
AWS.ApplicationELB.DesyncMitigationMode_NonCompliant_Request_Count
|
Count |
The number of requests that do not comply with RFC 7230 and are classified as non-compliant. |
AWS.ApplicationELB.DroppedInvalidHeaderRequestCount
|
Count |
The number of requests that were dropped because they contained invalid headers. |
AWS.ApplicationELB.ELBAuthError
|
Count |
The number of authentication errors encountered by the load balancer. |
AWS.ApplicationELB.ELBAuthFailure
|
Count |
The number of authentication failures. |
AWS.ApplicationELB.ELBAuthLatency
|
milliseconds (ms) |
Time taken by the load balancer to authenticate requests. It includes the time from when the request is received to when the authentication process is completed. |
AWS.ApplicationELB.ELBAuthRefreshTokenSuccess
|
Count |
The number of successful token refresh operations performed by the load balancer. |
AWS.ApplicationELB.ELBAuthSuccess
|
Count |
The number of successful authentication attempts by the load balancer. |
AWS.ApplicationELB.ELBAuthUserClaimsSizeExceeded
|
Count |
The number of authentication requests that were rejected because the size of the user claims exceeded the allowed limit. |
AWS.ApplicationELB.ForwardedInvalidHeaderRequestCount
|
Count |
The number of requests with invalid headers that were forwarded to the backend servers. The load balancer forwards these requests even though they contain invalid headers. |
AWS.ApplicationELB.GrpcRequestCount
|
Count |
The number of gRPC requests processed by the load balancer. It includes both IPv4 and IPv6 requests. |
AWS.ApplicationELB.HealthyHostCount
|
Count |
HealthyHostCount. The average number of targets that are considered healthy. |
AWS.ApplicationELB.HealthyHostRate
|
Percent (%) |
The rate at which the registered targets in an Application Load Balancer (ALB) are healthy. |
AWS.ApplicationELB.HealthyStateDNS
|
Count |
Indicates the health status of the DNS endpoints for the ALB. It shows whether the DNS endpoints are healthy and able to route traffic correctly. |
AWS.ApplicationELB.HealthyStateRouting
|
Count |
Reflects the health status of the routing components of the ALB. It indicates whether the load balancer is successfully routing traffic to healthy targets. |
AWS.ApplicationELB.HTTP_Fixed_Response_Count
|
Count |
the number of HTTP responses with a status code of 503 (Service Unavailable) returned by the ALB. |
AWS.ApplicationELB.HTTP_Redirect_Count
|
Count |
The number of HTTP responses with a status code of 301 (Moved Permanently) or 302 (Found) returned by the ALB. |
AWS.ApplicationELB.HTTP_Redirect_Url_Limit_Exceeded_Count
|
Count |
The number of times the ALB has exceeded the limit for the number of URLs that can be included in HTTP redirect responses. |
AWS.ApplicationELB.HTTPCode_ELB_3XX_Count
|
Count |
The number of HTTP responses with a status code in the 300-399 range (Multiple Choices, Redirection) returned by the ALB. |
AWS.ApplicationELB.HTTPCode_ELB_4XX_Count
|
Count |
HTTPCode_ELB_4XX_Count. The total number of HTTP 4XX client error codes that originate from the load balancer. |
AWS.ApplicationELB.HTTPCode_ELB_5XX_Count
|
Count |
HTTPCode_ELB_5XX_Count. The total number of HTTP 5XX client error codes that originate from the load balancer. |
AWS.ApplicationELB.HTTPCode_ELB_500_Count
|
Count |
The number of HTTP 500 (Internal Server Error) responses returned by the Application Load Balancer (ALB). |
AWS.ApplicationELB.HTTPCode_ELB_502_Count
|
Count |
The number of HTTP 502 (Bad Gateway) responses returned by the ALB. It indicates that the ALB received an invalid response from an inbound server while acting as a gateway or proxy. |
AWS.ApplicationELB.HTTPCode_ELB_503_Count
|
Count |
The number of HTTP 503 (Service Unavailable) responses returned by the ALB. It indicates that the ALB is temporarily unable to handle the request, usually due to a temporary overloading or maintenance of the server. |
AWS.ApplicationELB.HTTPCode_ELB_504_Count
|
Count |
The number of HTTP 504 (Gateway Timeout) responses returned by the ALB. It indicates that the ALB did not receive a timely response from an upstream server while acting as a gateway or proxy. |
AWS.ApplicationELB.HTTPCode_Target_2XX_Count
|
Count |
The number of HTTP 2xx (Success) responses returned by the targets in response to the ALB. It indicates that the request was successfully processed by the target. |
AWS.ApplicationELB.HTTPCode_Target_3XX_Count
|
Count |
The number of HTTP 3xx (Redirection) responses returned by the targets in response to the ALB. It indicates that further action needs to be taken by the client to complete the request. |
AWS.ApplicationELB.HTTPCode_Target_4XX_Count
|
Count |
HTTPCode_Target_4XX_Count. The total number of HTTP response with 4xx status codes generated by the targets. This does not include any response codes generated by the load balancer. |
AWS.ApplicationELB.HTTPCode_Target_5XX_Count
|
Count |
HTTPCode_Target_5XX_Count. The total number of HTTP response with 5xx status codes generated by the targets. This does not include any response codes generated by the load balancer. |
AWS.ApplicationELB.IPv6ProcessedBytes
|
bytes |
The total number of bytes processed by the load balancer for IPv6 traffic. |
AWS.ApplicationELB.IPv6RequestCount
|
Count |
The total number of IPv6 requests received by the load balancer. |
AWS.ApplicationELB.LambdaInternalError
|
Count |
The number of errors that occurred within the Lambda function when it was invoked by the load balancer. |
AWS.ApplicationELB.LambdaTargetProcessedBytes
|
Count |
The total number of bytes processed by the Lambda target. |
AWS.ApplicationELB.LambdaUserError
|
Count |
The number of errors returned by the Lambda function due to user requests. |
AWS.ApplicationELB.MitigatedHostCount
|
Count | The number of hosts mitigated by the load balancer to handle traffic. |
AWS.ApplicationELB.NewConnectionCount
|
Count |
NewConnectionCount. The total number of new TCP connections established from clients to the load balancer and from the load balancer to targets. |
AWS.ApplicationELB.NonStickyRequestCount
|
Count |
The number of requests that are not handled by sticky sessions. Sticky sessions ensure that a client's requests are always sent to the same target during a session. When sticky sessions are disabled, or if the load balancer cannot determine the session stickiness, the requests are considered non-sticky. |
AWS.ApplicationELB.ProcessedBytes
|
bytes |
ProcessedBytes. The total number of bytes processed by the load balancer over IPv4 and IPv6 (HTTP header and HTTP payload). |
AWS.ApplicationELB.RejectedConnectionCount
|
Count |
RejectedConnectionCount. The total number of connections that were rejected because the load balancer had reached its maximum number of connections. |
AWS.ApplicationELB.RequestCount
|
Count |
RequestCount. The total number of requests processed over IPv4 and IPv6. This metric is only incremented for requests where the load balancer node was able to choose a target. |
AWS.ApplicationELB.RequestCountPerTarget
|
Count |
RequestCountPerTarget. The total number of requests received by each target in a target group. |
AWS.ApplicationELB.RuleEvaluations
|
Count |
This metric counts the number of times the rules defined for your Application Load Balancer (ALB) are evaluated. Each rule determines how the load balancer routes requests to the targets in one or more target groups. |
AWS.ApplicationELB.TargetConnectionErrorCount
|
Count |
TargetConnectionErrorCount. The total number of connections that were not successfully established between the load balancer and target. This metric does not apply if the target is a Lambda function. |
AWS.ApplicationELB.TargetResponseTime
|
seconds (s) |
TargetResponseTime. The average time elapsed, in seconds, after the request leaves the load balancer until a response from the target is received. |
AWS.ApplicationELB.TargetResponseTime.p50
|
seconds (s) |
The 50th percentile (median) of the target response times. It means that 50% of the responses have a lower response time, and 50% have a higher response time |
AWS.ApplicationELB.TargetResponseTime.p90
|
seconds (s) | The 90th percentile of the target response times. It indicates that 90% of the responses have a lower response time, and 10% have a higher response time. |
AWS.ApplicationELB.TargetResponseTime.p95
|
seconds (s) |
The 95th percentile of the target response times. It means that 95% of the responses have a lower response time, and 5% have a higher response time. |
AWS.ApplicationELB.TargetResponseTime.p99
|
seconds (s) |
The 99th percentile of the target response times. It indicates that 99% of the responses have a lower response time, and 1% have a higher response time. |
AWS.ApplicationELB.TargetTLSNegotiationErrorCount
|
Count |
The number of TLS negotiation errors that occur when the load balancer tries to establish a secure connection with the target. |
AWS.ApplicationELB.UnHealthyHostCount
|
Count |
UnhealthyHostCount. The average number of targets that are considered unhealthy. |
AWS.ApplicationELB.UnhealthyRoutingRequestCount
|
Count | The number of requests routed to targets that are marked as unhealthy by the Application Load Balancer (ALB). It indicates how often requests are being sent to targets that may not be able to handle them properly. |
AWS.ApplicationELB.UnhealthyStateDNS
|
Count | The health status of the DNS endpoints for the ALB when they are in an unhealthy state. It indicates issues with the DNS endpoints that could affect the routing of traffic. |
AWS.ApplicationELB.UnhealthyStateRouting
|
Count | The health status of the routing components of the ALB when they are in an unhealthy state. It indicates issues with the load balancer's ability to route traffic correctly to healthy targets. |
Aurora Cluster
Metric | Units | Description |
---|---|---|
AWS.RDS.AuroraGlobalDBReplicationLag
|
milliseconds (ms) |
AuroraGlobalDBReplicationLag. The total amount of lag when replicating updates from the primary AWS region. |
AWS.RDS.AuroraVolumeBytesLeftTotal
|
bytes |
AuroraVolumeBytesLeftTotal. The total available space for the cluster volume. |
AWS.RDS.BacktrackChangeRecordsCreationRate
|
Count |
BacktrackChangeRecordsCreationRate. The total number of backtrack change records created over five minutes for the DB cluster. |
AWS.RDS.BacktrackChangeRecordsStored
|
Count |
BacktrackChangeRecordsCreationStored. The total number of backtrack change records used by the DB cluster. |
AWS.RDS.ServerlessDatabaseCapacity
|
Count |
ServerlessDatabaseCapacity. The total current capacity of an Aurora Serverless DB cluster. |
AWS.RDS.SnapshotStorageUsed
|
bytes |
SnapshotStorageUsed. The total amount of backup storage consumed by all Aurora snapshots for an Aurora DB cluster outside its backup retention window. |
AWS.RDS.VolumeBytesUsed
|
bytes |
VolumeBytesUsed. The total amount of storage used by the Aurora DB instance. |
AWS.RDS.VolumeReadIOPs
|
Count |
VolumeReadIOPs. The total number of billed read I/O operations from a cluster volume within a five-minute interval. |
AWS.RDS.VolumeWriteIOPs
|
Count |
VolumeWriteIOPs. The total number of write disk I/O operations to the cluster volume, reported at five-minute intervals. |
Aurora Instance
Metric | Units | Description |
---|---|---|
AWS.RDS.ActiveTransactions
|
Count per second |
ActiveTransactions. The total number of current transactions executing on an Aurora database instance per second. |
AWS.RDS.AuroraReplicaLag
|
milliseconds (ms) |
AuroraReplicaLag. The total amount of lag when replicating updates from the primary instance. |
AWS.RDS.CPUCreditBalance
|
Count |
CPUCreditBalance. The total number of CPU credits that an instance has accumulated, reported at five-minute intervals. You can use this metric to determine how long a DB instance can burst beyond its baseline performance level at a given rate. |
AWS.RDS.CPUCreditUsage
|
Count |
CPUCreditUsage. The total number of CPU credits consumed during the specified period, reported at five-minute intervals. This metric measures the amount of time during which physical CPUs have been used for processing instructions by virtual CPUs allocated to the DB instance. |
AWS.RDS.CPUUtilization
|
Percent (%) |
CPUUtilization. The total percentage of CPU used by an Aurora DB instance. |
AWS.RDS.ConnectionAttempts
|
Count |
ConnectionAttempts. The total number of attempts to connect to an instance, whether successful or not. |
AWS.RDS.DDLLatency
|
milliseconds (ms) |
DDLLatency. The total duration of requests such as example, create, alter, and drop requests. |
AWS.RDS.DDLThroughput
|
Count per second |
DDLThroughput. The total number of DDL requests per second. |
AWS.RDS.DMLLatency
|
milliseconds (ms) |
DMLLatency. The total duration of inserts, updates, and deletes. |
AWS.RDS.DMLThroughput
|
Count per second |
DMLThroughput. The total number of inserts, updates, and deletes per second. |
AWS.RDS.DatabaseConnections
|
Count |
DatabaseConnections. The total number of client network connections to the database instance. |
AWS.RDS.FreeableMemory
|
Binary Bytes |
FreeableMemory. The total amount of available random access memory. |
AWS.RDS.LoginFailures
|
Count per second |
LoginFailures. The total number of failed login attempts per second. |
AWS.RDS.MaximumUsedTransactionIDs
|
Count |
MaximumUsedTransactionIDs. The total age of the oldest unvacuumed transaction ID, in transactions. If this value reaches 2,146,483,648 (2^31 - 1,000,000), the database is forced into read-only mode to avoid transaction ID wraparound. |
AWS.RDS.ReadIOPS
|
Count per second |
ReadIOPS. The total number of disk I/O operations per second. |
AWS.RDS.ReadLatency
|
seconds (s) |
ReadLatency. The total amount of time taken per disk I/O operation. |
AWS.RDS.ReadThroughput
|
bps |
ReadThroughput. The total number of bytes read from disk per second. |
AWS.RDS.TransactionLogsDiskUsage
|
Megabytes (MB) |
TransactionLogsDiskUsage. The average amount of disk space consumed by transaction logs on the Aurora PostgreSQL DB instance. |
AWS.RDS.WriteIOPS
|
Count per second |
WriteIOPS. The total number of Aurora storage write records generated per second. |
AWS.RDS.WriteLatency
|
seconds (s) |
WriteLatency. The total amount of time taken per disk I/O operation. |
AWS.RDS.WriteThroughput
|
bps |
WriteThroughput. The total number of bytes written to persistent storage every second. |
Auto Scaling Group
Metric | Units | Description |
---|---|---|
AWS.AutoScaling.GroupAndWarmPoolDesiredCapacity
|
Count |
The total number of instances that the Auto Scaling group and warm pool are attempting to maintain. It includes both the desired capacity of the Auto Scaling group and the instances in the warm pool. |
AWS.AutoScaling.GroupAndWarmPoolTotalCapacity
|
Count |
The total number of instances in the Auto Scaling group and warm pool, including instances that are in service, pending, or terminating. |
AWS.AutoScaling.GroupDesiredCapacity
|
Count |
GroupDesiredCapacity. The average number of instances that the Auto Scaling group attempts to maintain. |
AWS.AutoScaling.GroupInServiceCapacity
|
Count |
The total number of instances that are currently in service in the Auto Scaling group. These instances are actively handling requests and are considered part of the desired capacity. |
AWS.AutoScaling.GroupInServiceInstances
|
Count |
GroupInServiceInstances. The average number of instances that are running as part of the Auto Scaling group. |
AWS.AutoScaling.GroupInServiceInstancesPercent
|
Percent (%) |
The percentage of instances in the Auto Scaling group that are currently in service. It is calculated as the number of in-service instances divided by the desired capacity, multiplied by 100. |
AWS.AutoScaling.GroupMaxSize
|
Count |
GroupMaxSize. The average maximum size of the Auto Scaling group. |
AWS.AutoScaling.GroupMinSize
|
Count |
GroupMinSize. The average minimum size of the Auto Scaling group. |
AWS.AutoScaling.GroupPendingCapacity
|
Count |
The number of instances that are in the process of launching but are not yet in service. These instances are pending and have not yet started handling requests. |
AWS.AutoScaling.GroupPendingInstances
|
Count |
GroupPendingInstances. The average number of instances that are pending. |
AWS.AutoScaling.GroupStandbyCapacity
|
Count |
The number of instances in a standby state within an Auto Scaling group. Standby instances are running but not actively serving traffic. |
AWS.AutoScaling.GroupStandbyInstances
|
Count |
GroupStandbyInstances. The average number of instances that are in standby state. |
AWS.AutoScaling.GroupTerminatingCapacity
|
Count |
The number of instances that are in the process of terminating and being removed from the Auto Scaling group. These instances are no longer handling requests. |
AWS.AutoScaling.GroupTerminatingInstances
|
Count |
GroupTerminatingInstances. The average number of instances that are in the process of terminating. |
AWS.AutoScaling.GroupTotalCapacity
|
Count |
The total number of instances in the Auto Scaling group, including instances that are in service, pending, and terminating. |
AWS.AutoScaling.GroupTotalInstances
|
Count |
GroupTotalInstances. The average number of total instances. |
AWS.AutoScaling.PredictiveScalingCapacityForecast
|
Count | A forecast of the capacity needed for predictive scaling. It analyzes historical load data to predict future capacity requirements, helping to proactively scale your resources. |
AWS.AutoScaling.PredictiveScalingLoadForecast
|
Count | Predictions of hourly load values based on historical load data from CloudWatch and an analysis of historical trends. It helps in forecasting future capacity needs to proactively scale the Auto Scaling group. |
AWS.AutoScaling.PredictiveScalingMetricPairCorrelation
|
Count | The correlation between a load metric and a scaling metric used in predictive scaling policies. A strong correlation ensures that the predictive scaling policy can accurately forecast and adjust capacity. |
AWS.AutoScaling.WarmPoolDesiredCapacity
|
Count |
The desired number of instances in the warm pool. The warm pool is a group of pre-initialized instances that can be quickly started to handle sudden increases in load. |
AWS.AutoScaling.WarmPoolMinSize
|
Count |
The minimum number of instances that should be maintained in the warm pool. The warm pool is a group of pre-initialized EC2 instances that can quickly respond to scale-out events. |
AWS.AutoScaling.WarmPoolPendingCapacity
|
Count |
The number of instances that are currently being initialized or are in the process of becoming available in the warm pool. |
AWS.AutoScaling.WarmPoolTerminatingCapacity
|
Count |
The number of instances in the warm pool that are currently being terminated. |
AWS.AutoScaling.WarmPoolTotalCapacity
|
Count |
The total number of instances in the warm pool, including both pending and active instances. |
AWS.AutoScaling.WarmPoolWarmedCapacity
|
Count |
The number of instances in the warm pool that are fully initialized and ready to serve traffic. |
Certificate Manager
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of AWS Certificate Manager entities in the Metrics Explorer, filter the |
AWS.CertificateManager.DaysToExpiry
|
Count | The number of days until a certificate expires. ACM stops publishing the metrics after a certificate expires. |
AWS.CertificateManager.CertificateArn
|
The Amazon Resource Name (ARN) of the certificate. |
CloudFront
Metric | Units | Description |
---|---|---|
AWS.CloudFront.4xxErrorRate
|
Percent (%) |
4xx error rate. The average percentage of all viewer requests for which the response's HTTP status code is 4xx. |
AWS.CloudFront.5xxErrorRate
|
Percent (%) |
5xx error rate. The average percentage of all viewer requests for which the response's HTTP status code is 5xx. |
AWS.CloudFront.BytesDownloaded
|
bytes |
Bytes downloaded. The average number of bytes downloaded by viewers for GET, HEAD, and OPTIONS requests. |
AWS.CloudFront.BytesUploaded
|
bytes |
Bytes uploaded. The average number of bytes that viewers uploaded to your origin with CloudFront using POST and PUT requests. |
AWS.CloudFront.CacheHitRate
|
Percent (%) | The percentage of viewer requests that are served directly from the CloudFront cache without needing to fetch the content from the origin server. A higher cache hit rate indicates better performance and reduced latency. |
AWS.CloudFront.OriginalLatency
|
milliseconds (ms) | The time taken by the origin server to respond with the first byte of the requested content. It helps in understanding the performance of the origin server and the overall latency experienced by end users. |
AWS.CloudFront.Requests
|
Count |
Requests. The total number of viewer requests received by CloudFront for all HTTP methods and for both HTTP and HTTPS requests. |
AWS.CloudFront.TotalErrorRate
|
Percent (%) |
Total error rate. The average percentage of all viewer requests for which the response's HTTP status code is 4xx or 5xx. |
Direct Connect
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of AWS Direct Connect entities in the Metrics Explorer, filter the |
aws.dx.ConnectionBpsEgress
|
bit per second | The bit rate for outbound data from the AWS side of the connection |
aws.dx.ConnectionBpsIngress
|
bit per second | The bit rate for inbound data to the AWS side of the connection |
aws.dx.ConnectionEncryptionState
|
Count | The encryption state of an AWS Direct Connect connection. It shows whether the data traversing the connection is encrypted or not. |
aws.dx.ConnectionCRCErrorCount
|
Count | The number of times cyclic redundancy check (CRC) errors are observed for the data received at the connection |
aws.dx.ConnectionErrorCount
|
Count | The number of errors that occur on an AWS Direct Connect connection. This metric helps you monitor the health and stability of your Direct Connect connection by providing insights into the frequency and types of errors encountered. |
aws.dx.ConnectionLightLevelRx
|
Count | Indicates the health of the fiber connection for ingress (inbound) traffic to the AWS side of the connection |
aws.dx.ConnectionLightLevelTx
|
Count | Indicates the health of the fiber connection for egress (outbound) traffic from the AWS side of the connection |
aws.dx.ConnectionPpsEgress
|
Count per second | The packet rate for outbound data from the AWS side of the connection |
aws.dx.ConnectionPpsIngress
|
Count per second | The packet rate for inbound data to the AWS side of the connection |
aws.dx.ConnectionState
|
Boolean | The state of the connection. 0 indicates DOWN and 1 indicates UP |
aws.dx.VirtualInterfaceBpsEgress
|
bps | The bitrate for outbound data from the AWS side of the virtual interface. It represents the amount of data leaving AWS in bits per second (bps). |
aws.dx.VirtualInterfaceBpsIngress
|
bps | The bitrate for inbound data to the AWS side of the virtual interface. It represents the amount of data coming into AWS in bits per second (bps). |
aws.dx.VirtualInterfacePpsEgress
|
Count per second | The packet rate for outbound data from the AWS side of the virtual interface. It represents the number of packets leaving AWS per second. |
aws.dx.VirtualInterfacePpsIngress
|
Count per second | The packet rate for inbound data to the AWS side of the virtual interface. It represents the number of packets coming into AWS per second. |
DynamoDB
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of DynamoDB entities in the Metrics Explorer, filter the |
AWS.DynamoDB.AccountMaxReads
|
Count | The maximum number of read capacity units that can be provisioned across all tables in your AWS account. |
AWS.DynamoDB.AccountMaxTableLevelReads
|
Count | The maximum number of read capacity units that can be provisioned for a single table or global secondary index in your AWS account. |
AWS.DynamoDB.AccountMaxTableLevelWrites
|
Count | The maximum number of write capacity units that can be provisioned for a single table or global secondary index in your AWS account. |
AWS.DynamoDB.AccountMaxWrites
|
Count | The maximum number of write capacity units that can be provisioned across all tables in your AWS account. |
AWS.DynamoDB.AccountProvisionedReadCapacityUtilization
|
Percent (%) | The percentage of provisioned read capacity units that are being used across all tables in your AWS account. |
AWS.DynamoDB.AgeOfOldestUnreplicatedRecord
|
milliseconds (ms) | The age of the oldest record in a DynamoDB table that has not yet been replicated. |
|
Count |
The number of failed attempts to perform conditional writes. The PutItem, UpdateItem, and DeleteItem operations let you provide a logical condition that must evaluate to true before the operation can proceed. If this condition evaluates to false, ConditionalCheckFailedRequests is incremented by one. |
AWS.DynamoDB.ConsumedChangeDataCaptureUnits
|
Count | The number of consumed units for change data capture operations. |
|
Count |
The number of read capacity units consumed over the specified time period, so you can track how much of your provisioned throughput is used. You can retrieve the total consumed read capacity for a table and all of its global secondary indexes, or for a particular global secondary index. |
|
Count |
The number of write capacity units consumed over the specified time period, so you can track how much of your provisioned throughput is used. You can retrieve the total consumed write capacity for a table and all of its global secondary indexes, or for a particular global secondary index. |
AWS.DynamoDB.FailedToReplicateRecordCount
|
Count | The number of records that failed to replicate. |
|
Count |
The number of write capacity units consumed when adding a new global secondary index to a table. If the write capacity of the index is too low, incoming write activity during the backfill phase might be throttled; this can increase the time it takes to create the index. You should monitor this statistic while the index is being built to determine whether the write capacity of the index is underprovisioned. |
|
Count |
The percentage of completion when a new global secondary index is being added to a table. DynamoDB must first allocate resources for the new index, and then backfill attributes from the table into the index. For large tables, this process might take a long time. You should monitor this statistic to view the relative progress as DynamoDB builds the index. |
|
Count |
The number of write throttle events that occur when adding a new global secondary index to a table. These events indicate that the index creation will take longer to complete, because incoming write activity is exceeding the provisioned write throughput of the index. |
AWS.DynamoDB.PendingReplicationCount
|
Count | The number of item updates that have been written to one replica but have not yet been written to another replica. |
|
Count |
The number of provisioned read capacity units for a table or a global secondary index. |
|
Count |
The number of provisioned write capacity units for a table or a global secondary index. |
AWS.DynamoDB.ReadThroAccountProvisionedWriteCapacityUtilizationttleEvents
|
Percent (%) | This metric measures the number of read throttling events due to provisioned write capacity utilization. |
|
Count |
Requests to DynamoDB that exceed the provisioned read capacity units for a table or a global secondary index. |
AWS.DynamoDB.ReplicationLatency
|
milliseconds (ms) | The time between when an updated item appears in the DynamoDB stream for one replica and when it appears in another replica. |
|
Binary Bytes |
The number of bytes returned by GetRecords operations (Amazon DynamoDB Streams) during the specified time period. |
|
Count |
The number of items returned by Query or Scan operations during the specified time period. |
|
Count |
The number of stream records returned by GetRecords operations (Amazon DynamoDB Streams) during the specified time period. |
|
milliseconds (ms) |
Successful requests to DynamoDB or Amazon DynamoDB Streams during the specified time period. |
|
Count |
Requests to DynamoDB or Amazon DynamoDB Streams that generate an HTTP 500 status code during the specified time period. |
|
Count |
The number of items deleted by Time To Live (TTL) during the specified time period. This metric helps you monitor the rate of TTL deletions on your table. |
AWS.DynamoDB.ThrottledPutRecordCount
|
Count | The number of put records that were throttled. |
|
Count |
Requests to DynamoDB that exceed the provisioned throughput limits on a resource (such as a table or an index). |
AWS.DynamoDB.TransactionConflict
|
Count | The number of transaction conflicts that occurred. |
|
Count |
Requests to DynamoDB or Amazon DynamoDB Streams that generate an HTTP 400 status code during the specified time period. |
|
Count |
Requests to DynamoDB that exceed the provisioned write capacity units for a table or a global secondary index. |
EBS
Metric | Units | Description |
---|---|---|
AWS.EBS.AverageReadLatency
|
milliseconds (ms) |
AverageReadLatency. The average time required to complete a read request during the specified time period. |
AWS.EBS.AverageWriteLatency
|
milliseconds (ms) |
AverageWriteLatency. The average time required to complete a write request during the specified time period. |
AWS.EBS.BurstBalance
|
Percent (%) |
Used with General Purpose SSD (gp2), Throughput Optimized HDD (st1) and Cold HDD (sc1) volumes only. Provides information about the percentage of I/O credits (for gp2) or throughput credits (for st1 and sc1) remaining in the burst bucket. |
AWS.EBS.FastSnapshotRestoreCreditsBalance
|
Count | The number of credits available for fast snapshot restore operations. These credits are used to accelerate the snapshot restoration process, and having a balance of credits ensures that you can perform fast snapshot restores when needed. |
AWS.EBS.FastSnapshotRestoreCreditsBucketSize
|
Count | The maximum number of credits that can be stored in the credit bucket for fast snapshot restore operations. It helps you understand the total capacity of credits you can accumulate for performing these accelerated restores. |
AWS.EBS.VolumeConsumedReadWriteOps
|
Count |
VolumeConsumedReadWriteOps. The total amount of read and write operations (normalized to 256K capacity units) consumed during the specified time period. |
AWS.EBS.VolumeIdleTime
|
seconds (s) |
The total number of seconds in a specified period of time when no read or write operations were submitted. |
AWS.EBS.VolumeQueueLength
|
Count |
VolumeQueueLength. The number of read and write operation requests waiting to be completed during the specified time period. |
AWS.EBS.VolumeReadBytes
|
Binary Bytes |
VolumeReadBytes. The total number of bytes transferred by read operations during the specified time period. |
AWS.EBS.VolumeReadOps
|
Count |
VolumeReadOps. The total number of read operations during the specified time period. Read operations are counted on completion. |
AWS.EBS.VolumeStalledIOCheck
|
Count | The number of stalled I/O operations on an EBS volume. It helps identify potential performance issues or bottlenecks related to I/O operations. |
AWS.EBS.VolumeThroughputPercentage
|
Percent (%) |
VolumeThroughputPercentage. The percentage of I/O operations per second (IOPS) delivered of the total IOPS provisioned for an Amazon EBS volume. |
AWS.EBS.VolumeTotalOps
|
Count |
The total number of I/O operations performed on an EBS volume. It includes both read and write operations and provides an overall view of the volume's activity. |
AWS.EBS.VolumeTotalReadTime
|
seconds (s) |
The total number of seconds spent by input operations that completed in a specified period of time. |
AWS.EBS.VolumeTotalWriteTime
|
seconds (s) |
The total number of seconds spent by output operations that completed in a specified period of time. |
AWS.EBS.VolumeWriteBytes
|
Binary Bytes |
VolumeWriteBytes. The total number of bytes transferred by write operations during the specified time period. |
AWS.EBS.VolumeWriteOps
|
Count |
VolumeWriteOps. The total number of write operations during the specified time period. Write operations are counted on completion. |
EC2
Metric | Units | Description |
---|---|---|
AWS.EC2.CPUCreditBalance
|
Count |
For T2 Instances. The number of CPU credits available for the instance to burst beyond its base CPU utilization. Credits are stored in the credit balance after they are earned and removed from the credit balance after they expire. Credits expire 24 hours after they are earned. |
AWS.EC2.CPUCreditUsage
|
Count |
For T2 Instances. The number of CPU credits consumed by the instance. One CPU credit equals one vCPU running at 100% utilization for one minute or an equivalent combination of vCPUs, utilization, and time (for example, one vCPU running at 50% utilization for two minutes or two vCPUs running at 25% utilization for two minutes). |
AWS.EC2.CPUSurplusCreditBalance
|
Count |
The number of CPU credits that an instance has accumulated beyond its baseline performance level. |
AWS.EC2.CPUSurplusCreditsCharged
|
Count |
The number of CPU credits that have been consumed above the baseline performance level. |
AWS.EC2.CPUUtilization
|
Percent (%) |
The percentage of allocated EC2 compute units that are currently in use on the instance. This metric identifies the processing power required to run an application upon a selected instance. |
AWS.EC2.DedicatedHostCPUUtilization
|
Percent (%) |
The percentage of CPU utilization on a dedicated host. It helps in monitoring the overall CPU usage of instances running on a dedicated host. |
AWS.EC2.DiskIOps
|
The number of read and write operations per second (IOPS) on the instance store volumes of an EC2 instance. |
|
AWS.EC2.DiskReadBytes
|
bytes |
Bytes read from all instance store volumes available to the instance. This metric is used to determine the volume of the data the application reads from the hard disk of the instance. This can be used to determine the speed of the application. |
AWS.EC2.DiskReadOps
|
Count |
Completed read operations from all instance store volumes available to the instance in a specified period of time. |
AWS.EC2.DiskWriteBytes
|
bytes |
Bytes written to all instance store volumes available to the instance. This metric is used to determine the volume of the data the application writes onto the hard disk of the instance. This can be used to determine the speed of the application. |
AWS.EC2.DiskWriteOps
|
Count |
Completed write operations to all instance store volumes available to the instance in a specified period of time. |
AWS.EC2.EBSByteBalance
|
Percent (%) |
The percentage of throughput credits remaining in the burst bucket for your EBS volumes. |
AWS.EC2.EBSIOBalance
|
Percent (%) |
The percentage of I/O credits remaining in the burst bucket for your EBS volumes. |
AWS.EC2.EBSReadBytes
|
bytes |
The total number of bytes read from your EBS volumes per second. |
AWS.EC2.EBSReadOps
|
Count |
The total number of read operations (I/O operations) performed on your EBS volumes per second. |
AWS.EC2.EBSWriteBytes
|
bytes |
The total number of bytes written to Amazon Elastic Block Store (EBS) volumes per second. |
AWS.EC2.EBSWriteOps
|
Count |
The total number of write operations (I/O operations) performed on EBS volumes per second. |
AWS.EC2.MetadataNoToken
|
Count |
The number of requests to the Instance Metadata Service (IMDS) that did not include a token. |
AWS.EC2.MetadataNoTokenRejected
|
Count | The number of requests to the Instance Metadata Service (IMDS) that were rejected because they did not include a token. |
AWS.EC2.NetworkIO
|
bps |
The total network input/output (I/O) operations per second for an EC2 instance. |
AWS.EC2.NetworkIn
|
bytes |
The number of bytes received on all network interfaces by the instance. This metric identifies the volume of incoming network traffic to a single instance. |
AWS.EC2.NetworkOut
|
bytes |
The number of bytes sent out on all network interfaces by the instance. This metric identifies the volume of outgoing network traffic from a single instance. |
AWS.EC2.NetworkPacketsIn
|
Count |
The number of packets received on all network interfaces by the instance. This metric identifies the volume of incoming traffic in terms of the number of packets on a single instance. This metric is available for basic monitoring only. |
AWS.EC2.NetworkPacketsOut
|
Count |
The number of packets sent out on all network interfaces by the instance. This metric identifies the volume of outgoing traffic in terms of the number of packets on a single instance. This metric is available for basic monitoring only. |
AWS.EC2.StatusCheckFailed
|
Count |
Reports whether the instance has passed both the instance status check and the system status check in the last minute.This metric can be either 0 (passed) or 1 (failed). |
AWS.EC2.StatusCheckFailed_AttachedEBS
|
Count | Indicates whether there is a failure in the status check related to attached EBS volumes. |
AWS.EC2.StatusCheckFailed_Instance
|
boolean |
Reports whether the instance has passed the instance status check in the last minute.This metric can be either 0 (passed) or 1 (failed). |
AWS.EC2.StatusCheckFailed_System
|
boolean |
Indicates whether there is a failure in the system status check, detecting underlying problems with the AWS systems on which your instance runs, such as hardware or network issues. |
ECS Cluster
Metric | Unit | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of AWS ECS Cluster entities in the Metrics Explorer, filter the |
AWS.ECS.CPUUtilization
|
Percent (%) | The percentage of CPU units that is used by the cluster. |
AWS.ECS.MemoryUtilization
|
Percent (%) | The percentage of memory in use by the cluster. |
AWS.ECS.GPUReservation
|
Percent (%) | The percentage of total available GPUs that are reserved by running tasks in the cluster. |
AWS.ECS.EBSFilesystemUtilization
|
Percent (%) | The percentage of the Amazon EBS filesystem that is used by tasks in a service. |
AWS.ECS.ActiveConnectionCount
|
Count | The total number of concurrent connections active from clients to the Amazon ECS Service Connect proxies that run in tasks. |
AWS.ECS.NewConnectionCount
|
Count | The total number of new connections established from clients to the Amazon ECS Service Connect proxies that run in tasks. |
AWS.ECS.ProcessedBytes
|
bytes | The total number of bytes of inbound traffic processed by the Service Connect proxies. |
AWS.ECS.RequestCount
|
Count | The number of inbound traffic requests processed by the Service Connect proxies. |
AWS.ECS.GrpcRequestCount
|
Count | The number of GRPC inbound traffic requests processed by the Service Connect proxies. |
AWS.ECS.HTTPCode_Target_2XX_Count
|
Count | The number of HTTP response codes with numbers 200 to 299 generated by the applications in the tasks. |
AWS.ECS.HTTPCode_Target_3XX_Count
|
Count | The number of HTTP response codes with numbers 300 to 399 generated by the applications in the tasks. |
AWS.ECS.HTTPCode_Target_4XX_Count
|
Count | The number of HTTP response codes with numbers 400 to 499 generated by the applications in the tasks. |
AWS.ECS.HTTPCode_Target_5XX_Count
|
Count | The number of HTTP response codes with numbers 500 to 599 generated by the applications in the tasks. |
AWS.ECS.RequestCountPerTarget
|
Count | The average number of requests received by each target. |
AWS.ECS.TargetProcessedBytes
|
bytes | The total number of bytes processed by the Service Connect proxies. |
AWS.ECS.TargetResponseTime
|
milliseconds (ms) | The time elapsed, in milliseconds, after the request reached the Service Connect proxy in the target task until a response from the target application is received back to the proxy. |
AWS.ECS.ClientTLSNegotiationErrorCount
|
Count | The total number of times the TLS connection failed. |
AWS.ECS.TargetTLSNegotiationErrorCount
|
Count | The total number of times the TLS connection failed due to missing client certificates. |
AWS.ECS.CPUReservation
|
Percent (%) | The percentage of CPU units that are reserved in the cluster. |
AWS.ECS.MemoryReservation
|
Percent (%) | The percentage of memory that is reserved by running tasks in the cluster. |
ECS Service
Metric | Unit | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of AWS ECS Service entities in the Metrics Explorer, filter the |
AWS.ECS.CPUUtilization
|
Percent (%) | The percentage of CPU units that is used by the cluster. |
AWS.ECS.MemoryUtilization
|
Percent (%) | The percentage of memory in use by the cluster. |
AWS.ECS.EBSFilesystemUtilization
|
Percent (%) | The percentage of the Amazon EBS filesystem that is used by tasks in a service. |
AWS.ECS.ActiveConnectionCount
|
Count | The total number of concurrent connections active from clients to the Amazon ECS Service Connect proxies that run in tasks. |
AWS.ECS.NewConnectionCount
|
Count | The total number of new connections established from clients to the Amazon ECS Service Connect proxies that run in tasks. |
AWS.ECS.ProcessedBytes
|
bytes | The total number of bytes of inbound traffic processed by the Service Connect proxies. |
AWS.ECS.RequestCount
|
Count | The number of inbound traffic requests processed by the Service Connect proxies. |
AWS.ECS.GrpcRequestCount
|
Count | The number of GRPC inbound traffic requests processed by the Service Connect proxies. |
AWS.ECS.HTTPCode_Target_2XX_Count
|
Count | The number of HTTP response codes with numbers 200 to 299 generated by the applications in the tasks. |
AWS.ECS.HTTPCode_Target_3XX_Count
|
Count | The number of HTTP response codes with numbers 300 to 399 generated by the applications in the tasks. |
AWS.ECS.HTTPCode_Target_4XX_Count
|
Count | The number of HTTP response codes with numbers 400 to 499 generated by the applications in the tasks. |
AWS.ECS.HTTPCode_Target_5XX_Count
|
Count | The number of HTTP response codes with numbers 500 to 599 generated by the applications in the tasks. |
AWS.ECS.RequestCountPerTarget
|
Count | The average number of requests received by each target. |
AWS.ECS.TargetProcessedBytes
|
bytes | The total number of bytes processed by the Service Connect proxies. |
AWS.ECS.TargetResponseTime
|
milliseconds (ms) | The time elapsed, in milliseconds, after the request reached the Service Connect proxy in the target task until a response from the target application is received back to the proxy. |
AWS.ECS.ClientTLSNegotiationErrorCount
|
Count | The total number of times the TLS connection failed. |
AWS.ECS.TargetTLSNegotiationErrorCount
|
Count | The total number of times the TLS connection failed due to missing client certificates. |
EFS
Metric | Units | Description |
---|---|---|
AWS.EFS.BurstCreditBalance
|
bytes |
BurstCreditBalance. The average number of burst credits that a file system has. Burst credits allow a file system to burst to throughput levels above a file system’s baseline level for periods of time. |
AWS.EFS.ClientConnections
|
Count |
ClientConnections. The total number of client connections to a file system. When using a standard client, there is one connection per mounted Amazon EC2 instance. |
AWS.EFS.DataReadIOBytes
|
bytes |
DataReadIOBytes. The average number of bytes for each file system read operation. |
AWS.EFS.DataWriteIOBytes
|
bytes |
DataWriteIOBytes. The average number of bytes for each file system write operation. |
AWS.EFS.MetadataIOBytes
|
bytes |
MetadataIOBytes. The average number of bytes for each metadata operation. |
AWS.EFS.MeteredIOBytes
|
bytes |
MeteredIOBytes. The average number of metered bytes for each file system operation, including data read, data write, and metadata operations, with read operations metered at one-third the rate of other operations. |
AWS.EFS.PercentIOLimit
|
Percent (%) |
PercentIOLimit. How close a file system is to reaching the I/O limit of the General Purpose performance mode. Data is available only for file systems running with General Purpose performance mode. |
AWS.EFS.PermittedThroughput
|
bps |
PermittedThroughput. The maximum amount of throughput that a file system can drive. |
AWS.EFS.StorageBytes
|
bytes |
StorageBytes. The average size of the file system in bytes, including the amount of data stored in the EFS Standard and EFS Standard–Infrequent Access (EFS Standard-IA) storage classes. |
AWS.EFS.TimeSinceLastSync
|
seconds (s) |
TimeSinceLastSync. The average amount of time that has passed since the last successful sync to the destination file system in a replication configuration. |
AWS.EFS.TotalIOBytes
|
bytes |
TotalIOBytes. The total number of bytes for each file system operation, including data read, data write, and metadata operations. This is the actual amount that your application is driving, and not the throughput the file system is being metered at. |
Elastic Beanstalk
Metric | Units | Description |
---|---|---|
|
milliseconds (ms) |
P99.9. The average latency for the slowest x percent of requests over the last 10 seconds, where x is the difference between the number and 100. For example, p99 1.403 indicates the slowest 1% of requests over the last 10 seconds had an average latency of 1.403 seconds. |
AWS.ElasticBeanstalk.ApplicationRequests2xx
|
Count |
Status 2xx. The average number of requests over the last 10 seconds that resulted in a status code greater than or equal to 200 but less than 300. |
AWS.ElasticBeanstalk.ApplicationRequests3xx
|
Count |
Status 3xx. The average number of requests over the last 10 seconds that resulted in a status code greater than or equal to 300 but less than 400. |
AWS.ElasticBeanstalk.ApplicationRequests4xx
|
Count |
Status 4xx. The average number of requests over the last 10 seconds that resulted in a status code greater than or equal to 400 but less than 500. |
AWS.ElasticBeanstalk.ApplicationRequests5xx
|
Count |
Status 5xx. The average number of requests over the last 10 seconds that resulted in a status code greater than or equal to 500 but less than 600. |
AWS.ElasticBeanstalk.ApplicationRequestsTotal
|
Count |
Request Count. The average number of requests handled by the web server per second over the last 10 seconds. |
AWS.ElasticBeanstalk.CPUIdle
|
Percent (%) | Percentage of time that the CPU has spent in the Idle state over the last 10 seconds. |
AWS.ElasticBeanstalk.CPUIowait
|
Percent (%) | Percentage of time that the CPU has spent in the I/O Wait state over the last 10 seconds. Available on Linux environments only. |
AWS.ElasticBeanstalk.CPUIrq
|
Count | Percentage of time that the CPU has spent in the IRQ (Interrupt Request) state over the last 10 seconds. Available on Linux environments only. |
AWS.ElasticBeanstalk.CPUNice
|
Percent (%) | Percentage of time that the CPU has spent in the Nice state over the last 10 seconds. Available on Linux environments only. |
AWS.ElasticBeanstalk.CPUPriveleged
|
Percent (%) | Percentage of time that the CPU has spent in the Privileged state over the last 10 seconds. Available on Linux environments only. |
AWS.ElasticBeanstalk.CPUSoftirq
|
Percent (%) | Percentage of time that the CPU has spent in the SoftIRQ state over the last 10 seconds. Available on Linux environments only. |
AWS.ElasticBeanstalk.CPUSystem
|
Percent (%) | Percentage of time that the CPU has spent in the System state over the last 10 seconds. Available on Linux environments only. |
AWS.ElasticBeanstalk.CPUUser
|
Count | Percentage of time that the CPU has spent in the User state over the last 10 seconds. |
AWS.ElasticBeanstalk.EnvironmentHealth
|
Count |
The health status of the environment. The possible values are 0 (OK), 1 (Info), 5 (Unknown), 10 (No data), 15 (Warning), 20 (Degraded) and 25 (Severe). |
AWS.ElasticBeanstalk.InstancesDegraded
|
Count | The number of instances in your Elastic Beanstalk environment that are in a degraded state, meaning they are not functioning optimally and may be impacting the performance of your application. |
AWS.ElasticBeanstalk.InstanceHealth
|
Count | Information about the health of instances in your Elastic Beanstalk environment. It includes attributes such as health status, color, causes, application metrics, and more. |
AWS.ElasticBeanstalk.InstancesInfo
|
Count | Information about the Amazon EC2 instances in your Elastic Beanstalk environment, including instance IDs, types, and other relevant details. |
AWS.ElasticBeanstalk.InstancesNoData
|
Count | The number of instances in your Elastic Beanstalk environment that are not reporting any data, which could suggest issues with data collection or instance health. |
AWS.ElasticBeanstalk.InstancesOk
|
Count | The number of instances in your Elastic Beanstalk environment that are functioning correctly and passing health checks. |
AWS.ElasticBeanstalk.InstancesPending
|
Count | The number of instances in your Elastic Beanstalk environment that are in a pending state, meaning they are being provisioned or are not yet fully operational. |
AWS.ElasticBeanstalk.InstancesSevere
|
Count | The number of instances in your Elastic Beanstalk environment that are in a severe state, meaning they are experiencing critical issues that require immediate attention. |
AWS.ElasticBeanstalk.InstancesUnknown
|
Count | The number of instances whose health status is unknown, meaning Elastic Beanstalk is unable to determine their health status. |
AWS.ElasticBeanstalk.InstancesWarning
|
Count | The number of instances in your environment that are in a warning state, indicating potential issues that may need to be addressed but are not critical. |
AWS.ElasticBeanstalk.LoadAverage1min
|
Count | The 1-minute load average of your instances, which is an indicator of the average number of processes that are either in a runnable or uninterruptible state over the past minute. |
AWS.ElasticBeanstalk.RootFilesystemUtil
|
Percent (%) | The percentage of the root file system's disk space that is being used on your instances. |
AWS.ElasticBeanstalk.Status5xxPercent
|
Percent (%) |
The percentage of HTTP requests to your instances that resulted in server errors (status codes 5xx). |
ElastiCache Memcached
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore |
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of ElastiCache Memcached entities in the Metrics Explorer, filter the |
AWS.ElastiCache.CPUCreditBalance
|
Count | The number of earned CPU credits that an instance has accrued since it was launched or started. |
AWS.ElastiCache.CPUCreditUsage
|
Count | The number of CPU credits spent by the instance for CPU utilization. |
AWS.ElastiCache.CPUUtilization
|
Percent (%) | The percentage of CPU utilization for the entire host. |
AWS.ElastiCache.CurrConnections
|
Count | The number of connections connected to the cache at an instant in time. |
AWS.ElastiCache.Evictions
|
Count | The number of non-expired items the cache evicted to allow space for new writes. |
AWS.ElastiCache.FreeableMemory
|
bytes | The amount of free memory available on the host. |
AWS.ElastiCache.NetworkBandwidthInAllowanceExceeded
|
Count | The number of packets shaped because the inbound aggregate bandwidth exceeded the maximum for the instance. |
AWS.ElastiCache.NetworkBandwidthOutAllowanceExceeded
|
Count | The number of packets shaped because the outbound aggregate bandwidth exceeded the maximum for the instance. |
AWS.ElastiCache.NetworkBytesIn
|
bytes | The number of bytes the host has read from the network. |
AWS.ElastiCache.NetworkBytesOut
|
bytes | The number of bytes sent out on all network interfaces by the instance. |
AWS.ElastiCache.NetworkConntrackAllowanceExceeded
|
Count | The number of packets shaped because connection tracking exceeded the maximum for the instance and new connections could not be established. |
AWS.ElastiCache.NetworkMaxBytesIn
|
bytes | The maximum burst of received bytes within each minute. |
AWS.ElastiCache.NetworkMaxBytesOut
|
bytes | The maximum burst of transmitted bytes within each minute. |
AWS.ElastiCache.NetworkMaxPacketsIn
|
Count | The maximum burst of received packets within each minute. |
AWS.ElastiCache.NetworkMaxPacketsOut
|
Count | The maximum burst of transmitted packets within each minute. |
AWS.ElastiCache.NetworkPacketsIn
|
Count | The number of packets received on all network interfaces by the instance. |
AWS.ElastiCache.NetworkPacketsOut
|
Count | The number of packets sent out on all network interfaces by the instance. |
AWS.ElastiCache.NetworkPacketsPerSecondAllowanceExceeded
|
Count | The number of packets shaped because the bidirectional packets per second exceeded the maximum for the instance. |
AWS.ElastiCache.NewConnections
|
Count | The number of new connections the cache has received. |
AWS.ElastiCache.NewItems
|
Count | The number of new items the cache has stored. |
AWS.ElastiCache.SwapUsage
|
bytes | The amount of swap used on the host. |
AWS.ElastiCache.UnusedMemory
|
bytes | The amount of memory not used by data. |
ElastiCache Redis
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore |
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of ElastiCache Redis entities in the Metrics Explorer, filter the |
AWS.ElastiCache.ActiveDefragHits
|
Count | The count of value reallocations per minute performed by the active defragmentation process. |
AWS.ElastiCache.AuthenticationFailures
|
Count | The total count of failed attempts to authenticate to Redis using the AUTH command. |
AWS.ElastiCache.BytesReadFromDisk
|
bytes | The total count of bytes read from disk per minute. Supported only for clusters using Data tiering. |
AWS.ElastiCache.BytesReadIntoMemcached
|
bytes | The number of bytes read from the network by the cache node. |
AWS.ElastiCache.BytesUsedForCache
|
bytes | The total count of bytes allocated by Redis for all purposes, including the dataset, buffers, and so on. |
AWS.ElastiCache.BytesUsedForCacheItems
|
bytes | The number of bytes used to store cache items. |
AWS.ElastiCache.BytesUsedForHash
|
bytes | The number of bytes currently used by hash tables. |
AWS.ElastiCache.BytesWrittenOutFromMemcached
|
bytes | The number of bytes written to the network by the cache node. |
AWS.ElastiCache.BytesWrittenToDisk
|
bytes | The total count of bytes written to disk per minute. Supported only for clusters using Data tiering. |
AWS.ElastiCache.CacheHitRate
|
Percent (%) | Indicates the usage efficiency of the Redis instance. |
AWS.ElastiCache.CacheHits
|
Count | The count of successful read-only key lookups in the main dictionary. |
AWS.ElastiCache.CacheMisses
|
Count | The count of unsuccessful read-only key lookups in the main dictionary. |
AWS.ElastiCache.CasBadval
|
Count | The number of CAS (check and set) requests where the CAS value provided did not match the stored CAS value. |
AWS.ElastiCache.CasHits
|
Count | The number of CAS requests where the requested key was found and the CAS value matched. |
AWS.ElastiCache.CasMisses
|
Count | The number of CAS requests where the requested key was not found. |
AWS.ElastiCache.ChannelAuthorizationFailures
|
Count | The total count of failed attempts by users to access channels they do not have permission to access. |
AWS.ElastiCache.ClusterBasedCmds
|
Count | The total number of commands executed on your ElastiCache cluster. |
AWS.ElastiCache.ClusterBasedCmdsLatency
|
microseconds | The latency of commands executed on your ElastiCache cluster. |
AWS.ElastiCache.CmdConfigGet
|
Count | The number of CONFIG GET commands executed on your ElastiCache cluster. |
AWS.ElastiCache.CmdConfigSet
|
Count | The number of CONFIG SET commands executed on your ElastiCache cluster. |
AWS.ElastiCache.CmdFlush
|
Count | The number of FLUSH commands executed on your ElastiCache cluster. |
AWS.ElastiCache.CmdGets
|
Count | The number of GET commands executed on your ElastiCache cluster. |
AWS.ElastiCache.CmdSet
|
Count | The number of SET commands executed on your ElastiCache cluster. |
AWS.ElastiCache.CmdTouch
|
Count | The number of TOUCH commands executed on your ElastiCache cluster. |
AWS.ElastiCache.CommandAuthorizationFailures
|
Count | The total count of failed attempts by users to run commands they don’t have permission to call. |
AWS.ElastiCache.CPUCreditBalance
|
minutes | The count of earned CPU credits that an instance has accrued since it was launched or started. |
AWS.ElastiCache.CPUCreditUsage
|
minutes | The count of CPU credits spent by the instance for CPU utilization. |
AWS.ElastiCache.CPUUtilization
|
Percent (%) | The percentage of CPU utilization for the entire host. Because Redis is single-threaded, we recommend you monitor EngineCPUUtilization metric for nodes with 4 or more vCPUs. |
AWS.ElastiCache.CurrConfig
|
Count | The current configuration of the ElastiCache cluster. It includes details about the settings and parameters that are currently applied to the cluster. |
AWS.ElastiCache.CurrConnections
|
Count | The count of client connections, excluding connections from read replicas. |
AWS.ElastiCache.CurrItems
|
Count | The count of items in the cache. |
AWS.ElastiCache.CurrVolatileItems
|
Count | Total count of keys in all databases that have a ttl set. |
AWS.ElastiCache.DatabaseCapacityUsageCountedForEvictPercentage
|
Percent (%) | Percentage of the total data capacity for the cluster that is in use, excluding the memory used for overhead and COB. |
AWS.ElastiCache.DatabaseCapacityUsagePercentage
|
Percent (%) | The percentage of the database's capacity that is currently being used. |
AWS.ElastiCache.DatabaseMemoryUsagecountedForEvictpercentage
|
Percent (%) | percentage of the memory for the cluster that is in use, excluding memory used for overhead and COB. |
AWS.ElastiCache.DatabaseMemoryUsagePercentage
|
Percent (%) | percentage of the memory for the cluster that is in use. |
AWS.ElastiCache.DBOAverageTTL
|
milliseconds (ms) | Exposes avg_ttl of DBO from the keyspace statistic of Redis INFO command. |
AWS.ElastiCache.DecrHits
|
Count | The number of successful decrement operations (decr) where the requested key was found in the cache. |
AWS.ElastiCache.DecrMisses
|
Count | The number of decrement operations (decr) where the requested key was not found in the cache. |
AWS.ElastiCache.DeleteHits
|
Count | The number of successful delete operations (del) where the requested key was found in the cache. |
AWS.ElastiCache.DeleteMisses
|
Count | The number of delete operations (del) where the requested key was not found in the cache. |
AWS.ElastiCache.ElastiCacheProcessingUnits
|
Count | The total number of ElastiCacheProcessingUnits (ECPUs) consumed by the requests executed on your cache. |
AWS.ElastiCache.EngineCPUUtilization
|
Percent (%) | Provides CPU utilization of the Redis engine thread. |
AWS.ElastiCache.EvalBasedCmds
|
Count | The total number of EVAL -based commands executed on your ElastiCache cluster. |
AWS.ElastiCache.EvalBasedCmdsLatency
|
microseconds | The latency of EVAL -based commands. |
AWS.ElastiCache.EvictedUnfetched
|
Count | The number of valid items that were evicted from the cache because they were never fetched after being set. These items were removed to make space for new writes. |
AWS.ElastiCache.Evictions
|
Count | The count of keys that have been evicted due to the maxmemory limit. |
AWS.ElastiCache.ExpiredUnfetched
|
Count | The number of items that expired and were reclaimed from the cache because they were never fetched after being set. These items were removed to make space for new writes. |
AWS.ElastiCache.FreeableMemory
|
bytes | The amount of free memory available on the host. |
AWS.ElastiCache.GeoSpatialBasedCmds
|
Count | The number of geospatial commands executed per second. |
AWS.ElastiCache.GeoSpatialBasedCmdsLatency
|
microseconds | The average latency for geospatial commands. |
AWS.ElastiCache.GetHits
|
Count | The number of successful get commands (for example, the requested key was found) per second. |
AWS.ElastiCache.GetMisses
|
Count | The number of unsuccessful get commands (for example, the requested key was not found) per second. |
AWS.ElastiCache.GetTypeCmds
|
Count | The number of commands of a specific type executed per second. |
AWS.ElastiCache.GetTypeCmdsLatency
|
microseconds | The average latency for commands of a specific type. |
AWS.ElastiCache.GlobalDatastoreReplicationLag
|
seconds (s) | This is the lag between the secondary Region's primary node and the primary Region's primary node. |
AWS.ElastiCache.HashBasedCmds
|
Count | The total number of commands executed on the cache that are based on hash tables. |
AWS.ElastiCache.HashBasedCmdsLatency
|
microseconds | The latency of commands executed on the cache that are based on hash tables. |
AWS.ElastiCache.HyperLogLogBasedCmds
|
Count | The total number of commands executed on the cache that are based on HyperLogLog data structures. |
AWS.ElastiCache.HyperLogLogBasedCmdsLatency
|
Count | The latency of commands executed on the cache that are based on HyperLogLog data structures. |
AWS.ElastiCache.IamAuthenticationExpirations
|
Count | The total count of expired IAM-authenticated Redis connections. |
AWS.ElastiCache.IamAuthenticationThrottling
|
Count | The total count of throttled IAM-authenticated Redis AUTH or HELLO requests. |
AWS.ElastiCache.IncrHits
|
Count | The number of successful increment operations (incr) where the requested key was found in the cache and the increment operation was successfully performed. |
AWS.ElastiCache.IncrMisses
|
Count | The number of increment operations (incr) where the requested key was not found in the cache, resulting in a miss. |
AWS.ElastiCache.IsMaster
|
Count | Indicates whether the node is the primary node of current shard/cluster. |
AWS.ElastiCache.JsonBasedCmds
|
Count | The total number of JSON-based commands executed in your ElastiCache cluster. |
AWS.ElastiCache.JsonBasedCmdsLatency
|
microseconds | The latency of JSON-based commands executed in your ElastiCache cluster. |
AWS.ElastiCache.JsonBasedGetCmds
|
Count | The number of JSON-based GET commands executed in your ElastiCache cluster. |
AWS.ElastiCache.JsonBasedGetCmdsLatency
|
microseconds | The latency of JSON-based GET commands executed in your ElastiCache cluster. |
AWS.ElastiCache.JsonBasedSetCmds
|
Count | The number of JSON-based SET commands executed in your ElastiCache cluster. |
AWS.ElastiCache.JsonBasedSetCmdsLatency
|
microseconds | The latency of JSON-based SET commands executed in your ElastiCache cluster. |
AWS.ElastiCache.KeyAuthorizationFailures
|
Count | The total count of failed attempts by users to access keys they don’t have permission to access |
AWS.ElastiCache.KeyBasedCmds
|
Count | The total number of key-based commands executed on your ElastiCache cluster. Key-based commands include operations like GET , SET , and DELETE . |
AWS.ElastiCache.KeyBasedCmdsLatency
|
milliseconds (ms) | The latency of key-based commands executed on your ElastiCache cluster. |
AWS.ElastiCache.KeysTracked
|
Count | The count of keys being tracked by Redis key tracking as a percentage of tracking-table-max-keys |
AWS.ElastiCache.ListBasedCmds
|
Count | The total number of list-based commands executed on the cache. Examples of list-based commands include LPOP , LPUSH , or LRANGE . |
AWS.ElastiCache.ListBasedCmdsLatency
|
microseconds | The latency for executing list-based commands on the cache. |
AWS.ElastiCache.MasterLinkHealthStatus
|
Count | The health status of the master link in a replication group. |
AWS.ElastiCache.MemoryFragmentationRatio
|
Count | Indicates the efficiency in the allocation of memory of the Redis engine |
AWS.ElastiCache.NetworkBandwidthInAllowanceExceeded
|
Count | The count of packets queued or dropped because the inbound aggregate bandwidth exceeded the maximum for the instance. |
AWS.ElastiCache.NetworkBandwidthOutAllowanceExceeded
|
Count | The count of packets queued or dropped because the outbound aggregate bandwidth exceeded the maximum for the instance. |
AWS.ElastiCache.NetworkBytesIn
|
bytes | The count of bytes the host has read from the network. |
AWS.ElastiCache.NetworkBytesOut
|
bytes | The count of bytes sent out on all network interfaces by the instance. |
AWS.ElastiCache.NetworkConntrackAllowanceExceeded
|
Count | The count of packets dropped because connection tracking exceeded the maximum for the instance and new connections could not be established |
AWS.ElastiCache.NetworkMaxBytesIn
|
bytes | The maximum burst of received bytes within each minute. |
AWS.ElastiCache.NetworkMaxBytesOut
|
bytes | The maximum burst of transmitted bytes within each minute. |
AWS.ElastiCache.NetworkMaxPacketsIn
|
Count | The maximum burst of received packets within each minute. |
AWS.ElastiCache.NetworkMaxPacketsOut
|
Count | The maximum burst of transmitted packets within each minute. |
AWS.ElastiCache.NetworkPacketsIn
|
Count | The count of packets received on all network interfaces by the instance |
AWS.ElastiCache.NetworkPacketsOut
|
Count | The count of packets sent out on all network interfaces by the instance |
AWS.ElastiCache.NetworkPacketsPerSecondAllowanceExceeded
|
Count | The count of packets queued or dropped because the bidirectional packets per second exceeded the maximum for the instance. |
AWS.ElastiCache.NewConnections
|
Count | The total number of new connections accepted by the cache server during a specific period. |
AWS.ElastiCache.NewItems
|
Count | The number of new items added to the cache. |
AWS.ElastiCache.NonKeyTypeCmds
|
Count | The total number of non-key type commands executed on the cache. Examples include HGETALL , HSET , ZADD . |
AWS.ElastiCache.NonKeyTypeCmdsLatency
|
microseconds | The latency for executing non-key type commands on the cache. |
AWS.ElastiCache.NumItemsReadFromDisk
|
Count | The total count of items retrieved from disk per minute. |
AWS.ElastiCache.NumItemsWrittenToDisk
|
Count | The total number of items written to disk by the ElastiCache cluster. |
AWS.ElastiCache.PubSubBasedCmds
|
Count | The number of publish/subscribe commands executed in the ElastiCache cluster. |
AWS.ElastiCache.PubSubBasedCmdsLatency
|
microseconds | The latency of publish/subscribe commands. |
AWS.ElastiCache.Reclaimed
|
Count | The number of items that have been evicted from the cache due to expiration. |
AWS.ElastiCache.ReplicationBytes
|
Count | The number of bytes transferred between the primary and replica nodes in a replication group. |
AWS.ElastiCache.ReplicationLag
|
seconds (s) | The time difference (lag) between the primary node and its read replicas. It's crucial for monitoring the replication delay. |
AWS.ElastiCache.SaveInProgress
|
Count | The percentage of time the system is actively saving data to disk. |
AWS.ElastiCache.SetBasedCmds
|
Count | The number of commands executed that modify the cache, such as SET commands in Redis. |
AWS.ElastiCache.SetBasedCmdsLatency
|
microseconds | The latency (response time) for SET based commands. |
AWS.ElastiCache.SetTypeCmds
|
Count | The number of SET type commands executed, which include commands like HSET , SADD . |
AWS.ElastiCache.SetTypeCmdsLatency
|
milliseconds (ms) | The latency for SET type commands. |
AWS.ElastiCache.SlabsMoved
|
Count | The number of memory slabs moved during memory allocation and deallocation operations. |
AWS.ElastiCache.SortedSetBasedCmds
|
Count | The total number of commands executed on your ElastiCache cluster that are based on sorted sets. |
AWS.ElastiCache.SortedSetBasedCmdsLatency
|
microseconds | The latency of commands executed on your ElastiCache cluster that are based on sorted sets. |
AWS.ElastiCache.StreamBasedCmds
|
Count | The total number of commands executed on your ElastiCache cluster that are based on streams. |
AWS.ElastiCache.StreamBasedCmdsLatency
|
microseconds | The latency of commands executed on your ElastiCache cluster that are based on streams. |
AWS.ElastiCache.StringsBasedCmds
|
Count | The total number of commands executed on your ElastiCache cluster that are based on strings. |
AWS.ElastiCache.StringsBasedCmdsLatency
|
microseconds | The latency of commands executed on your ElastiCache cluster that are based on strings. |
AWS.ElastiCache.SuccessfulReadRequestLatency
|
microseconds | Latency of successful read requests. |
AWS.ElastiCache.SuccessfulWriteRequestLatency
|
microseconds | Latency of successful write requests. |
AWS.ElastiCache.SwapUsage
|
bytes | The amount of swap used on the host. |
AWS.ElastiCache.TotalCmdsCount
|
Count | Total count of all commands executed on your cache. |
AWS.ElastiCache.TouchHits
|
Count | The number of times items in the cache were accessed (touched) and found to be valid. |
AWS.ElastiCache.TouchMisses
|
Count | The number of times items in the cache were accessed (touched) but were not found, indicating a cache miss. |
ELB
Metric | Units | Description |
---|---|---|
AWS.ELB.BackendConnectionErrors
|
Count |
BackendConnectionErrors. The total number of connections that were not successfully established between the load balancer and the registered instances. |
AWS.ELB.BackendConnectionErrorsRate
|
Percent (%) |
The rate at which connections between the load balancer and backend instances fail. It includes retries and health check-related errors. |
AWS.ELB.DesyncMitigationMode_NonCompliant_Request_Count
|
Count |
The number of requests that do not comply with RFC 7230, which are potentially harmful and could lead to HTTP desync attacks. |
AWS.ELB.EstimatedALBActiveConnectionCount
|
Count | The total number of concurrent TCP connections from clients to the load balancer and from the load balancer to targets. |
AWS.ELB.EstimatedALBConsumedLCUs
|
Count per second | The number of Load Balancer Capacity Units (LCUs) consumed by the Application Load Balancer. |
AWS.ELB.EstimatedALBNewConnectionCount
|
Count | The number of new TCP connections initiated from clients to the load balancer. |
AWS.ELB.EstimatedProcessedBytes
|
bytes | The total number of bytes processed by the load balancer. |
AWS.ELB.HTTPCode_Backend_2XX
|
Count |
The number of HTTP 2XX status codes returned by the backend instances. These status codes indicate successful responses. |
AWS.ELB.HTTPCode_Backend_3XX
|
Count |
The number of HTTP 3XX status codes returned by the backend instances. These status codes indicate redirection responses. |
AWS.ELB.HTTPCode_Backend_4XX
|
Count |
The number of HTTP 4XX status codes returned by the backend instances. These status codes indicate client error responses, such as Bad Request or Not Found. |
AWS.ELB.HTTPCode_Backend_5XX
|
Count |
The number of HTTP 5XX status codes returned by the backend instances. These status codes indicate server error responses, such as Internal Server Error or Service Unavailable. |
AWS.ELB.HTTPCode_ELB_4XX
|
Count |
HTTPCode_ELB_4XX. The total number of HTTP 4XX client error codes generated by the load balancer. |
AWS.ELB.HTTPCode_ELB_5XX
|
Count |
HTTPCode_ELB_5XX. The total number of HTTP 5XX client error codes generated by the load balancer. |
AWS.ELB.HealthyHostCount
|
Count |
healthyHostCount. The average number of healthy instances registered with your load balancer. |
AWS.ELB.HealthyHostPercent
|
Percent (%) |
The percentage of healthy hosts in a target group over a specified period. |
AWS.ELB.HttpCodeELB5xxRate
|
Percent (%) |
The rate of HTTP 5xx error codes (server errors) returned by the load balancer. |
AWS.ELB.Latency
|
milliseconds (ms) |
The time it takes for the load balancer to respond to requests. It includes the time spent processing the request and the time spent waiting for a response from the backend server. |
AWS.ELB.Latency.p50
|
milliseconds (ms) |
The 50th percentile (median) of the latency metric. It represents the middle value of the latency distribution, meaning 50% of the requests have a lower latency and 50% have a higher latency. |
AWS.ELB.Latency.p95
|
milliseconds (ms) |
The 95th percentile of the latency metric. It represents the latency below which 95% of the requests fall, providing a sense of the higher end of the latency distribution. |
AWS.ELB.Latency.p99
|
milliseconds (ms) |
The 99th percentile of the latency metric. It represents the latency below which 99% of the requests fall, giving you an idea of the very high end of the latency distribution. |
AWS.ELB.RequestCount
|
Count |
RequestCount. The total number of requests completed or connections made during the specified interval |
AWS.ELB.SpilloverCount
|
Count |
SpilloverCount. The total number of requests that were rejected because the surge queue is full. |
AWS.ELB.SurgeQueueLength
|
Count |
SurgeQueueLength. The total number of requests (HTTP listener) or connections (TCP listener) that are pending routing to a healthy instance. |
AWS.ELB.UnHealthyHostCount
|
Count |
UnHealthyHostCount. The average number of unhealthy instances registered with your load balancer. An instance is considered unhealthy after it exceeds the unhealthy threshold configured for health checks. |
FSx
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore |
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of FSx entities in the Metrics Explorer, filter the |
AWS.FSx.CapacityPoolReadBytes
|
Count |
The total number of bytes read from the capacity pool by clients. |
AWS.FSx.CapacityPoolReadOperations
|
Count |
The number of read operations performed on the capacity pool by clients. |
AWS.FSx.CapacityPoolWriteBytes
|
Count |
The total number of bytes written to the capacity pool by clients. |
AWS.FSx.CapacityPoolWriteOperations
|
Count |
The number of write operations performed on the capacity pool by clients. |
AWS.FSx.ClientConnections
|
Count | The total number of active connections between clients and the file server. |
AWS.FSx.CompressionRatio
|
ratio | Average ratio of compressed storage usage to uncompressed storage usage. |
AWS.FSx.CPUUtilization
|
Percent (%) |
The average percentage utilization of your file server’s CPU resources. |
AWS.FSx.DataReadBytes
|
bytes |
Total number of bytes for file system read operations. |
AWS.FSx.DataReadOperations
|
Count |
Total number of read operations. |
AWS.FSx.DataReadOperationsPercent
|
Percent (%) |
The percentage of read operations performed by clients on the file system. |
AWS.FSx.DataReadOperationTime
|
seconds (s) |
Total time spent within the file system for read operations (network I/O) from clients accessing data in the volume. |
AWS.FSx.DataReadThroughputPercent
|
Percent (%) |
The percentage of network throughput utilized for read operations. |
AWS.FSx.DataWriteBytes
|
bytes |
Total number of bytes for file system write operations. |
AWS.FSx.DataWriteOperations
|
Count |
Total number of write operations. |
AWS.FSx.DataWriteOperationsPercent
|
Percent (%) |
The percentage of write operations performed by clients on the file system. |
AWS.FSx.DataWriteOperationTime
|
seconds (s) |
Total time spent within the file system for fulfilling write operations (network I/O) from clients accessing data in the volume. |
AWS.FSx.DataWriteThroughputPercent
|
Percent (%) |
The percentage of network throughput utilized for write operations. It measures how much of the available write throughput capacity is being used. |
AWS.FSx.DeduplicationSavedStorage
|
bytes | The average amount of storage space saved by data deduplication, if enabled. |
AWS.FSx.DiskIopsUtilization
|
Percent (%) |
The average disk IOPS between your file server and storage volumes, as a percentage of the provisioned IOPS limit determined by the storage volumes. |
AWS.FSx.DiskReadBytes
|
bytes |
Total number of bytes for read operations that access storage volumes. |
AWS.FSx.DiskReadOperations
|
Count | Total number of read operations for the file server accessing storage volumes. |
AWS.FSx.DiskThroughputBalance
|
Percent (%) | The average percentage of available burst credits for disk throughput for the storage volumes. |
AWS.FSx.DiskThroughputUtilization
|
Percent (%) | The average disk throughput between your file server and its storage volumes, as a percentage of the provisioned limit determined by the storage volumes. |
AWS.FSx.DiskWriteBytes
|
bytes | Total number of bytes for write operations that access storage volumes. |
AWS.FSx.DiskWriteOperations
|
Count | Total number of write operations for the file server accessing storage volumes. |
AWS.FSx.FilesCapacity
|
Count | The total number of files (or inodes) that can be created on the volume. |
AWS.FSx.FileServerCacheHitRatio
|
Percent (%) | The ratio of cache hits to the total number of cache requests. A higher cache hit ratio indicates better performance as more data is served from the cache rather than from the disk. |
AWS.FSx.FileServerDiskIopsBalance
|
Percent (%) | The average percentage of available burst credits for disk IOPS between your file server and its storage volumes. |
AWS.FSx.FileServerDiskIopsUtilization
|
Percent (%) | The average disk IOPS between your file server and storage volumes, as a percentage of the provisioned limit determined by throughput capacity. |
AWS.FSx.FileServerDiskThroughputBalance
|
Percent (%) | The average percentage of available burst credits for disk throughput between your file server and its storage volumes. |
AWS.FSx.FileServerDiskThroughputUtilization
|
Percent (%) | The average disk throughput between your file server and its storage volumes, as a percentage of the provisioned limit determined by throughput capacity. |
AWS.FSx.FilesUsed
|
Count | The total number of used files (or inodes) on the volume. |
AWS.FSx.FreeDataStorageCapacity
|
bytes | The average amount of available storage capacity. |
AWS.FSx.FreeStorageCapacity
|
bytes | The average amount of available storage capacity. |
AWS.FSx.LogicalDataStored
|
bytes | The average amount of logical data stored on the file system, considering both the SSD tier and the capacity pool tier. |
AWS.FSx.LogicalDiskUsage
|
bytes | The average amount of logical data stored (uncompressed). |
AWS.FSx.MemoryUtilization
|
Percent (%) | The average percentage utilization of your file server’s memory resources. |
AWS.FSx.MetadataOperations
|
Count | The average number of metadata operations. |
AWS.FSx.MetadataOperationTime |
seconds (s) | Total time spent within the file system for fulfilling metadata operations (network I/O) from clients that are accessing data in the volume. |
AWS.FSx.NetworkReceivedBytes |
bytes | The total number of bytes received by the file system, including data movement to and from linked data repositories. |
AWS.FSx.NetworkSentBytes
|
bytes | The total number of bytes sent by the file system, including data movement to and from linked data repositories. |
AWS.FSx.NetworkThroughputUtilization
|
Percent (%) | The average network throughput for clients accessing the file system, as a percentage of the provisioned limit. |
AWS.FSx.NfsBadCalls
|
Count | Average number of calls rejected by the NFS server Remote Procedure Call (RPC) mechanism. |
AWS.FSx.PhysicalDiskUsage
|
bytes | The average amount of storage physically occupied by file system data (compressed). |
AWS.FSx.StorageCapacity
|
bytes | The average storage capacity of the primary (SSD) tier. |
AWS.FSx.StorageCapacityUtilization
|
Percent (%) | The used physical storage capacity as a average percentage of total storage capacity. |
AWS.FSx.StorageEfficiencySavings
|
bytes | The amount of storage savings achieved through data deduplication and compression techniques. |
AWS.FSx.StorageUsed
|
bytes | The average amount of physical data stored on the file system, on both the primary (SSD) tier and the capacity pool tier. |
AWS.FSx.UsedStorageCapacity
|
bytes | The total storage used on the volume. |
Kinesis Data Firehose
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Kinesis Firehose entities in the Metrics Explorer, filter the |
AWS.Firehose.ActivePartitionsLimit
|
Count | The maximum number of active partitions that a Firehose stream can process before sending data to the error bucket. |
AWS.Firehose.BackupToS3.Bytes
|
bytes | The number of bytes that have been backed up to Amazon S3. |
AWS.Firehose.BackupToS3.Success
|
Count | The number of successful backup operations to Amazon S3. |
AWS.Firehose.BytesPerSecondLimit
|
bps | The maximum number of bytes that can be processed per second. |
AWS.Firehose.DataReadFromKinesisStream.Bytes
|
bytes | The number of bytes read from the Kinesis data stream. |
AWS.Firehose.DataReadFromKinesisStream.Records
|
Count | The number of records read from the Kinesis data stream. |
AWS.Firehose.DataReadFromSource.Backpressured
|
bytes | Indicates whether the data source is backpressured, meaning it is temporarily unable to accept more data. |
AWS.Firehose.DataReadFromSource.Bytes
|
bytes | The number of raw bytes read from the source database. |
AWS.Firehose.DataReadFromSource.Records
|
Count | The number of records read from the source database. |
AWS.Firehose.DeliveryToAmazonOpenSearchServerless.AuthFailure
|
Count | The number of delivery attempts that failed due to authentication issues. |
AWS.Firehose.DeliveryToAmazonOpenSearchServerless.Bytes
|
bytes | The number of bytes delivered to Amazon OpenSearch Serverless. |
AWS.Firehose.DeliveryToAmazonOpenSearchServerless.DataFreshness
|
seconds (s) | The age of the oldest record in the delivery stream, measured from the time Firehose ingested the data to the present time. |
AWS.Firehose.DeliveryToAmazonOpenSearchServerless.DeliveryRejected
|
Count | The number of delivery attempts that were rejected. |
AWS.Firehose.DeliveryToAmazonOpenSearchServerless.Records
|
Count | The number of records delivered to Amazon OpenSearch Serverless. |
AWS.Firehose.DeliveryToAmazonOpenSearchServerless.Success
|
Count | The number of successful deliveries to Amazon OpenSearch Serverless. |
AWS.Firehose.DeliveryToAmazonOpenSearchService.AuthFailure
|
Count | The number of delivery failures due to authentication issues when delivering to Amazon OpenSearch Service. |
AWS.Firehose.DeliveryToAmazonOpenSearchService.Bytes
|
bytes | The total number of bytes delivered to Amazon OpenSearch Service. |
AWS.Firehose.DeliveryToAmazonOpenSearchService.DataFreshness
|
seconds (s) | The age of the oldest record in the delivery stream, measured from the time Firehose ingested the data to the present time. |
AWS.Firehose.DeliveryToAmazonOpenSearchService.DeliveryRejected
|
Count | The number of delivery attempts that were rejected by Amazon OpenSearch Service. |
AWS.Firehose.DeliveryToAmazonOpenSearchService.Records
|
Count | The number of records successfully delivered to Amazon OpenSearch Service. |
AWS.Firehose.DeliveryToAmazonOpenSearchService.Success
|
Count | The number of successful deliveries to Amazon OpenSearch Service. |
|
bytes |
The number of bytes indexed to Amazon ES over the specified time period. |
|
Count |
The number of records indexed to Amazon ES over the specified time period. |
|
Count |
The sum of the successfully indexed records over the sum of records that were attempted. |
AWS.Firehose.DeliveryToHttpEndpoint.Bytes
|
bytes | The number of bytes sent to the HTTP endpoint. |
AWS.Firehose.DeliveryToHttpEndpoint.DataFreshness
|
seconds (s) | The age of the oldest record in the delivery stream, measured from the time Firehose ingested the data to the present time. |
AWS.Firehose.DeliveryToHttpEndpoint.ProcessedBytes
|
bytes | The number of bytes processed by Firehose for delivery to the HTTP endpoint. |
AWS.Firehose.DeliveryToHttpEndpoint.ProcessedRecords
|
Count | The number of records processed by Firehose for delivery to the HTTP endpoint. |
AWS.Firehose.DeliveryToHttpEndpoint.Record
|
Count | The number of individual records sent to the HTTP endpoint. |
AWS.Firehose.DeliveryToHttpEndpoint.Success
|
Count | The number of successful deliveries to the HTTP endpoint. |
|
bytes |
The number of bytes copied to Amazon Redshift over the specified time period. |
|
Count |
The number of records copied to Amazon Redshift over the specified time period. |
|
Count |
The sum of successful Amazon Redshift COPY commands over the sum of all Amazon Redshift COPY commands. |
|
bytes |
The number of bytes delivered to Amazon S3 over the specified time period. |
|
seconds (s) |
The age (from getting into Kinesis Firehose to now) of the oldest record in Kinesis Firehose. Any record older than this age has been delivered to the S3 bucket. |
AWS.Firehose.DeliveryToS3.ObjectCount
|
Count | The number of objects that are being delivered to your S3 bucket. |
|
Count |
The number of records delivered to Amazon S3 over the specified time period. |
|
Count |
The sum of successful Amazon S3 put commands over the sum of all Amazon S3 put commands. |
|
milliseconds (ms) |
The time taken per DescribeDeliveryStream operation, measured over the specified time period. |
|
Count |
The total number of DescribeDeliveryStream requests. |
AWS.Firehose.FailedValidation.Bytes
|
bytes | The number of bytes that failed validation during data processing. |
AWS.Firehose.FailedValidation.Records
|
Count | The number of records that failed validation during data processing. |
|
bytes |
The number of bytes ingested into the Kinesis Firehose stream over the specified time period. |
AWS.Firehose.IncomingPutRequests
|
Count | The number of incoming put requests to the Firehose stream. |
|
Count |
The number of records ingested into the Kinesis Firehose stream over the specified time period. |
AWS.Firehose.JQProcessing.Duration
|
milliseconds (ms) | The amount of time it took to execute the JQ expression in the JQ Lambda function. |
AWS.Firehose.KafkaOffsetLag
|
Count | The difference between the last record written to the Kafka topic and the last record processed by the consumer. |
AWS.Firehose.KMSKeyAccessDenied
|
Count | Indicates that access to the KMS key was denied. It usually means that the necessary permissions are not set correctly for the Kinesis Data Firehose to use the KMS key. |
AWS.Firehose.KMSKeyDisabled
|
Count | Indicates that the KMS key is disabled and cannot be used. |
AWS.Firehose.KMSKeyInvalidState
|
Count | Indicates that the KMS key is in an invalid state and cannot be used. |
AWS.Firehose.KMSKeyNotFound
|
Count | Indicates that the KMS key was not found. It usually means that the specified key does not exist or the Firehose delivery stream is not configured correctly to use the key. |
|
milliseconds (ms) |
The time taken per ListDeliveryStream operation, measured over the specified time period. |
|
Count |
The total number of ListFirehose requests. |
AWS.Firehose.PartitionCount
|
Count | The number of partitions that are currently being used in the delivery stream. It helps you monitor the distribution of data across partitions. |
AWS.Firehose.PartitionCountExceeded
|
Count | Indicates that the number of partitions being used has exceeded the configured limit. |
AWS.Firehose.PerPartitionThroughput
|
bps | The throughput for each partition in the delivery stream. |
|
bytes |
The number of bytes put to the Kinesis Firehose delivery stream using PutRecord over the specified time period. |
|
milliseconds (ms) |
The time taken per PutRecord operation, measured over the specified time period. |
|
Count |
The total number of PutRecord requests, which is equal to total number of records from PutRecord operations. |
|
bytes |
The number of bytes put to the Kinesis Firehose delivery stream using PutRecordBatch over the specified time period. |
|
milliseconds (ms) |
The time taken per PutRecordBatch operation, measured over the specified time period. |
|
Count |
The total number of records from PutRecordBatch operations. |
|
Count |
The total number of PutRecordBatch requests. |
AWS.Firehose.PutRequestsPerSecondLimit
|
Count per second | The maximum number of put requests that can be processed per second by the Firehose delivery stream. |
AWS.Firehose.RecordsPerSecondLimit
|
Count per second | The maximum number of records that can be processed per second by the Firehose delivery stream. |
AWS.Firehose.SourceThrottled.Delay
|
milliseconds (ms) | The amount of time that records were delayed due to throttling at the data source. |
AWS.Firehose.ThrottledRecords
|
Count | The number of records that were throttled (temporarily paused) due to exceeding the processing capacity of the Firehose delivery stream. |
|
milliseconds (ms) |
The time taken per UpdateDeliveryStream operation, measured over the specified time period. |
|
Count |
The total number of UpdateDeliveryStream requests. |
Kinesis Data Stream
Basic Stream-level
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Kinesis Data Stream entities in the Metrics Explorer, filter the |
|
bytes |
The number of bytes retrieved from the Kinesis stream, measured over the specified time period. |
|
milliseconds (ms) |
The age of the last record in all GetRecords calls made against an Kinesis stream, measured over the specified time period. |
|
milliseconds (ms) |
The time taken per GetRecords operation, measured over the specified time period. |
|
Count |
The number of records retrieved from the shard, measured over the specified time period. Minimum, Maximum, and Average statistics represent the records in a single GetRecords operation for the stream in the specified time period. |
|
Count |
The number of successful GetRecords operations per stream, measured over the specified time period. |
|
bytes |
The number of bytes successfully put to the Kinesis stream over the specified time period. |
|
Count |
The number of records successfully put to the Kinesis stream over the specified time period. |
|
bytes |
The number of bytes put to the Kinesis stream using the PutRecord operation over the specified time period. |
|
milliseconds (ms) |
The time taken per PutRecord operation, measured over the specified time period. |
|
Count |
The number of successful PutRecord operations per Kinesis stream, measured over the specified time period. |
|
Binary Bytes |
The number of bytes put to the Kinesis stream using the PutRecords operation over the specified time period. |
AWS.Kinesis.PutRecords.FailedRecords
|
Count | The number of records that failed to be added to the Kinesis data stream. It helps in identifying issues with data ingestion. |
|
ms |
The time taken per PutRecords operation, measured over the specified time period. |
AWS.Kinesis.PutRecords.PutRecords.ThrottledRecords
|
Count | The number of records that were throttled (temporarily paused) due to exceeding the provisioned throughput for the stream. It helps in monitoring and managing the data flow. |
|
Count |
The number of PutRecords operations where at least one record succeeded, per Kinesis stream, measured over the specified time period. |
AWS.Kinesis.PutRecords.SuccessfulRecords
|
Count | The number of records that were successfully added to the Kinesis data stream. It provides insights into the overall success rate of data ingestion. |
|
Count |
The number of successful records in a PutRecords operation per Kinesis stream, measured over the specified time period. |
|
Count |
The number of GetRecords calls throttled for the stream over the specified time period. |
AWS.Kinesis.uptime
|
bytes | The uptime or availability of the Kinesis Data Streams service. It helps in monitoring the reliability and performance of the service. |
|
Count |
The number of records rejected due to throttling for the stream over the specified time period. This metric includes throttling from PutRecord and PutRecords operations. |
Enhanced Shard-level
Metric | Units | Description |
---|---|---|
|
bytes |
The number of bytes successfully put to the shard over the specified time period. |
|
Count |
The number of records successfully put to the shard over the specified time period. |
|
milliseconds (ms) |
The age of the last record in all GetRecords calls made against a shard, measured over the specified time period. |
|
bytes |
The number of bytes retrieved from the shard, measured over the specified time period. |
|
Count |
The number of records retrieved from the shard, measured over the specified time period. |
|
Count |
The number of GetRecords calls throttled for the shard over the specified time period. This exception count covers all dimensions of the following limits: 5 reads per shard per second or 2 MB per second per shard. |
AWS.Kinesis.SubscribeToShard.RateExceeded
|
Count per second | The number of times the rate limit was exceeded when calling SubscribeToShard . It helps you monitor and manage throttling issues. |
AWS.Kinesis.SubscribeToShard.Success
|
Count | The number of successful SubscribeToShard calls. It helps you track the success rate of your subscription requests. |
AWS.Kinesis.SubscribeToShardEvent.Bytes
|
bytes | The number of bytes received from the shard in a SubscribeToShardEvent . It helps you monitor the volume of data being processed. |
AWS.Kinesis.SubscribeToShardEvent.MillisBehindLatest
|
milliseconds (ms) | The number of milliseconds the consumer is behind the latest record in the shard. A value of zero means the consumer is up-to-date with the stream. |
AWS.Kinesis.SubscribeToShardEvent.Records
|
Count | The number of records received in a SubscribeToShardEvent . It helps you monitor the number of records being processed. |
AWS.Kinesis.SubscribeToShardEvent.Success
|
Count | The number of successful SubscribeToShardEvent calls. It helps you track the success rate of your event processing. |
|
Count |
The number of records rejected due to throttling for the shard over the specified time period. This metric includes throttling from PutRecord and PutRecords operations and covers all dimensions of the following limits: 1,000 records per second per shard or 1 MB per second per shard. |
Kinesis Video Stream
Metric | Unit | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of AWS Kinesis Video Stream entities in the Metrics Explorer, filter the |
AWS.KinesisVideo.ArchivedFragmentsConsumed.Media
|
Count | The number of fragment media quota points that were consumed by all of the APIs. |
AWS.KinesisVideo.ArchivedFragmentsConsumed.Metadata
|
Count | The number of fragments metadata quota points that were consumed by all of the APIs. |
AWS.KinesisVideo.GetClip.Latency
|
milliseconds (ms) | The latency of the GetClip API calls. |
AWS.KinesisVideo.GetClip.Outgoingbytes
|
bytes | The total number of bytes sent out from the service as part of the GetClip API. |
AWS.KinesisVideo.GetClip.Requests
|
Count | The number of GetClip API requests. |
AWS.KinesisVideo.GetClip.Success
|
Count | The number of Successful GetClip API requests. |
AWS.KinesisVideo.GetDASHManifest.Latency
|
milliseconds (ms) | The latency of the GetDASHManifest API calls. |
AWS.KinesisVideo.GetDASHManifest.Requests
|
Count | The number of GetDASHManifest API requests. |
AWS.KinesisVideo.GetDASHManifest.Success
|
Count | The number of Successful GetDASHManifest API requests. |
AWS.KinesisVideo.GetDASHStreamingSessionURL.Latency
|
milliseconds (ms) | The latency of the GetDASHStreamingSessionURL API calls. |
AWS.KinesisVideo.GetDASHStreamingSessionURL.Requests
|
Count | The number of GetDASHStreamingSessionURL API requests. |
AWS.KinesisVideo.GetDASHStreamingSessionURL.Success
|
Count | The number of Successful GetDASHStreamingSessionURL API requests. |
AWS.KinesisVideo.GetHLSMasterPlaylist.Latency
|
milliseconds (ms) | The latency of the GetHLSMasterPlaylist API calls. |
AWS.KinesisVideo.GetHLSMasterPlaylist.Requests
|
Count | The number of GetHLSMasterPlaylist API requests. |
AWS.KinesisVideo.GetHLSMasterPlaylist.Success
|
Count | The number of successful GetHLSMasterPlaylist API requests. |
AWS.KinesisVideo.GetHLSMediaPlaylist.Latency
|
milliseconds (ms) | The latency of the GetHLSMediaPlaylist API calls. |
AWS.KinesisVideo.GetHLSMediaPlaylist.Requests
|
Count | The number of GetHLSMediaPlaylist API requests. |
AWS.KinesisVideo.GetHLSMediaPlaylist.Success
|
Count | The number of Successful GetHLSMediaPlaylist API requests. |
AWS.KinesisVideo.GetHLSStreamingSessionURL.Latency
|
milliseconds (ms) | The latency of the GetHLSStreamingSessionURL API calls. |
AWS.KinesisVideo.GetHLSStreamingSessionURL.Requests
|
Count | The number of GetHLSStreamingSessionURL API requests. |
AWS.KinesisVideo.GetHLSStreamingSessionURL.Success
|
Count | The number of successful GetHLSStreamingSessionURL API requests. |
AWS.KinesisVideo.GetMedia.ConnectionErrors
|
Count | The number of connections that were not successfully established. |
AWS.KinesisVideo.GetMedia.MillisBehindNow
|
milliseconds (ms) | The time difference between the current server timestamp and the server timestamp of the last fragment sent. |
AWS.KinesisVideo.GetMedia.Outgoingbytes
|
bytes | The total number of bytes sent out from the service as part of the GetMedia API for a given stream. |
AWS.KinesisVideo.GetMedia.OutgoingFragments
|
Count | The number of fragments sent while doing GetMedia for the stream. |
AWS.KinesisVideo.GetMedia.OutgoingFrames
|
Count | The number of frames sent during GetMedia on the given stream. |
AWS.KinesisVideo.GetMedia.Requests
|
Count | The number of GetMedia API requests for a given stream. |
AWS.KinesisVideo.GetMedia.Success
|
Count | The number of connections that were successfully established. |
AWS.KinesisVideo.GetMediaForFragmentList.Outgoingbytes
|
bytes | The total number of bytes sent out from the service as part of the GetMediaForFragmentList API for a given stream. |
AWS.KinesisVideo.GetMediaForFragmentList.OutgoingFragments
|
Count | The total number of fragments sent out from the service as part of the GetMediaForFragmentList API. |
AWS.KinesisVideo.GetMediaForFragmentList.OutgoingFrames
|
Count | The total number of frames sent out from the service as part of the GetMediaForFragmentList API. |
AWS.KinesisVideo.GetMediaForFragmentList.Requests
|
Count | The number of GetMediaForFragmentList API requests for a given stream. |
AWS.KinesisVideo.GetMediaForFragmentList.Success
|
Count | The number of Successful GetMediaForFragmentList API requests for a given stream. |
AWS.KinesisVideo.GetMP4InitFragment.Latency
|
milliseconds (ms) | The latency of the GetMP4InitFragment API calls. |
AWS.KinesisVideo.GetMP4InitFragment.Requests
|
Count | The number of GetMP4InitFragment API requests. |
AWS.KinesisVideo.GetMP4InitFragment.Success
|
Count | The number of Successful GetMP4InitFragment API requests. |
AWS.KinesisVideo.GetMP4MediaFragment.Latency
|
milliseconds (ms) | The latency of the GetMP4MediaFragment API calls. |
AWS.KinesisVideo.GetMP4MediaFragment.Outgoingbytes
|
bytes | The total number of bytes sent out from the service as part of the GetMP4MediaFragment API. |
AWS.KinesisVideo.GetMP4MediaFragment.Requests
|
Count | The number of GetMP4MediaFragment API requests. |
AWS.KinesisVideo.GetMP4MediaFragment.Success
|
Count | The number of Successful GetMP4MediaFragment API requests. |
AWS.KinesisVideo.GetTSFragment.Latency
|
milliseconds (ms) | The latency of the GetTSFragment API calls. |
AWS.KinesisVideo.GetTSFragment.Outgoingbytes
|
bytes | The total number of bytes sent out from the service as part of the GetTSFragment API. |
AWS.KinesisVideo.GetTSFragment.Requests
|
Count | The number of GetTSFragment API requests. |
AWS.KinesisVideo.GetTSFragment.Success
|
Count | The number of successful GetTSFragment API requests. |
AWS.KinesisVideo.ListFragments.Latency
|
milliseconds (ms) | The latency of the ListFragments API calls. |
AWS.KinesisVideo.ListFragments.Requests
|
Count | The number of ListFragments API requests. |
AWS.KinesisVideo.ListFragments.Success
|
Count | The number of successful ListFragments API requests. |
AWS.KinesisVideo.PutMedia.ActiveConnections
|
Count | The total number of connections to the service host. |
AWS.KinesisVideo.PutMedia.BufferingAckLatency
|
milliseconds (ms) | The time difference between when the first byte of a new fragment is received by Amazon Kinesis Video Streams and when the Buffering ACK is sent for the fragment. |
AWS.KinesisVideo.PutMedia.ConnectionErrors
|
Count | The errors while establishing PutMedia connection for the stream. |
AWS.KinesisVideo.PutMedia.ErrorAckCount
|
Count | The number of Error ACKs sent while doing PutMedia for the stream. |
AWS.KinesisVideo.PutMedia.FragmentIngestionLatency
|
milliseconds (ms) | The time difference between when the first and last bytes of a fragment are received by Amazon Kinesis Video Streams. |
AWS.KinesisVideo.PutMedia.FragmentPersistLatency
|
milliseconds (ms) | The time taken from when the complete fragment data is received and archived. |
AWS.KinesisVideo.PutMedia.Incomingbytes
|
bytes | The number of bytes received as part of PutMedia for the stream. |
AWS.KinesisVideo.PutMedia.IncomingFragments
|
Count | The number of complete fragments received as part of PutMedia for the stream. |
AWS.KinesisVideo.PutMedia.IncomingFrames
|
Count | The number of complete frames received as part of PutMedia for the stream. |
AWS.KinesisVideo.PutMedia.Latency
|
milliseconds (ms) | The time difference between the request and the HTTP response from InletService while establishing the connection. |
AWS.KinesisVideo.PutMedia.PersistedAckLatency
|
milliseconds (ms) | The time difference between when the last byte of a new fragment is received by Amazon Kinesis Video Streams and when the Persisted ACK is sent for the fragment. |
AWS.KinesisVideo.PutMedia.ReceivedAckLatency
|
milliseconds (ms) | The time difference between when the last byte of a new fragment is received by Amazon Kinesis Video Streams and when the Received ACK is sent for the fragment. |
AWS.KinesisVideo.PutMedia.Requests
|
Count | The number of PutMedia API requests for a given stream. |
AWS.KinesisVideo.PutMedia.Success
|
Count | The number of Successes sent while doing PutMedia for the stream. |
Lambda
Metric | Units | Description |
---|---|---|
AWS.Lambda.AsyncEventsAge
|
milliseconds (ms) | The age of asynchronous events that are being processed. It helps in understanding the latency of event processing in the Lambda function. |
AWS.Lambda.AsyncEventsDropped
|
Count | The number of asynchronous events that were dropped due to errors or exceeded retries. It helps in identifying issues with event processing and potential data loss. |
AWS.Lambda.AsyncEventsReceived
|
Count | The number of asynchronous events received by the Lambda function. It provides insights into the workload and the volume of events being processed. |
AWS.Lambda.ClaimedAccountConcurrency
|
Count | The number of concurrent executions claimed by the Lambda function. It helps in monitoring the utilization of the function's reserved concurrency and overall account concurrency. |
AWS.Lambda.ConcurrentExecutions
|
Count |
ConcurrentExecutions. The maximum number of function instances that are processing events. |
AWS.Lambda.DeadLetterErrors
|
Count |
DeadLetterErrors. The total number of times that Lambda attempts to send an event to a dead-letter queue but fails. Dead-letter errors can occur due to permissions errors, misconfigured resources, or size limits. |
AWS.Lambda.DestinationDeliveryFailures
|
Count | The number of times an asynchronous invocation's result could not be delivered to its destination due to issues such as permission errors or unreachable endpoints. |
AWS.Lambda.Duration
|
milliseconds (ms) |
Duration. The average amount of time that your function code spends processing an event. |
AWS.Lambda.ErrorRate
|
Count |
The rate of errors that occurred while invoking the Lambda function. It helps in understanding the reliability and stability of the function's execution. |
AWS.Lambda.Errors
|
Count |
Errors. The total number of invocations that result in a function error. |
AWS.Lambda.Invocations
|
Count |
Invocations. The total number of times that a function code is invoked, including successful invocations and invocations that result in a function error. |
AWS.Lambda.IteratorAge
|
milliseconds (ms) |
IteratorAge. The maximum amount of time between when a stream receives the record and when the event source mapping sends the event to the function. |
AWS.Lambda.OffsetLag
|
milliseconds (ms) | The difference between the last record processed and the latest record available in the event source. It helps in identifying whether the function is keeping up with the incoming data. |
AWS.Lambda.OversizedRecordCount
|
Count | The number of records that exceed the maximum allowable size for processing. It helps in monitoring and managing large records that might need special handling. |
AWS.Lambda.PostRuntimeExtensionsDuration
|
milliseconds (ms) |
The time spent running post-runtime extensions after the function execution completes. It helps in understanding the additional overhead introduced by extensions. |
AWS.Lambda.ProvisionedConcurrentExecutions
|
Count | The number of concurrent executions that are provisioned for the Lambda function. It helps in ensuring that the function has enough capacity to handle incoming requests. |
AWS.Lambda.ProvisionedConcurrencyInvocations
|
Count | The number of invocations that use provisioned concurrency. It helps you understand how many function executions are benefiting from pre-allocated compute capacity. |
AWS.Lambda.ProvisionedConcurrencySpilloverInvocations
|
Count | The number of invocations that exceeded the provisioned concurrency and used on-demand capacity instead. It provides insights into how often your function is going beyond its reserved capacity. |
AWS.Lambda.ProvisionedConcurrencyUtilization
|
Percent (%) | The percentage of provisioned concurrency that is being utilized. It helps you monitor the efficiency of your provisioned resources. |
AWS.Lambda.RecursiveInvocationsDropped
|
Count | The number of recursive invocations that were dropped to prevent infinite loops or excessive recursion. It's useful for ensuring stability and avoiding resource exhaustion. |
AWS.Lambda.ThrottleRate
|
Percent (%) |
The rate of throttled invocations due to reaching concurrency limits. It helps in identifying capacity issues and optimizing function performance. |
AWS.Lambda.Throttles
|
Count |
Throttles. The total number of invocation requests that are throttled. When all function instances are processing requests and no concurrency is available to scale up, Lambda rejects additional requests with a TooManyRequestsException error. |
AWS.Lambda.UnreservedConcurrentExecutions
|
Count |
The number of concurrent executions that are not using reserved concurrency. It helps in monitoring the utilization of the function's overall concurrency capacity. |
Managed Apache Flink
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore |
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Managed Apache Flink entities in the Metrics Explorer, filter the |
AWS.KinesisAnalysis.uptime
|
milliseconds (ms) | The time that the job has been running without interruption. |
AWS.KinesisAnalysis.lastCheckpointSize
|
bytes | The total size of the last checkpoint. |
AWS.KinesisAnalysis.lastCheckpointDuration
|
milliseconds (ms) | The time it took to complete the last checkpoint. |
AWS.KinesisAnalysis.cpuUtilization
|
Percent (%) | Overall percentage of CPU utilization across task managers. |
AWS.KinesisAnalysis.containerCPUUtilization
|
Percent (%) | Overall percentage of CPU utilization across task manager containers in Flink application cluster. |
AWS.KinesisAnalysis.containerMemoryUtilization
|
Percent (%) | Overall percentage of memory utilization across task manager containers in Flink application cluster. |
AWS.KinesisAnalysis.containerDiskUtilization
|
Percent (%) | Overall percentage of disk utilization across task manager containers in Flink application cluster. |
AWS.KinesisAnalysis.heapMemoryUtilization
|
Percent (%) | Overall heap memory utilization across task managers. |
AWS.KinesisAnalysis.downtime
|
milliseconds (ms) | For jobs currently in a failing/recovering situation, the time elapsed during this outage. |
AWS.KinesisAnalysis.fullRestarts
|
Count | The total number of times this job has fully restarted since it was submitted. |
AWS.KinesisAnalysis.managedMemoryUtilization
|
Percent (%) | Derived by managedMemoryUsed /managedMemoryTotal . |
AWS.KinesisAnalysis.numRecordsInPerSecond
|
Count per second | The total number of records this application, operator or task has received per second. |
AWS.KinesisAnalysis.numRecordsOutPerSecond
|
Count per second | The total number of records this application, operator or task has emitted per second. |
AWS.KinesisAnalysis.threadcount
|
Count | The total number of live threads used by the application. |
AWS.KinesisAnalysis.backPressuredTimeMsPerSecond
|
milliseconds (ms) | The time this task or operator is back pressured per second. |
AWS.KinesisAnalysis.busyTimeMsPerSecond
|
milliseconds (ms) | The time this task or operator is busy (neither idle nor back pressured) per second. |
AWS.KinesisAnalysis.currentInputWatermark
|
milliseconds (ms) | The last watermark this application/operator/task/thread has received. |
AWS.KinesisAnalysis.currentOutputWatermark
|
milliseconds (ms) | The last watermark this application/operator/task/thread has emitted. |
AWS.KinesisAnalysis.idleTimeMsPerSecond
|
milliseconds (ms) | The time this task or operator is idle per second. |
AWS.KinesisAnalysis.managedMemoryUsed
|
bytes | The amount of managed memory currently used. |
AWS.KinesisAnalysis.managedMemoryTotal
|
bytes | The total amount of managed memory. |
AWS.KinesisAnalysis.numberOfFailedCheckpoints
|
Count | The number of times checkpointing has failed. |
AWS.KinesisAnalysis.numRecordsIn
|
Count | The total number of records this application, operator, or task has received. |
AWS.KinesisAnalysis.numRecordsOut
|
Count | The total number of records this application, operator or task has emitted. |
AWS.KinesisAnalysis.numLateRecordsDropped
|
Count | The number of records that were dropped because they arrived late and were beyond the processing window. |
AWS.KinesisAnalysis.oldGenerationGCcount
|
Count | The number of times the old generation garbage collection has occurred. |
AWS.KinesisAnalysis.oldGenerationGCTime
|
milliseconds (ms) | The total time spent on old generation garbage collection. |
AWS.KinesisAnalysis.millisBehindLatest
|
milliseconds (ms) | Indicates how many milliseconds behind the latest data the application is. |
AWS.KinesisAnalysis.bytesRequestedPerFetch
|
bytes | The number of bytes requested per fetch operation from the data stream. |
AWS.KinesisAnalysis.currentoffsets
|
Count | The current offsets of the data being processed in a Kinesis Data Analytics application. |
AWS.KinesisAnalysis.commitsFailed
|
Count | The number of failed commit attempts in the application. |
AWS.KinesisAnalysis.commitsSucceeded
|
Count | The number of successful commit operations. |
AWS.KinesisAnalysis.committedoffsets
|
Count | The number of offsets that have been successfully committed. |
AWS.KinesisAnalysis.records_lag_max
|
Count | The maximum lag in records being processed, measured in milliseconds. |
AWS.KinesisAnalysis.bytes_consumed_rate
|
bytes | The rate at which data is consumed from the Kinesis stream. |
AWS.KinesisAnalysis.zeppelinCpuUtilization
|
Percent (%) | The percentage of CPU resources being used by the Zeppelin server. |
AWS.KinesisAnalysis.zeppelinHeapMemoryUtilization
|
Percent (%) | The percentage of heap memory utilized by the Zeppelin server. |
AWS.KinesisAnalysis.zeppelinThreadcount
|
Count per second | The number of active threads being used by the Zeppelin server. |
AWS.KinesisAnalysis.zeppelinWaitingJobs
|
Count | The number of jobs waiting to be executed in the Zeppelin server. |
AWS.KinesisAnalysis.zeppelinServerUptime
|
seconds (s) | The uptime of the Zeppelin server, indicating how long it has been running continuously. |
NAT Gateway
Metric | Units | Description |
---|---|---|
AWS.NATGateway.ActiveConnectionCount
|
Count |
ActiveConnectionCount. The maximum number of concurrent active TCP connections through the NAT gateway. |
AWS.NATGateway.Bandwidth
|
bps |
The total network bandwidth used by the NAT gateway. |
AWS.NATGateway.BytesInFromDestination
|
bytes |
BytesInFromDestination. The total number of bytes received by the NAT gateway from the destination. |
AWS.NATGateway.BytesInFromSource
|
bytes |
BytesInFromSource. The total number of bytes received by the NAT gateway from clients in VPC. |
AWS.NATGateway.BytesOutToDestination
|
bytes |
BytesOutToDestination. The total number of bytes sent out through the NAT gateway to the destination. |
AWS.NATGateway.BytesOutToSource
|
bytes |
BytesOutToSource. The total number of bytes sent through the NAT gateway to the clients in VPC. |
AWS.NATGateway.ConnectionAttemptCount
|
Count |
ConnectionAttemptCount. The total number of connection attempts made through the NAT gateway. |
AWS.NATGateway.ConnectionEstablishedCount
|
Count |
ConnectionEstablishedCount. The total number of connections established through the NAT gateway. |
AWS.NATGateway.ConnectionEstablishedPercent
|
Percent (%) |
The percentage of connection attempts that successfully establish a connection through the NAT gateway. |
AWS.NATGateway.ErrorPortAllocation
|
Count |
ErrorPortAllocation. The total number of times the NAT gateway could not allocate a source port. |
AWS.NATGateway.IdleTimeoutCount
|
Count |
IdleTimeoutCount. The total number of connections that transitioned from the active state to the idle state. |
AWS.NATGateway.PacketsDropCount
|
Count |
PacketsDropCount. The total number of packets dropped by the NAT gateway. |
AWS.NATGateway.PacketsInFromDestination
|
Count |
PacketsInFromDestination. The total number of packets received by the NAT gateway from the destination. |
AWS.NATGateway.PacketsInFromSource
|
Count |
PacketsInFromSource. The total number of packets received by the NAT gateway from clients in VPC. |
AWS.NATGateway.PacketsOutToDestination
|
Count |
PacketsOutToDestination. The total number of packets sent out through the NAT gateway to the destination. |
AWS.NATGateway.PacketsOutToSource
|
Count |
PacketsOutToSource. The total number of packets sent through the NAT gateway to the clients in VPC. |
AWS.NATGateway.PeakBytesPerSecond
|
Count | The peak rate of bytes transferred per second through the NAT gateway. |
AWS.NATGateway.PeakPacketsPerSecond
|
Count | The peak rate of packets transferred per second through the NAT gateway. |
Neptune
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore |
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Neptune entities in the Metrics Explorer, filter the |
AWS.Neptune.BackupRetentionPeriodStorageUsed
|
bytes | The total amount of backup storage used to support from the Neptune DB cluster's backup retention window |
AWS.Neptune.BufferCacheHitRatio
|
Percent (%) | The percentage of requests that are served by the buffer cache. |
AWS.Neptune.ClusterReplicaLag
|
milliseconds (ms) | For a read replica, the amount of lag when replicating updates from the primary instance. |
AWS.Neptune.ClusterReplicaLagMaximum
|
milliseconds (ms) | The maximum amount of lag between the primary instance and each Neptune DB instance in the DB cluster. |
AWS.Neptune.ClusterReplicaLagMinimum
|
milliseconds (ms) | The minimum amount of lag between the primary instance and each Neptune DB instance in the DB cluster. |
AWS.Neptune.ClusterReplicaLagMinimum
|
milliseconds (ms) | The minimum amount of lag between the primary instance and each Neptune DB instance in the DB cluster. |
AWS.Neptune.CPUUtilization
|
Percent (%) | The percentage of CPU utilization. |
AWS.Neptune.EngineUptime
|
seconds | The amount of time that the instance has been running. |
AWS.Neptune.FreeableMemory
|
bytes | The amount of available random access memory. |
AWS.Neptune.GlobalDbDataTransferBytes
|
bytes | The number of bytes of redo log data transferred from the primary AWS Region to a secondary AWS Region in a Neptune global database. |
AWS.Neptune.GlobalDbProgressLag
|
milliseconds (ms) | The number of milliseconds that a secondary cluster is behind the primary cluster for both user transactions and system transactions. |
AWS.Neptune.GlobalDbReplicatedWriteIO
|
Count | The number of write I/O operations replicated from the primary AWS Region in the global database to the cluster volume in a secondary AWS Region. |
AWS.Neptune.GremlinRequestsPerSec
|
Count per second | Number of requests per second to the Gremlin engine. |
AWS.Neptune.GremlinWebSocketOpenConnections
|
Count | The number of open WebSocket connections to Neptune. |
AWS.Neptune.LoaderRequestsPerSec
|
Count per second | Number of loader requests per second. |
AWS.Neptune.MainRequestQueuePendingRequests
|
Count | The number of requests waiting in the input queue pending execution. Neptune starts throttling requests when they exceed the maximum queue capacity. |
AWS.Neptune.NCUUtilization
|
Percent (%) | At a cluster level, NCUUtilization reports the percentage of maximum capacity being used by the cluster as a whole. |
AWS.Neptune.NetworkThroughput
|
bps | The amount of network throughput both received from and transmitted to clients by each instance in the Neptune DB cluster, in bytes per second. This throughput does not include network traffic between instances in the DB cluster and the cluster volume. |
AWS.Neptune.NetworkTransmitThroughput
|
bps | The amount of outgoing network throughput transmitted to clients by each instance in the Neptune DB cluster, in bytes per second. This throughput does not include network traffic between instances in the DB cluster and the cluster volume. |
AWS.Neptune.NumTxCommitted
|
Count per second | The number of transactions successfully committed per second. |
AWS.Neptune.NumTxOpened
|
Count per second | The number of transactions opened on the server per second. |
AWS.Neptune.NumTxRolledBack
|
Count per second | For write queries, the number of transactions per second rolled back on the server because of errors. For read-only queries, this metric is equal to the number of completed read-only transactions per second. |
AWS.Neptune.OpenCypherBoltOpenConnections
|
Count | The number of open Bolt connections to Neptune. |
AWS.Neptune.OpenCypherRequestsPerSec
|
Count per second | Number of requests per second (both HTTPS and Bolt) to the openCypher engine. |
AWS.Neptune.ServerlessDatabaseCapacity
|
Count | As an instance-level metric, ServerlessDatabaseCapacity reports the current instance capacity of a given Neptune serverless instance, in NCUs. At a cluster-level, ServerlessDatabaseCapacity reports the average of all the ServerlessDatabaseCapacity values of the DB instances in the cluster. |
AWS.Neptune.SnapshotStorageUsed
|
bytes | The total amount of backup storage consumed by all snapshots for a Neptune DB cluster outside its backup retention window, in bytes. Included in the total reported by the TotalBackupStorageBilled metric. |
AWS.Neptune.SparqlRequestsPerSec
|
Count per second | The number of requests per second to the SPARQL engine. |
AWS.Neptune.StatsNumStatementsScanned
|
Count | The total number of statements scanned for DFE statistics since the server started. |
AWS.Neptune.TotalBackupStorageBilled
|
bytes | The total amount of backup storage for which you are billed for a given Neptune DB cluster, in bytes. Includes the backup storage measured by the BackupRetentionPeriodStorageUsed and SnapshotStorageUsed metrics. |
AWS.Neptune.TotalClientErrorsPerSec
|
Count per second | The total number per second of requests that errored out because of client-side issues. |
AWS.Neptune.TotalRequestsPerSec
|
Count per second | The total number of requests per second to the server from all sources. |
AWS.Neptune.TotalServerErrorsPerSec
|
Count per second | The total number per second of requests that errored out on the server because of internal failures. |
AWS.Neptune.UndoLogListSize
|
Count | The count of undo logs in the undo log list. |
AWS.Neptune.VolumeBytesUsed
|
bytes | The total amount of storage allocated to your Neptune DB cluster. |
AWS.Neptune.VolumeReadIOPs
|
Count | The average number of billed read I/O operations from a cluster volume, reported at 5-minute intervals. Billed read operations are calculated at the cluster volume level, aggregated from all instances in the Neptune DB cluster, and then reported at 5-minute intervals. |
AWS.Neptune.VolumeWriteIOPs
|
Count | The average number of write disk I/O operations to the cluster volume, reported at 5-minute intervals. |
NetworkELB
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore |
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Network ELB entities in the Metrics Explorer, filter the |
AWS.NetworkELB.ActiveFlowCount
|
Count |
The total number of concurrent flows (or connections) from clients to targets. This metric includes connections in the SYN_SENT and ESTABLISHED states. TCP connections are not terminated at the load balancer, so a client opening a TCP connection to a target counts as a single flow. |
AWS.NetworkELB.ActiveFlowCount_TCP
|
Count |
The total number of concurrent TCP flows (or connections) from clients to targets. This metric includes connections in the SYN_SENT and ESTABLISHED state. TCP connections are not terminated at the load balancer, so a client opening a TCP connection to a target counts as a single flow. |
AWS.NetworkELB.ActiveFlowCount_TLS
|
Count | The total number of concurrent TLS flows (or connections) from clients to targets. This metric includes connections in the SYN_SENT and ESTABLISHED state. |
AWS.NetworkELB.ActiveFlowCount_UDP
|
Count | The total number of concurrent UDP flows (or connections) from clients to targets. |
AWS.NetworkELB.ClientTLSNegotiationErrorCount
|
Count | The total number of TLS handshakes that failed during negotiation between a client and a TLS listener. |
AWS.NetworkELB.ConsumedLCUs
|
Count |
The number of load balancer capacity units (LCU) used by your load balancer. You pay for the number of LCUs that you use per hour. For more information, see Elastic Load Balancing Pricing. |
AWS.NetworkELB.ConsumedLCUs_TCP
|
Count |
The number of load balancer capacity units (LCU) used by your load balancer for TCP. You pay for the number of LCUs that you use per hour. For more information, see Elastic Load Balancing Pricing. |
AWS.NetworkELB.ConsumedLCUs_TLS
|
Count | The number of load balancer capacity units (LCU) used by your load balancer for TLS. You pay for the number of LCUs that you use per hour. For more information, see Elastic Load Balancing Pricing. |
AWS.NetworkELB.ConsumedLCUs_UDP
|
Count | The number of load balancer capacity units (LCU) used by your load balancer for UDP. You pay for the number of LCUs that you use per hour. For more information, see Elastic Load Balancing Pricing. |
AWS.NetworkELB.HealthyHostCount
|
Count |
The number of targets that are considered healthy. This metric does not include any Application Load Balancers registered as targets. |
AWS.NetworkELB.NewFlowCount
|
Count |
The total number of new flows (or connections) established from clients to targets in the time period. |
AWS.NetworkELB.NewFlowCount_TCP
|
Count |
The total number of new TCP flows (or connections) established from clients to targets in the time period. |
AWS.NetworkELB.NewFlowCount_TLS
|
Count | The total number of new TLS flows (or connections) established from clients to targets in the time period. |
AWS.NetworkELB.NewFlowCount_UDP
|
Count | The total number of new UDP flows (or connections) established from clients to targets in the time period. |
AWS.NetworkELB.PeakPacketsPerSecond
|
Count per second |
Highest average packet rate (packets processed per second), calculated every 10 seconds during the sampling window. This metric includes health check traffic. |
AWS.NetworkELB.PortAllocationErrorCount
|
Count |
The total number of ephemeral port allocation errors during a client IP translation operation. A non-zero value indicates dropped client connections. Network Load Balancers support 55,000 simultaneous connections or about 55,000 connections per minute to each unique target (IP address and port) when performing client address translation. To fix port allocation errors, add more targets to the target group. |
AWS.NetworkELB.ProcessedBytes
|
bytes |
The total number of bytes processed by the load balancer, including TCP/IP headers. This count includes traffic to and from targets, minus health check traffic. |
AWS.NetworkELB.ProcessedBytes_TCP
|
bytes |
The total number of bytes processed by TCP listeners. |
AWS.NetworkELB.ProcessedBytes_TLS
|
bytes | The total number of bytes processed by TLS listeners. |
AWS.NetworkELB.ProcessedBytes_UDP
|
bytes | The total number of bytes processed by UDP listeners. |
AWS.NetworkELB.ProcessedPackets
|
Count |
The total number of packets processed by the load balancer. This count includes traffic to and from targets, including health check traffic. |
AWS.NetworkELB.RejectedFlowCount
|
Count |
The number of network flows rejected by the Network Load Balancer due to security group rules or other policies. |
AWS.NetworkELB.SecurityGroupBlockedFlowCount_Inbound_ICMP
|
Count | The number of new ICMP messages rejected by the inbound rules of the load balancer security groups. |
AWS.NetworkELB.SecurityGroupBlockedFlowCount_Inbound_TCP
|
Count |
The number of new TCP flows rejected by the inbound rules of the load balancer security groups. |
AWS.NetworkELB.SecurityGroupBlockedFlowCount_Inbound_UDP
|
Count | The number of new UDP flows rejected by the inbound rules of the load balancer security groups. |
AWS.NetworkELB.SecurityGroupBlockedFlowCount_Outbound_ICMP
|
Count | The number of new ICMP messages rejected by the outbound rules of the load balancer security groups. |
AWS.NetworkELB.SecurityGroupBlockedFlowCount_Outbound_TCP
|
Count | The number of new TCP flows rejected by the outbound rules of the load balancer security groups. |
AWS.NetworkELB.SecurityGroupBlockedFlowCount_Outbound_UDP
|
Count | The number of new UDP flows rejected by the outbound rules of the load balancer security groups. |
AWS.NetworkELB.TargetTLSNegotiationErrorCount
|
Count | The total number of TLS handshakes that failed during negotiation between a TLS listener and a target. |
AWS.NetworkELB.TCP_Client_Reset_Count
|
Count |
The total number of reset (RST) packets sent from a client to a target. These resets are generated by the client and forwarded by the load balancer. |
AWS.NetworkELB.TCP_ELB_Reset_Count
|
Count |
The total number of reset (RST) packets generated by the load balancer. For more information, see Troubleshooting. |
AWS.NetworkELB.TCP_Target_Reset_Count
|
Count |
The total number of reset (RST) packets sent from a target to a client. These resets are generated by the target and forwarded by the load balancer. |
AWS.NetworkELB.UnHealthyHostCount
|
Count | The number of targets that are considered unhealthy. This metric does not include any Application Load Balancers registered as targets. Reporting criteria: Reported if health checks are enabled. |
AWS.NetworkELB.UnhealthyRoutingFlowCount
|
Count | The number of flows (or connections) that are routed using the routing failover action (fail open). |
AWS.NetworkELB.ZonalHealthStatus
|
Status Indicator | Represents the health status of a Network Load Balancer in each availability zone, helping to identify failover events and potential issues. |
OpenSearch Collection
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore |
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of OpenSearch Collection entities in the Metrics Explorer, filter the |
AWS.AOSS.2xx
|
Count | The number of 2XX HTTP status code responses. These indicate successful requests. |
AWS.AOSS.3xx
|
Count | The number of 3XX HTTP status code responses. These indicate redirections |
AWS.AOSS.4xx
|
Count | The number of 4XX HTTP status code responses. These indicate client errors, such as bad requests or unauthorized access. |
AWS.AOSS.5xx
|
Count | The number of 5XX HTTP status code responses. These indicate server errors, such as server overload or server-side issues. |
AWS.AOSS.ActiveCollection
|
Count |
Indicates whether a collection is active. A value of 1 means the collection is in an ACTIVE state. This metric is emitted upon successful creation of a collection and remains 1 until the collection is deleted. |
AWS.AOSS.DeletedDocuments
|
Count | The total number of documents that have been deleted from the collection. This metric increases after delete requests are processed and decreases after index segments are merged within the cluster. |
AWS.AOSS.HotStorageUsed
|
bytes | The amount of storage used for hot data, which is data that is frequently accessed and needs to be readily available. |
AWS.AOSS.IndexingOCU
|
Count | The number of OpenSearch Compute Units (OCUs) used to ingest collection data. This metric applies at the account level and helps monitor the compute resources used for indexing. |
AWS.AOSS.IngestionDataRate
|
Gigabytes per second (GB/s) |
The indexing rate in GiB per second to a collection or index. This metric only applies to bulk indexing requests and helps track the data ingestion speed. |
AWS.AOSS.IngestionDocumentErrors
|
Count |
The number of document ingestion errors that occur while indexing data. This metric helps in identifying issues during the data ingestion process. |
AWS.AOSS.IngestionDocumentRate
|
Count per second |
The rate per second at which documents are being ingested to a collection or index. This metric applies to bulk indexing requests. |
AWS.AOSS.IngestionRequestErrors
|
Count |
The total number of bulk indexing request errors to a collection. This metric is emitted when a bulk indexing request fails for any reason, such as an authentication or availability issue. |
AWS.AOSS.IngestionRequestLatency
|
seconds (s) | The time it takes for ingestion requests to be processed and completed. This metric measures the latency from the start to the end of the ingestion request. |
AWS.AOSS.IngestionRequestRate
|
Count per second |
The rate at which ingestion requests are being made to a collection or index. This metric tracks the number of requests per unit of time. |
AWS.AOSS.IngestionRequestSuccess
|
Count |
The number of successful ingestion requests to a collection or index. This metric counts the requests that were successfully processed without errors. |
AWS.AOSS.SearchableDocuments
|
Count | The total number of searchable documents in the OpenSearch domain. |
AWS.AOSS.SearchOCU
|
Count | The number of OpenSearch Compute Units (OCUs) used for search operations. |
AWS.AOSS.SearchRequestErrors
|
Count per minute |
The total number of errors encountered during search requests. |
AWS.AOSS.SearchRequestLatency
|
milliseconds (ms) | The average latency (response time) for search requests. |
AWS.AOSS.SearchRequestRate
|
Count per minute |
The rate at which search requests are being made. |
AWS.AOSS.StorageUsedInS3
|
bytes | The amount of storage used in Amazon S3 for OpenSearch data. |
OpenSearch Domain
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore |
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of OpenSearch Domain entities in the Metrics Explorer, filter the |
AWS.ES.2xx
|
Count |
The number of successful responses from the OpenSearch Service. For example, a 200 status code indicates that the request was successfully processed. |
AWS.ES.3xx
|
Count |
The number of redirection responses. For example, a 301 status code means that the requested resource has been moved to a new URL. |
AWS.ES.4xx
|
Count |
The number of client error responses. For example, a 404 status code indicates that the requested resource was not found. |
AWS.ES.5xx
|
Count |
The number of server error responses. For example, a 500 status code means that the server encountered an unexpected condition that prevented it from fulfilling the request. |
AWS.ES.ADAnomalyDetectorsIndexStatus.red
|
Boolean | Indicates the status of the anomaly detectors index in Amazon OpenSearch Service. A red status means that at least one primary shard and its replicas are not allocated to a node. |
AWS.ES.ADAnomalyDetectorsIndexStatusIndexExists
|
Boolean | Checks if the anomaly detectors index exists in the OpenSearch Service cluster. |
AWS.ES.ADAnomalyResultsIndexStatus.red
|
Boolean | Indicates the status of the anomaly results index. A red status means that at least one primary shard and its replicas are not allocated to a node. |
AWS.ES.ADAnomalyResultsIndexStatusIndexExists
|
Boolean | Checks if the anomaly results index exists in the OpenSearch Service cluster. |
AWS.ES.ADExecuteFailureCount
|
Count | The number of failures when executing anomaly detection tasks. |
AWS.ES.ADExecuteRequestCount
|
Count |
The number of requests made to execute anomaly detection tasks. |
AWS.ES.ADHCExecuteFailureCount
|
Count | The number of failures when executing anomaly detection tasks in high-availability clusters. |
AWS.ES.ADHCExecuteRequestCount
|
Count | The number of requests executed by the asynchronous data processing component (ADHC). |
AWS.ES.ADModelsCheckpointIndexStatus.red
|
Boolean | The health status of the checkpoint index for anomaly detection models. A status of "red" means there is a problem with the index. |
AWS.ES.ADModelsCheckpointIndexStatusIndexExists
|
Boolean | Checks if the checkpoint index for anomaly detection models exists. |
AWS.ES.ADPluginUnhealthy
|
Boolean | Indicates the health status of the anomaly detection plugin. A status of "unhealthy" means the plugin is not functioning correctly. |
AWS.ES.AlertingDegraded
|
Boolean |
Indicates the health status of the alerting system. A status of "degraded" means the alerting system is not performing optimally. |
AWS.ES.AlertingIndexExists
|
Boolean |
Checks if the alerting index exists. |
AWS.ES.AlertingIndexStatus.green
|
Boolean |
Indicates the health status of the alerting index. A status of "green" means the index is healthy and functioning correctly. |
AWS.ES.AlertingIndexStatus.red
|
Boolean |
Indicates that the alerting index is in a red status, meaning that at least one primary shard and its replicas are not allocated to a node. |
AWS.ES.AlertingIndexStatus.yellow
|
Boolean |
Indicates that the alerting index is in a yellow status, meaning that at least one replica shard is not allocated to a node. |
AWS.ES.AlertingNodesNotOnSchedule
|
Count |
The number of alerting nodes that are not on schedule. |
AWS.ES.AlertingNodesOnSchedule
|
Count |
The number of alerting nodes that are on schedule. |
AWS.ES.AlertingScheduledJobEnabled
|
Boolean |
Indicates whether the scheduled job for alerting is enabled. |
AWS.ES.AsynchronousSearchCancelled
|
Count |
The number of asynchronous search requests that were canceled. |
AWS.ES.AsynchronousSearchCompletionRate
|
Percent (%) |
The rate at which asynchronous search requests are completed successfully. |
AWS.ES.AsynchronousSearchFailureRate
|
Percent (%) |
Indicates the rate at which asynchronous search requests fail. |
AWS.ES.AsynchronousSearchInitializedRate
|
Count |
The rate at which asynchronous search requests are initialized. |
AWS.ES.AsynchronousSearchPersistFailedRate
|
Percent (%) | The rate at which attempts to persist asynchronous search results fail. |
AWS.ES.AsynchronousSearchPersistRate
|
Percent (%) | The rate at which asynchronous search results are successfully persisted. |
AWS.ES.AsynchronousSearchRejected
|
Count | The number of asynchronous search requests that are rejected due to various reasons, such as exceeding resource limits. |
AWS.ES.AsynchronousSearchRunningCurrent
|
Count | The current number of asynchronous search requests that are running. |
AWS.ES.AsynchronousSearchSubmissionRate
|
Count | The rate at which asynchronous search requests are submitted. |
AWS.ES.AsyncQueryCancelApiFailedRequestCusErrCount
|
Count | The number of failed asynchronous query cancel API requests due to customer errors. |
AWS.ES.AsyncQueryCancelApiFailedRequestSysErrCount
|
Count | The number of failed asynchronous query cancel API requests due to system errors. |
AWS.ES.AsyncQueryCancelApiRequestCount
|
Count | The total number of asynchronous query cancel API requests. |
AWS.ES.AsyncQueryCreateApiFailedRequestCusErrCount
|
Count | The number of failed asynchronous query create API requests due to customer errors. |
AWS.ES.AsyncQueryCreateApiFailedRequestSysErrCount
|
Count | The number of failed asynchronous query create API requests due to system errors. |
AWS.ES.AsyncQueryCreateApiRequestCount
|
Count | The total number of asynchronous query create API requests. |
AWS.ES.AsyncQueryGetApiFailedRequestCusErrCount
|
Count | The number of failed asynchronous query get API requests due to customer errors. |
AWS.ES.AsyncQueryGetApiFailedRequestSysErrCount |
Count | The number of failed asynchronous query get API requests due to system errors. |
AWS.ES.AsyncQueryGetApiRequestCount |
Count | The total number of asynchronous query get API requests. |
AWS.ES.AutomatedSnapshotFailure
|
Count | The number of automated snapshot failures in the OpenSearch Service. It helps in identifying issues with the automated snapshot process, such as network problems, insufficient storage, or high CPU utilization. |
AWS.ES.AvgPointInTimeAliveTime
|
milliseconds (ms) | Average alive time for point-in-time search requests. |
AWS.ES.BurstBalance
|
Percent (%) | The burst balance of the instance. |
AWS.ES.ClusterIndexWritesBlocked
|
Count | Number of blocked index writes in the cluster. |
AWS.ES.ClusterStatus.green
|
Status Indicator | Indicates if the cluster status is green. |
AWS.ES.ClusterStatus.red
|
Status Indicator | Indicates if the cluster status is red. |
AWS.ES.ClusterStatus.yellow
|
Status Indicator | Indicates if the cluster status is yellow. |
AWS.ES.ClusterUsedSpace
|
Bytes | The total space used by the cluster. |
AWS.ES.ColdStorageSpaceUtilization
|
Percent (%) | The percentage of cold storage space utilized. |
AWS.ES.ColdToWarmMigrationFailureCount
|
Count | The number of cold to warm migration failures. |
AWS.ES.ColdToWarmMigrationLatency
|
milliseconds (ms) | The latency of cold to warm migrations. |
AWS.ES.ColdToWarmMigrationQueueSize
|
Count | The size of the cold to warm migration queue. |
AWS.ES.ColdToWarmMigrationSuccessCount
|
Count | The number of successful cold to warm migrations. |
AWS.ES.ConcurrentSearchLatency
|
milliseconds (ms) | The latency of concurrent search requests. |
AWS.ES.ConcurrentSearchRate
|
Count per second | The rate of concurrent search requests. |
AWS.ES.CoordinatingWriteRejected
|
Count | The number of coordinating write requests rejected. |
AWS.ES.CPUCreditBalance
|
Credits | The remaining CPU credit balance. |
AWS.ES.CPUUtilization
|
Percent (%) | The percentage of CPU utilization. |
AWS.ES.DeletedDocuments
|
Count | The total number of deleted documents. |
AWS.ES.DiskQueueDepth
|
Count | The depth of the disk queue. |
AWS.ES.ESReportingFailedRequestSysErrCount
|
Count | The number of failed OpenSearch reporting requests due to system errors. |
AWS.ES.ESReportingFailedRequestUserErrCount
|
Count | The number of failed OpenSearch reporting requests due to user errors. |
AWS.ES.ESReportingRequestCount
|
Count | The total number of OpenSearch reporting requests. |
AWS.ES.ESReportingSuccessCount
|
Count | The number of successful OpenSearch reporting requests. |
AWS.ES.FreeStorageSpace
|
bytes | The amount of free storage space available. |
AWS.ES.HasActivePointInTime
|
Boolean | Indicates if there is an active point-in-time search. |
AWS.ES.HasUsedPointInTime
|
Boolean | Indicates if a point-in-time search has been used. |
AWS.ES.HotStorageSpaceUtilization
|
Percent (%) | The percentage of hot storage space utilized. |
AWS.ES.HotToWarmMigrationFailureCount
|
Count | The number of hot to warm migration failures. |
AWS.ES.HotToWarmMigrationForceMergeLatency
|
milliseconds (ms) | The latency of force merge operations during hot to warm migrations. |
AWS.ES.HotToWarmMigrationProcessingLatency
|
milliseconds (ms) | The processing latency of hot to warm migrations. |
AWS.ES.HotToWarmMigrationQueueSize
|
Count | Number of migration tasks from hot to warm storage currently in the queue. |
AWS.ES.HotToWarmMigrationSnapshotLatency
|
milliseconds (ms) | Time taken to create a snapshot during hot to warm migration. |
AWS.ES.HotToWarmMigrationSuccessCount
|
Count | Total number of successful hot to warm migrations. |
AWS.ES.HotToWarmMigrationSuccessLatency
|
milliseconds (ms) | Time taken for a successful hot to warm migration. |
AWS.ES.IndexingLatency
|
milliseconds (ms) | Time taken to index documents. |
AWS.ES.IndexingRate
|
Count per second | Number of documents indexed per second. |
AWS.ES.InFlightFetches
|
Count | Number of fetch operations currently in progress. |
AWS.ES.InvalidHostHeaderRequests
|
Count | Number of requests with an invalid host header. |
AWS.ES.IopsThrottle
|
Count | Number of IO operations throttled due to exceeding provisioned IOPS limits. |
AWS.ES.JVMGCOldCollectionCount
|
Count | Number of old generation garbage collection events in the JVM. |
AWS.ES.JVMGCOldCollectionTime
|
milliseconds (ms) | Time spent in old generation garbage collection in the JVM. |
AWS.ES.JVMGCYoungCollectionCount
|
Count | Number of young generation garbage collection events in the JVM. |
AWS.ES.JVMGCYoungCollectionTime
|
milliseconds (ms) | Time spent in young generation garbage collection in the JVM. |
AWS.ES.JVMMemoryPressure
|
Percent (%) | JVM memory pressure expressed as a percentage of the total available memory. |
AWS.ES.KMSKeyError
|
Count | Number of errors encountered while accessing KMS keys. |
AWS.ES.KMSKeyInaccessible
|
Count | Number of times KMS keys were found to be inaccessible. |
AWS.ES.KNNCacheCapacityReached
|
Count | Number of times the KNN cache capacity was reached. |
AWS.ES.KNNCircuitBreakerTriggered
|
Count | Number of times the KNN circuit breaker was triggered. |
AWS.ES.KNNEvictionCount
|
Count | Number of evictions from the KNN cache. |
AWS.ES.KNNFaissInitialized
|
Count | Number of times FAISS index was initialized for KNN. |
AWS.ES.KNNGraphIndexErrors
|
Count | Number of errors encountered while building KNN graph indexes. |
AWS.ES.KNNGraphIndexRequests
|
Count | Number of requests for building KNN graph indexes. |
AWS.ES.KNNGraphMemoryUsage
|
bytes | Memory usage of the KNN graph indexes. |
AWS.ES.KNNGraphMemoryUsagePercentage
|
Percent (%) | Memory usage of the KNN graph indexes expressed as a percentage of the total available memory. |
AWS.ES.KNNGraphQueryErrors
|
Count | Number of errors encountered during KNN graph queries. |
AWS.ES.KNNGraphQueryRequests
|
Count | Number of KNN graph queries made. |
AWS.ES.KNNHitCount
|
Count | The number of successful k-NN (k-Nearest Neighbors) searches. |
AWS.ES.KNNLoadExceptionCount
|
Count | The number of exceptions encountered while loading k-NN models. |
AWS.ES.KNNLoadSuccessCount
|
Count | The number of successful k-NN model loads. |
AWS.ES.KNNLuceneInitialized
|
Boolean | Indicates whether the Lucene engine for k-NN is initialized. |
AWS.ES.KNNMissCount
|
Count | The number of k-NN searches that did not find a match. |
AWS.ES.KNNNmslibInitialized
|
Boolean | Indicates whether the NMSLIB engine for k-NN is initialized. |
AWS.ES.KNNQueryRequests
|
Count | The number of k-NN (k-Nearest Neighbors) query requests. |
AWS.ES.KNNScriptCompilationErrors
|
Count | The number of errors encountered during the compilation of k-NN scripts. |
AWS.ES.KNNScriptCompilations
|
Count | The number of k-NN script compilations. |
AWS.ES.KNNScriptQueryErrors
|
Count | The number of errors encountered during k-NN script queries. |
AWS.ES.KNNScriptQueryRequests
|
Count | The number of k-NN script query requests. |
AWS.ES.KNNTotalLoadTime
|
milliseconds (ms) | The total time taken to load k-NN models. |
AWS.ES.KNNTrainingErrors
|
Count | The number of errors encountered during k-NN model training. |
AWS.ES.KNNTrainingMemoryUsage
|
bytes | The memory usage during k-NN (k-Nearest Neighbors) model training. |
AWS.ES.KNNTrainingMemoryUsagePercentage
|
Percent (%) | The percentage of memory used during k-NN model training. |
AWS.ES.KNNTrainingRequests |
Count | The number of k-NN model training requests. |
AWS.ES.LTRFeatureMemoryUsageInBytes
|
bytes | The memory usage of Learning to Rank (LTR) features. |
AWS.ES.LTRFeaturesetMemoryUsageInBytes
|
bytes | The memory usage of Learning to Rank (LTR) feature sets. |
AWS.ES.LTRModelMemoryUsageInBytes
|
bytes | The memory usage of Learning to Rank (LTR) models. |
AWS.ES.LTRPluginUnhealthy
|
Boolean | Indicates whether the Learning to Rank (LTR) plugin is unhealthy. |
AWS.ES.LTRRequestErrorCount
|
Count | The number of errors encountered during Learning to Rank (LTR) requests. |
AWS.ES.LTRRequestTotalCount
|
Count | The total number of Learning to Rank (LTR) requests. |
AWS.ES.LTRStatus.red
|
Boolean | The health status of the Learning to Rank (LTR) plugin. A red status means that at least one primary shard and its replicas are not allocated to a node. |
AWS.ES.MasterCPUCreditBalance
|
Count | The number of CPU credits that a burstable instance has accrued. |
AWS.ES.MasterCPUUtilization
|
Percent (%) | The percentage of allocated EC2 compute units that are in use on the instance. |
AWS.ES.MasterJVMMemoryPressure
|
Percent (%) | The percentage of the Java heap in a cluster node. |
AWS.ES.MasterOldGenJVMMemoryPressure
|
Percent (%) | The memory pressure in the old generation memory pool of the Java heap. |
AWS.ES.MasterReachableFromNode
|
Boolean | Indicates whether the master node is reachable from other nodes in the cluster. |
AWS.ES.MasterSysMemoryUtilization
|
Percent (%) | The percentage of system memory utilization on the master node. |
AWS.ES.MaxProvisionedThroughput
|
Megabytes per second | The maximum provisioned throughput for the cluster. |
AWS.ES.MlCircuitBreakerTriggerCount
|
Count | The number of times the machine learning circuit breaker has been triggered. |
AWS.ES.MLCommonsPluginUnhealthy
|
Boolean | Indicates whether the ML Commons plugin is unhealthy. |
AWS.ES.MlConnectorCount
|
Count | The number of machine learning connectors. |
AWS.ES.MlConnectorIndexStatus.red |
Boolean | The health status of the machine learning connector index. A red status means that at least one primary shard and its replicas are not allocated to a node. |
AWS.ES.MlConnectorIndexStatusIndexExists
|
Boolean | Indicates whether the machine learning connector index exists. |
AWS.ES.MlDeployedModelCount
|
Count | The number of deployed machine learning models. |
AWS.ES.MlExecutingTaskCount
|
Count | The number of executing machine learning tasks. |
AWS.ES.MlFailureCount
|
Count | The number of failures encountered during machine learning tasks. |
AWS.ES.MlModelCount
|
Count | The total number of machine learning models. |
AWS.ES.MlModelIndexStatus.red
|
Boolean | The health status of the machine learning model index. A red status means that at least one primary shard and its replicas are not allocated to a node. |
AWS.ES.MlModelIndexStatusIndexExists
|
Boolean | Indicates whether the machine learning model index exists. |
AWS.ES.MlRequestCount
|
Count | The number of machine learning requests. |
AWS.ES.MlTaskIndexStatus.red
|
Boolean | The health status of the machine learning task index. A red status means that at least one primary shard and its replicas are not allocated to a node. |
AWS.ES.MlTaskIndexStatusIndexExists
|
Boolean | Indicates whether the machine learning task index exists. |
AWS.ES.Nodes
|
Count | The total number of nodes in the OpenSearch Service cluster. |
AWS.ES.OldGenJVMMemoryPressure
|
Percent (%) | The memory pressure in the old generation memory pool of the Java heap. |
AWS.ES.OpenSearchDashboardsConcurrentConnections
|
Count | The number of concurrent connections to OpenSearch Dashboards. |
AWS.ES.OpenSearchDashboardsHealthyNodes |
Count | The number of healthy nodes in the OpenSearch Dashboards. |
AWS.ES.OpenSearchDashboardsHeapTotal
|
bytes | The total heap memory allocated for OpenSearch Dashboards. |
AWS.ES.OpenSearchDashboardsHeapUsed
|
bytes | The amount of heap memory currently used by OpenSearch Dashboards. |
AWS.ES.OpenSearchDashboardsHeapUtilization
|
Percent (%) | The percentage of heap memory used out of the total allocated heap memory for OpenSearch Dashboards. |
AWS.ES.OpenSearchDashboardsIndexMigrationFailed
|
Count | The number of failed index migrations in OpenSearch Dashboards. |
AWS.ES.OpenSearchDashboardsOS1MinuteLoad
|
Count | The 1-minute load average on the operating system running OpenSearch Dashboards. |
AWS.ES.OpensearchDashboardsReportingFailedRequestSysErrCount
|
Count | The number of failed reporting requests due to system errors in OpenSearch Dashboards. |
AWS.ES.OpensearchDashboardsReportingFailedRequestUserErrCount
|
Count | The number of failed reporting requests due to user errors in OpenSearch Dashboards. |
AWS.ES.OpensearchDashboardsReportingRequestCount
|
Count | The total number of reporting requests in OpenSearch Dashboards. |
AWS.ES.OpensearchDashboardsReportingSuccessCount
|
Count | The number of successful reporting requests in OpenSearch Dashboards. |
AWS.ES.OpenSearchDashboardsRequestTotal
|
Count | The total number of requests made to OpenSearch Dashboards. |
AWS.ES.OpenSearchDashboardsResponseTimesMaxInMillis
|
milliseconds (ms) | The maximum response time for requests to OpenSearch Dashboards. |
AWS.ES.OpenSearchRequests
|
Count | The total number of requests made to the OpenSearch cluster. |
AWS.ES.PPLFailedRequestCountByCusErr
|
Count | The number of failed Piped Processing Language (PPL) requests due to customer errors in OpenSearch. |
AWS.ES.PPLFailedRequestCountBySysErr
|
Count | The number of failed Piped Processing Language (PPL) requests due to system errors in OpenSearch. |
AWS.ES.PPLRequestCount
|
Count | The total number of Piped Processing Language (PPL) requests in OpenSearch. |
AWS.ES.PrimaryWriteRejected
|
Count | The number of primary write requests that were rejected in OpenSearch due to resource constraints. |
AWS.ES.ReadIOPS
|
Count per second | The average number of read input/output operations per second (IOPS) in OpenSearch. |
AWS.ES.ReadIOPSMicroBursting
|
Count per second | The number of read IOPS micro-bursting events in OpenSearch, indicating short periods of high read IOPS activity. |
AWS.ES.ReadLatency
|
milliseconds (ms) | The average time taken to complete read operations in OpenSearch. |
AWS.ES.ReadThroughput
|
bps | The average number of bytes read from disk per second in OpenSearch. |
AWS.ES.ReadThroughputMicroBursting
|
Count | The number of read throughput micro-bursting events in OpenSearch, indicating short periods of high read throughput activity. |
AWS.ES.RemoteStorageUsedSpace
|
bytes | The amount of space used in remote storage by OpenSearch. |
AWS.ES.RemoteStorageWriteRejected
|
Count | The number of write requests to remote storage that were rejected in OpenSearch due to resource constraints. |
AWS.ES.ReplicationNumBootstrappingIndices
|
Count | The number of indices in the bootstrapping phase during replication in OpenSearch. |
AWS.ES.ReplicationNumFailedIndices
|
Count | The number of indices that have failed during replication in OpenSearch. |
AWS.ES.ReplicationNumPausedIndices
|
Count | The number of indices that have paused during replication in OpenSearch. |
AWS.ES.ReplicationNumSyncingIndices
|
Count | The number of indices currently syncing during replication in OpenSearch. |
AWS.ES.ReplicaWriteRejected |
Count | The number of replica write requests that were rejected in OpenSearch due to resource constraints. |
AWS.ES.SearchableDocuments
|
Count | The total number of documents that are searchable in the OpenSearch cluster. |
AWS.ES.SearchLatency
|
milliseconds (ms) | The average time taken to complete search operations in OpenSearch. |
AWS.ES.SearchRate
|
Count per second | The number of search requests per second in OpenSearch. |
AWS.ES.SearchShardTaskCancelled
|
Count | The number of search shard tasks that were cancelled in OpenSearch. |
AWS.ES.SearchTaskCancelled
|
Count | The number of search tasks that were cancelled in OpenSearch. |
AWS.ES.SegmentCount
|
Count | The total number of segments in the OpenSearch index. |
AWS.ES.Shards.active
|
Count | The total number of active primary and replica shards in the OpenSearch cluster. |
AWS.ES.Shards.activePrimary
|
Count | The total number of active primary shards in the OpenSearch cluster. |
AWS.ES.Shards.delayedUnassigned
|
Count | The number of shards whose node allocation has been delayed by the timeout settings in OpenSearch. |
AWS.ES.Shards.initializing
|
Count | The number of shards that are currently in the initializing state in OpenSearch. |
AWS.ES.Shards.relocating
|
Count | The number of shards that are currently being relocated to different nodes in OpenSearch. |
AWS.ES.Shards.unassigned
|
Count | The number of shards that are not allocated to any nodes in the OpenSearch cluster. |
AWS.ES.SQLDefaultCursorRequestCount
|
Count | The total number of SQL default cursor requests in OpenSearch. |
AWS.ES.SQLFailedRequestCountByCusErr
|
Count | The number of failed SQL requests due to customer errors in OpenSearch. |
AWS.ES.SQLFailedRequestCountBySysErr
|
Count | The number of failed SQL requests due to system errors in OpenSearch. |
AWS.ES.SQLRequestCount
|
Count | The total number of SQL requests in OpenSearch. |
AWS.ES.SQLUnhealthy
|
Count | The number of unhealthy SQL instances in OpenSearch. |
AWS.ES.SysMemoryUtilization
|
Percent (%) | The percentage of system memory utilized by OpenSearch. |
AWS.ES.ThreadCount
|
Count | The total number of threads in use by OpenSearch. |
AWS.ES.ThreadpoolBulkQueue
|
Count | The number of bulk requests waiting in the queue in OpenSearch. |
AWS.ES.ThreadpoolBulkRejected
|
Count | The number of bulk requests that were rejected due to the thread pool being full in OpenSearch. |
AWS.ES.ThreadpoolBulkThreads
|
Count | The number of threads in the bulk thread pool in OpenSearch. |
AWS.ES.ThreadpoolForce_mergeQueue
|
Count | The number of force merge requests waiting in the queue in OpenSearch. |
AWS.ES.ThreadpoolForce_mergeRejected
|
Count | The number of force merge requests that were rejected due to the thread pool being full in OpenSearch. |
AWS.ES.ThreadpoolForce_mergeThreads
|
Count | The number of threads in the force merge thread pool in OpenSearch. |
AWS.ES.ThreadpoolIndexQueue
|
Count | The number of index requests waiting in the queue in OpenSearch. |
AWS.ES.ThreadpoolIndexRejected
|
Count | The number of index requests that were rejected due to the thread pool being full in OpenSearch. |
AWS.ES.ThreadpoolIndexSearcherQueue
|
Count | The number of index searcher requests waiting in the queue in OpenSearch. |
AWS.ES.ThreadpoolIndexSearcherRejected
|
Count | The number of index searcher requests that were rejected due to the thread pool being full in OpenSearch. |
AWS.ES.ThreadpoolIndexSearcherThreads
|
Count | The number of threads in the index searcher thread pool in OpenSearch. |
AWS.ES.ThreadpoolIndexThreads
|
Count | The number of threads in the index thread pool in OpenSearch. |
AWS.ES.ThreadpoolSearchQueue
|
Count | The number of search requests waiting in the queue in OpenSearch. |
AWS.ES.ThreadpoolSearchRejected
|
Count | The number of search requests that were rejected due to the thread pool being full in OpenSearch. |
AWS.ES.ThreadpoolSearchThreads
|
Count | The number of threads in the search thread pool in OpenSearch. |
AWS.ES.ThreadpoolsqlWorkerQueue
|
Count | The number of SQL worker requests waiting in the queue in OpenSearch. |
AWS.ES.ThreadpoolsqlWorkerRejected
|
Count | The number of SQL worker requests that were rejected due to the thread pool being full in OpenSearch. |
AWS.ES.ThreadpoolsqlWorkerThreads
|
Count | The number of threads in the SQL worker thread pool in OpenSearch. |
AWS.ES.ThreadpoolWriteQueue
|
Count | The number of write requests waiting in the queue in OpenSearch. |
AWS.ES.ThreadpoolWriteRejected
|
Count | The number of write requests that were rejected due to the thread pool being full in OpenSearch. |
AWS.ES.ThreadpoolWriteThreads
|
Count | The number of threads in the write thread pool in OpenSearch. |
AWS.ES.ThroughputThrottle
|
Count | The number of times throughput was throttled in OpenSearch. |
AWS.ES.TLSNegotiationError
|
Count | The number of TLS negotiation errors encountered in OpenSearch. |
AWS.ES.WarmCPUUtilization
|
Percent (%) | The percentage of CPU utilization for warm nodes in OpenSearch. |
AWS.ES.WarmFreeStorageSpace
|
bytes | The amount of free storage space available in warm nodes in OpenSearch. |
AWS.ES.WarmJVMGCOldCollectionCount
|
Count | The number of old generation garbage collection events in the JVM for warm nodes in OpenSearch. |
AWS.ES.WarmJVMGCYoungCollectionCount
|
Count | The number of young generation garbage collection events in the JVM for warm nodes in OpenSearch. |
AWS.ES.WarmJVMGCYoungCollectionTime
|
milliseconds (ms) | The total time spent on young generation garbage collection in the JVM for warm nodes in OpenSearch. |
AWS.ES.WarmJVMMemoryPressure
|
Percent (%) | The percentage of JVM memory pressure for warm nodes in OpenSearch, indicating the overall heap usage including young and old pools. |
AWS.ES.WarmNodes
|
Count | The number of warm nodes in the OpenSearch cluster. |
AWS.ES.WarmOldGenJVMMemoryPressure
|
Percent (%) | The percentage of old generation JVM memory pressure for warm nodes in OpenSearch, indicating the usage of the old generation memory pool. |
AWS.ES.WarmSearchLatency
|
milliseconds (ms) | The average time taken to complete search operations in warm nodes of OpenSearch. |
AWS.ES.WarmSearchRate
|
Count per second | The number of search requests per second in warm nodes of OpenSearch. |
AWS.ES.WarmSearchableDocuments
|
Count | The total number of documents that are searchable in warm nodes of OpenSearch. |
AWS.ES.WarmStorageSpaceUtilization
|
Percent (%) | The percentage of storage space utilized in warm nodes of OpenSearch. |
AWS.ES.WarmSysMemoryUtilization
|
Percent (%) | The percentage of system memory utilized by warm nodes in OpenSearch. |
AWS.ES.WarmThreadpoolSearchQueue
|
Count | The number of search requests waiting in the queue in warm nodes of OpenSearch. |
AWS.ES.WarmThreadpoolSearchRejected
|
Count | The number of search requests that were rejected due to the thread pool being full in warm nodes of OpenSearch. |
AWS.ES.WarmThreadpoolSearchThreads
|
Count | The number of threads in the search thread pool in warm nodes of OpenSearch. |
AWS.ES.WarmToColdMigrationFailureCount
|
Count | The number of failed migrations from warm to cold nodes in OpenSearch. |
AWS.ES.WarmToColdMigrationLatency
|
milliseconds (ms) | The average time taken to migrate data from warm to cold nodes in OpenSearch. |
AWS.ES.WarmToColdMigrationQueueSize
|
Count | The number of migration tasks from warm to cold nodes that are waiting in the queue in OpenSearch. |
AWS.ES.WarmToColdMigrationSuccessCount
|
Count | The number of successful migrations from warm to cold nodes in OpenSearch. |
AWS.ES.WarmToHotMigrationQueueSize
|
Count | The number of migration tasks from warm to hot nodes that are waiting in the queue in OpenSearch. |
AWS.ES.WriteIOPS |
Count per second | The average number of write input/output operations per second (IOPS) in OpenSearch. |
AWS.ES.WriteIOPSMicroBursting
|
Count per second | The number of write IOPS micro-bursting events in OpenSearch, indicating short periods of high write IOPS activity |
AWS.ES.WriteLatency
|
milliseconds (ms) | The average time taken to complete write operations in OpenSearch. |
AWS.ES.WriteThroughput
|
bps | The average number of bytes written to disk per second in OpenSearch. |
AWS.ES.WriteThroughputMicroBursting
|
bps | The number of write throughput micro-bursting events in OpenSearch, indicating short periods of high write throughput activity. |
OpenSearch Ingestion Pipeline
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore |
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of OpenSearch Ingestion Pipeline entities in the Metrics Explorer, filter the |
AWS.OSIS.computeUnits
|
Count |
The number of Ingestion OpenSearch Compute Units (Ingestion OCUs) in use by a pipeline. |
AWS.OSIS.jvm.memory.committed.value
|
bytes |
The amount of memory that is committed for use by the Java virtual machine (JVM). |
AWS.OSIS.jvm.memory.max.value
|
bytes |
The maximum amount of memory that can be used for memory management. |
AWS.OSIS.jvm.memory.used.value
|
bytes |
The total amount of memory used. |
AWS.OSIS.log-pipeline.BlockingBuffer.bufferUsage.value
|
Percent (%) |
Percent usage of the |
AWS.OSIS.log-pipeline.BlockingBuffer.checkpointTimeElapsed.count |
Count |
A count of data points recorded while checkpointing. |
AWS.OSIS.log-pipeline.BlockingBuffer.checkpointTimeElapsed.max
|
milliseconds (ms) | The maximum time elapsed while checkpointing. |
AWS.OSIS.log-pipeline.BlockingBuffer.checkpointTimeElapsed.sum
|
milliseconds (ms) |
The total time elapsed while checkpointing. |
AWS.OSIS.log-pipeline.BlockingBuffer.readTimeElapsed.count
|
Count |
A count of data points recorded while reading from a buffer. |
AWS.OSIS.log-pipeline.BlockingBuffer.readTimeElapsed.max
|
milliseconds (ms) | The maximum time elapsed while reading from a buffer |
AWS.OSIS.log-pipeline.BlockingBuffer.readTimeElapsed.sum
|
milliseconds (ms) |
The total time elapsed while reading from a buffer. |
AWS.OSIS.log-pipeline.BlockingBuffer.recordsInBuffer.value
|
Count |
The number of records currently in a buffer. |
AWS.OSIS.log-pipeline.BlockingBuffer.recordsInFlight.value
|
Count |
The number of unchecked records read from a buffer. |
AWS.OSIS.log-pipeline.BlockingBuffer.recordsRead.count
|
Count |
The number of records read from a buffer. |
AWS.OSIS.log-pipeline.BlockingBuffer.recordsWriteFailed.count
|
Count |
The number of records that the pipeline failed to write to the sink. |
AWS.OSIS.log-pipeline.BlockingBuffer.recordsWritten.count
|
Count |
The number of records written to a buffer. |
AWS.OSIS.log-pipeline.BlockingBuffer.writeTimeElapsed.count
|
Count |
A count of data points recorded while writing to a buffer. |
AWS.OSIS.log-pipeline.BlockingBuffer.writeTimeElapsed.max
|
milliseconds (ms) | The maximum amount of time that the write operation has elapsed. |
AWS.OSIS.log-pipeline.BlockingBuffer.writeTimeElapsed.sum
|
milliseconds (ms) |
The total amount of time that the write operation has elapsed. |
AWS.OSIS.log-pipeline.BlockingBuffer.writeTimeouts.count
|
Count |
The count of write timeouts to a buffer. |
AWS.OSIS.log-pipeline.date.recordsIn.count
|
Count | The ingress of records to a pipeline component. |
AWS.OSIS.log-pipeline.date.recordsOut.count
|
Count | The egress of records from a pipeline component. |
AWS.OSIS.log-pipeline.date.timeElapsed.count
|
Count | A count of data points recorded during execution of a pipeline component. |
AWS.OSIS.log-pipeline.date.timeElapsed.max
|
milliseconds (ms) | The maximum time elapsed during execution of a pipeline component. |
AWS.OSIS.log-pipeline.date.timeElapsed.sum
|
milliseconds (ms) | The total time elapsed during execution of a pipeline component. |
AWS.OSIS.log-pipeline.http.AuthFailure.count
|
Count | The number of failed Signature V4 requests to the pipeline. |
AWS.OSIS.log-pipeline.http.AuthServerError.count
|
Count | The number of Signature V4 requests to the pipeline that returned server errors. |
AWS.OSIS.log-pipeline.http.AuthSuccess.count
|
Count | The number of successful Signature V4 requests to the pipeline. |
AWS.OSIS.log-pipeline.opensearch.recordsIn.count |
Count | The ingress of records to a pipeline component. |
AWS.OSIS.log-pipeline.recordsProcessed.count
|
Count | The number of records read from a buffer and processed by a pipeline. |
AWS.OSIS.system.cpu.count.value
|
Count | The total amount of CPU usage for all data nodes. |
AWS.OSIS.system.cpu.usage.value
|
Percent (%) | The percentage of available CPU usage for all data nodes. |
RDS
Metric | Units | Description |
---|---|---|
AWS.RDS.AbortedClients
|
Count |
The number of connections that were aborted because the client died and didn't correctly close the connection. |
AWS.RDS.ActiveTransactions
|
Count |
The number of active transactions in the database. |
AWS.RDS.ACUUtilization
|
Percent (%) | The percentage of Aurora Capacity Units (ACUs) utilized by the Aurora Serverless v2 cluster. |
AWS.RDS.AuroraBinlogReplicaLag
|
seconds (s) | The amount of lag in seconds for binary log replication between the primary instance and the replica. |
AWS.RDS.AuroraEstimatedSharedMemoryBytes
|
bytes | The estimated amount of shared memory used by the Aurora MySQL database. |
AWS.RDS.AuroraGlobalDBDataTransferBytes
|
bytes | The amount of redo log data transferred from the source AWS Region to a secondary AWS Region in an Aurora Global Database. |
AWS.RDS.AuroraGlobalDBProgressLag
|
milliseconds (ms) | The measure of how far the secondary cluster is behind the primary cluster for both user transactions and system transactions in an Aurora Global Database. |
AWS.RDS.AuroraGlobalDBReplicatedWriteIO
|
Count | The number of write I/O operations replicated from the primary AWS Region to the cluster volume in a secondary AWS Region in an Aurora Global Database. |
AWS.RDS.AuroraGlobalDBRPOLag
|
milliseconds (ms) | The recovery point objective (RPO) lag time, measuring how far the secondary cluster is behind the primary cluster for user transactions in an Aurora Global Database. |
AWS.RDS.AuroraOptimizedReadsCacheHitRatio
|
Percent (%) | The percentage of read operations that are served from the cache in an Aurora database. |
AWS.RDS.AuroraReplicaLag
|
milliseconds (ms) | The amount of lag in milliseconds for replication between the primary instance and the Aurora replica. |
AWS.RDS.AuroraReplicaLagMaximum
|
milliseconds (ms) | The maximum amount of lag in milliseconds for replication between the primary instance and the Aurora replica. |
AWS.RDS.AuroraReplicaLagMinimum
|
milliseconds (ms) | The minimum amount of lag in milliseconds for replication between the primary instance and the Aurora replica. |
AWS.RDS.AuroraSlowConnectionHandleCount
|
Count | The number of slow connection handles in Aurora. |
AWS.RDS.AuroraSlowHandshakeCount
|
Count | The number of slow handshakes in Aurora. |
AWS.RDS.AuroraVolumeBytesLeftTotal
|
bytes | The remaining available space for the cluster volume in Aurora. |
AWS.RDS.BacktrackChangeRecordsCreationRate
|
Count per minute |
The number of backtrack change records created over a specified period for your Aurora DB cluster. |
AWS.RDS.BacktrackChangeRecordsStored
|
Count |
The actual number of backtrack change records stored by your Aurora DB cluster. |
AWS.RDS.BacktrackWindowActual
|
minutes |
The actual amount of time you can backtrack your Aurora DB cluster, which can be smaller than the target backtrack window. |
AWS.RDS.BacktrackWindowAlert
|
Count |
The number of times the actual backtrack window is smaller than the target backtrack window for a given period. |
AWS.RDS.BackupRetentionPeriodStorageUsed
|
bytes |
The amount of storage used by automated backups that are retained for the backup retention period. |
AWS.RDS.BinLogDiskUsage
|
bytes |
BinLogDiskUsage. The average amount of disk space occupied by binary logs. |
AWS.RDS.BlockedTransactions
|
Count |
The number of transactions that are blocked due to row-level locks in the database. |
AWS.RDS.BufferCacheHitRatio
|
Percent (%) |
The percentage of requests that are served from the buffer cache, indicating the efficiency of the cache. |
AWS.RDS.BurstBalance
|
Percent (%) |
BurstBalance. The average percent of General Purpose SSD (gp2) burst-bucket I/O credits available. |
AWS.RDS.CheckpointLag
|
seconds (s) | The amount of time since the most recent checkpoint. |
AWS.RDS.CommitLatency
|
microseconds |
The cumulative commit latency, measured as the time between when a client submits a commit request and when it receives the commit acknowledgment. |
AWS.RDS.CommitThroughput
|
Count per second |
The number of commit operations per second in the database. |
AWS.RDS.ConnectionAttempts
|
Count |
The number of attempts to connect to an instance, whether successful or not. |
AWS.RDS.CPUCreditBalance
|
Count |
CpuCreditBalance. The average number of earned CPU credits that an instance has accrued since it was launched or started. |
AWS.RDS.CPUCreditUsage
|
Count |
CpuCreditUsage. The average number of CPU credits spent by the instance for CPU utilization. |
AWS.RDS.CPUSurplusCreditBalance
|
Count |
The number of surplus CPU credits spent to sustain CPU utilization when the CPUCreditBalance value is zero. |
AWS.RDS.CPUSurplusCreditsCharged
|
Count |
The number of surplus CPU credits exceeding the maximum number of CPU credits that can be earned in a 24-hour period, attracting an additional charge. |
AWS.RDS.CPUUtilization
|
Percent (%) |
CpuUtilization. The average percentage of CPU utilization. |
AWS.RDS.DatabaseConnections
|
Count |
DatabaseConnections. The total number of client network connections to the database instance. |
AWS.RDS.DBLoad
|
Average Active Sessions (AAS) |
Measures the level of session activity in your database, representing the activity of the DB instance in average active sessions. |
AWS.RDS.DBLoadCPU
|
Average Active Sessions (AAS) |
The number of active sessions where the wait event type is CPU. |
AWS.RDS.DBLoadNonCPU
|
Average Active Sessions (AAS) |
The number of active sessions where the wait event type is not CPU. |
AWS.RDS.DDLLatency
|
milliseconds (ms) |
The average time taken to complete Data Definition Language (DDL) operations in the database. |
AWS.RDS.DDLThroughput
|
Count per second |
The number of Data Definition Language (DDL) operations per second in the database. |
AWS.RDS.Deadlocks
|
Count |
The number of deadlock events detected in the database. A deadlock occurs when two or more processes are waiting on the same resource and each process is waiting on the other process to complete before moving forward. |
AWS.RDS.DeleteLatency
|
milliseconds (ms) |
The average time taken to complete delete operations in the database. |
AWS.RDS.DeleteThroughput
|
Count per second |
The number of delete operations per second in the database. |
AWS.RDS.DiskQueueDepth
|
Count |
DiskQueueDepth. The average number of outstanding I/Os (read/write requests) waiting to access the disk. |
AWS.RDS.DiskQueueDepthLogVolume
|
Count | The number of input and output (I/O) requests that were submitted by the application but haven't been sent to the storage device yet. |
AWS.RDS.DMLLatency
|
milliseconds (ms) |
The average time taken to complete Data Manipulation Language (DML) operations in the database. |
AWS.RDS.DMLThroughput
|
Count per second |
The number of Data Manipulation Language (DML) operations per second in the database. |
AWS.RDS.EBSByteBalance
|
Percent (%) |
The percentage of throughput credits remaining in the burst bucket of your RDS database. |
AWS.RDS.EBSIOBalance
|
Percent (%) |
The percentage of I/O credits remaining in the burst bucket of your RDS database. |
AWS.RDS.EngineUptime
|
seconds (s) |
The number of seconds since the last time a DB instance was started. |
AWS.RDS.FailedSQLServerAgentJobsCount
|
Count |
The number of SQL Server Agent jobs that have failed. |
AWS.RDS.ForwardingMasterDMLLatency
|
milliseconds (ms) |
The average response time of forwarded DML statements on the master DB instance. |
AWS.RDS.ForwardingMasterDMLThroughput
|
Count per second |
The number of forwarded DML statements processed each second by the master DB instance. |
AWS.RDS.ForwardingMasterOpenSessions
|
Count |
The number of open sessions on the master DB instance processing forwarded queries. |
AWS.RDS.ForwardingReplicaDMLLatency
|
milliseconds (ms) |
The average response time in milliseconds of forwarded DML statements on the replica DB instance. |
AWS.RDS.ForwardingReplicaDMLThroughput
|
Count per second |
The number of forwarded DML (Data Manipulation Language) statements processed each second by the replica DB instance. |
AWS.RDS.ForwardingReplicaOpenSessions
|
Count |
The number of open sessions on the replica DB instance that are processing forwarded queries. |
AWS.RDS.ForwardingReplicaReadWaitLatency
|
milliseconds (ms) |
The average wait time in milliseconds that the replica waits to be consistent with the Log Sequence Number (LSN) of the writer DB instance. |
AWS.RDS.ForwardingReplicaReadWaitThroughput
|
Count per second |
The total number of SELECT statements processed each second in all sessions that are forwarding writes. |
AWS.RDS.ForwardingReplicaSelectLatency
|
milliseconds (ms) |
The average response time in milliseconds of forwarded SELECT statements on the replica. |
AWS.RDS.ForwardingReplicaSelectThroughput
|
Count per second |
The number of forwarded SELECT statements processed each second by the replica DB instance. |
AWS.RDS.ForwardingWriterDMLLatency
|
milliseconds (ms) |
The average time to process each forwarded DML statement on the writer DB instance. It doesn't include the time for the DB cluster to forward the write request or the time to replicate changes back to the writer. |
AWS.RDS.ForwardingWriterDMLThroughput
|
Count per second |
The number of forwarded DML statements processed each second by the writer DB instance. |
AWS.RDS.ForwardingWriterOpenSessions
|
Count |
The number of forwarded sessions on the writer DB instance. |
AWS.RDS.FreeEphemeralStorage
|
bytes | The amount of free ephemeral storage available in the RDS instance. |
AWS.RDS.FreeableMemory
|
bytes |
FreeableMemory. The average amount of available random access memory. |
AWS.RDS.FreeLocalStorage
|
bytes |
The amount of free local storage available in the RDS instance. |
AWS.RDS.FreeStorageSpace
|
bytes |
FreeStorageSpace. The average amount of available storage space. |
AWS.RDS.FreeStorageSpaceLogVolume
|
bytes | The amount of free storage space available in the log volume of the RDS instance. |
AWS.RDS.InsertLatency
|
microseconds |
The cumulative commit latency, measured as the time between when a client submits a commit request and when it receives the commit acknowledgment. |
AWS.RDS.InsertThroughput
|
Count per second |
The number of insert operations per second in the database. |
AWS.RDS.LoginFailures
|
Count |
The number of failed login attempts to the database. |
AWS.RDS.LVMReadIOPS
|
Count |
The average number of read input/output operations per second (IOPS) for the logical volume manager (LVM) in the RDS instance. |
AWS.RDS.LVMWriteIOPS
|
Count |
The average number of write input/output operations per second (IOPS) for the logical volume manager (LVM) in the RDS instance. |
AWS.RDS.MaximumUsedTransactionIDs
|
Count |
MaximumUsedTransactionIDs. The maximum transaction IDs that have been used. This metric applies to PostgreSQL. |
AWS.RDS.NetworkReceiveThroughput
|
bps |
NetworkReceiveThroughput. The average incoming (receive) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication. |
AWS.RDS.NetworkThroughput
|
bps |
The average number of bytes transmitted and received over the network per second in the RDS instance. |
AWS.RDS.NetworkTransmitThroughput
|
bps |
NetworkTransmitThroughput. The average outgoing (transmit) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication. |
AWS.RDS.NumBinaryLogFiles
|
Count |
The number of binary log files on the RDS instance.. |
AWS.RDS.OldestReplicationSlotLag
|
bytes |
OldestReplicationSlotLag. The average lagging size of the replica lagging the most in terms of write-ahead log (WAL) data received. This metric applies to PostgreSQL. |
AWS.RDS.Queries
|
Count |
The number of queries executed on the RDS instance. |
AWS.RDS.RDSToAuroraPostgreSQLReplicaLag
|
milliseconds (ms) |
The amount of lag in milliseconds for replication from an RDS for PostgreSQL DB instance to an Aurora PostgreSQL DB cluster. |
AWS.RDS.ReadIOPS
|
Count per second |
ReadIOPS. The average number of disk read I/O operations per second. |
AWS.RDS.ReadIOPSEphemeralStorage
|
Count per second | The average number of read input/output operations per second (IOPS) for ephemeral storage in the RDS instance. |
AWS.RDS.ReadIOPSLogVolume
|
Count per second | The average number of read input/output operations per second (IOPS) for the log volume in the RDS instance. |
AWS.RDS.ReadLatency
|
seconds (s) |
Readlatency. The average amount of time taken per disk I/O operation. |
AWS.RDS.ReadLatencyEphemeralStorage
|
milliseconds (ms) | The average time taken to complete read operations on ephemeral storage in the RDS instance. |
AWS.RDS.ReadLatencyLogVolume
|
milliseconds (ms) | The average time taken to complete read operations on the log volume in the RDS instance. |
AWS.RDS.ReadThroughput
|
bps |
ReadThroughput. The average number of bytes read from disk per second. |
AWS.RDS.ReadThroughputLogVolume
|
bps | The average number of bytes read from the log volume per second in the RDS instance. |
AWS.RDS.ReplicaLag
|
seconds (s) |
ReplicaLag. For read replica configurations, the average amount of time a read replica DB instance lags behind the source DB instance. |
AWS.RDS.ReplicationChannelLag
|
seconds (s) | The amount of lag in seconds for replication between the primary instance and the replica in the RDS instance. |
AWS.RDS.ReplicationSlotDiskUsage
|
bytes |
ReplicationSlotDiskUsage. The average disk space used by replication slot files. This metric applies to PostgreSQL. |
AWS.RDS.ResultSetCacheHitRatio
|
Percent (%) |
The percentage of read operations that are served from the result set cache in the RDS instance. |
AWS.RDS.RollbackSegmentHistoryListLength
|
Count |
The length of the undo log or rollback segment history list, which contains the before images of database records used during transaction rollbacks or to provide a consistent read view for long-running transactions. |
AWS.RDS.RowLockTime
|
milliseconds (ms) |
The average time spent waiting for row locks in the RDS instance. |
AWS.RDS.SelectLatency
|
milliseconds (ms) |
The average time taken to complete select operations in the database. |
AWS.RDS.SelectThroughput
|
Count per second |
The number of select operations per second in the database. |
AWS.RDS.ServerlessDatabaseCapacity
|
Aurora Capacity Units (ACUs) |
The capacity of the Aurora Serverless v2 database, measured in Aurora Capacity Units (ACUs). |
AWS.RDS.SnapshotStorageUsed
|
bytes |
The amount of storage used by snapshots in the RDS instance. |
AWS.RDS.StorageNetworkReceiveThroughput
|
bps |
The amount of network throughput received from the storage subsystem by the RDS instance. |
AWS.RDS.StorageNetworkThroughput
|
bps |
The total network throughput for storage operations in the RDS instance. |
AWS.RDS.StorageNetworkTransmitThroughput
|
bps |
The amount of network throughput sent to clients by each instance in the RDS instance. |
AWS.RDS.SumBinaryLogSize
|
bytes |
The total size of all binary log files on the RDS instance. |
AWS.RDS.SwapUsage
|
bytes |
SwapUsage. The average amount of swap space used on the DB instance. This metric is not available for SQL Server. |
AWS.RDS.TempStorageIOPS
|
Count per second | The average number of input/output operations per second (IOPS) for temporary storage in the RDS instance. |
AWS.RDS.TempStorageThroughput
|
bps | The average number of bytes read from or written to temporary storage per second in the RDS instance. |
AWS.RDS.TotalBackupStorageBilled
|
bytes |
The total amount of backup storage billed for the RDS instance. |
AWS.RDS.TransactionLogsDiskUsage
|
bytes |
TransactionLogsDiskUsage. The average disk space used by transaction logs. This metric applies to PostgreSQL. |
AWS.RDS.TransactionLogsGeneration
|
bps |
TransactionLogsGeneration. The average size of transaction logs generated per second. This metric applies to PostgreSQL. |
AWS.RDS.UpdateLatency
|
milliseconds (ms) |
The average time taken to complete update operations in the database. |
AWS.RDS.UpdateThroughput
|
Count per second |
The number of update operations per second in the database. |
AWS.RDS.VolumeBytesUsed
|
bytes |
The amount of storage space used by the volume in the RDS instance. |
AWS.RDS.VolumeReadIOPs
|
Count per minute |
The average number of read input/output operations per second (IOPS) for the volume in the RDS instance. |
AWS.RDS.VolumeWriteIOPs
|
Count per minute |
The average number of write input/output operations per second (IOPS) for the volume in the RDS instance. |
AWS.RDS.WriteIOPS
|
Count per second |
WriteIOPS. The average number of disk write I/O operations per second. |
AWS.RDS.WriteIOPSEphemeralStorage
|
Count per second | The average number of write input/output operations per second (IOPS) for ephemeral storage in the RDS instance. |
AWS.RDS.WriteIOPSLogVolume
|
Count per second | The average number of write input/output operations per second (IOPS) for the log volume in the RDS instance. |
AWS.RDS.WriteLatency
|
seconds (s) |
WriteLatency. The average amount of time taken per disk I/O operation. |
AWS.RDS.WriteLatencyEphemeralStorage
|
milliseconds (ms) | The average time taken to complete write operations on ephemeral storage in the RDS instance. |
AWS.RDS.WriteLatencyLogVolume
|
seconds (s) | The average time taken to complete write operations on the log volume in the RDS instance. |
AWS.RDS.WriteThroughput
|
bps |
WriteThroughput. The average number of bytes written to disk per second. |
AWS.RDS.WriteThroughputEphemeralStorage
|
bps | The average number of bytes written to ephemeral storage per second in the RDS instance. |
AWS.RDS.WriteThroughputLogVolume
|
bps | The average number of bytes written to the log volume per second in the RDS instance. |
Redshift Cluster
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Redshift entities in the Metrics Explorer, filter the |
AWS.Redshift.CommitQueueLength
|
Count | The number of transactions waiting to commit at a given point in time. |
AWS.Redshift.ConcurrencyScalingActiveClusters
|
Count | The number of concurrency scaling clusters that are actively processing queries at any given time. |
AWS.Redshift.ConcurrencyScalingSecond
|
Count | The number of seconds used by concurrency scaling clusters that have active query processing activity. |
|
Percent (%) |
The percentage of CPU utilization. |
|
Count |
The number of database connections to a cluster. |
|
Boolean |
Indicates the health of the cluster. Every minute the cluster connects to its database and performs a simple query. If it is able to perform this operation successfully, the cluster is considered healthy. Otherwise, the cluster is unhealthy. |
|
Boolean |
Indicates whether the cluster is in maintenance mode. |
AWS.Redshift.MaxConfiguredConcurrencyScalingClusters
|
Count | Sets the maximum number of concurrency scaling clusters allowed when concurrency scaling is enabled. |
|
bps |
The rate at which the node or cluster receives data. |
|
bps |
The rate at which the node or cluster writes data. |
AWS.Redshift.NumExceededSchemaQuotas
|
Count | The number of times schema quotas have been exceeded in the Redshift cluster. |
|
Percent (%) |
The percent of disk space used. |
AWS.Redshift.PercentageQuotaUsed
|
Percent (%) | The percentage of the quota that has been used in the Redshift cluster. |
AWS.Redshift.QueriesCompletedPerSecond
|
Count per second | The number of queries completed per second in the Redshift cluster. |
AWS.Redshift.QueryDuration
|
milliseconds (ms) | The average amount of time taken to complete a query in the Redshift cluster. |
AWS.Redshift.QueryRuntimeBreakdown
|
milliseconds (ms) | The breakdown of query runtime into various stages such as planning, waiting, and execution. |
|
Count per second |
The average number of disk read operations per second. |
|
seconds (s) |
The average amount of time taken for disk read I/O operations. |
|
bytes |
The average number of bytes read from disk per second. |
AWS.Redshift.RedshiftManagedStorageTotalCapacity
|
bytes | The total capacity of managed storage available in the Redshift cluster. |
AWS.Redshift.SchemaQuota
|
Megabytes | The amount of disk space that a schema can use in the Redshift cluster. |
AWS.Redshift.StorageUsed
|
bytes | The amount of storage space used by the Redshift cluster. |
AWS.Redshift.TotalTableCount
|
Count | The total number of tables in the Redshift cluster. |
AWS.Redshift.UsageLimitAvailable
|
Count | The amount of usage limit available for the specified feature in the Redshift cluster. |
AWS.Redshift.UsageLimitConsumed
|
Count | The amount of usage limit consumed for the specified feature in the Redshift cluster. |
AWS.Redshift.WLMQueriesCompletedPerSecond
|
Count per second | The number of queries completed per second in the workload management (WLM) queues of the Redshift cluster. |
AWS.Redshift.WLMQueryDuration
|
milliseconds (ms) | The average amount of time taken to complete a query in the workload management (WLM) queues of the Redshift cluster. |
AWS.Redshift.WLMQueueLength
|
Count | The number of queries waiting in the workload management (WLM) queues of the Redshift cluster. |
AWS.Redshift.WLMQueueWaitTime
|
milliseconds (ms) | The amount of time queries wait in the workload management (WLM) queue before being processed. |
AWS.Redshift.WLMRunningQueries
|
Count | The number of queries currently running in the workload management (WLM) queue. |
|
Count per second |
The average number of write operations per second. |
|
seconds (s) |
The average amount of time taken for disk write I/O operations. |
|
bps |
The average number of bytes written to disk per second. |
Route 53
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Route53 entities in the Metrics Explorer, filter the |
|
Count |
For a calculated health check, the number of health checks that are healthy among the health checks that Amazon Route 53 is monitoring. |
|
milliseconds (ms) |
The average time, in milliseconds, that it took Amazon Route 53 health checkers to establish a TCP connection with the endpoint. |
AWS.Route53.DNSQueries
|
Count | The number of DNS queries received by Amazon Route 53. |
|
Percent (%) |
The percentage of Amazon Route 53 health checkers that consider the selected endpoint to be healthy. |
|
Boolean |
The status of the health check endpoint that CloudWatch is checking 1 indicates healthy, and 0 indicates unhealthy. |
|
milliseconds (ms) |
The average time, in milliseconds, that it took Amazon Route 53 health checkers to complete the SSL handshake. |
|
milliseconds (ms) |
The average time, in milliseconds, that it took Amazon Route 53 health checkers to receive the first byte of the response to an HTTP or HTTPS request. |
S3
Metric | Units | Description |
---|---|---|
AWS.S3.4xxErrors
|
Count |
4xxErrors. The number of HTTP 4xx client error status code requests made to an Amazon S3 bucket with a value of either 0 or 1. |
AWS.S3.5xxErrors
|
Count |
5xxErrors. The number of HTTP 5xx server error status code requests made to an Amazon S3 bucket with a value of either 0 or 1. |
AWS.S3.AllRequests
|
Count |
AllRequests. The total number of HTTP requests made to an Amazon S3 bucket, regardless of type. |
AWS.S3.BucketSizeBytes
|
bytes |
BucketSizeBytes. The amount of data that is stored in a bucket, in bytes. |
AWS.S3.BytesDownloaded
|
bytes |
BytesDownloaded. The number of bytes downloaded for requests made to an Amazon S3 bucket where the response includes a body. |
AWS.S3.BytesUploaded
|
bytes |
BytesUploaded. The number of bytes uploaded for requests made to an Amazon S3 bucket where the request includes a body. |
AWS.S3.DeleteRequests
|
Count |
The number of HTTP DELETE requests made for objects in a bucket. |
AWS.S3.FirstByteLatency
|
milliseconds (ms) |
FirstByteLatency. The per-request time from the complete request being received by an Amazon S3 bucket to when the response starts to be returned. |
AWS.S3.GetRequests
|
Count |
GetRequests. The number of HTTP GET requests made for objects in an Amazon S3 bucket. This doesn't include list operations. |
AWS.S3.HeadRequests
|
Count |
The number of HTTP HEAD requests made to a bucket. |
AWS.S3.InvokedLambda
|
Count | The number of times an AWS Lambda function is invoked by Amazon S3. |
AWS.S3.LambdaResponse4xx
|
Count | The number of 4xx (client error) responses returned by AWS Lambda functions invoked by Amazon S3. |
AWS.S3.LambdaResponse5xx
|
Count | The number of 5xx (server error) responses returned by AWS Lambda functions invoked by Amazon S3. |
AWS.S3.LambdaResponseRequests
|
Count | The number of requests made to AWS Lambda functions invoked by Amazon S3. |
AWS.S3.ListRequests
|
Count |
The number of HTTP requests that list the contents of a bucket. |
AWS.S3.NumberOfObjects
|
Count |
NumberOfObjects. The total number of objects stored in a bucket for all storage classes. This value is calculated by counting all objects in the bucket (both current and noncurrent objects) and the total number of parts for all incomplete multipart uploads to the bucket. |
AWS.S3.PostRequests
|
Count |
PostRequests. The number of HTTP POST requests made to an Amazon S3 bucket. |
AWS.S3.ProxiedRequests
|
Count | The number of requests that are proxied through Amazon S3. |
AWS.S3.PutRequests
|
Count |
PutRequests. The number of HTTP PUT requests made for objects in an Amazon S3 bucket. |
AWS.S3.ReplicationLatency
|
milliseconds (ms) |
The maximum number of seconds by which the replication destination bucket is behind the source bucket for a given replication rule. |
AWS.S3.SelectBytesReturned
|
bytes |
The amount of data returned with Select requests from S3 Standard storage. |
AWS.S3.SelectBytesScanned
|
bytes |
The amount of data scanned with Select requests from S3 Standard storage. |
AWS.S3.SelectRequests
|
Count |
The number of requests made to Amazon S3 Select. |
AWS.S3.TotalRequestLatency
|
milliseconds (ms) |
TotalRequestLatency. The elapsed per-request time from the first byte received to the last byte sent to an Amazon S3 bucket. This metric includes the time taken to receive the request body and send the response body, which is not included in FirstByteLatency. |
SNS
Metric | Units | Description |
---|---|---|
AWS.SNS.NotificationSuccessRate
|
Percent (%) |
The percentage of successfully delivered notifications out of the total notifications attempted. |
AWS.SNS.NumberOfMessagesPublished
|
Count |
NumberOfMessagesPublished. The average number of messages published to Amazon SNS topics. |
AWS.SNS.NumberOfNotificationsDelivered
|
Count |
NumberOfNotificationsDelivered. The average number of messages successfully delivered from Amazon SNS topics to subscribing endpoints. |
AWS.SNS.NumberOfNotificationsFailed
|
Count |
NumberOfNotificationsFailed. The average number of messages that Amazon SNS failed to deliver. |
AWS.SNS.NumberOfNotificationsFailedToRedriveToDlq
|
Count |
NumberOfNotificationsFailedToRedriveToDlq. The average number of messages that couldn't be moved to a dead-letter queue. |
AWS.SNS.NumberOfNotificationsFilteredOut
|
Count |
NumberOfNotificationsFilteredOut. The average number of messages that were rejected by subscription filter policies. A filter policy rejects a message when the message attributes don't match the policy attributes. |
AWS.SNS.NumberOfNotificationsFilteredOut-InvalidAttributes
|
Count |
NumberOfNotificationsFilteredOut-InvalidAttributes. The average number of messages that were rejected by subscription filter policies because the messages' attributes are invalid. |
AWS.SNS.NumberOfNotificationsFilteredOut-InvalidMessageBody
|
Count | The number of notifications filtered out due to invalid message body content. |
AWS.SNS.NumberOfNotificationsFilteredOut-MessageAttributes
|
Count | The number of notifications filtered out due to message attributes not matching the filter policy. |
AWS.SNS.NumberOfNotificationsFilteredOut-MessageBody
|
Count | The number of notifications filtered out due to message body content not matching the filter policy. |
AWS.SNS.NumberOfNotificationsFilteredOut-NoMessageAttributes
|
Count |
NumberOfNotificationsFilteredOut-NoMessageAttributes. The average number of messages that were rejected by subscription filter policies because the messages have no attributes. |
AWS.SNS.NumberOfNotificationsRedrivenToDlq
|
Count |
NumberOfNotificationsRedrivenToDlq. The average number of messages that have been moved to a dead-letter queue. |
AWS.SNS.PublishSize
|
bytes |
PublishSize. The average size of messages published. |
AWS.SNS.SMSMonthToDateSpentUSD
|
USD |
The total amount of money spent on SMS messages for the current month. |
AWS.SNS.SMSSuccessRate
|
Percent (%) | The percentage of successfully delivered SMS messages out of the total SMS messages attempted. |
SQS
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Simple Queue Service (SQS) entities in the Metrics Explorer, filter the |
|
seconds (s) |
The approximate age of the oldest non-deleted message in the queue. |
|
Count |
The number of messages in the queue that are delayed and not available for reading immediately. |
|
Count |
The number of messages that are "in flight." Messages are considered in flight if they have been sent to a client but have not yet been deleted or have not yet reached the end of their visibility window. |
|
Count |
The number of messages available for retrieval from the queue. |
|
Count |
The number of ReceiveMessage API calls that did not return a message. |
|
Count |
The number of messages deleted from the queue. |
|
Count |
The number of messages returned by calls to the ReceiveMessage API action. |
|
Count |
The number of messages added to a queue. |
|
bytes |
The size of messages added to a queue. |
Transfer Family
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore |
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Transfer Family entities in the Metrics Explorer, filter the |
AWS.Transfer.BytesIn
|
Count |
The total number of bytes received by the server. |
AWS.Transfer.BytesOut
|
Count |
The total number of bytes sent by the server. |
AWS.Transfer.FilesIn
|
Count |
The total number of files received by the server. |
AWS.Transfer.FilesOut
|
Count |
The total number of files sent from the server. |
AWS.Transfer.InboundMessage
|
Count | The total number of AS2 messages successfully received from a trading partner. This metric is emitted as soon as the inbound message has finished processing successfully. |
AWS.Transfer.InboundFailedMessage
|
Count | The total number of AS2 messages that were unsuccessfully received from a trading partner. This means a trading partner sent a message, but the Transfer Family server was not able to successfully process it. |
AWS.Transfer.OnUploadExecutionsStarted
|
Count |
The total number of workflow executions started on the server. |
AWS.Transfer.OnUploadExecutionsSuccess
|
Count |
The total number of successful workflow executions on the server. |
AWS.Transfer.OnUploadExecutionsFailed
|
Count |
The total number of unsuccessful workflow executions on the server. |
AWS.Transfer.OnPartialUploadExecutionsStarted
|
Count |
The total number of on-partial-upload workflow executions started on the server. |
AWS.Transfer.OnPartialUploadExecutionsSuccess
|
Count |
The total number of successful, on-partial-upload workflow executions on the server. |
AWS.Transfer.OnPartialUploadExecutionsFailed
|
Count |
The total number of unsuccessful, on-partial-upload workflow executions on the server. |
Transit Gateway
Metric | Units | Description |
---|---|---|
AWS.TransitGateway.BytesDropCountBlackhole
|
Count |
BytesDropCountBlackhole. The total number of bytes dropped because they matched a blackhole route. |
AWS.TransitGateway.BytesDropCountNoRoute
|
Count |
BytesDropCountNoRoute. The total number of bytes dropped because they did not match a route. |
AWS.TransitGateway.BytesDropPercentage
|
Percent (%) |
The percentage of bytes dropped by the transit gateway due to various reasons such as blackhole routes or no matching routes. |
AWS.TransitGateway.BytesIn
|
Count |
BytesIn. The total number of bytes received by the transit gateway. |
AWS.TransitGateway.BytesOut
|
Count |
BytesOut. The total number of bytes sent from the transit gateway. |
AWS.TransitGateway.PacketDropCountBlackhole
|
Count |
PacketDropCountBlackhole. The total number of packets dropped because they matched a blackhole route. |
AWS.TransitGateway.PacketDropCountNoRoute
|
Count |
PacketDropCountNoRoute. The total number of packets dropped because they did not match a route. |
AWS.TransitGateway.PacketsDropPercentage
|
Percent (%) |
The percentage of packets dropped by the transit gateway due to various reasons such as blackhole routes, no matching routes, or TTL expiration. |
AWS.TransitGateway.PacketsIn
|
Count |
PacketsIn. The total number of packets received by the transit gateway. |
AWS.TransitGateway.PacketsOut
|
Count |
PacketsOut. The total number of packets sent by the transit gateway. |
VPN
Metric | Units | Description |
---|---|---|
AWS.VPN.TunnelDataIn
|
bytes |
TunnelDataIn. The total bytes received on the AWS side of the connection through the VPN tunnel from a customer gateway. |
AWS.VPN.TunnelDataOut
|
bytes |
TunnelDataOut. The total bytes sent from the AWS side of the connection through the VPN tunnel to the customer gateway. Each metric data point represents the number of bytes sent after the previous data point. |
AWS.VPN.TunnelState
|
Count |
TunnelState. The average state of the tunnels. For static VPNs, 0 indicates DOWN and 1 indicates UP. |
Infrastructure/Azure metrics
Metrics for Azure entities are collected by integrating SolarWinds Observability SaaS with your Azure cloud account. See Azure cloud platform monitoring.
API Management Service
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore |
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of API Management Service entities in the Metrics Explorer, filter the |
azure.api.management.BackendDuration
|
milliseconds (ms) | Duration of backend requests. |
azure.api.management.Capacity
|
Percent (%) |
Utilization metric for ApiManagement service. For SKUs other than Premium, Max aggregation will show the value as 0. |
azure.api.management.ConnectionAttempts
|
Count | Count of WebSocket connection attempts based on selected source and destination. |
azure.api.management.Duration
|
milliseconds (ms) | Overall duration of gateway requests. |
azure.api.management.EventHubDroppedEvents
|
Count | Number of events skipped because of queue size limit reached. |
azure.api.management.EventHubRejectedEvents
|
Count | Number of rejected EventHub events (wrong configuration or unauthorized). |
azure.api.management.EventHubSuccessfulEvents
|
Count | Number of successful EventHub events. |
azure.api.management.EventHubThrottledEvents
|
Count | Number of throttled EventHub events. |
azure.api.management.EventHubTimedoutEvents
|
Count | Number of timed out EventHub events. |
azure.api.management.EventHubTotalBytesSent
|
bytes | Total size of EventHub events. |
azure.api.management.EventHubTotalEvents
|
Count | Number of events sent to EventHub. |
azure.api.management.EventHubTotalFailedEvents
|
Count | Number of failed EventHub events. |
azure.api.management.NetworkConnectivity
|
Count | Network Connectivity status of dependent resource types from API Management service. |
azure.api.management.Requests
|
Count | Gateway request metrics with multiple dimensions. |
azure.api.management.WebSocketMessages
|
Count | Count of WebSocket messages based on selected source and destination. |
App Service
Metric | Units | Description |
---|---|---|
azure.sites.app_connections
|
Count |
The average number of connections established by an application in Azure App Service. |
azure.sites.app_domains
|
Count |
Total App Domains. The average number of AppDomains loaded in this application. |
azure.sites.app_domains.unloaded
|
Count |
The number of application domains that have been unloaded in an Azure App Service environment, which can be useful for monitoring app lifecycle events. |
azure.sites.collections.gen1
|
Count |
The number of garbage collection events for Generation 1 objects in an Azure App Service instance. |
azure.sites.collections.gen2
|
Count |
The number of garbage collection events for Generation 2 objects in an Azure App Service instance. |
azure.sites.cpu_time
|
seconds (s) |
CPU Time. The total amount of CPU consumed by the app, in seconds. |
azure.sites.current_assemblies
|
Count |
The number of assemblies currently loaded across all application domains in an Azure App Service instance. |
azure.sites.function_executions
|
Count |
The total number of function executions in an Azure Functions app, providing insight into function activity and usage. |
azure.sites.handles
|
Count |
The number of open file handles in an Azure App Service environment. This metric helps monitor resource usage and potential file access issues. |
azure.sites.http.101
|
Count |
Tracks HTTP 101 responses, which indicate protocol switching (for example, upgrading from HTTP to WebSockets). |
azure.sites.http.2xx
|
Count |
Http2xx. The total number of requests resulting in an HTTP status code greater than or equal to 200 but less than 300. |
azure.sites.http.3xx
|
Count |
HTTP 3xx responses, which indicate redirection. These status codes signal that the requested resource has moved to a different location. |
azure.sites.http.401
|
Count |
HTTP 401 responses, which indicate unauthorized access. This occurs when authentication credentials are missing or invalid. |
azure.sites.http.403
|
Count |
HTTP 403 responses, which indicate forbidden access. This happens when a request is denied due to insufficient permissions or security restrictions. |
azure.sites.http.404
|
Count |
HTTP 404 responses, which indicate that the requested resource was not found. This can occur when a URL is incorrect or the resource has been removed. |
azure.sites.http.406
|
Count |
HTTP 406 responses, which indicate that the requested format is not acceptable. This happens when the server cannot provide content in the format specified by the request. |
azure.sites.http.4xx
|
Count |
Http4xx. The total number of requests resulting in an HTTP status code greater than or equal to 400 but less than 500. |
azure.sites.http.5xx
|
Count |
Http5xx. The total number of requests resulting in an HTTP status code greater than or equal to 500 but less than 600. |
azure.sites.io.bytes_received
|
bytes |
Bytes Received. The total amount of incoming bandwidth consumed by the app. |
azure.sites.io.bytes_sent
|
bytes |
Bytes Sent. The total amount of outgoing bandwidth consumed by the app. |
azure.sites.io.other_bytes
|
bps |
The rate at which the app process issues bytes to I/O operations that do not involve data transfer, such as control operations. |
azure.sites.io.other_ops
|
Count per second |
The rate at which the app process issues I/O operations that are not read or write operations. |
azure.sites.io.read_bytes
|
bps |
IoReadBytesPerSecond. The number of bytes per second the app is reading from I/O operations. |
azure.sites.io.read_ops
|
Count per second |
The number of read operations performed by the app process. |
azure.sites.io.write_bytes
|
bps |
IO Write Bytes Per Second. The number of bytes per second the app is writing to I/O operations. |
azure.sites.io.write_ops
|
Count per second |
The number of write operations performed by the app process. |
azure.sites.memory.working_set
|
bytes |
Memory Working Set. The current amount of memory used by the app. |
azure.sites.memory.working_set.avg
|
Megabytes (MB) |
Average Memory Working Set. The average amount of memory used by the app, in megabytes. |
azure.sites.private_bytes
|
bytes |
The amount of memory allocated by the app process that cannot be shared with other processes. This includes allocated memory, local variables, heap memory, and other runtime data. |
azure.sites.queued_requests
|
Count |
Requests In Application Queue. The average number of requests in the application request queue. |
azure.sites.requests
|
Count |
Requests. The total number of requests regardless of their resulting HTTP status code. |
azure.sites.response_time
|
seconds (s) |
Average Response Time. The average time taken for the app to serve requests, in seconds. |
azure.sites.threads
|
Count |
Threads. The average number of threads currently active in the app process. |
Application Gateway
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore |
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Application Gateway entities in the Metrics Explorer, filter the |
azure.applicationgateway.ApplicationGatewayTotalTime
|
milliseconds (ms) | Average time that it takes for a request to be processed and its response to be sent. |
azure.applicationgateway.AvgRequestCountPerHealthyHost
|
Percent (%) |
Average request count per minute per healthy backend host in a pool. |
azure.applicationgateway.BackendConnectTime
|
Count | Time spent establishing a connection with a backend server. |
azure.applicationgateway.BackendFirstByteResponseTime
|
milliseconds (ms) | Time interval between start of establishing a connection to backend server and receiving the first byte of the response header. |
azure.applicationgateway.BackendLastByteResponseTime
|
Count | Time interval between start of establishing a connection to backend server and receiving the last byte of the response body. |
azure.applicationgateway.BackendResponseStatus
|
Count | The number of HTTP response codes generated by the backend members. |
azure.applicationgateway.BlockedCount
|
Count | Web Application Firewall blocked requests rule distribution. |
azure.applicationgateway.BytesReceived
|
Count | The total number of bytes received by the Application Gateway from the clients. |
azure.applicationgateway.BytesSent
|
Count | The total number of bytes sent by the Application Gateway to the clients. |
azure.applicationgateway.CapacityUnits
|
bytes | Capacity Units consumed. |
azure.applicationgateway.ClientRtt
|
Count | Average round trip time between clients and Application Gateway. |
azure.applicationgateway.ComputeUnits
|
Count | Compute Units consumed. |
azure.applicationgateway.CpuUtilization
|
Count | Current CPU utilization of the Application Gateway. |
azure.applicationgateway.CurrentConnections
|
Count | Count of current connections established with Application Gateway |
azure.applicationgateway.EstimatedBilledCapacityUnits
|
Count | Estimated capacity units that will be charged. |
azure.applicationgateway.FailedRequests
|
Count | Count of failed requests that Application Gateway has served. |
azure.applicationgateway.FixedBillableCapacityUnits
|
Count | Minimum capacity units that will be charged. |
azure.applicationgateway.HealthyHostCount
|
Count | Number of healthy backend hosts. |
azure.applicationgateway.MatchedCount
|
Count | Web Application Firewall Total Rule Distribution for the incoming traffic. |
azure.applicationgateway.NewConnectionsPerSecond
|
Count per second | New connections per second established with Application Gateway. |
azure.applicationgateway.ResponseStatus
|
Count | Http response status returned by Application Gateway. |
azure.applicationgateway.Throughput
|
bps | Number of bytes per second the Application Gateway has served. |
azure.applicationgateway.TlsProtocol
|
Count | The number of TLS and non-TLS requests initiated by the client that established connection with the Application Gateway. |
azure.applicationgateway.TotalRequests
|
Count | Count of successful requests that Application Gateway has served. |
azure.applicationgateway.UnhealthyHostCount
|
Count | Number of unhealthy backend hosts. |
azure.applicationgateway.AzwafBotProtection
|
Count | Number of matched Bot Rules. |
azure.applicationgateway.AzwafCustomRule
|
Count | Number of matched Custom Rules. |
azure.applicationgateway.AzwafSecRule
|
Count | Number of matched Managed Rules. |
azure.applicationgateway.AzwafTotalRequests
|
Count | Total number of requests evaluated by WAF. |
Application Insights
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore |
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Application Insights entities in the Metrics Explorer, filter the |
azure.application.insight.Availability
|
Percent (%) | Tracks the availability and responsiveness of an application by sending web requests at regular intervals. If the application isn't responding or the response time is too slow, alerts can be triggered. |
azure.application.insight.Availability_tests
|
Count | Recurring web tests that monitor an application's availability from various locations worldwide. These tests help ensure uptime and detect performance issues. |
azure.application.insight.Availability_test_duration
|
milliseconds (ms) | The duration of availability tests, helping assess response times and detect slow performance. |
azure.application.insight.client_processing_time
|
milliseconds (ms) | Represents the time taken by the client to process a request before sending a response. This metric helps analyze client-side performance. |
azure.application.insight.Receiving_response_time
|
milliseconds (ms) | Tracks the time taken to receive a response from the server after a request is sent. This metric helps monitor network latency and server responsiveness. |
azure.application.insight.Send_request_time
|
milliseconds (ms) | Measures the time taken to send a request from the client to the server. This metric helps assess network performance and request transmission speed. |
azure.application.insight.Browser_page_load_time
|
milliseconds (ms) | The time taken for a web page to fully load in the browser. This metric helps assess user experience and identify performance bottlenecks. |
azure.application.insight.Dependency_calls
|
Count | The number of external service or database calls made by an application. This metric helps monitor interactions with dependencies like APIs, databases, and storage. |
azure.application.insight.Dependency_duration
|
milliseconds (ms) | The time taken for a dependency call to complete, including connection time and response retrieval. This metric helps analyze performance and detect slow dependencies. |
azure.application.insight.Dependency_call_failures
|
Count | The number of failed dependency calls, helping identify issues with external services or databases that impact application functionality. |
azure.application.insight.Browser_exceptions
|
Count | Exceptions that occur in the browser, such as JavaScript errors. This metric helps diagnose client-side issues affecting user experience. |
azure.application.insight.Exceptions
|
Count | The number of exceptions encountered by the application, including both client-side and server-side errors. This metric helps troubleshoot failures and improve application stability. |
azure.application.insight.Server_exceptions
|
Count | Tracks exceptions that occur on the server side of an application. These exceptions can be correlated with failed requests and other events to diagnose issues efficiently. |
azure.application.insight.Page_views
|
Count | The number of times a page is viewed in an application. This metric helps analyze user engagement and behavior. |
azure.application.insight.Page_view_load_time
|
milliseconds (ms) | The time taken for a web page to fully load in the browser. This metric helps assess user experience and identify performance bottlenecks. |
azure.application.insight.Exception_rate
|
Count per second | The rate of exceptions occurring in an application. This metric helps monitor application stability and detect potential issues. |
azure.application.insight.Process_CPU
|
Percent (%) | The percentage of CPU usage by the application process. High values may indicate increased workload or performance bottlenecks. |
azure.application.insight.Processor_time
|
Percent (%) | The total processor time consumed by the application. This metric helps monitor resource utilization and performance efficiency. |
azure.application.insight.Process_private_bytes
|
bytes | Represents the amount of memory allocated by an application that cannot be shared with other processes. This metric helps monitor memory usage and detect potential performance issues. |
azure.application.insight.HTTP_request_execution_time
|
milliseconds (ms) | The time taken to execute an HTTP request within an application. This metric helps assess request processing efficiency and identify bottlenecks. |
azure.application.insight.HTTP_requests_in_application_queue
|
Count | The number of HTTP requests waiting in the application queue before being processed. A high value may indicate performance issues or resource constraints. |
azure.application.insight.HTTP_request_rate
|
Count per second | The rate at which HTTP requests are received by the application. This metric helps monitor traffic patterns and detect potential spikes in demand. |
azure.application.insight.Server_requests
|
Count | The number of requests received by the server, providing insights into application workload and performance. |
azure.application.insight.Server_response_time
|
milliseconds (ms) | The time taken for the server to respond to incoming requests. This metric helps assess application responsiveness and detect slow performance. |
azure.application.insight.Failed_requests
|
Count | The number of failed requests in an application. This metric helps diagnose errors, exceptions, and faults affecting application stability. |
azure.application.insight.Server_request_rate
|
Count per second | The rate at which server requests are received by the application. This metric helps monitor workload and traffic patterns. |
azure.application.insight.Traces
|
Count per second | Captures trace logs generated by an application, providing insights into debugging, performance monitoring, and distributed tracing. |
Blob Storage
Metric | Units | Description |
---|---|---|
azure.storage.blob.availability
|
Percent (%) |
Availability. The average percentage of availability for the storage service or the specified API operation. Availability is calculated by taking the total billable requests value and dividing it by the number of applicable requests. |
azure.storage.blob.blobs
|
Count |
BlobCount. The average number of blob objects stored in the storage account. |
azure.storage.blob.capacity
|
bytes |
BlobCapacity. The average amount of blob storage used in the storage account. |
azure.storage.blob.containers
|
ContainerCount. The average number of containers in the storage account. |
|
azure.storage.blob.egress
|
bytes |
Egress. The total amount of egress data. This number includes egress from an external client into Azure Storage as well as egress within Azure. As a result, this number does not reflect billable egress. |
azure.storage.blob.index_capacity
|
IndexCapacity. The average amount of storage used by ADLS Gen2 Hierarchical Index. |
|
azure.storage.blob.ingress
|
bytes |
Ingress. The total amount of ingress data. This number includes ingress from an external client into Azure Storage as well as ingress within Azure. |
azure.storage.blob.success.e2e_latency
|
milliseconds (ms) |
SuccessE2ELatency. The average end-to-end latency of successful requests made to a storage service or the specified API operation. This value includes the required processing time within Azure Storage to read the request, send the response, and receive acknowledgment of the response. |
azure.storage.blob.success.server_latency
|
milliseconds (ms) |
SuccessServerLatency. The average time used to process a successful request by Azure Storage. |
azure.storage.blob.transactions
|
Count |
Transactions. The total number of requests made to a storage service or the specified API operation. This number includes successful and failed requests, as well as requests that produced errors. |
Cache for Redis
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Cache for Redis entities in the Metrics Explorer, filter the |
azure.redis.alloperationsPerSecond
|
Count | The number of instantaneous operations per second executed on the cache. |
azure.redis.allpercentprocessortime
|
Percent (%) | The CPU utilization of the Azure Redis Cache server as a percentage. |
azure.redis.cacheLatency
|
Count | The latency to the cache in microseconds. |
azure.redis.LatencyP99
|
Count | Measures the worst-case (99th percentile) latency of server-side commands in microseconds. Measured by issuing PING commands from the load balancer to the Redis server and tracking the time to respond. |
azure.redis.allcachehits
|
Count | The number of successful key lookups. |
azure.redis.allcachemisses
|
Count | The number of failed key lookups. |
azure.redis.allconnectedclients
|
Count | The number of client connections to the cache. |
azure.redis.allserverLoad
|
Percent (%) | The percentage of cycles in which the Redis server is busy processing and not waiting idle for messages. |
azure.redis.allusedmemorypercentage
|
Percent (%) | The percentage of cache memory used for key/value pairs. |
azure.redis.allexpiredkeys
|
Count | The number of items expired from the cache. |
azure.redis.errors
|
Count | The number errors that occured on the cache. |
azure.redis.allcacheRead
|
bps | The amount of data read from the cache in bytes per second. |
azure.redis.allcacheWrite
|
bps | The amount of data written to the cache in bytes per second. |
azure.redis.allConnectionsClosedPerSecond
|
Count per second | The number of instantaneous connections closed per second on the cache via port 6379 or 6380 (SSL). |
azure.redis.allConnectionsCreatedPerSecond
|
Count per second | The number of instantaneous connections created per second on the cache via port 6379 or 6380 (SSL). |
azure.redis.allevictedkeys
|
Count | The number of items evicted from the cache. |
azure.redis.allgetcommands
|
Count | The number of get operations from the cache. |
azure.redis.allsetcommands
|
Count | The number of set operations to the cache. |
azure.redis.alltotalcommandsprocessed
|
Count | The total number of commands processed by the cache server. |
azure.redis.alltotalkeys
|
Count | The total number of items in the cache. |
azure.redis.allusedmemory
|
bytes | The amount of cache memory used for key/value pairs in the cache in MB. |
azure.redis.allusedmemoryRss
|
bytes | The amount of cache memory used in MB, including fragmentation and metadata. |
azure.redis.cachehits
|
Count | The number of successful key lookups. |
azure.redis.cachemisses
|
Count | The number of failed key lookups. |
azure.redis.cachemissrate
|
Percent (%) | The % of get requests that miss. |
azure.redis.cacheRead
|
bps | The amount of data read from the cache in bytes per second. |
azure.redis.cacheWrite
|
bps | The amount of data written to the cache in bytes per second. |
azure.redis.connectedclients
|
Count | The number of client connections to the cache. |
azure.redis.ConnectedClientsUsingAADToken
|
Count | The number of client connections to the cache using AAD Token. |
azure.redis.evictedkeys
|
Count | The number of items evicted from the cache. |
azure.redis.expiredkeys
|
Count | The number of items expired from the cache. |
azure.redis.GeoReplicationConnectivityLag
|
seconds (s) | Time in seconds since last successful data synchronization with geo-primary cache. Value will continue to increase if the link status is down. |
azure.redis.GeoReplicationDataSyncOffset
|
bytes | Approximate amount of data in bytes that needs to be synchronized to geo-secondary cache. |
azure.redis.GeoReplicationFullSyncEventFinished
|
Count | Fired on completion of a full synchronization event between geo-replicated caches. This metric reports 0 most of the time because geo-replication uses partial resynchronizations for any new data added after the initial full synchronization. |
azure.redis.GeoReplicationFullSyncEventStarted
|
Count | Fired on initiation of a full synchronization event between geo-replicated caches. This metric reports 0 most of the time because geo-replication uses partial resynchronizations for any new data added after the initial full synchronization. |
azure.redis.GeoReplicationHealthy
|
Count | The health status of geo-replication link. 1 if healthy and 0 if disconnected or unhealthy. |
azure.redis.getcommands
|
Count | The number of get operations from the cache. |
azure.redis.operationsPerSecond
|
Count | The number of instantaneous operations per second executed on the cache. |
azure.redis.percentProcessorTime
|
Percent (%) | The CPU utilization of the Azure Redis Cache server as a percentage. |
azure.redis.serverLoad
|
Percent (%) | The percentage of cycles in which the Redis server is busy processing and not waiting idle for messages. |
azure.redis.setcommands
|
Count | The number of set operations to the cache. |
azure.redis.totalcommandsprocessed
|
Count | The total number of commands processed by the cache server. |
azure.redis.totalkeys
|
Count | The total number of items in the cache. |
azure.redis.usedmemory
|
bytes | The amount of cache memory used for key/value pairs in the cache in MB. |
azure.redis.usedmemorypercentage
|
Percent (%) | The percentage of cache memory used for key/value pairs. |
azure.redis.usedmemoryRss
|
bytes | The amount of cache memory used in MB, including fragmentation and metadata. |
Cache for Redis
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Cache for Redis Enterprise entities in the Metrics Explorer, filter the |
azure.redis.enterprise.operationsPerSecond
|
Count | The number of instantaneous operations per second executed on the cache. |
azure.redis.enterprise.percentProcessorTime
|
Percent (%) | The CPU utilization of the Azure Redis Cache server as a percentage. |
azure.redis.enterprise.usedmemorypercentage
|
Percent (%) | The percentage of cache memory used for key/value pairs. |
azure.redis.enterprise.cachehits
|
Count | The number of successful key lookups. |
azure.redis.enterprise.cacheLatency
|
Count | The latency to the cache in microseconds. |
azure.redis.enterprise.cachemisses
|
Count | The number of failed key lookups. |
azure.redis.enterprise.serverLoad
|
Percent (%) | The percentage of cycles in which the Redis server is busy processing and not waiting idle for messages. |
azure.redis.enterprise.connectedclients
|
Count | The number of client connections to the cache. |
azure.redis.enterprise.errors
|
Count | The number errors that occured on the cache. |
azure.redis.enterprise.cacheRead
|
Megabytes per second (MB/s) |
The amount of data read from the cache in Megabytes per second (MB/s). |
azure.redis.enterprise.cacheWrite
|
Megabytes per second (MB/s) | The amount of data written to the cache in Megabytes per second (MB/s). |
azure.redis.enterprise.totalkeys
|
Count | The total number of items in the cache. |
azure.redis.enterprise.evictedkeys
|
Count | The number of items evicted from the cache. |
azure.redis.enterprise.expiredkeys
|
Count | The number of items expired from the cache. |
azure.redis.enterprise.geoReplicationHealthy
|
Count | The health of geo replication in an Active Geo-Replication group. 0 represents Unhealthy and 1 represents Healthy. |
azure.redis.enterprise.getcommands
|
Count | The number of get operations from the cache. |
azure.redis.enterprise.setcommands
|
Count | The number of set operations to the cache. |
azure.redis.enterprise.totalcommandsprocessed
|
Count | The total number of commands processed by the cache server. |
azure.redis.enterprise.usedmemory
|
Megabytes (MB) | The amount of cache memory used for key/value pairs in the cache in MB. |
Container Instances Group
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Container Instances Group entities in the Metrics Explorer, filter the |
azure.container.instances.group.CpuUsage
|
Count | CPU usage on all cores in millicores. |
azure.container.instances.group.MemoryUsage
|
bytes | Total memory usage in byte. |
azure.container.instances.group.NetworkBytesReceivedPerSecond
|
bytes | The network bytes received per second. |
azure.container.instances.group.NetworkBytesTransmittedPerSecond
|
bytes | The network bytes transmitted per second. |
CDN
Metric | Units | Description |
---|---|---|
azure.cdn.byte_hit_ratio
|
Percent (%) |
ByteHitRatio. Of the total number of response bytes, the percentage that were served from the CDN cache. |
azure.cdn.origin_health_percentage
|
Percent (%) |
OriginHealthPercentage. The percentage of successful health probes sent to backends. |
azure.cdn.origin_latency
|
milliseconds (ms) |
OriginLatency. The average time from when the request was sent to the backend to when the last response byte was received. |
azure.cdn.origin_request_count
|
Count |
OriginRequestCount. The total number of requests sent to origin. |
azure.cdn.percentage_4XX
|
Percent (%) |
Percentage4XX. The average percentage of requests with a status code greater than or equal to 400 but less than 500. |
azure.cdn.percentage_5XX
|
Percent (%) |
Percentage5XX. The average percentage of requests with a status code greater than or equal to 500 but less than 600. |
azure.cdn.request_count
|
Count |
RequestCount. The total number of client requests served by CDN. |
azure.cdn.request_size
|
bytes |
RequestSize. The total number of bytes sent as requests from clients. |
azure.cdn.response_size
|
bytes |
ResponseSize. The total number of bytes sent as responses from CDN edge to clients. |
azure.cdn.total_latency
|
milliseconds (ms) |
TotalLatency. The average time from the client request being received by CDN until the last response byte is sent from CDN to the client. |
azure.cdn.web_application_firewall_request_count
|
Count |
WebApplicationFirewallRequestCount. The total number of matched WAF requests. |
Cosmos DB
Metric | Units | Description |
---|---|---|
azure.cosmos.autoscale_max_throughput
|
Count |
AutoscaleMaxThroughput. The maximum throughput the autoscale will scale to. |
azure.cosmos.available_storage
|
bytes |
AvailableStorage. The total amount of available storage reported at 5-minute granularity per region. |
azure.cosmos.cassandra.connection.avg_replication_latency
|
milliseconds (ms) |
CassandraConnectorAvgReplicationLatency. The average replication latency of the Cassandra Connector. |
azure.cosmos.cassandra.connection.replication_health_status
|
CassandraConnectorReplicationHealthStatus. The replication health status of the Cassandra Connector. |
|
azure.cosmos.cassandra.connection_closures
|
Count |
CassandraConnectionClosures. The total number of Cassandra Connections closed. |
azure.cosmos.cassandra.request_charges
|
Count |
CassandraRequestCharges. The total number of request units consumed by the API for Cassandra. |
azure.cosmos.cassandra.requests
|
Count |
CassandraRequests. The total number of Cassandra API requests made. |
azure.cosmos.data.usage
|
bytes |
DataUsage. The total data usage reported at 5-minute granularity per region. |
azure.cosmos.document.count
|
Count |
DocumentCount. The total document count reported at 5-minute granularity per region. |
azure.cosmos.document.quota
|
bytes |
DocumentQuota. The total storage quota reported at 5-minute granularity per region. |
azure.cosmos.gremlin.request_charge
|
Count |
GremlinRequestCharges. The total number of request units consumed by Gremlin queries. |
azure.cosmos.gremlin.requests
|
Count |
GremlinRequests. The total number of requests made by Gremlin queries. |
azure.cosmos.index_usage
|
bytes |
IndexUsage. The total Index usage reported at 5-minute granularity per region. |
azure.cosmos.mongo.request_charge
|
Count |
MongoRequestCharge. The total number of Mongo request units consumed. |
azure.cosmos.mongo.requests
|
Count |
MongoRequests. The total number of Mongo requests made. |
azure.cosmos.normalized_ru_consumption
|
Percent (%) |
NormalizedRUConsumption. The maximum request unit consumption percentage per minute. |
azure.cosmos.provisioned_throughput
|
Count |
ProvisionedThroughput. The maximum provisioned throughput at container granularity. |
azure.cosmos.replication_latency.p99
|
milliseconds (ms) |
ReplicationLatency. The average replication latency across the source and target regions for a geo-enabled account. |
azure.cosmos.requests.metadata
|
Count |
MetadataRequests. The total number of metadata requests. |
azure.cosmos.requests.total
|
Count |
TotalRequests. The total number of requests made. |
azure.cosmos.requests.total_units
|
Count |
TotalRequestUnits. The total number of request units consumed. |
azure.cosmos.server_side_latency
|
milliseconds (ms) |
ServerSideLatency. The average amount of time taken by the server to process a request. |
azure.cosmos.service_availability
|
Percent (%) |
ServiceAvailability. The average account request availability at one-hour granularity. |
Data Factory
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Data Factory entities in the Metrics Explorer, filter the |
azure.data.factory.PipelineFailedRuns
|
Count | The number of pipeline runs that have failed due to errors or unexpected conditions during execution. This helps identify issues in data workflows and troubleshoot failures. |
azure.data.factory.PipelineSucceededRuns
|
Count | The number of pipeline runs that have successfully completed without errors. This metric helps monitor the reliability and efficiency of data pipelines. |
azure.data.factory.PipelineCancelledRuns
|
Count | The number of pipeline runs that were manually or automatically canceled before completion. This can be useful for tracking interruptions in data processing. |
azure.data.factory.ActivityCancelledRuns
|
Count | The number of individual activities within a pipeline that were canceled before execution or completion. This helps monitor workflow interruptions at the activity level. |
azure.data.factory.ActivitySucceededRuns
|
Count | The number of activities within a pipeline that have successfully completed without errors. This metric helps assess the effectiveness of individual tasks in a data pipeline. |
azure.data.factory.ActivityFailedRuns
|
Count | The number of activities within a pipeline that have failed due to errors or unexpected conditions. This helps pinpoint specific issues within a pipeline execution. |
azure.data.factory.TriggerFailedRuns
|
Count | The number of trigger runs that have failed due to errors or unexpected conditions during execution. This helps identify issues in automated data workflows. |
azure.data.factory.TriggerSucceededRuns
|
Count | The number of trigger runs that have successfully completed without errors. This metric helps monitor the reliability and efficiency of scheduled or event-driven triggers. |
azure.data.factory.TriggerCancelledRuns
|
Count | The number of trigger runs that were manually or automatically canceled before completion. This can be useful for tracking interruptions in data processing. |
azure.data.factory.MaxAllowedResourceCount
|
Count | The maximum number of resources allowed within an Azure Data Factory instance. This metric helps monitor resource allocation limits. |
azure.data.factory.ResourceCount
|
Count | The total number of resources currently in use within an Azure Data Factory instance. This metric helps track resource consumption and availability. |
azure.data.factory.FactorySizeInGbUnits
|
Count | The total size of the Azure Data Factory instance in gigabyte units. This metric helps monitor storage and processing capacity. |
azure.data.factory.IntegrationRuntimeCpuPercentage
|
Percent (%) | The percentage of CPU utilization for the integration runtime. Higher values may indicate increased workload or potential performance bottlenecks. |
azure.data.factory.IntegrationRuntimeAvailableMemory
|
bytes | The amount of available memory for the integration runtime. This metric helps monitor resource usage and ensure optimal performance. |
azure.data.factory.IntegrationRuntimeAvailableNodeNumber
|
Count | The number of available nodes in the integration runtime. This metric is useful for assessing scalability and resource allocation. |
azure.data.factory.IntegrationRuntimeQueueLength
|
Count | The number of tasks waiting in the queue for execution within the integration runtime. A high queue length may indicate processing delays or resource constraints. |
azure.data.factory.IntegrationRuntimeAverageTaskPickupDelay
|
seconds (s) | The average delay before a task is picked up for execution by the integration runtime. Longer delays may suggest resource contention or inefficiencies in task scheduling. |
Disk Storage
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Disk Storage entities in the Metrics Explorer, filter the |
azure.disk.storage.Disk_Read_Bytes_sec
|
bps | Bytes per second read from disk during the monitoring period. |
azure.disk.storage.Disk_Read_Operations_sec
|
Count per second | Number of read IOs performed on a disk during monitoring period. |
azure.disk.storage.Disk_Write_Bytes_sec
|
bps | Bytes per second written to disk during monitoring period. |
azure.disk.storage.Disk_Write_Operations_sec
|
Count per second | Number of Write IOs performed on a disk during the monitoring period. |
azure.disk.storage.DiskPaidBurstIOPS
|
Count | The accumulated operations of burst transactions used for disks with on-demand burst enabled. |
DNS Zone
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of DNS Zone entities in the Metrics Explorer, filter the |
azure.dnszone.QueryVolume
|
Count | Number of queries served for a DNS zone. |
azure.dnszone.RecordSetCapacityUtilization
|
Percent (%) | Percent of Record Set capacity utilized by a DNS zone. |
azure.dnszone.RecordSetCount
|
Count | Number of Record Sets in a DNS zone. |
Event Hubs
Metric | Units | Description |
---|---|---|
azure.eventhubs.namespaces.active_connections
|
Count |
ActiveConnections. The maximum number of active connections on a namespace and on an entity (event hub) in the namespace. |
azure.eventhubs.namespaces.capture_backlog
|
Count |
Tracks the backlog of events waiting to be captured in Azure Event Hubs. |
azure.eventhubs.namespaces.captured_bytes
|
bytes |
CapturedBytes. The total number of captured bytes for an event hub. |
azure.eventhubs.namespaces.captured_messages
|
Count |
CapturedMessages. The total number of captured messages for an event hub. |
azure.eventhubs.namespaces.connections_closed
|
Cloud |
ConnectionsClosed. The total number of closed connections. |
azure.eventhubs.namespaces.connections_opened
|
Count |
ConnectionsOpened. The total number of open connections. |
azure.eventhubs.namespaces.incoming_bytes
|
bytes |
IncomingBytes. The number of incoming bytes for an event hub during the specified period. |
azure.eventhubs.namespaces.incoming_messages
|
Count |
IncomingMessages. The total number of events or messages sent to Event Hubs over a specified period. |
azure.eventhubs.namespaces.incoming_requests
|
Count |
IncomingRequests. The total number of requests made to the Event Hubs service over a specified period. This metric includes all the data and management plane operations. |
azure.eventhubs.namespaces.namespace_cpu_usage
|
Percent (%) |
NamespaceCpuUsage. The maximum namespace CPU usage. |
azure.eventhubs.namespaces.namespace_memory_usage
|
Percent (%) |
NamespaceMemoryUsage. The maximum namespace memory usage. |
azure.eventhubs.namespaces.outgoing_bytes
|
bytes |
OutgoingBytes. The number of outgoing bytes for an event hub during the specified period. |
azure.eventhubs.namespaces.outgoing_messages
|
Count |
OutgoingMessages. The total number of events or messages received from Event Hubs over a specified period. |
azure.eventhubs.namespaces.quota_exceeded_errors
|
Count |
QuotaExceededErrors. The total number of errors caused by exceeding quotas over a specified period. |
azure.eventhubs.namespaces.server_errors
|
Count |
ServerErrors. The total number of requests not processed because of an error in the Event Hubs service over a specified period. |
azure.eventhubs.namespaces.size
|
bytes |
Size. The average size of an event hub. |
azure.eventhubs.namespaces.successful_requests
|
Count |
SuccessfulRequests. The total number of successful requests made to the Event Hubs service over a specified period. |
azure.eventhubs.namespaces.throttled_requests
|
Count |
ThrottledRequests. The total number of requests that were throttled because the usage was exceeded. |
azure.eventhubs.namespaces.user_errors
|
Count |
UserErrors. The total number of requests not processed because of user errors over a specified period. |
ExpressRoute Gateway
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health state. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of ExpressRoute Gateway entities in the Metrics Explorer, filter the |
azure.expressroutegateway.ErGatewayConnectionBitsInPerSecond
|
bits per second | Bits per second ingressing Azure via ExpressRoute Gateway which can be further split for specific connections |
azure.expressroutegateway.ErGatewayConnectionBitsOutPerSecond
|
bits per second | Bits per second egressing Azure via ExpressRoute Gateway which can be further split for specific connections |
azure.expressroutegateway.ExpressRouteGatewayActiveFlows
|
Count | Number of Active Flows on ExpressRoute Gateway |
azure.expressroutegateway.ExpressRouteGatewayBitsPerSecond
|
bits per second | Total Bits received on ExpressRoute Gateway per second |
azure.expressroutegateway.ExpressRouteGatewayCountOfRoutesAdvertisedToPeer
|
Count | Count Of Routes Advertised To Peer by ExpressRoute Gateway |
azure.expressroutegateway.ExpressRouteGatewayCountOfRoutesLearnedFromPeer
|
Count | Count Of Routes Learned From Peer by ExpressRoute Gateway |
azure.expressroutegateway.ExpressRouteGatewayCpuUtilization
|
Percent (%) | CPU Utilization of the ExpressRoute Gateway |
azure.expressroutegateway.ExpressRouteGatewayFrequencyOfRoutesChanged
|
Count | Frequency of Routes change in ExpressRoute Gateway |
azure.expressroutegateway.ExpressRouteGatewayMaxFlowsCreationRate
|
Count per second | Maximum Number of Flows Created Per Second on ExpressRoute Gateway |
azure.expressroutegateway.ExpressRouteGatewayNumberOfVmInVnet
|
Count | Number of VMs in the Virtual Network |
azure.expressroutegateway.ExpressRouteGatewayPacketsPerSecond
|
Count per second | Total Packets received on ExpressRoute Gateway per second |
Files
Metric | Units | Description |
---|---|---|
azure.storage.files.availability
|
Percent (%) |
Availability. The average percentage of availability for the storage service or the specified API operation. Availability is calculated by taking the total billable requests value and dividing it by the number of applicable requests, including those requests that produced unexpected errors. |
azure.storage.files.egress
|
bytes |
Egress. The total amount of egress data. This number includes egress from an external client into Azure Storage as well as egress within Azure. |
azure.storage.files.file_capacity
|
bytes |
FileCapacity. The average amount of file storage used by the storage account. |
azure.storage.files.file_count
|
Count |
FileCount. The average number of files in the storage account. |
azure.storage.files.fileshare_count
|
Count |
FileShareCount. The average number of file shares in the storage account. |
azure.storage.files.fileshare_quota
|
bytes |
FileShareQuota. The average upper limit on the amount of storage that can be used by Azure Files service in bytes. |
azure.storage.files.fileshare_snapshotcount
|
Count |
FileShareSnapshotCount. The average number of snapshots present on the share in the storage account's Azure Files service. |
azure.storage.files.fileshare_snapshotsize
|
bytes |
FileShareSnapshotSize. The average amount of storage used by the snapshots in the storage account's Azure Files service. |
azure.storage.files.ingress
|
bytes |
Ingress. The total amount of ingress data. This number includes ingress from an external client into Azure Storage as well as ingress within Azure. |
azure.storage.files.success.e2e_latency
|
milliseconds (ms) |
SuccessE2ELatency. The average end-to-end latency of successful requests made to a storage service or the specified API operation. This value includes the required processing time within Azure Storage to read the request, send the response, and receive acknowledgment of the response. |
azure.storage.files.success.server_latency
|
milliseconds (ms) |
SuccessServerLatency. The average time used to process a successful request by Azure Storage. This value does not include the network latency specified in SuccessE2ELatency. |
azure.storage.files.transactions
|
Count |
Transactions. The total number of requests made to a storage service or the specified API operation. This number includes successful and failed requests, as well as requests that produced errors. |
Front Door
Metric | Units | Description |
---|---|---|
azure.frontdoor.backend_health_percentage
|
Percent (%) |
BackendHealthPercentage. The average percentage of successful health probes from AFD to origin. |
azure.frontdoor.backend_request_count
|
Count |
BackendRequestCount. The total number of requests sent from AFD to origin. |
azure.frontdoor.backend_request_latency
|
milliseconds (ms) |
BackendRequestLatency. The average time calculated from when the request was sent by AFD edge to the backend until AFD received the last response byte from the backend. |
azure.frontdoor.billable_response_size
|
bytes |
BillableResponseSize. The total number of billable bytes (minimum 2KB per request) sent as responses from HTTP/S proxy to clients. |
azure.frontdoor.request_count
|
Count |
RequestCount. The total number of client requests served by CDN. |
azure.frontdoor.request_size
|
bytes |
RequestSize. The total number of bytes sent as requests from clients to AFD. |
azure.frontdoor.response_size
|
bytes |
ResponseSize. The total number of bytes sent as responses from Front Door to clients. |
azure.frontdoor.total_latency
|
milliseconds (ms) |
TotalLatency. The average time from the client request being received by CDN until the last response byte is sent from CDN to the client. |
azure.frontdoor.web_application_firewall_request_count
|
Count |
WebApplicationFirewallRequestCount. The total number of matched WAF requests. |
Functions
Metric | Units | Description |
---|---|---|
azure.sites.app_connections
|
Represents the number of active connections established by an application. This metric helps monitor app connectivity and resource usage. |
|
azure.sites.app_domains
|
Count |
Total App Domains. The average number of app domains loaded in the application. |
azure.sites.app_domains.unloaded
|
Count |
Total App Domains Unloaded. The average number of application domains unloaded. |
azure.sites.collections.gen1
|
Count |
The number of garbage collection events for Generation 1 objects in an Azure Functions instance. This metric helps assess memory management efficiency. |
azure.sites.collections.gen2
|
Count |
The number of garbage collection events for Generation 2 objects in an Azure Functions instance. Higher generation garbage collections include all lower generation collections. |
azure.sites.collections.gen3
|
Count |
The number of garbage collection events for Generation 3 objects in an Azure Functions instance. |
azure.sites.current_assemblies
|
Count |
The number of assemblies currently loaded across all application domains in an Azure Functions instance. This metric helps track application dependencies and runtime behavior. |
azure.sites.function_executions
|
Count |
Function Execution Count. The total number of times a function app has executed. This value correlates to the number of times a function runs in an app. |
azure.sites.function_executions.unit
|
Count |
Function Execution Units. The number of function execution units. |
azure.sites.handles
|
Count |
Tracks the number of open file handles in an Azure Functions environment. This metric helps monitor resource usage and potential file access issues. |
azure.sites.http.101
|
Count |
Tracks HTTP 101 responses which indicate protocol switching (for example, upgrading from HTTP to WebSockets). |
azure.sites.http.2xx
|
Count |
HTTP 2xx responses which indicate successful requests. These status codes confirm that the server successfully processed the request. |
azure.sites.http.3xx
|
Count |
HTTP 3xx responses which indicate redirection. These status codes signal that the requested resource has moved to a different location. |
azure.sites.http.401
|
Count |
HTTP 401 responses which indicate unauthorized access. This occurs when authentication credentials are missing or invalid. |
azure.sites.http.403
|
Count |
HTTP 403 responses which indicate forbidden access. This happens when a request is denied due to insufficient permissions or security restrictions. |
azure.sites.http.404
|
Count |
HTTP 404 responses which indicate that the requested resource was not found. This can occur when a URL is incorrect or the resource has been removed. |
azure.sites.http.406
|
Count |
HTTP 406 responses which indicate that the requested format is not acceptable. This happens when the server cannot provide content in the format specified by the request. |
azure.sites.http.4xx
|
Count |
HTTP 4xx responses which indicate client-side errors. These errors typically occur due to incorrect requests, authentication failures, or missing resources. |
azure.sites.http.5xx
|
Count |
HTTP 5xx. The total number of requests with a status code greater than or equal to 500 but less than 600. |
azure.sites.io.bytes_received
|
bytes |
Bytes Received. The number of incoming data bytes. |
azure.sites.io.bytes_sent
|
bytes |
Bytes Sent. The number of outgoing data bytes. |
azure.sites.io.other_bytes
|
bps |
IO Other Bytes Per Second |
azure.sites.io.other_ops
|
Count per second |
IO Other Operations Per Second |
azure.sites.io.read_bytes
|
bps |
IO Read Bytes Per Second. The number of bytes per second the app is reading from I/O operations. |
azure.sites.io.read_ops
|
Count per second |
IO Read Operations Per Second. The number of read I/O operations per second the app is issuing. |
azure.sites.io.write_bytes
|
bps |
IO Write Bytes Per Second. The number of bytes per second the app is writing to I/O operations. |
azure.sites.io.write_ops
|
Count per second |
IO Write Operations Per Second. The number of write I/O operations per second the app is issuing. |
azure.sites.memory.working_set
|
bytes |
Memory Working Set. The average amount of memory used by the app. |
azure.sites.memory.working_set.avg
|
bytes |
Average Memory Working Set. The average amount of memory used by the app. |
azure.sites.private_bytes
|
bytes |
Private Bytes. The average number of private bytes allocated to the app. |
azure.sites.queued_requests
|
Count |
Requests In Application Queue. The average number of requests in the application queue. |
azure.sites.requests
|
Count |
Requests. The total number of requests. |
azure.sites.response_time
|
seconds (s) |
Average Response Time. The average time taken for the app to serve requests. |
azure.sites.threads
|
Count |
The number of active threads in an Azure Functions instance. |
Key Vault
Metric | Units | Description |
---|---|---|
azure.key_vault.service_api.hit
|
Count |
Service API Hit. The total number of service API hits. |
azure.key_vault.service_api.latency
|
milliseconds (ms) |
Service API Latency. The average latency of service API requests. |
azure.key_vault.service_api.result
|
Count |
Service API Result. The total number of service API results. |
Load Balancer
Name | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Load Balancer entities in the Metrics Explorer, filter the |
azure.lb.vip_availability
|
Count | Average Load Balancer data path availability per time duration. |
azure.lb.dip_availability
|
Count | Average Load Balancer health probe status per time duration. |
azure.lb.bytes
|
bytes | Total number of Bytes transmitted within time period. |
azure.lb.packets
|
Count | Total number of Packets transmitted within time period. |
azure.lb.syns
|
Count | Total number of SYN Packets transmitted within time period. |
azure.lb.snat_connections
|
Count | Total number of new SNAT connections created within time period. |
azure.lb.allocated_snat_ports
|
Count | Total number of SNAT ports allocated within time period. |
azure.lb.used_snat_ports
|
Count | Total number of SNAT ports used within time period. |
Logic Apps
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Logic Apps entities in the Metrics Explorer, filter the |
azure.logic.workflow.ActionLatency
|
seconds (s) | Latency of completed workflow actions. |
azure.logic.workflow.ActionsCompleted
|
Count | Number of workflow actions completed. |
azure.logic.workflow.ActionsFailed
|
Count | Number of workflow actions failed. |
azure.logic.workflow.ActionsSkipped
|
Count | Number of workflow actions skipped. |
azure.logic.workflow.ActionsStarted
|
Count | Number of workflow actions started. |
azure.logic.workflow.ActionsSucceeded
|
Count | Number of workflow actions succeeded. |
azure.logic.workflow.ActionSuccessLatency
|
seconds (s) | Latency of succeeded workflow actions. |
azure.logic.workflow.ActionThrottledEvents
|
Count | Number of workflow action throttled events.. |
azure.logic.workflow.BillableActionExecutions
|
Count | Number of workflow action executions getting billed. |
azure.logic.workflow.BillableTriggerExecutions
|
Count | Number of workflow trigger executions getting billed. |
azure.logic.workflow.BillingUsageNativeOperation
|
Count | Number of native operation executions getting billed. |
azure.logic.workflow.BillingUsageStandardConnector
|
Count | Number of standard connector executions getting billed. |
azure.logic.workflow.BillingUsageStorageConsumption
|
Count | Number of storage consumption executions getting billed. |
azure.logic.workflow.RunFailurePercentage
|
Percent (%) | Percentage of workflow runs failed. |
azure.logic.workflow.RunLatency
|
seconds (s) | Latency of completed workflow runs. |
azure.logic.workflow.RunsCancelled
|
Count | Number of workflow runs cancelled. |
azure.logic.workflow.RunsCompleted
|
Count | Number of workflow runs completed. |
azure.logic.workflow.RunsFailed
|
Count | Number of workflow runs failed. |
azure.logic.workflow.RunsStarted
|
Count | Number of workflow runs started. |
azure.logic.workflow.RunsSucceeded
|
Count | Number of workflow runs succeeded. |
azure.logic.workflow.RunStartThrottledEvents
|
Count | Number of workflow run start throttled events. |
azure.logic.workflow.RunSuccessLatency
|
seconds | Latency of succeeded workflow runs. |
azure.logic.workflow.RunThrottledEvents
|
Count | Number of workflow action or trigger throttled events. |
azure.logic.workflow.TotalBillableExecutions
|
Count | Number of workflow executions getting billed. |
azure.logic.workflow.TriggerFireLatency
|
seconds (s) | Latency of fired workflow triggers. |
azure.logic.workflow.TriggerLatency
|
seconds (s) | Latency of completed workflow triggers. |
azure.logic.workflow.TriggersCompleted
|
Count | Number of workflow triggers completed. |
azure.logic.workflow.TriggersFailed
|
Count | Number of workflow triggers failed. |
azure.logic.workflow.TriggersFired
|
Count | Number of workflow triggers fired. |
azure.logic.workflow.TriggersSkipped
|
Count | Number of workflow triggers skipped. |
azure.logic.workflow.TriggersStarted
|
Count | Number of workflow triggers started. |
azure.logic.workflow.TriggersSucceeded
|
Count | Number of workflow triggers succeeded. |
azure.logic.workflow.TriggerSuccessLatency
|
seconds (s) | Latency of succeeded workflow triggers. |
azure.logic.workflow.TriggerThrottledEvents
|
Count | Number of workflow trigger throttled events. |
MySQL Flexible Server
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of MySQL Flexible Server entities in the Metrics Explorer, filter the |
azure.mysql.flexible.aborted_connections
|
Count | Aborted Connections. |
azure.mysql.flexible.active_connections
|
Count | Active Connections. |
azure.mysql.flexible.available_memory_bytes
|
bytes | Amount of physical memory, in bytes. |
azure.mysql.flexible.cpu_percent
|
Percent (%) | Host CPU Percent. |
azure.mysql.flexible.backup_storage_used
|
bytes | Backup Storage Used. |
azure.mysql.flexible.binlog_storage_used
|
bytes | Storage used by Binlog files. |
azure.mysql.flexible.memory_percent
|
Percent (%) | Host Memory Percent. |
azure.mysql.flexible.network_bytes_egress
|
bytes | Host Network egress in bytes. |
azure.mysql.flexible.network_bytes_ingress
|
bytes | Host Network ingress in bytes. |
azure.mysql.flexible.Queries
|
Count | Number of queries. |
azure.mysql.flexible.Slow_queries
|
Count | The number of queries that have taken more than long_query_time seconds. |
azure.mysql.flexible.replication_lag
|
seconds (s) | Replication lag in seconds. |
azure.mysql.flexible.storage_io_count
|
Count | The number of storage I/O consumed. |
azure.mysql.flexible.storage_limit
|
bytes | Storage Limit. |
azure.mysql.flexible.storage_used
|
bytes | Storage Used. |
azure.mysql.flexible.total_connections
|
Count | Total Connections. |
azure.mysql.flexible.storage_percent
|
Percent (%) | Percentage of storage. |
azure.mysql.flexible.Threads_running
|
Count | The number of threads that are not sleeping. |
azure.mysql.flexible.Com_alter_table
|
Count | The number of times ALTER TABLE statement has been executed. |
azure.mysql.flexible.Com_create_db
|
Count | The number of times CREATE DB statement has been executed. |
azure.mysql.flexible.Com_create_table
|
Count | The number of times CREATE TABLE statement has been executed. |
azure.mysql.flexible.Com_delete
|
Count | The number of times DELETE statement has been executed. |
azure.mysql.flexible.Com_drop_db
|
Count | The number of times DROP DB statement has been executed. |
azure.mysql.flexible.Com_drop_table
|
Count | The number of times DROP TABLE statement has been executed. |
azure.mysql.flexible.Com_insert
|
Count | The number of times INSERT statement has been executed. |
azure.mysql.flexible.Com_select
|
Count | The number of times SELECT statement has been executed. |
azure.mysql.flexible.Com_update
|
Count | The number of times UPDATE statement has been executed. |
azure.mysql.flexible.cpu_credits_consumed
|
Count | CPU Credits Consumed. |
azure.mysql.flexible.cpu_credits_remaining
|
Count | CPU Credits Remaining. |
azure.mysql.flexible.data_storage_used
|
bytes | Storage used by data files. |
azure.mysql.flexible.HA_IO_status
|
Count | Status for replication IO thread running. |
azure.mysql.flexible.HA_replication_lag
|
seconds (s) | HA Replication lag in seconds. |
azure.mysql.flexible.HA_SQL_status
|
Count | Status for replication SQL thread running. |
azure.mysql.flexible.ibdata1_storage_used
|
bytes | Storage used by ibdata1 files. |
azure.mysql.flexible.Innodb_buffer_pool_pages_data
|
Count | The number of pages in the InnoDB buffer pool containing data. |
azure.mysql.flexible.Innodb_buffer_pool_pages_dirty
|
Count | The current number of dirty pages in the InnoDB buffer pool. |
azure.mysql.flexible.Innodb_buffer_pool_pages_flushed
|
Count | The number of requests to flush pages from the InnoDB buffer pool. |
azure.mysql.flexible.Innodb_buffer_pool_pages_free
|
Count | The number of free pages in the InnoDB buffer pool. |
azure.mysql.flexible.Innodb_buffer_pool_read_requests
|
Count | The number of logical read requests. |
azure.mysql.flexible.Innodb_buffer_pool_reads
|
Count | The number of logical reads that InnoDB could not satisfy from the buffer pool, and had to read directly from disk. |
azure.mysql.flexible.Innodb_data_writes
|
Count | The total number of data writes. |
azure.mysql.flexible.Innodb_row_lock_time
|
milliseconds (ms) | The total time spent in acquiring row locks for InnoDB tables, in milliseconds. |
azure.mysql.flexible.Innodb_row_lock_waits
|
Count | The number of times operations on InnoDB tables had to wait for a row lock. |
azure.mysql.flexible.io_consumption_percent
|
Percent (%) | Storage I/O consumption percent. |
azure.mysql.flexible.others_storage_used
|
bytes | Storage used by other files. |
azure.mysql.flexible.Replica_IO_Running
|
Count | Status for replication IO thread running. |
azure.mysql.flexible.Replica_SQL_Running
|
Count | Status for replication SQL thread running. |
azure.mysql.flexible.serverlog_storage_limit
|
bytes | Serverlog Storage Limit. |
azure.mysql.flexible.serverlog_storage_percent
|
Percent (%) | Serverlog Storage Percent. |
azure.mysql.flexible.serverlog_storage_usage
|
bytes | Serverlog Storage Used. |
azure.mysql.flexible.storage_throttle_count
|
Count | Storage IO requests throttled in the selected time range. Deprecated, please check Storage IO Percent for throttling. |
NAT Gateway
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of NAT Gateway entities in the Metrics Explorer, filter the |
azure.natgateway.ByteCount
|
bytes | Total number of bytes transmitted within time period. |
azure.natgateway.DatapathAvailability
|
Count | NAT Gateway Datapath Availability. |
azure.natgateway.PacketCount
|
Count | Total number of Packets transmitted within time period. |
azure.natgateway.PacketDropCount
|
Count | Count of dropped packets. |
azure.natgateway.SNATConnectionCount
|
Count | Total concurrent active connections. |
azure.natgateway.TotalConnectionCount
|
Count | Total number of active SNAT connections. |
OpenAI Service
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of OpenAI Service entities in the Metrics Explorer, filter the |
azure.openai.AzureOpenAIProvisionedManagedUtilizationV2
|
Percent (%) | The utilization of provisioned managed throughput in Azure OpenAI. This metric helps track the efficiency of allocated processing capacity for AI workloads. |
azure.openai.AzureOpenAITimeToResponse
|
milliseconds (ms) | The time taken for Azure OpenAI to generate a response after receiving a request. This metric is useful for monitoring latency and performance. |
azure.openai.TotalEvents
|
Count | The total number of events processed by Azure OpenAI, including requests, completions, and other interactions. |
azure.openai.AzureOpenAIRequests
|
Count | The total number of requests sent to Azure OpenAI, helping monitor usage and workload demand. |
azure.openai.ActiveTokens
|
Count | The number of active tokens being processed in Azure OpenAI, which can indicate the complexity and scale of ongoing operations. |
azure.openai.ProcessedPromptTokens
|
Count | The number of prompt tokens processed by Azure OpenAI, helping assess input complexity and resource consumption. |
azure.openai.TokenTransaction
|
Count | Represents the number of token transactions processed by Azure OpenAI, tracking usage and billing-related metrics. |
azure.openai.GeneratedTokens
|
Count | The total number of tokens generated by Azure OpenAI models in response to user queries. |
azure.openai.FineTunedTrainingHours
|
Count | The number of hours spent fine-tuning models in Azure OpenAI, helping monitor resource consumption and optimization. |
azure.openai.ClientErrors
|
Count | Errors caused by incorrect or invalid requests from users, such as authentication failures or malformed API calls. |
azure.openai.ServerErrors
|
Count | Errors occurring on the server side, such as internal failures or service outages. |
azure.openai.AvailabilityRate
|
Percent (%) | The percentage of time Azure OpenAI services are available and operational, helping assess reliability and uptime. |
PostgreSQL Flexible Server
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of PostgreSQL Flexible Server entities in the Metrics Explorer, filter the |
azure.postgresql.flexible.active_connections
|
Count | Active Connections. |
azure.postgresql.flexible.backup_storage_used
|
bytes | Backup Storage Used. |
azure.postgresql.flexible.client_connections_active
|
Count | Connections from clients which are associated with a PostgreSQL connection. |
azure.postgresql.flexible.connections_failed
|
Count | Failed Connections. |
azure.postgresql.flexible.connections_succeeded
|
Count | Succeeded Connections. |
azure.postgresql.flexible.cpu_percent
|
Percent (%) | CPU percent. |
azure.postgresql.flexible.cpu_credits_consumed
|
Count | Total number of credits consumed by the database server. |
azure.postgresql.flexible.disk_queue_depth
|
Count | Number of outstanding I/O operations to the data disk. |
azure.postgresql.flexible.iops
|
Count | IO Operations per second. |
azure.postgresql.flexible.is_db_alive
|
Count | Indicates if the database is up or not. |
azure.postgresql.flexible.memory_percent
|
Percent (%) | Memory percent. |
azure.postgresql.flexible.maximum_used_transactionIDs
|
Count | Maximum Used Transaction IDs. |
azure.postgresql.flexible.network_bytes_egress
|
bytes | Network Out across active connections. |
azure.postgresql.flexible.network_bytes_ingress
|
bytes | Network In across active connections. |
azure.postgresql.flexible.read_iops
|
Count | Number of data disk I/O read operations per second. |
azure.postgresql.flexible.read_throughput
|
Count | bytes read per second from the data disk during monitoring period. |
azure.postgresql.flexible.server_connections_active
|
Count | Connections to PostgreSQL that are in use by a client connection. |
azure.postgresql.flexible.storage_free
|
bytes | Storage Free. |
azure.postgresql.flexible.storage_percent
|
Percent (%) | Storage percent. |
azure.postgresql.flexible.storage_used
|
bytes | Storage used. |
azure.postgresql.flexible.write_iops
|
Count | Number of data disk I/O write operations per second. |
azure.postgresql.flexible.write_throughput
|
Count | bytes written per second to the data disk during monitoring period. |
azure.postgresql.flexible.xact_total
|
Count | Number of total transactions executed in this database. |
azure.postgresql.flexible.analyze_count_user_tables
|
Count | Number of times user only tables have been manually analyzed in this database. |
azure.postgresql.flexible.autoanalyze_count_user_tables
|
Count | Number of times user only tables have been analyzed by the autovacuum daemon in this database. |
azure.postgresql.flexible.autovacuum_count_user_tables
|
Count | Number of times user only tables have been vacuumed by the autovacuum daemon in this database. |
azure.postgresql.flexible.blks_hit
|
Count | Number of times disk blocks were found already in the buffer cache, so that a read was not necessary. |
azure.postgresql.flexible.blks_read
|
Count | Number of disk blocks read in this database. |
azure.postgresql.flexible.bloat_percent
|
Percent (%) | Estimated bloat percentage for user only tables in this database. |
azure.postgresql.flexible.client_connections_waiting
|
Count | Connections from clients that are waiting for a PostgreSQL connection to service them. |
azure.postgresql.flexible.cpu_credits_remaining
|
Count | Total number of credits available to burst. |
azure.postgresql.flexible.deadlocks
|
Count | Number of deadlocks detected in this database. |
azure.postgresql.flexible.disk_bandwidth_consumed_percentage
|
Percent (%) | Percentage of disk bandwidth consumed per minute. |
azure.postgresql.flexible.disk_iops_consumed_percentage
|
Percent (%) | Percentage of disk I/Os consumed per minute. |
azure.postgresql.flexible.logical_replication_delay_in_bytes
|
bytes | Maximum lag across all logical replication slots. |
azure.postgresql.flexible.longest_query_time_sec
|
seconds (s) | The age in seconds of the longest query that is currently running. |
azure.postgresql.flexible.longest_transaction_time_sec
|
seconds (s) | The age in seconds of the longest transaction (including idle transactions). |
azure.postgresql.flexible.max_connections
|
Count | Max connections. |
azure.postgresql.flexible.n_dead_tup_user_tables
|
Count | Estimated number of dead rows for user only tables in this database. |
azure.postgresql.flexible.n_live_tup_user_tables
|
Count | Estimated number of live rows for user only tables in this database. |
azure.postgresql.flexible.n_mod_since_analyze_user_tables
|
Count | Estimated number of rows modified since user only tables were last analyzed. |
azure.postgresql.flexible.num_pools
|
Count | Total number of connection pools. |
azure.postgresql.flexible.numbackends
|
Count | Number of backends connected to this database. |
azure.postgresql.flexible.oldest_backend_time_sec
|
seconds (s) | The age in seconds of the oldest backend (irrespective of the state). |
azure.postgresql.flexible.oldest_backend_xmin
|
Count | The actual value of the oldest xmin. |
azure.postgresql.flexible.oldest_backend_xmin_age
|
Count | Age in units of the oldest xmin. It indicated how many transactions passed since oldest xmin. |
azure.postgresql.flexible.physical_replication_delay_in_bytes
|
bytes | Maximum lag across all asynchronous physical replication slots. |
azure.postgresql.flexible.physical_replication_delay_in_seconds
|
seconds (s) | Read Replica lag in seconds. |
azure.postgresql.flexible.server_connections_idle
|
Count | Connections to PostgreSQL that are idle, ready to service a new client connection. |
azure.postgresql.flexible.sessions_by_state
|
Count | Overall state of the backends. |
azure.postgresql.flexible.sessions_by_wait_event_type
|
Count | Sessions by the type of event for which the backend is waiting. |
azure.postgresql.flexible.tables_analyzed_user_tables
|
Count | Number of user only tables that have been analyzed in this database. |
azure.postgresql.flexible.tables_autoanalyzed_user_tables
|
Count | Number of user only tables that have been analyzed by the autovacuum daemon in this database. |
azure.postgresql.flexible.tables_autovacuumed_user_tables
|
Count | Number of user only tables that have been vacuumed by the autovacuum daemon in this database. |
azure.postgresql.flexible.tables_counter_user_tables
|
Count | Number of user only tables in this database. |
azure.postgresql.flexible.tables_vacuumed_user_tables
|
Count | Number of user only tables that have been vacuumed in this database. |
azure.postgresql.flexible.temp_bytes
|
bytes | Total amount of data written to temporary files by queries in this database. |
azure.postgresql.flexible.temp_files
|
Count | Number of temporary files created by queries in this database. |
azure.postgresql.flexible.total_pooled_connections
|
Count | Current number of pooled connections. |
azure.postgresql.flexible.tps
|
Count | Number of transactions executed within a second. |
azure.postgresql.flexible.tup_deleted
|
Count | Number of rows deleted by queries in this database. |
azure.postgresql.flexible.tup_fetched
|
Count | Number of rows fetched by queries in this database. |
azure.postgresql.flexible.tup_inserted
|
Count | Number of rows inserted by queries in this database. |
azure.postgresql.flexible.tup_returned
|
Count | Number of rows returned by queries in this database. |
azure.postgresql.flexible.tup_updated
|
Count | Number of rows updated by queries in this database. |
azure.postgresql.flexible.txlogs_storage_used
|
bytes | Transaction Log Storage Used. |
azure.postgresql.flexible.vacuum_count_user_tables
|
Count | Number of times user only tables have been manually vacuumed in this database (not counting VACUUM FULL). |
azure.postgresql.flexible.xact_commit
|
Count | Number of transactions in this database that have been committed. |
azure.postgresql.flexible.xact_rollback
|
Count | Number of transactions in this database that have been rolled back. |
Recovery Services
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Recovery Services entities in the Metrics Explorer, filter the |
azure.recovery.service.backup_health_event
|
Count | The count of health events related to backup job health within a specific time frame. When a backup job completes, Azure Backup generates a backup health event, with dimensions varying based on the job status (for example, succeeded or failed). |
azure.recovery.service.restore_health_event
|
Count | The count of health events related to restore job health within a specific time frame. When a restore job completes, Azure Backup generates a restore health event, with dimensions varying based on the job status (for example, succeeded or failed). |
Service Bus
Metric | Units | Description |
---|---|---|
azure.servicebus.namespaces.abandon_message
|
Count |
AbandonMessage. The total number of messages abandoned over a specified period. |
azure.servicebus.namespaces.active_connections
|
Count |
ActiveConnections. The total number of active connections on a namespace and on an entity in the namespace. The value for this metric is a point-in-time value. Connections that were active immediately after that point in time may not be reflected in the metric. |
azure.servicebus.namespaces.active_messages
|
Count |
ActiveMessages. The average number of active messages in a queue/topic. |
azure.servicebus.namespaces.complete_message
|
Count |
CompleteMessage. The total number of messages completed over a specified period. |
azure.servicebus.namespaces.connections_closed
|
Count |
ConnectionsClosed. The average number of connections closed. The value for this metric is an aggregation and includes all connections that were opened in the aggregation time window. |
azure.servicebus.namespaces.connections_opened
|
Count |
ConnectionsOpened. The average number of connections opened. The value for this metric is an aggregation and includes all connections that were opened in the aggregation time window. |
azure.servicebus.namespaces.deadlettered_messages
|
Count |
DeadletteredMessages. The average number of dead-lettered messages in a queue/topic. |
azure.servicebus.namespaces.incoming_messages
|
Count |
IncomingMessages. The total number of events or messages sent to Service Bus over a specified period. For basic and standard tiers, incoming auto-forwarded messages are included in this metric. For the premium tier, they aren't included. |
azure.servicebus.namespaces.incoming_requests
|
Count |
IncomingRequests. The total number of requests made to the Service Bus service over a specified period. |
azure.servicebus.namespaces.messages
|
Count |
Messages. The average number of messages in a queue/topic. |
azure.servicebus.namespaces.namespace_cpu_usage
|
Percent (%) |
The percentage of CPU used by premium namespaces. |
azure.servicebus.namespaces.outgoing_messages
|
Count |
OutgoingMessages. The total number of events or messages received from Service Bus over a specified period. The outgoing auto-forwarded messages aren't included in this metric. |
azure.servicebus.namespaces.pending_checkpoint_operation_count
|
Count |
PendingCheckpointOperationCount. The average number of pending checkpoint operations on the namespace. Service starts to throttle when the pending checkpoint count exceeds limit of (500,000 + (500,000 * messaging units)) operations. This metric applies only to namespaces using the premium tier. |
azure.servicebus.namespaces.scheduled_messages
|
Count |
ScheduledMessages. The average number of scheduled messages in a queue/topic. |
azure.servicebus.namespaces.server_errors
|
Count |
ServerErrors. The total number of requests not processed because of an error in the Service Bus service over a specified period. |
azure.servicebus.namespaces.server_send_latency
|
milliseconds (ms) |
ServerSendLatency. The average time taken by the Service Bus service to complete the request. |
azure.servicebus.namespaces.size
|
bytes |
Size. The average size of an entity (queue or topic) in bytes. |
azure.servicebus.namespaces.successful_requests
|
Count |
SuccessfulRequests. The total number of successful requests made to the Service Bus service over a specified period. |
azure.servicebus.namespaces.throttled_requests
|
Count |
ThrottledRequests. The total number of requests that were throttled because the usage was exceeded. |
azure.servicebus.namespaces.user_errors
|
Count |
UserErrors. The total number of requests not processed because of user errors over a specified period. |
SQL Database
Metric | Units | Description |
---|---|---|
azure.sql.servers.databases.allocated_data_storage
|
bytes |
The amount of formatted file space allocated for storing database data. This space grows automatically but does not decrease after data deletions, ensuring faster future inserts. |
azure.sql.servers.databases.blocked_by_firewall
|
Count |
The number of connection attempts that were blocked due to firewall rules in Azure SQL Database. This helps monitor access control and security settings. |
azure.sql.servers.databases.connection_failed
|
Count |
Failed Connections. The total number of connections that failed. |
azure.sql.servers.databases.connection_successful
|
Count |
Successful Connections. The total number of successful connections. |
azure.sql.servers.databases.cpu_percent
|
Percent (%) |
CPU Utilization. The average percentage of CPU used. |
azure.sql.servers.databases.deadlock
|
Count |
Deadlocks. The total number of deadlocks. |
azure.sql.servers.databases.dtu_consumption_percent
|
Percent (%) |
The percentage of Database Transaction Units (DTUs) consumed relative to the allocated DTU limit. DTUs represent a blend of CPU, memory, reads, and writes, helping gauge database performance. |
azure.sql.servers.databases.dtu_limit
|
Count |
The maximum number of DTUs allocated to a database. This limit determines the available compute, storage, and I/O resources for the database. |
azure.sql.servers.databases.dtu_used
|
Count |
The actual number of DTUs consumed by the database workload. This metric helps assess resource utilization and performance. |
azure.sql.servers.databases.log_write_percent
|
Percent (%) |
Log Write Percentage. The average log I/O percentage based on the limit of the service tier. |
azure.sql.servers.databases.physical_data_read_percent
|
Percent (%) |
Data IO Percentage. The average data I/O percentage based on the limit of the service tier. |
azure.sql.servers.databases.rateOfConnectionFailure
|
Percent (%) |
The rate of failed connection attempts to an Azure SQL Database. Connection failures can occur due to firewall rules, authentication issues, or transient network errors. |
azure.sql.servers.databases.sessions_percent
|
Percent (%) |
Sessions Percentage. The average percentage of concurrent sessions based on the limit of the service tier. |
azure.sql.servers.databases.storage
|
bytes |
Data Space Used. The total amount of space used to store data. |
azure.sql.servers.databases.storage_percent
|
Percent (%) |
Storage Utilization. The average percentage of spaced used to store data based on the limit of the service tier. |
azure.sql.servers.databases.workers_percent
|
Percent (%) |
The percentage of available worker threads being utilized in an Azure SQL Database. High worker utilization can indicate performance bottlenecks, excessive concurrent queries, or inefficient query execution. |
azure.sql.servers.databases.xtp_storage_percent
|
Percent (%) |
The percentage of In-Memory OLTP storage used in an Azure SQL Database. This metric is relevant for databases using memory-optimized tables and helps monitor available memory for in-memory processing. |
Traffic Manager
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Traffic Manager entities in the Metrics Explorer, filter the |
azure.trafficmanager.ProbeAgentCurrentEndpointStateByProfileResourceId
|
Count | Status of the endpoint’s probe. 1 indicates the probe is enabled, 0 indicates the probe is disabled. |
azure.trafficmanager.QpsByEndpoint
|
Count | Number of times a Traffic Manager endpoint was returned. |
Virtual Machines
Metric | Units | Description |
---|---|---|
azure.vm.cpu.credits_consumed
|
credits |
Total number of credits consumed by the Virtual Machine |
azure.vm.cpu.credits_remaining
|
credits |
Total number of credits available to burst |
azure.vm.cpu.percentage
|
Percent (%) |
The percentage of allocated compute units that are currently in use by the Virtual Machine(s) |
azure.vm.disk.cache.data.read_hit
|
Percent (%) |
The number of successful read operations from the data disk cache. A higher value indicates efficient caching performance. |
azure.vm.disk.cache.data.read_miss
|
Percent (%) |
The number of read operations that were not found in the data disk cache, requiring retrieval from the underlying storage. |
azure.vm.disk.cache.os.read_hit
|
Percent (%) |
The number of successful read operations from the OS disk cache, improving performance by reducing direct disk access. |
azure.vm.disk.cache.os.read_miss
|
Percent (%) |
The number of read operations that were not found in the OS disk cache, leading to additional disk access. |
azure.vm.disk.data.bandwidth.consumed.percentage
|
Percent (%) |
The percentage of allocated bandwidth consumed by data disk operations. A high percentage may suggest bandwidth saturation. |
azure.vm.disk.data.iops.consumed.percentage
|
Percent (%) |
The percentage of allocated IOPS (Input/Output Operations Per Second) consumed by data disk operations. A high percentage may indicate performance bottlenecks. |
azure.vm.disk.data.max.burst.bandwidth
|
Count |
The maximum bandwidth that a data disk can achieve when bursting is enabled. This allows temporary performance boosts beyond the provisioned limits. |
azure.vm.disk.data.max.burst.iops
|
Count |
The maximum IOPS (Input/Output Operations Per Second) a data disk can reach during burst periods. This helps handle short-term spikes in workload demand. |
azure.vm.disk.data.queue_depth
|
Count |
The number of outstanding I/O requests waiting to be processed by the data disk. A higher queue depth may indicate disk contention or performance bottlenecks. |
azure.vm.disk.data.read_bytes
|
bps |
The total number of bytes read from the data disk over a given period. This metric helps monitor disk read performance and workload patterns. |
azure.vm.disk.data.read_ops
|
Count per second |
The total number of read operations performed on the data disk. This metric is useful for analyzing disk activity and optimizing performance. |
azure.vm.disk.data.target.bandwidth
|
Count |
The expected bandwidth allocation for a data disk based on its provisioned performance tier. This helps ensure consistent throughput for workloads. |
azure.vm.disk.data.target.iops
|
Count |
Represents the expected IOPS (Input/Output Operations Per Second) allocation for a data disk based on its provisioned performance tier. This helps ensure consistent disk performance. |
azure.vm.disk.data.used.burst.bps.credits.percentage
|
Percent (%) | The percentage of burst bandwidth credits used by a data disk. Azure premium disks allow temporary performance bursts beyond provisioned limits, and this metric helps track credit consumption. |
azure.vm.disk.data.used.burst.io.credits.percentage
|
Percent (%) | The percentage of burst IOPS credits used by a data disk. This metric helps monitor how much of the available burst capacity has been consumed. |
azure.vm.disk.data.write_bytes
|
bps |
The total number of bytes written to the data disk over a given period. This metric helps monitor disk write performance and workload patterns. |
azure.vm.disk.data.write_ops
|
Count per second |
The total number of write operations performed on the data disk. This metric is useful for analyzing disk activity and optimizing performance. |
azure.vm.disk.os.bandwidth.consumed.percentage
|
Percent (%) |
The percentage of allocated bandwidth consumed by the OS disk. A high percentage may indicate bandwidth saturation. |
azure.vm.disk.os.iops.consumed.percentage
|
Percent (%) |
The percentage of allocated IOPS (Input/Output Operations Per Second) consumed by the OS disk. A high percentage may suggest performance bottlenecks. |
azure.vm.disk.os.max.burst.bandwidth
|
Count |
The maximum bandwidth that the OS disk can achieve when bursting is enabled. This allows temporary performance boosts beyond the provisioned limits. |
azure.vm.disk.os.max.burst.iops
|
Count |
The maximum IOPS the OS disk can reach during burst periods. This helps handle short-term spikes in workload demand. |
azure.vm.disk.os.queue_depth
|
Count |
The number of outstanding I/O requests waiting to be processed by the OS disk. A higher queue depth may indicate disk contention or performance bottlenecks. |
azure.vm.disk.os.read_bytes
|
bps |
The total number of bytes read from the OS disk over a given period. This metric helps monitor disk read performance and workload patterns. |
azure.vm.disk.os.read_ops
|
Count per second |
The total number of read operations performed on the OS disk. This metric helps monitor disk activity and performance. |
azure.vm.disk.os.target.bandwidth
|
Count |
The expected bandwidth allocation for the OS disk based on its provisioned performance tier. This helps ensure consistent throughput for workloads. |
azure.vm.disk.os.target.iops
|
Count |
The expected IOPS (Input/Output Operations Per Second) allocation for the OS disk based on its provisioned performance tier. This helps maintain stable disk performance. |
azure.vm.disk.os.used.burst.bps.credits.percentage
|
Percent (%) | The percentage of burst bandwidth credits used by the OS disk. Azure premium disks allow temporary performance bursts beyond provisioned limits, and this metric helps track credit consumption. |
azure.vm.disk.os.used.burst.io.credits.percentage
|
Percent (%) | The percentage of burst IOPS credits used by the OS disk. This metric helps monitor how much of the available burst capacity has been consumed. |
azure.vm.disk.os.write_bytes
|
bps |
The total number of bytes written to the OS disk over a given period. This metric helps monitor disk write performance and workload patterns. |
azure.vm.disk.os.write_ops
|
Count per second |
The total number of write operations performed on the OS disk. This metric helps monitor disk activity and performance. |
azure.vm.disk.read_bytes
|
bytes |
Bytes read from disk during monitoring period |
azure.vm.disk.read_ops
|
Count per second |
Disk Read IOPS. |
azure.vm.disk.write_bytes
|
bytes |
Bytes written to disk during monitoring period. |
azure.vm.disk.write_ops
|
Count per second |
Disk Write IOPS. |
azure.vm.memory.available_bytes
|
bytes |
The amount of physical memory, in bytes, immediately available for allocation to a process or for system use in the virtual machine. |
azure.vm.network.in
|
bytes |
The number of billable bytes received on all network interfaces by the Virtual Machine(s) (Incoming Traffic). |
azure.vm.network.inbound_flows
|
Count |
The number of inbound network flows to the virtual machine. This metric helps monitor network traffic and connection patterns. |
azure.vm.network.inbound_flows_maximum_creation_rate
|
Count per second |
The maximum rate at which inbound network flows are created for the virtual machine. This metric helps assess network performance and connection handling. |
azure.vm.network.out
|
bytes |
The total amount of outbound network traffic from the virtual machine. This metric helps monitor bandwidth usage and network performance. |
azure.vm.network.outbound_flows
|
Count |
The number of outbound network flows from the virtual machine. This metric helps monitor outgoing connections and network activity. |
azure.vm.network.outbound_flows_maximum_creation_rate
|
Count per second |
The maximum rate at which outbound network flows are created for the virtual machine. This metric helps assess network performance and connection handling. |
azure.vm.network.total_in
|
bytes |
The total amount of inbound network traffic received by the virtual machine. This metric helps monitor bandwidth usage and network performance. |
azure.vm.network.total_out
|
bytes |
the total amount of outbound network traffic sent from the virtual machine. This metric helps monitor outgoing bandwidth usage and network efficiency. |
Virtual Machine Scale Sets
Metric | Units | Description |
---|---|---|
azure.vmss.cpu.credits_consumed
|
Count | The total number of CPU credits consumed by a Virtual Machine Scale Set (VMSS) instance. This metric is relevant for B-series burstable VMs, which use a credit-based system to manage CPU performance. |
azure.vmss.cpu.percentage
|
Percent (%) |
Percentage CPU. The percentage of allocated compute units that are currently in use by the VM(s). |
azure.vmss.cpu.credits_remaining
|
Count | The total number of CPU credits available for a VMSS instance to use for bursting. When credits run out, the VM operates at its baseline performance level. |
azure.vmss.disk.cache.data.read_hit
|
Percent (%) |
The number of successful read operations from the data disk cache. A higher value indicates efficient caching performance. |
azure.vmss.disk.cache.data.read_miss
|
Percent (%) |
The number of read operations that were not found in the data disk cache, requiring retrieval from the underlying storage. |
azure.vmss.disk.cache.os.read_hit
|
Percent (%) |
The number of successful read operations from the OS disk cache, improving performance by reducing direct disk access. |
azure.vmss.disk.cache.os.read_miss
|
Percent (%) |
The number of read operations that were not found in the OS disk cache, leading to additional disk access. |
azure.vmss.disk.data.queue_depth
|
Count |
The number of outstanding I/O requests waiting to be read from or written to the data disk in a Virtual Machine Scale Set (VMSS). A higher queue depth may indicate disk contention or performance bottlenecks. |
azure.vmss.disk.data.read_bytes
|
bps |
Data Disk Read. The average number of bytes per second read from a single disk during the monitoring period. |
azure.vmss.disk.data.read_ops
|
Count per second |
The total number of read operations performed on the data disk in a VMSS. This metric helps monitor disk activity and performance. |
azure.vmss.disk.data.write_bytes
|
bps |
Data Disk Write. The average number of bytes per second written to a single disk during the monitoring period. |
azure.vmss.disk.data.write_ops
|
Count per second |
The total number of write operations performed on the data disk in a VMSS. This metric is useful for analyzing disk activity and optimizing performance. |
azure.vmss.disk.os.queue_depth
|
Count |
The number of outstanding I/O requests waiting to be read from or written to the OS disk in a VMSS. A higher queue depth may indicate disk contention or performance bottlenecks. |
azure.vmss.disk.os.read_bytes
|
bps |
The total number of bytes read from the OS disk in a VMSS over a given period. This metric helps monitor disk read performance and workload patterns. |
azure.vmss.disk.os.read_ops
|
Count per second |
The total number of read operations performed on the OS disk in a VMSS. This metric provides insight into overall disk activity and utilization. |
azure.vmss.disk.os.write_bytes
|
bps |
Tracks the total number of bytes written to the OS disk in a Virtual Machine Scale Set (VMSS). This metric helps monitor disk write performance and workload patterns. |
azure.vmss.disk.os.write_ops
|
Count per second |
The total number of write operations performed on the OS disk in a VMSS. This metric provides insight into overall disk activity and utilization. |
azure.vmss.disk.read_bytes
|
bytes |
Disk Read. The total number of bytes read from disk during the monitoring period. |
azure.vmss.disk.read_ops
|
Count per second |
Disk Read Operations. The average number of input operations read in a second from all disks attached to the VM(s). |
azure.vmss.disk.write_bytes
|
bytes |
Disk Write. The total number of bytes written to disk during the monitoring period. |
azure.vmss.disk.write_ops
|
Count per second |
Disk Write Operations. The average number of output operations written in a second to all disks attached to the VM(s). |
azure.vmss.memory.available_bytes
|
bytes |
Available Memory Bytes. The amount of physical memory, in bytes, immediately available for allocation to a process or for system use in the VM(s). |
azure.vmss.network.inbound_flows
|
Count |
The number of inbound network flows to a VMSS instance. This metric helps monitor network traffic and connection patterns. |
azure.vmss.network.inbound_flows_maximum_creation_rate
|
Count per second |
The maximum rate at which inbound network flows are created for a VMSS instance. This metric helps assess network performance and connection handling. |
azure.vmss.network.outbound_flows
|
Count |
The number of outbound network flows from a VMSS instance. This metric helps monitor outgoing connections and network activity. |
azure.vmss.network.outbound_flows_maximum_creation_rate
|
Count per second |
The maximum rate at which outbound network flows are created for a VMSS instance. This metric helps assess network performance and connection handling. |
azure.vmss.network.total_in
|
bytes |
Network In Total. The number of bytes received on all network interfaces by the VM(s) (incoming traffic). |
azure.vmss.network.total_out
|
bytes |
Network Out Total. The number of bytes out on all network interfaces by the VM(s) (outgoing traffic). |
vNet
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Virtual Network entities in the Metrics Explorer, filter the |
azure.virtualnetwork.BytesDroppedDDoS
|
bps | The maximum inbound bytes dropped per second, DDoS. |
azure.virtualnetwork.BytesForwardedDDoS
|
bps | The maximum inbound bytes forwarded per second, DDoS. |
azure.virtualnetwork.BytesInDDoS
|
bps | The maximum inbound bytes per second, DDoS. |
azure.virtualnetwork.DDoSTriggerSYNPackets
|
Count per second | The maximum inbound SYN packets per second to trigger DDoS mitigation. |
azure.virtualnetwork.DDoSTriggerTCPPackets
|
Count per second | The maximum inbound TCP packets per second to trigger DDoS mitigation. |
azure.virtualnetwork.DDoSTriggerUDPPackets
|
Count per second | The maximum inbound UDP packets per second to trigger DDoS mitigation. |
azure.virtualnetwork.IfUnderDDoSAttack
|
Count | The maximum entities under DDoS attack. |
azure.virtualnetwork.PacketsDroppedDDoS
|
Count per second | The maximum inbound packets dropped per second, DDoS. |
azure.virtualnetwork.PacketsForwardedDDoS
|
Count per second | The maximum inbound packets forwarded per second, DDoS. |
azure.virtualnetwork.PacketsInDDoS
|
Count per second | The maximum inbound packets per second, DDoS. |
azure.virtualnetwork.PingMeshAverageRoundtripMs
|
milliseconds (ms) | The average round trip time for Pings sent to a destination VM. |
azure.virtualnetwork.PingMeshProbesFailedPercent
|
Percent (%) | Of the total number of pings sent to a destination VM, the average percentage of pings that failed. |
azure.virtualnetwork.TCPBytesDroppedDDoS
|
bps | The maximum inbound TCP bytes dropped per second, DDoS. |
azure.virtualnetwork.TCPBytesForwardedDDoS
|
bps | The maximum inbound TCP bytes forwarded per second, DDoS. |
azure.virtualnetwork.TCPBytesInDDoS
|
bps | The Maximum inbound TCP bytes per second, DDoS. |
azure.virtualnetwork.TCPPacketsDroppedDDoS
|
Count per second | The maximum inbound TCP packets dropped per second, DDoS. |
azure.virtualnetwork.TCPPacketsForwardedDDoS
|
Count per second | The Maximum inbound TCP packets forwarded per second, DDoS. |
azure.virtualnetwork.TCPPacketsInDDoS
|
Count per second | The maximum inbound TCP packets per second, DDoS. |
azure.virtualnetwork.UDPBytesDroppedDDoS
|
bps | The maximum inbound UDP bytes dropped per second, DDoS. |
azure.virtualnetwork.UDPBytesForwardedDDoS
|
bps | The maximum inbound UDP bytes forwarded per second, DDoS. |
azure.virtualnetwork.UDPBytesInDDoS
|
bps | The maximum inbound UDP bytes per second, DDoS. |
azure.virtualnetwork.UDPPacketsDroppedDDoS
|
Count per second | The maximum inbound UDP packets dropped per second, DDoS. |
azure.virtualnetwork.UDPPacketsForwardedDDoS
|
Count per second | The maximum inbound UDP packets forwarded per second, DDoS |
azure.virtualnetwork.UDPPacketsInDDoS
|
Count per second | The maximum inbound UDP packets per second, DDoS. |
VPN Gateway
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health score. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of VPN Gateway entities in the Metrics Explorer, filter the |
azure.vpngateway.AverageBandwidth
|
bps | Site-to-site bandwidth of a gateway in bytes per second. |
azure.vpngateway.BgpPeerStatus
|
Count | Status of BGP peer. |
azure.vpngateway.BgpRoutesAdvertised
|
Count | Count of Bgp Routes Advertised through tunnel. |
azure.vpngateway.BgpRoutesLearned
|
Count | Count of Bgp Routes Learned through tunnel. |
azure.vpngateway.MmsaCount
|
Count | MMSA Count. |
azure.vpngateway.QmsaCount
|
Count | QMSA Count. |
azure.vpngateway.TunnelAverageBandwidth
|
bps | Average bandwidth of a tunnel in bytes per second. |
azure.vpngateway.TunnelEgressBytes
|
bytes | Outgoing bytes of a tunnel. |
azure.vpngateway.TunnelEgressPacketDropCount
|
Count | Count of outgoing packets dropped by tunnel. |
azure.vpngateway.TunnelEgressPacketDropTSMismatch
|
Count | Outgoing packet drop count from traffic selector mismatch of a tunnel. |
azure.vpngateway.TunnelEgressPackets
|
Count | Outgoing packet count of a tunnel. |
azure.vpngateway.TunnelIngressBytes
|
bytes | Incoming bytes of a tunnel. |
azure.vpngateway.TunnelIngressPacketDropCount
|
Count | Count of incoming packets dropped by tunnel. |
azure.vpngateway.TunnelIngressPacketDropTSMismatch
|
Count | Incoming packet drop count from traffic selector mismatch of a tunnel. |
azure.vpngateway.TunnelIngressPackets
|
Count | Incoming packet count of a tunnel. |
azure.vpngateway.TunnelNatAllocations
|
Count | Count of allocations for a NAT rule on a tunnel. |
azure.vpngateway.TunnelNatedBytes
|
bytes | Number of bytes that were NATed on a tunnel by a NAT rule. |
azure.vpngateway.TunnelNatedPackets
|
Count | Number of packets that were NATed on a tunnel by a NAT rule. |
azure.vpngateway.TunnelNatFlowCount
|
Count | Number of NAT flows on a tunnel by flow type and NAT rule. |
azure.vpngateway.TunnelNatPacketDrop
|
Count | Number of NATed packets on a tunnel that dropped by drop type and NAT rule. |
azure.vpngateway.TunnelPeakPackets
|
Count | Tunnel Peak Packets Per Second. |
azure.vpngateway.TunnelReverseNatedBytes
|
bytes | Number of bytes that were reverse NATed on a tunnel by a NAT rule. |
azure.vpngateway.TunnelReverseNatedPackets
|
Count | Number of packets on a tunnel that were reverse NATed by a NAT rule. |
azure.vpngateway.TunnelTotalFlowCount
|
Count | Total flow count on a tunnel. |
azure.vpngateway.VnetAddressPrefixCount
|
Count | Count of Vnet address prefixes behind gateway. |
Infrastructure/Google Cloud Platform metrics
Metrics for GCP entities are collected by integrating SolarWinds Observability SaaS with your Google cloud account. See Google Cloud Platform monitoring.
Google Compute Engine (GCE)
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore |
Percent (%) |
Health state. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of GCE entities in the Metrics Explorer, filter the |
gcp.compute.googleapis.com.instance.cpu.guestVisibleVcpus
|
Count | Number of vCPUs visible and available inside the guest VM instance. |
gcp.compute.googleapis.com.instance.cpu.reservedCores
|
Count | Number of reserved vCPUs on a host. |
gcp.compute.googleapis.com.instance.cpu.schedulerWaitTime
|
seconds (s) | Time a vCPU spends in the ready state but not scheduled to run. |
gcp.compute.googleapis.com.instance.cpu.usageTime
|
seconds (s) | vCPU usage per instance for a time interval in vCPU-seconds. |
gcp.compute.googleapis.com.instance.cpu.utilization
|
Percent (%) | CPU utilization as a fraction of the total allocated capacity (0.0-1.0). |
gcp.compute.googleapis.com.instance.disk.averageIOLatency
|
Microseconds | Average latency of disk I/O operations over the past minute. |
gcp.compute.googleapis.com.instance.disk.averageIOQueueDepth
|
Count | Average number of I/O requests in the disk's queue over the past minute. |
gcp.compute.googleapis.com.instance.disk.maxReadBytesCount
|
bytes | Max per-second read throughput of a Google Cloud Persistent Disk. |
gcp.compute.googleapis.com.instance.disk.maxReadOpsCount
|
Count | Max number of read requests per second on a Google Cloud Storage bucket. |
gcp.compute.googleapis.com.instance.disk.maxWriteBytesCount
|
bytes | Maximum per-second written bytes to a Google Cloud Storage bucket. |
gcp.compute.googleapis.com.instance.disk.maxWriteOpsCount
|
Count | Maximum number of per-second write requests on a Google Cloud Storage bucket. |
gcp.compute.googleapis.com.instance.disk.performanceStatus
|
Count | Disk read throughput, in bytes per second, for a Compute Engine instance. |
gcp.compute.googleapis.com.instance.disk.provisioningIOPS
|
Count | User-specified provisioned IOPS for a Google Cloud Platform Persistent Disk. |
gcp.compute.googleapis.com.instance.disk.provisioningSize
|
bytes | Size in bytes of a Google Cloud Storage bucket's disk specified by the user. |
gcp.compute.googleapis.com.instance.disk.provisioningThroughput
|
Count | Rate at which bytes are written to a Google Cloud Storage bucket's disk. |
gcp.compute.googleapis.com.instance.disk.readBytesCount
|
bytes | Count of bytes read from disk per sample period. |
gcp.compute.googleapis.com.instance.disk.readOpsCount
|
Count | Disk read I/O operation count. |
gcp.compute.googleapis.com.instance.disk.writeBytesCount
|
bytes | Bytes written per minute to a Google Cloud Storage bucket. |
gcp.compute.googleapis.com.instance.disk.writeOpsCount
|
Count | Disk writes IO count. |
gcp.compute.googleapis.com.instance.globalDNS.requestCount
|
Count | Number of global internal DNS queries from a Google Compute Engine VM. |
gcp.compute.googleapis.com.instance.gpu.infraHealth
|
Count | VM instance GPU infrastructure health status. |
gcp.compute.googleapis.com.instance.gpu.packetRetransmissionCount
|
Count | Packet retransmission count observed by GPU NICs per timestamp. |
gcp.compute.googleapis.com.instance.gpu.throughputRxBytes
|
bytes | Network traffic in bytes per minute for GPU VM machine types. |
gcp.compute.googleapis.com.instance.gpu.throughputTxBytes
|
bytes | Network throughput (bytes per minute) for GPU VM machine types. |
gcp.compute.googleapis.com.instance.integrity.earlyBootValidationStatus
|
Count | Early boot integrity policy validation status. |
gcp.compute.googleapis.com.instance.integrity.lateBootValidationStatus
|
Count | Validation status of late boot integrity policy. |
gcp.compute.googleapis.com.instance.interruptionCount
|
Count | Current count of interruptions by type and reason. |
gcp.compute.googleapis.com.instance.memory.balloonRamSize
|
bytes | Total memory allocated to a VM instance. |
gcp.compute.googleapis.com.instance.memory.balloonRamUsed
|
bytes | Current memory usage on a VM instance. |
gcp.compute.googleapis.com.instance.memory.balloonSwapInBytesCount
|
bytes | Memory usage from swap space. |
gcp.compute.googleapis.com.instance.memory.balloonSwapOutBytesCount
|
bytes | Memory written to swap space. |
gcp.compute.googleapis.com.instance.network.receivedBytesCount
|
bytes | Bytes received over the network. |
gcp.compute.googleapis.com.instance.network.receivedPacketsCount
|
Count | Packets received count. |
gcp.compute.googleapis.com.instance.network.sentBytesCount
|
bytes | Bytes transmitted over a network interface. |
gcp.compute.googleapis.com.instance.network.sentPacketsCount
|
Count | Number of packets transmitted. |
gcp.compute.googleapis.com.instance.uptime
|
seconds (s) | Instance deltaUpTime. |
gcp.compute.googleapis.com.instance.uptimeTotal
|
seconds (s) | Time elapsed since a VM instance was started. |
Google Cloud Storage (GCS)
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health state. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of GCS entities in the Metrics Explorer, filter the |
gcp.storage.googleapis.com.anywhereCache.ingestedBytesCount
|
bytes | Increase in total raw bytes ingested into Cloud Memorystore per second, by zone. |
gcp.storage.googleapis.com.anywhereCache.requestCount
|
Count | The number of API call events per minute, categorized by API method, response code, cache zone, and cache hit status. |
gcp.storage.googleapis.com.anywhereCache.sentBytesCount
|
bytes | The number of bytes transmitted per API method, response code, cache zone, and cache hit status in a 60-second period. |
gcp.storage.googleapis.com.anywhereCacheMetering.cacheStorageBytesCount
|
bytes | Total count of bytes stored in caches across all zones. |
gcp.storage.googleapis.com.anywhereCacheMetering.cacheStorageKbsecCount
|
Kibibytes | Changes in storage size in KiB (1024 bytes) for each Anywhere Cache zone over a time interval, in seconds. |
gcp.storage.googleapis.com.anywhereCacheMetering.evictionByteCount
|
bytes | Change in bytes evicted from cache per zone. |
gcp.storage.googleapis.com.anywhereCacheMetering.ingestedBillableBytesCount
|
bytes | The number of new, successfully ingested billable bytes per zone in Google Cloud Memorystore Cache. |
gcp.storage.googleapis.com.api.lroCount
|
Count | Number of long-running operation completions. |
gcp.storage.googleapis.com.api.requestCount
|
Count | Number of API call differences per method and response code within a 60-second window. |
gcp.storage.googleapis.com.authn.authentication
|
Count | Number umber of authenticated requests per 60 second interval, grouped by authentication method and access ID. |
gcp.storage.googleapis.com.authz.ACLBasedObjectAccessCount
|
Count | Number of access grants issued based on an object's Access Control List within a sampling window. |
gcp.storage.googleapis.com.authz.ACLOperationsCount
|
Count | Aggregated number of Create, Get, Set, and Delete ACL operations per minute. |
gcp.storage.googleapis.com.authz.objectSpecificACLMutationCount
|
Count | Changes in object-specific Access Control Lists (ACLs) counts within a sampling window of 60 seconds. |
gcp.storage.googleapis.com.autoclass.transitionOperationCount
|
Count | The total number of Autoclass-initiated storage class transition operations within a sampling period. |
gcp.storage.googleapis.com.autoclass.transitionedBytesCount
|
bytes | The total number of bytes transitioned between Google Cloud Platform resource classes monitored by Autoclass within a 300-second interval. |
gcp.storage.googleapis.com.client.grpc.client.attempt.duration
|
Seconds (s) | Time taken for an RPC attempt, including channel selection. |
gcp.storage.googleapis.com.client.grpc.client.attempt.RCVDTotalCompressedMessageSize
|
bytes | Total compressed bytes received per RPC attempt. |
gcp.storage.googleapis.com.client.grpc.client.attempt.sentTotalCompressedMessageSize
|
bytes | The total number of compressed bytes transmitted for each RPC attempt, excluding metadata and framing. |
gcp.storage.googleapis.com.client.grpc.client.attempt.started
|
Count | Number of RPC call initiations. |
gcp.storage.googleapis.com.client.grpc.client.call.duration
|
Seconds (s) | The elapsed time between a client sending a request and receiving the corresponding response in a gRPC call. |
gcp.storage.googleapis.com.client.grpc.lb.rls.cacheEntries
|
Count | The number of items currently stored in a Redis instance's read-through cache. |
gcp.storage.googleapis.com.client.grpc.lb.rls.cacheSize
|
bytes | The current size in bytes of Redis Laboratory Service (RLS) cache. |
gcp.storage.googleapis.com.client.grpc.lb.rls.defaultTargetPicks
|
Count | The number of load balancer picks directed towards the default target. |
gcp.storage.googleapis.com.client.grpc.lb.rls.failedPicks
|
Count | Number of failed load balancer (LB) picks due to RLS request failures or RLS channel throttling. |
gcp.storage.googleapis.com.client.grpc.lb.rls.targetPicks
|
Count | The number of load balancing picks made for each Registered Load Balancer (RLS) target. |
gcp.storage.googleapis.com.client.grpc.lb.wrr.endpointWeightNotYetUsable
|
Count | Number of unsynchronized endpoints per scheduler update. |
gcp.storage.googleapis.com.client.grpc.lb.wrr.endpointWeightStale
|
Count | Total number of stale endpoint weights in scheduler updates. |
gcp.storage.googleapis.com.client.grpc.lb.wrr.endpointWeights
|
Weight | Endpoint weight histogram measures the distribution of endpoint weights. |
gcp.storage.googleapis.com.client.grpc.lb.wrr.rrFallback
|
Count | Number of occurrences where WRR policy fell back to RR due to insufficient healthy endpoints with valid weights. |
gcp.storage.googleapis.com.client.grpc.xdsClient.connected
|
Boolean | The current state of the ADS connection between xDS client and server (1 for active, 0 for inactive). |
gcp.storage.googleapis.com.client.grpc.xdsClient.resourceUpdatesInvalid
|
Count | Resources received validation errors count. |
gcp.storage.googleapis.com.client.grpc.xdsClient.resourceUpdatesValid
|
Count | Resources validated count. |
gcp.storage.googleapis.com.client.grpc.xdsClient.resources
|
Count | The number of gRPC xDS resources in use. |
gcp.storage.googleapis.com.client.grpc.xdsClient.serverFailure
|
Count | Number of xDS servers transitioning from healthy to unhealthy state. |
gcp.storage.googleapis.com.network.receivedBytesCount
|
bytes | Deltas in bytes received per API method and response code over a period of 60 seconds. |
gcp.storage.googleapis.com.network.sentBytesCount
|
bytes | The number of bytes sent over the network per API method and response code, measured every minute. |
gcp.storage.googleapis.com.quota.anywhereCacheStorageSize.exceeded
|
Count | The number of attempts to exceed the quota limit for Cloud Storage anywhere cache size. |
gcp.storage.googleapis.com.quota.anywhereCacheStorageSize.limit
|
Kibibytes | Measures the current size in bytes of a Google Cloud Memorystore anywhere cache's writable cache. |
gcp.storage.googleapis.com.quota.anywhereCacheStorageSize.usage
|
Kibibytes | The current size of the anywhere cache in bytes. |
gcp.storage.googleapis.com.quota.dualregionAnywhereCacheEgressBandwidth.limit
|
bits | Measures the current limit of egress bandwidth (in bytes per second) for Google Cloud Storage Nearline or Coldline cache anywhere in a dual region. |
gcp.storage.googleapis.com.quota.dualregionAnywhereCacheEgressBandwidth.usage
|
bits | Measures the current usage of egress bandwidth for anywhere cache in dual-region setup. |
gcp.storage.googleapis.com.quota.dualregionGoogleEgressBandwidth.limit
|
bits | Measures the current limit of dual-region egress bandwidth in bytes per second. |
gcp.storage.googleapis.com.quota.dualregionGoogleEgressBandwidth.usage
|
bits | Measures the current usage of outbound network bandwidth in bytes per second across all projects and regions in Google Cloud Platform. |
gcp.storage.googleapis.com.quota.dualregionInternetEgressBandwidth.limit
|
bits | The current limit (in bytes per second) of internet egress bandwidth for a dual-region configuration in Google Cloud Platform. |
gcp.storage.googleapis.com.quota.dualregionInternetEgressBandwidth.usage
|
bits | Current dual-region internet egress bandwidth usage. |
gcp.storage.googleapis.com.replication.meetingRPO
|
Count | Whether the specified storage class object meets its respective RPO (Reliability of Objects) requirement. |
gcp.storage.googleapis.com.replication.missingRPOMinutesLast30d
|
Count | Number of minutes in the last 30 days where RPO was missed. |
gcp.storage.googleapis.com.replication.objectReplicationsLast30d
|
Count | Total number of object replications over the past 30 days, with distinction between RPO met and missed replications. |
gcp.storage.googleapis.com.replication.timeSinceMetricsUpdated
|
Seconds (s) | Time elapsed since last calculation of 'storage.googleapis.com/replication' metric values. |
gcp.storage.googleapis.com.replication.turboMaxDelay
|
Seconds (s) | The maximum age (in seconds) of an unsynced object in a bucket's multi-object intelli-tiering storage class. |
gcp.storage.googleapis.com.replication.v2.objectReplicationsLast30d
|
Count | Total count of object replications over the last 30 days. |
gcp.storage.googleapis.com.replication.v2.timeSinceMetricsUpdated
|
Seconds (s) | The elapsed time since the last calculation of missing RPO minutes for Cloud Storage object replications over the past 30 days. |
gcp.storage.googleapis.com.storage.objectCount
|
Count | Buckets: number of objects per storage class. |
gcp.storage.googleapis.com.storage.totalByteSeconds
|
bps | The total daily bps of storage used by each storage class in a Google Cloud Storage bucket, excluding soft-deleted objects. |
gcp.storage.googleapis.com.storage.totalBytes
|
bytes | The total size of all non-soft-deleted objects in the bucket, grouped by storage class. |
gcp.storage.googleapis.com.storage.v2.deletedBytes
|
bytes | Deltas of deleted bytes per day, by storage class in each bucket. |
gcp.storage.googleapis.com.storage.v2.totalByteSeconds
|
bps | Total daily bps used across all object storage classes and types within a bucket. |
gcp.storage.googleapis.com.storage.v2.totalBytes
|
bytes | The total size of all objects and multipart-uploads in a Google Cloud Storage bucket. |
gcp.storage.googleapis.com.storage.v2.totalCount
|
Count | Number of objects and multipart-uploads per bucket, grouped by storage class and type. |
Infrastructure/Kubernetes metrics
Metrics for Kubernetes entities are collected by installing the SWO K8s Collector on a Kubernetes cluster that has Prometheus installed. See Kubernetes monitoring.
Cluster metrics
Metric | Unit | Description |
---|---|---|
k8s.cluster.cpu.allocatable
|
core |
The allocatable of CPU on cluster that are available for scheduling. Metric type: Gauge. |
k8s.cluster.cpu.capacity
|
core |
The cluster CPU capacity. Metric type: Gauge. |
k8s.cluster.cpu.utilization
|
Percent (%) |
The cluster CPU usage. Metric type: Gauge. |
k8s.cluster.memory.allocatable |
Binary Bytes |
The allocatable of memory on cluster that are available for scheduling. Metric type: Gauge. |
k8s.cluster.memory.capacity
|
Binary Bytes |
The cluster memory capacity. Metric type: Gauge. |
k8s.cluster.memory.utilization
|
Percent (%) |
The cluster memory usage. Metric type: Gauge. |
k8s.cluster.nodes
|
Count |
The number of nodes on cluster. Metric type: Gauge. |
k8s.cluster.nodes.ready
|
Count |
The number of nodes with status condition ready. Metric type: Gauge. |
k8s.cluster.nodes.ready.avg
|
Percent (%) |
The percentage of nodes with status condition ready. Metric type: Gauge. |
k8s.cluster.pods
|
Count |
The number of pods on a cluster. Metric type: Gauge. |
k8s.cluster.pods.running
|
Count |
The number of pods in running phase. Metric type: Gauge. |
k8s.cluster.spec.cpu.requests
|
cores |
The total number of requested CPU by all containers in a cluster. Metric type: Gauge. |
k8s.cluster.spec.memory.requests
|
Binary Bytes |
The total number of requested memory by all containers in a cluster. Metric type: Gauge. |
Node metrics
Metric | Unit | Description |
---|---|---|
k8s.kube_node_created |
seconds (s) |
Unix creation timestamp. Metric type: Gauge. |
k8s.kube_node_info |
Information about a cluster node. Metric type: Gauge. |
|
k8s.kube_node_spec_unschedulable
|
Whether a node can schedule new pods. Metric type: Gauge. |
|
k8s.kube_node_status_allocatable |
|
The amount of resources allocatable for pods (after reserving some for system daemons). Metric type: Gauge. |
k8s.kube_node_status_capacity |
|
The total amount of resources available for a node. Metric type: Gauge. |
k8s.kube_node_status_condition
|
The condition of a cluster node. Metric type: Gauge. |
|
k8s.kube_node_status_ready
|
Node status (as tag Metric type: Gauge. |
|
k8s.node.cpu.allocatable |
core |
CPU Utilization. The allocatable of CPU on node that are available for scheduling. Metric type: Gauge. |
k8s.node.cpu.capacity
|
core |
CPU Utilization. The node CPU capacity. Metric type: Gauge. |
k8s.node.cpu.usage.seconds.rate
|
core |
CPU Utilization. The rate of node cumulative CPU time consumed. Metric type: Gauge. |
k8s.node.fs.iops |
Disk IOPS. Rate of reads and writes of all pods on node. Metric type: Gauge. |
|
k8s.node.fs.throughput
|
Disk throughput. Rate of bytes read and written of all pods on node. Metric type: Gauge. |
|
k8s.node.fs.usage |
Binary Bytes |
Disk Usage. Number of bytes that are consumed by containers on this node’s filesystem. Metric type: Gauge. |
k8s.node.memory.allocatable |
Binary Bytes |
Memory Utilization. The allocatable of memory on node that are available for scheduling. Metric type: Gauge. |
k8s.node.memory.capacity |
Binary Bytes |
Memory Utilization. The node memory capacity. Metric type: Gauge. |
k8s.node.memory.working_set
|
Binary Bytes |
Memory utilization. Current working set on node. Metric type: Gauge. |
k8s.node.network.bytes_received |
Network In. Rate of bytes received of all pods on node. Metric type: Gauge. |
|
k8s.node.network.bytes_transmitted
|
Network Out. Rate of bytes transmitted of all pods on node. Metric type: Gauge. |
|
k8s.node.network.packets_received
|
Rate of packets received of all pods on node. Metric type: Gauge. |
|
k8s.node.network.packets_transmitted
|
Rate of packets transmitted of all pods on node. Metric type: Gauge. |
|
k8s.node.network.receive_packets_dropped
|
Rate of packets dropped while receiving of all pods on node. Metric type: Gauge. |
|
k8s.node.network.transmit_packets_dropped
|
Rate of packets dropped while transmitting of all pods on node. Metric type: Gauge. |
|
k8s.node.pods |
Count |
Number of pods. The number of pods on a node. Metric type: Gauge. |
k8s.node.status.condition.diskpressure
|
The condition diskpressure of a cluster node (1 when true, 0 when false or unknown). Metric type: Gauge. |
|
k8s.node.status.condition.memorypressure |
The condition memorypressure of a cluster node (1 when true, 0 when false or unknown). Metric type: Gauge. |
|
k8s.node.status.condition.networkunavailable
|
The condition networkunavailable of a cluster node (1 when true, 0 when false or unknown). Metric type: Gauge. |
|
k8s.node.status.condition.pidpressure
|
The condition pidpressure of a cluster node (1 when true, 0 when false or unknown). Metric type: Gauge. |
|
k8s.node.status.condition.ready
|
The condition ready of a cluster node (1 when true, 0 when false or unknown). Metric type: Gauge. |
Pod metrics
Metric | Unit | Description |
---|---|---|
k8s.kube.pod.owner.daemonset
|
Information about the DaemonSet owning the pod. Metric type: Gauge. |
|
k8s.kube.pod.owner.replicaset
|
Information about the ReplicaSet owning the pod. Metric type: Gauge. |
|
k8s.kube.pod.owner.statefulset
|
Information about the StatefulSet owning the pod. Metric type: Gauge. |
|
k8s.kube_pod_completion_time |
seconds (s) |
Completion time in unix timestamp for a pod. Metric type: Gauge. |
k8s.kube_pod_created
|
seconds (s) |
Unix creation timestamp. Metric type: Gauge. |
k8s.kube_pod_info |
Information about the pod. Metric type: Gauge. |
|
k8s.kube_pod_owner
|
Information about the pod owner. Metric type: Gauge. |
|
k8s.kube_pod_start_time
|
seconds (s) |
Start time in unix timestamp for a pod. Metric type: Gauge. |
k8s.kube_pod_status_phase
|
The pod's current phase. Metric type: Gauge. |
|
k8s.kube_pod_status_ready
|
Describes whether the pod is ready to serve requests. Metric type: Gauge. |
|
k8s.kube_pod_status_reason
|
The pod status reasons. Metric type: Gauge. |
|
k8s.pod.containers |
Count |
The number of containers on pod. Metric type: Gauge. |
k8s.pod.containers.running
|
Current number of running containers on pod. Metric type: Gauge. |
|
k8s.pod.cpu.usage.seconds.rate
|
seconds (s) |
CPU Utilization. The rate of pod's cumulative CPU time consumed. Metric type: Gauge. |
k8s.pod.fs.iops
|
Disk IOPS. Rate of reads and writes of all containers on pod. Metric type: Gauge. |
|
k8s.pod.fs.reads.bytes.rate
|
Rate of bytes read of all containers on pod. Metric type: Gauge. |
|
k8s.pod.fs.reads.rate
|
Rate of reads of all containers on pod. Metric type: Gauge. |
|
k8s.pod.fs.throughput
|
Disk Throughput. Rate of bytes read and written of all containers on pod. Metric type: Gauge. |
|
k8s.pod.fs.usage.bytes
|
Binary Bytes |
Disk Usage. Number of bytes that are consumed by containers on this pod's filesystem. Metric type: Gauge. |
k8s.pod.fs.writes.bytes.rate
|
Rate of bytes written of all containers on pod. Metric type: Gauge. |
|
k8s.pod.fs.writes.rate
|
Rate of writes of all containers on pod. Metric type: Gauge. |
|
k8s.pod.memory.working_set
|
Binary Bytes |
Memory Utilization. Current working set on pod. Metric type: Gauge. |
k8s.pod.network.bytes_received
|
Network In. Rate of bytes received of all containers on pod. Metric type: Gauge. |
|
k8s.pod.network.bytes_transmitted
|
Network Out. Rate of bytes transmitted of all containers on pod. Metric type: Gauge. |
|
k8s.pod.network.packets_received
|
Rate of packets received of all containers on pod. Metric type: Gauge. |
|
k8s.pod.network.packets_transmitted
|
Rate of packets transmitted of all containers on pod. Metric type: Gauge. |
|
k8s.pod.network.receive_packets_dropped
|
Rate of packets dropped while receiving of all containers on pod. Metric type: Gauge. |
|
k8s.pod.network.transmit_packets_dropped
|
Rate of packets dropped while transmitting of all containers on pod. Metric type: Gauge. |
|
k8s.pod.spec.cpu.limit
|
cores |
CPU quota of all containers on pod in given CPU period. Metric type: Gauge. |
k8s.pod.spec.cpu.requests
|
cores |
The number of requested request resource by all containers on pod. Metric type: Gauge. |
k8s.pod.spec.memory.limit
|
Binary Bytes |
Memory Utilization. Memory limit for all containers on pod. Metric type: Gauge. |
k8s.pod.spec.memory.requests
|
Binary Bytes |
The number of requested memory by all containers on pod. Metric type: Gauge. |
k8s.pod.status.reason
|
The current pod status reason. Metric type: Gauge. |
Container metrics
Metric | Unit | Description |
---|---|---|
k8s.container.spec.cpu.limit |
core |
CPU quota of container in given CPU period. Metric type: Gauge. |
k8s.container.cpu.usage.seconds.rate
|
cores |
The rate of pod cumulative CPU time consumed. Metric type: Gauge. |
k8s.container.fs.iops
|
Rate of reads and writes on container. Metric type: Gauge. |
|
k8s.container.fs.throughput
|
Rate of bytes read and written on container. Metric type: Gauge. |
|
k8s.container.network.bytes_received
|
Rate of bytes received on container. Metric type: Gauge. |
|
k8s.container.network.bytes_transmitted
|
Rate of bytes transmitted on container. Metric type: Gauge. |
|
k8s.kube_pod_container_status_last_terminated_timestamp
|
seconds |
Last terminated time for a pod container in unix timestamp. Metric type: Gauge. |
k8s.kube_pod_init_container_info
|
Information about an init container in a pod. Metric type: Gauge. |
|
k8s.kube_pod_init_container_status_waiting
|
Describes whether the init container is currently in waiting state. Metric type: Gauge. |
|
k8s.kube_pod_init_container_status_waiting_reason
|
Describes the reason the init container is currently in waiting state. Metric type: Gauge. |
|
k8s.kube_pod_init_container_status_running
|
Describes whether the init container is currently in running state. Metric type: Gauge. |
|
k8s.kube_pod_init_container_status_terminated
|
Describes whether the init container is currently in terminated state. Metric type: Gauge. |
|
k8s.kube_pod_init_container_status_terminated_reason
|
Describes the reason the init container is currently in terminated state. Metric type: Gauge. |
|
k8s.kube_pod_init_container_status_last_terminated_reason
|
Describes the last reason the init container was in terminated state. Metric type: Gauge. | |
k8s.kube_pod_init_container_status_ready
|
Describes whether the init containers readiness check succeeded. Metric type: Gauge. |
|
k8s.kube_pod_init_container_status_restarts_total
|
The number of restarts for the init container. Metric type: Gauge. |
|
k8s.kube_pod_init_container_resource_limits
|
The number of CPU cores requested limit by an init container. Metric type: Gauge. |
|
k8s.kube_pod_init_container_resource_requests
|
The number of CPU cores requested by an init container. Metric type: Gauge. |
Deployment metrics
Metric | Unit | Description |
---|---|---|
k8s.deployment.condition.available
|
Describes whether the deployment has an Available status condition. Metric type: Gauge. |
|
k8s.deployment.condition.progressing
|
Describes whether the deployment has a Progressing status condition. Metric type: Gauge. |
|
k8s.deployment.condition.replicafailure
|
Describes whether the deployment has a ReplicaFailure status condition. Metric type: Gauge. |
|
k8s.kube_deployment_created
|
seconds (s) |
Unix creation timestamp. Metric type: Gauge. |
k8s.kube_deployment_labels
|
Kubernetes labels converted to Prometheus labels. Metric type: Gauge. |
|
k8s.kube_deployment_spec_paused
|
Whether the deployment is paused and will not be processed by the deployment controller. Metric type: Gauge. |
|
k8s.kube_deployment_spec_replicas
|
Number of desired pods for a deployment. Metric type: Gauge. |
|
k8s.kube_deployment_status_condition
|
The current status conditions of a deployment. Metric type: Gauge. |
|
k8s.kube_deployment_status_replicas
|
The number of replicas per deployment. Metric type: Gauge. |
|
k8s.kube_deployment_status_replicas_available
|
The number of available replicas per deployment. Metric type: Gauge. |
|
k8s.kube_deployment_status_replicas_ready
|
The number of ready replicas per deployment. Metric type: Gauge. |
|
k8s.kube_deployment_status_replicas_unavailable
|
The number of unavailable replicas per deployment. Metric type: Gauge. |
|
k8s.kube_deployment_status_replicas_updated
|
The number of updated replicas per deployment. Metric type: Gauge. |
StatefulSet metrics
Metric | Unit | Description |
---|---|---|
k8s.kube_statefulset_created
|
seconds (s) |
Unix creation timestamp.
|
k8s.kube_statefulset_labels
|
Kubernetes labels converted to Prometheus labels. Metric type: Gauge. |
|
k8s.kube_statefulset_replicas
|
Number of desired pods for a StatefulSet. Metric type: Gauge. |
|
k8s.kube_statefulset_status_replicas_current
|
The number of current replicas per StatefulSet. Metric type: Gauge. |
|
k8s.kube_statefulset_status_replicas_ready
|
The number of ready replicas per StatefulSet. Metric type: Gauge. |
|
k8s.kube_statefulset_status_replicas_updated
|
The number of updated replicas per StatefulSet. Metric type: Gauge. |
DaemonSet metrics
Metric | Unit | Description |
---|---|---|
k8s.kube_daemonset_created
|
seconds (s) |
Unix creation timestamp. Metric type: Gauge. |
k8s.kube_daemonset_labels
|
Kubernetes labels converted to Prometheus labels. Metric type: Gauge. |
|
k8s.kube_daemonset_status_current_number_scheduled
|
The number of nodes that should be running a daemon pod and have at least one daemon pod running. Metric type: Gauge. |
|
k8s.kube_daemonset_status_desired_number_scheduled
|
The number of nodes that should be running the daemon pod. Metric type: Gauge. |
|
k8s.kube_daemonset_status_number_available
|
The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and available. Metric type: Gauge. |
|
k8s.kube_daemonset_status_number_misscheduled
|
The number of nodes that should not be running a daemon pod and have one or more running anyway. Metric type: Gauge. |
|
k8s.kube_daemonset_status_number_ready
|
The number of nodes that should be running the daemon pod and have one or more of the daemon pod running and ready. Metric type: Gauge. |
|
k8s.kube_daemonset_status_number_unavailable
|
The number of nodes that should be running the daemon pod and have none of the daemon pod running and available. Metric type: Gauge. |
|
k8s.kube_daemonset_status_updated_number_scheduled
|
The total number of nodes that are running updated daemon pod. Metric type: Gauge. |
ReplicaSet metrics
Metric | Unit | Description |
---|---|---|
k8s.kube_replicaset_spec_replicas
|
Information about the desired replicasets. Metric type: Gauge. |
|
k8s.kube_replicaset_status_ready_replicas
|
Information about the ready replicasets. Metric type: Gauge. |
|
k8s.kube_replicaset_status_replicas
|
Information about the current replicasets. Metric type: Gauge. |
Namespace metrics
Metric | Unit | Description |
---|---|---|
k8s.kube_namespace_created
|
seconds (s) |
Unix creation timestamp. Metric type: Gauge. |
k8s.kube_namespace_status_phase
|
Kubernetes namespace status phase. Metric type: Gauge. |
|
k8s.kube_resourcequota |
ResourceQuota metric. Metric type: Gauge. |
Other metrics
Metric | Unit | Description |
---|---|---|
k8s.apiserver.request.successrate
|
Percent (%) |
Success rate of Kubernetes API server calls. Metric type: Gauge. |
k8s.apiserver_request_total
|
Kubernetes API server requests. Metric type: Counter. |
|
k8s.apiserver_request_duration_seconds
|
Kubernetes API server requests latency. Metric type: Histogram. |
|
k8s.workqueue_adds_total
|
Kubernetes workqueue adds. Metric type: Counter. |
|
k8s.workqueue_depth
|
Kubernetes workqueue depth. Metric type: Gauge. |
|
k8s.workqueue_queue_duration_seconds
|
How long item stays in Kubernetes workqueue. Metric type: Histogram. |
|
k8s.coredns_cache_entries
|
The number of elements in the cache. Metric type: Gauge. |
|
k8s.coredns_cache_hits_total
|
The count of cache hits. Metric type: Counter. |
|
k8s.coredns_cache_misses_total
|
The count of cache misses. Metric type: Counter. |
|
k8s.coredns_dns_requests_total
|
Counter of DNS requests made per zone, protocol and family. Metric type: Counter. |
|
k8s.coredns_dns_request_duration_seconds
|
Histogram of the time (in seconds) each request took per zone. Metric type: Histogram. |
|
k8s.coredns_dns_responses_total
|
Counter of response status codes. Metric type: Counter. |
Network metrics
Metrics for network device entities are sent by an installed Network Collector. See Network monitoring.
Standard metrics
Network device metrics
Interface metrics
Metric | Units | Description |
---|---|---|
sw.collector.InterfaceAvailability.Availability
|
Percent (%) |
Availability. Availability of the interface instance of instances. Displayed as a percentage. May be displayed as:
|
sw.collector.InterfaceTraffic.InPercentUtil
|
Percent (%) |
In Percent Utilization. Average utilization of an interface instance or instances. Displayed as a percentage. May be displayed as:
|
sw.collector.InterfaceTraffic.OutPercentUtil
|
Percent (%) |
Out Percent Utilization. Average utilization of an interface instance or instances. Displayed as a percentage. May be displayed as:
|
sw.collector.InterfaceTraffic.InAveragebps
|
Percent (%) |
In Bits Per Second Average. Average utilization of an interface instance or instances. Displayed as a percentage. |
sw.collector.InterfaceTraffic.OutAveragebps
|
Percent (%) |
Out Bits Per Second Average. Average utilization of an interface instance or instances. Displayed as a percentage. |
sw.collector.InterfaceErrors.InDiscards
|
Percent (%) |
Out Discards. Average utilization of an interface instance or instances. Displayed as a percentage. May be displayed as:
|
sw.collector.InterfaceErrors.OutDiscards
|
Percent (%) |
In Discards. Average utilization of an interface instance or instances. Displayed as a percentage. May be displayed as:
|
sw.collector.InterfaceErrors.InErrors
|
Percent (%) |
In Errors. Average utilization of an interface instance or instances. Displayed as a percentage. May be displayed as:
|
sw.collector.InterfaceErrors.OutErrors
|
Percent (%) |
Out Errors. Average utilization of an interface instance or instances. Displayed as a percentage. May be displayed as:
|
Volume metrics
Metric | Units | Description |
---|---|---|
sw.collector.VolumeUsageHistory.PercentDiskUsed
|
Percent (%) |
Percent Disk Used. Indicates the overall disk usage as a percentage. |
sw.collector.VolumeUsageHistory.AvgDiskUsed
|
Gigabytes |
Average Disk Used. Indicates the average disk usage in Gigabytes. |
sw.collector.VolumeUsageHistory.DiskSize
|
Gigabytes |
Volume Size. Indicates the disk size in Gigabytes. |
sw.collector.VolumePerformanceHistory.AvgDiskReads
|
Percent (%) |
Disk Read Average. Indicates the average read speed of the volume. Only for volumes monitored via WMI. |
sw.collector.VolumePerformanceHistory.AvgDiskWrites
|
Percent (%) |
Disk Write Average. Indicates the average write speed. Only for volumes monitored via WMI. |
Sensor metrics
Flow metrics
Metric | Units | Description |
---|---|---|
sw.collector.Netflow.Flows.Bytes
|
GB |
Top Protocols, Top Countries, Top Endpoints, Top Conversations, Top Applications, Top Advanced Applications. Endpoints producing the most traffic on your network, most bandwidth-consuming conversations, protocols used for most traffic, countries hosting endpoints that transmit the most data, or applications responsible for most monitored traffic. |
Wireless Controller and Thin Access Point metrics
Metric | Units | Description |
---|---|---|
sw.collector.Wireless.Interfaces
|
N/A | MAC, SSIDs, Channels and Radio Type details are gathered from wireless interfaces of that AP. |
sw.collector.Wireless.Clients
|
Number | The sum of clients connected to all interfaces of AP. |
sw.collector.Wireless.HistoricalClients.SignalStrength
|
RSSI - signal strength The following thresholds are used to convert dbm value to a strength indicator: -82, -72, -68, -63, -56 (-82 is the worst). |
|
sw.collector.Wireless.HistoricalClients.OutDataRate
|
Data rate on clients |
Special metrics
Metric | Units | Description |
---|---|---|
|
Percent (%) | Total average bps (transmitted + received). |
OTel metrics
When an OTel receiver is configured to send telemetry data directly to SolarWinds Observability SaaS, the metrics collected depend on what OTel data is sent. See OTel direct ingestion.
When you integrate with Apache, Elasticsearch, NGINX, Redis, or ZooKeeper, the SolarWinds Observability Agent is used to send metrics and log data to SolarWinds Observability SaaS. See Monitor with OTel.
Apache metrics
Metric | Units | Description |
---|---|---|
apache.cpu.load
|
Percent (%) |
The current load of the CPU. |
apache.cpu.time
|
Jiff | The jiffs used by processes of a given category. |
apache.current_connections
|
Connections | The number of active connections currently attached to the HTTP server. |
apache.load.1
|
Percent (%) | The average server load during the last minute. |
apache.load.15
|
Percent (%) | The average server load during the last 15 minutes. |
apache.load.5
|
Percent (%) | The average server load during the last 5 minutes. |
apache.request.time
|
milliseconds (ms) | Total time spent on handling requests. |
apache.request.time.rate
|
milliseconds (ms) | Total time spent on handling requests. |
apache.requests
|
Requests | The number of requests serviced by the HTTP server per second. |
apache.requests.rate
|
milliseconds (ms) | Total time spent on handling requests. |
apache.scoreboard
|
Workers | The number of workers in each state. |
apache.throughput
|
Byte per request | The average number of bytes served per request. |
apache.time.perrequest
|
milliseconds per request | The average processing time per request. |
apache.traffic
|
Byte | Total HTTP server traffic in bytes. |
apache.traffic.rate
|
Byte per request | HTTP server traffic in bytes per second. |
apache.uptime
|
seconds (s) | The amount of time that the server has been running in seconds. |
apache.workers
|
Workers | The number of workers currently attached to the HTTP server. |
apache.workers.idle
|
Workers | The number of idle workers. |
Confluent Cloud metrics
Metric | Units | Description |
---|---|---|
confluent_kafka_server_active_connection_count
|
{connections} | The count of active authenticated connections. |
|
{partitions} | The number of partitions. |
confluent_kafka_server_received_bytes
|
By (bytes)/60s | The delta count of bytes of the customer's data received from the network. Each sample is the number of bytes received since the previous data sample. The count is sampled every 60 seconds. |
confluent_kafka_server_received_records
|
{records}/60s | The delta count of records received. Each sample is the number of records received since the previous data sample. The count is sampled every 60 seconds. |
confluent_kafka_server_request_bytes
|
Bytes/60s | The delta count of total request bytes from the specified request types sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds. |
confluent_kafka_server_request_count
|
{requests}/60s | The delta count of requests received over the network. Each sample is the number of requests received since the previous data point. The count sampled every 60 seconds. |
confluent_kafka_server_response_bytes
|
Bytes/60s | The delta count of total response bytes from the specified response types sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds. |
confluent_kafka_server_retained_bytes
|
Bytes/60s | The current count of bytes retained by the cluster. The count is sampled every 60 seconds. |
confluent_kafka_server_sent_bytes
|
Bytes/60s | The delta count of bytes sent over the network. Each sample is the number of bytes sent since the previous data point. The count is sampled every 60 seconds. |
confluent_kafka_server_sent_records
|
{records}/60s | The delta count of records sent. Each sample is the number of records sent since the previous data point. The count is sampled every 60 seconds. |
confluent_kafka_server_successful_authentication_count
|
{successful authentications}/60s | The delta count of successful authentications. Each sample is the number of successful authentications since the previous data point. The count sampled every 60 seconds. |
Docker metrics
Metric | Units | Description |
---|---|---|
container.blockio.io_service_bytes_recursive
|
bytes (By) | The nof bytes transferred to/from the disk by the group and descendant groups. |
container.cpu.throttling_data.periods
|
{periods} | The number of periods with throttling active. |
container.cpu.usage.kernelmode
|
nanosecond (ns) | Time spent by tasks of the cgroup in kernel mode (Linux). Time spent by all container processes in kernel mode (Windows). |
container.cpu.usage.total
|
nanosecond (ns) | Total CPU time consumed. |
container.cpu.usage.usermode
|
nanosecond (ns) | Time spent by tasks of the cgroup in user mode (Linux). Time spent by all container processes in user mode (Windows). |
container.cpu.utilization
|
percentage (%) |
Container CPU Utilization. Percentage of CPU used per container. |
container.memory.file
|
bytes (By) | Amount of memory used to cache filesystem data, including tmpfs and shared memory (Only available with cgroups v2). |
container.memory.percent
|
percentage (%) |
Container Memory Utilization. Percentage of memory used per container |
container.memory.total_cache
|
bytes (By) | Total amount of memory used by the processes of this cgroup (and descendants) that can be associated with a block on a block device. Also accounts for memory used by tmpfs (Only available with cgroups v1). |
container.memory.usage.limit
|
bytes (By) | Memory limit of the container. |
container.memory.usage.total
|
bytes (By) | Memory usage of the container. This excludes the cache. |
container.network.io.usage.rx_bytes
|
bytes (By) |
Total Received Bytes per Container. Total bytes received by the container. |
container.network.io.usage.rx_dropped
|
{packets} |
Total Incoming Dropped Packets by Container . Total incoming packets dropped by the container. |
container.network.io.usage.tx_bytes
|
bytes (By) |
Total Sent Bytes per Container. Total bytes sent by the container. |
container.network.io.usage.tx_dropped
|
{packets} |
Total Outgoing Dropped Packets by Container. Total outgoing packets dropped by the container. |
container.uptime
|
seconds (s) |
Total Container Uptime. The time elapsed since the start time of the container. |
Elasticsearch metrics
Metric | Units | Description |
---|---|---|
elasticsearch.breaker.memory.estimated
|
bytes (By) |
The estimated memory used for the operation. |
elasticsearch.breaker.memory.limit
|
bytes (By) | The memory limit for the circuit breaker. |
elasticsearch.breaker.tripped
|
1 | The total number of times the circuit breaker has been triggered and prevented an out of memory error. |
elasticsearch.cluster.data_nodes
|
{nodes} | Data Nodes. The number of data nodes in the cluster. |
elasticsearch.cluster.health
|
status | Cluster by Status. The health status of the cluster. Health status is based on the state of its primary and replica shards. Green indicates all shards are assigned. Yellow indicates that one or more replica shards are unassigned. Red indicates that one or more primary shards are unassigned, making some data unavailable. |
elasticsearch.cluster.in_flight_fetch
|
{fetches} | The number of unfinished fetches. |
elasticsearch.cluster.nodes
|
{nodes} | Nodes, Top 5 Clusters by Node Count. The total number of nodes in the cluster. |
elasticsearch.cluster.pending_tasks
|
{tasks} | Pending Tasks in Cluster. The number of cluster-level changes that have not yet been executed. |
elasticsearch.cluster.published_states.differences
|
1 | The number of differences between published cluster states. |
elasticsearch.cluster.published_states.full
|
1 | The number of published cluster states. |
elasticsearch.cluster.shards
|
{shards} | Active Shards, Shards by State. The number of shards in the cluster. |
elasticsearch.cluster.state_queue
|
1 | The number of cluster states in queue. |
elasticsearch.cluster.state_update.count
|
1 | The number of cluster state update attempts that changed the cluster state since the node started. |
elasticsearch.cluster.state_update.time
|
milliseconds (ms) | The cumulative amount of time updating the cluster state since the node started. |
elasticsearch.index.operations.completed
|
{operations} | The number of operations completed for an index. |
elasticsearch.index.operations.time
|
milliseconds (ms) | Time spent on operations for an index. |
elasticsearch.index.shards.size
|
bytes (By) | The size of the shards assigned to this index. |
elasticsearch.indexing_pressure.memory.limit
|
bytes (By) | The configured memory limit, in bytes, for the indexing requests. |
elasticsearch.indexing_pressure.memory.total.primary_rejections
|
1 | The cumulative number of indexing requests rejected in the primary stage. |
elasticsearch.indexing_pressure.memory.total.replica_rejections
|
1 | The number of indexing requests rejected in the replica stage. |
elasticsearch.memory.indexing_pressure
|
bytes (By) | Indexing Pressure. The memory consumed, in bytes, by indexing requests in the specified stage. |
elasticsearch.node.cache.count
|
{count} | The total count of query cache misses across all shards assigned to selected nodes. |
elasticsearch.node.cache.evictions
|
{evictions} | The number of evictions from the cache on a node. |
elasticsearch.node.cache.memory.usage
|
bytes (By) | The size in bytes of the cache on a node. |
elasticsearch.node.cluster.connections
|
{connections} | Cluster Connections. The number of open TCP connections for internal cluster communication. |
elasticsearch.node.cluster.io
|
bytes (By) | The number of bytes sent and received on the network for internal cluster communication. |
elasticsearch.node.cluster.io.rate
|
bytes per second (By/s) | Network Traffic. The number of bytes sent and received for internal cluster communication per second. |
elasticsearch.node.disk.io.read
|
kilobytes (KiBy) | Disk Read and Write. The total number of kilobytes read across all file stores for this node. |
elasticsearch.node.disk.io.write
|
kilobytes (KiBy) | Disk Read and Write. The total number of kilobytes written across all file stores for this node. |
elasticsearch.node.documents
|
{documents} | The number of documents on the node. |
elasticsearch.node.fs.disk.available
|
bytes (By) | The amount of disk space available to the JVM across all file stores for this node. Depending on OS or process level restrictions, this might appear less than free. This is the actual amount of free disk space the Elasticsearch node can use. |
elasticsearch.node.fs.disk.free
|
bytes (By) | The amount of unallocated disk space across all file stores for this node. |
elasticsearch.node.fs.disk.total
|
bytes (By) | The amount of disk space across all file stores for this node. |
elasticsearch.node.http.connections
|
{connections} | The number of HTTP connections to the node. |
elasticsearch.node.ingest.documents
|
{documents} | The total number of documents ingested during the lifetime of this node. |
elasticsearch.node.ingest.documents.current
|
{documents} | The total number of documents currently being ingested. |
lasticsearch.node.ingest.operations.failed
|
{operation} | The total number of failed ingest operations during the lifetime of this node. |
elasticsearch.node.open_files
|
{files} | Open File Descriptors. The number of open file descriptors held by the node. |
elasticsearch.node.operations.completed
|
{operations} | The number of operations completed by a node. |
elasticsearch.node.operations.completed.rate
|
{operations} per second | Node Operations Completed per Second. The number of operations completed for an index per second. |
elasticsearch.node.operations.time
|
milliseconds (s) | Total Time Spent on Operations. The time spent on operations by a node. |
elasticsearch.node.pipeline.ingest.documents.current
|
{documents} | The total number of documents currently being ingested by a pipeline. |
elasticsearch.node.pipeline.ingest.documents.preprocessed
|
{documents} | The number of documents preprocessed by the ingest pipeline. |
elasticsearch.node.pipeline.ingest.operations.failed
|
{operation} | The total number of failed operations for the ingest pipeline. |
elasticsearch.node.script.cache_evictions
|
1 | The total number of times the script cache has evicted old data. |
elasticsearch.node.script.compilation_limit_triggered
|
1 | The total number of times the script compilation circuit breaker has limited inline script compilations. |
elasticsearch.node.script.compilations
|
{compilations} | The total number of inline script compilations performed by the node. |
elasticsearch.node.shards.data_set.size
|
bytes (By) | The total data set size of all shards assigned to the node. This includes the size of shards not stored fully on the node, such as the cache for partially mounted indices. |
elasticsearch.node.shards.reserved.size
|
bytes (By) | A prediction of how much larger the shard stores on this node will eventually grow due to ongoing peer recoveries, restoring snapshots, and similar activities. A value of -1 indicates that this is not available. |
elasticsearch.node.shards.size
|
bytes (By) | The size of the shards assigned to this node. |
elasticsearch.node.thread_pool.tasks.finished
|
{tasks} | The number of tasks finished by the thread pool. |
elasticsearch.node.thread_pool.tasks.queued
|
{tasks} | Queued Tasks in Thread Pool. The number of queued tasks in the thread pool. |
elasticsearch.node.thread_pool.threads
|
{threads} | The number of threads in the thread pool. |
elasticsearch.node.translog.operations
|
{operations} | The number of transaction log operations. |
elasticsearch.node.translog.size
|
bytes (By) | The size of the transaction log. |
elasticsearch.node.translog.uncommitted.size
|
bytes (By) | The size of uncommitted transaction log operations. |
elasticsearch.os.cpu.load_avg.15m
|
1 | CPU Utilization. The fifteen-minute load average on the system. The field is not present if fifteen-minute load average is not available. |
elasticsearch.os.cpu.load_avg.1m
|
1 | CPU Utilization. The one-minute load average on the system. The field is not present if one-minute load average is not available. |
elasticsearch.os.cpu.load_avg.5m
|
1 | CPU Utilization. The five-minute load average on the system. The field is not present if five-minute load average is not available. |
elasticsearch.os.cpu.usage
|
Percent (%) | The recent CPU usage for the whole system, or -1 if not supported. |
elasticsearch.os.memory
|
bytes (By) | The amount of physical memory. |
jvm.classes.loaded |
1 | The number of loaded classes. |
jvm.gc.collections.count |
1 | The total number of garbage collections that have occurred. |
jvm.gc.collections.count.rate
|
collections per second | JVM GC Collection Count per Second. The number of Java Virtual Machine garbage collections that have occurred per second. |
jvm.gc.collections.elapsed
|
milliseconds (ms) | Total JVM GC Collection Time. The approximate accumulated collection elapsed time . |
jvm.memory.heap.committed |
bytes (By) | JVM Memory Heap Committed vs Used. The amount of memory that is guaranteed to be available for the heap. |
jvm.memory.heap.max
|
bytes (By) | The maximum amount of memory can be used for the heap . |
jvm.memory.heap.used
|
bytes (By) | JVM Memory Heap Committed vs Used. The current heap memory usage. |
jvm.memory.nonheap.committed
|
bytes (By) | The amount of memory that is guaranteed to be available for non-heap purposes. |
jvm.memory.nonheap.used
|
bytes (By) | The current non-heap memory usage. |
jvm.memory.pool.max
|
bytes (By) | The maximum amount of memory can be used for the memory pool. |
jvm.memory.pool.used
|
bytes (By) | The current memory pool memory usage. |
jvm.threads.count
|
1 | The current number of threads. |
IIS metrics
Metric | Units | Description |
---|---|---|
iis.connection.active
|
{active connections} | The number of active connections. |
iis.connection.anonymous
|
{anonymous connections} | The number of connections established anonymously. |
iis.connection.anonymous/rate
|
{anonymous connections}/s | The number of connections established anonymously per second. |
iis.connection.attempt.count
|
{connection attempts} | The total number of attempts to connect to the server. |
iis.connection.attempt.count/rate
|
{connection attempts}/second (s) | The total number of attempts to connect to the server per second. |
iis.network.blocked
|
bytes (By) | The total number of bytes blocked due to bandwidth throttling. |
iis.network.file.count
|
bytes (By) | The number of transmitted files. |
iis.network.io
|
bytes (By) | The total amount of bytes sent and received. |
iis.network.io/rate
|
bytes (By)/second (s) | The total amount of bytes sent and received per second |
iis.request.count
|
{requests} | The total number of requests of a given type. |
iis.request.queue.count
|
{requests} | The current number of requests in the queue. |
iis.request.rejected
|
{requests} | The total number of requests rejected. |
iis.thread.active
|
{requests} | The total number of active threads. |
iis.uptime
|
M/k | The amount of time the server has been up. |
Kafka metrics
Metric | Units | Description |
---|---|---|
kafka_controller_kafkacontroller_activecontrollercount
|
{active controllers in cluster} | Active Cluster Controllers. The average number of active controllers in the cluster. |
|
Log Flush Rate and Time. The maximum values of log flush rate and time. | |
|
ms (millisecond) | Leader Request Time. The average time taken to process a request at the leader. |
|
ms (millisecond) | Producer Request Time. The average total time to serve a single 'Produce' request. |
kafka_network_socketserver_networkprocessoravgidlepercent
|
% (percentage) | Broker Process Idle Time. The average fraction of time the network processors are idle. |
kafka_server_brokertopicmetrics_bytesin_1minuterate
|
Bytes/second | Broker Incoming Bytes. The one-minute sum of incoming bytes per second. |
kafka_server_brokertopicmetrics_bytesin_1minuterate
|
Bytes/second/{topic} | Broker Incoming Bytes per Topic. The one-minute average rate of incoming bytes per second distributed by Topic. |
kafka_server_brokertopicmetrics_messagesin_1minuterate
|
{messages}/second | Broker Incoming Messages. The one-minute sum of incoming messages per second. |
kafka_server_brokertopicmetrics_messagesin_1minuterate
|
{messages}/second/{topic} | Broker Incoming Messages per Topic. The one-minute average rate of incoming messages per second distributed per topic. |
kafka_server_replicafetchermanager_maxlag
|
{messages} | Max Replica Lag. The average of maximum number of messages by which the consumer lags behind the producer. |
kafka_server_replicamanager_isrshrinks_1minuterate
|
{shrink events}/minute | ISR Shrink Rate. The one-minute rate of ISR shrink events. If a broker goes down, ISR for some of the partitions will shrink. When that broker is up again and the replicas are fully caught up, ISR will expand. |
kafka_server_replicamanager_leadercount
|
{replica leaders} | Leader Replicas. The average number of replica leaders. |
kafka_server_replicamanager_partitioncount
|
{partitions} | Partitions. The average number of partitions on all brokers. |
kafka_server_replicamanager_underreplicatedpartitions
|
{under-replicated partitions} | Under-Replicated Partitions. The average number of under-replicated partitions. |
Memcached metrics
Metric | Units | Description |
---|---|---|
memcached.bytes
|
bytes (By) | Current Bytes Stored, Bytes Stored. The current number of bytes used by this server to store items. |
memcached.commands
|
{commands} | The commands executed. |
memcached.commands.rate
|
{commands}/second | Commands. The commands executed per second. |
memcached.connections.current
|
{connections} | The current number of open connections. |
memcached.connections.total
|
{connections} | The total number of connections opened since the server started running. |
memcached.cpu.usage
|
seconds (s) | CPU User Time, CPU System Time. The accumulated user and system time. |
memcached.current_items
|
{items} | Current Items in Cache, Active Connections. The number of items currently stored in the cache. |
memcached.evictions
|
{evictions} | Total Evictions. The average total number of cache item evictions. |
memcached.network
|
bytes (By) | Bytes transferred over the network. |
memcached.network.rate
|
bytes/second (By/s) | Network Traffic. The average number of bytes transferred over the network, per second. |
memcached.operation_hit_ratio
|
percentage (%) | Operation Hit Ratio. The hit ratio for operations, expressed as a percentage value between 0.0 and 100.0. |
memcached.operations
|
{operations} | Hits and Misses Total. The average total counts of hits and misses. |
memcached.operations.rate
|
{operations}/second | The average counts of hits and misses per second. |
memcached.threads
|
{threads} | The number of threads used by the Memcached instance. |
NGINX metrics
Metric | Units | Description |
---|---|---|
nginx.conections
|
Connections |
The current number of nginx connections by state. |
nginx.connections_accepted
|
Connections | The total number of accepted client connections. |
nginx.connections_accepted.gauge
|
Connections | The accepted client connections (gauge). |
nginx.connections_accepted.rate
|
Connections | The number of accepted client connections per second. |
nginx.connections_current
|
Connections | The current number of nginx connections by state. |
nginx.connections_dropped
|
Connections | The total number of dropped client connections. |
nginx.connections_dropped.rate
|
Connections | The number of dropped client connections per second. |
nginx.connections_handled
|
Connections | The total number of handled connections. Generally, the parameter value is the same as nginx.connections_accepted unless some resource limits have been reached (for example, the worker_connections limit). |
nginx.connections_handled.gauge
|
Connections | The handled client connections (gauge). |
nginx.connections_handled.rate
|
Connections | The number of handled client connections per second. |
nginx.requests
|
Requests | The total number of requests made to the server since it started. |
nginx.requests.rate
|
Requests per second |
The number of requests per second. |
Oracle DB metrics
Metric | Units | Description |
---|---|---|
oracledb.cpu_time
|
Seconds (s) |
The cumulative CPU time, in seconds. |
oracledb.dml_locks.limit
|
{locks} | The maximum limit of active Data Manipulation Language (DML) locks, -1 if unlimited. |
oracledb.dml_locks.usage
|
{locks} | The current count of active Data Manipulation Language (DML) locks. |
oracledb.enqueue_deadlocks
|
{deadlocks} | The total number of deadlocks between table or row locks in different sessions. |
oracledb.enqueue_locks.limit
|
{locks} | The maximum limit of active en queue locks, -1 if unlimited. |
oracledb.enqueue_locks.usage
|
{locks} | The current count of active en queue locks. |
oracledb.enqueue_resources.limit
|
{resources} | The maximum limit of active en queue resources, -1 if unlimited. |
oracledb.enqueue_resources.usage
|
{resources} | The current count of active en queue resources. |
oracledb.exchange_deadlocks
|
{deadlocks} | The number of times that a process detected a potential deadlock when exchanging two buffers and raised an internal, restartable error. Index scans are the only operations that perform exchanges. |
oracledb.executions
|
{executions} | The total number of calls (user and recursive) that executed SQL statements. |
oracledb.hard_parses
|
{parses} | The number of hard parses. |
oracledb.logical_reads
|
{reads} | The number of logical reads. |
oracledb.parse_calls
|
{parses} | The total number of parse calls. |
oracledb.pga_memory
|
bytes (By) | The Session Program Global Area (PGA) memory. |
oracledb.physical_reads
|
{reads} | The number of physical reads. |
oracledb.processes.limit
|
{processes} | The maximum limit of active processes, -1 if unlimited. |
oracledb.processes.usage
|
{processes} | The current count of active processes. |
oracledb.sessions.limit
|
{processes} | The maximum limit of active sessions, -1 if unlimited. |
oracledb.sessions.usage
|
{processes} | The count of active sessions. |
oracledb.tablespace_size.limit
|
bytes (By) | The maximum size of tablespace in bytes, -1 if unlimited. |
oracledb.tablespace_size.usage
|
bytes (By) | The used tablespace in bytes. |
oracledb.transactions.limit
|
{transactions} | The maximum limit of active transactions, -1 if unlimited. |
oracledb.transactions.usage
|
{transactions} | The current count of active transactions. |
oracledb.user_commits
|
{commits} | The number of user commits. When a user commits a transaction, the redo generated that reflects the changes made to database blocks must be written to disk. Commits often represent the closest thing to a user transaction rate. |
oracledb.user_rollbacks
|
1 | The number of times users manually issue the ROLLBACK statement or an error occurs during a user's transactions |
RabbitMQ metrics
Metric | Units | Description |
---|---|---|
rabbitmq.message.current.sum
|
{messages} | Current Messages in Queues, Top 10 Queues by Depth. The total number of messages currently in the queues on RabbitMQ by queue name. |
rabbitmq_channels
|
{channels} | Open Channels. The number of channels currently open on RabbitMQ. |
rabbitmq_channel_messages_unacked
|
{messages} | Messages Unacknowledged. The average number of delivered but not yet acknowledged messages on RabbitMQ. |
rabbitmq_consumers
|
{consumers} | Queue Consumers. The number of currently connected consumers on RabbitMQ. |
rabbitmq_disk_space_available_bytes
|
Bytes | Free Disk Space. The average free disk space available on RabbitMQ. |
rabbitmq_erlang_processes_used
|
{processes} | Used Processes. The total number of Erlang processes used by RabbitMQ. |
rabbitmq.message.acknowledged.rate
|
{messages}/s | Messages Acknowledged per Second. The average number of messages acknowledged per second on RabbitMQ. |
rabbitmq.message.delivered.rate
|
{messages}/s | Messages Delivered per Second. The average number of messages delivered per second on RabbitMQ. |
rabbitmq.message.dropped.rate
|
{messages}/s | Messages Dropped per Second. The average number of messages dropped per second on RabbitMQ. |
rabbitmq.message.published.rate
|
{messages}/s | Messages Published per Second. The average number of messages published per second on RabbitMQ. |
rabbitmq_process_open_fds
|
{file descriptors} | Open File Descriptors. The average number of open file descriptors on RabbitMQ. |
rabbitmq_process_open_tcp_sockets
|
{sockets} | Open Sockets. The total number of open TCP sockets on RabbitMQ. |
rabbitmq_process_resident_memory_bytes
|
Bytes | Memory Consumed by Node. The memory used by node on RabbitMQ. |
rabbitmq_queue_consumer_utilisation
|
Consumer Utilization. The average proportion of time that the queues can deliver messages to consumers on RabbitMQ. | |
rabbitmq_queue_process_memory_bytes
|
Bytes | Memory Consumed by Queues. The average memory used by the Erlang queue process on RabbitMQ. |
Redis metrics
Metric | Units | Description |
---|---|---|
redis.clients.blocked
|
Blocked Clients, Clients. The number of clients pending on a blocking call. | |
redis.clients.connected
|
Redis Version, Clients. The number of client connections (excluding connections from replicas). | |
redis.clients.max_input_buffer
|
The biggest input buffer among current client connections . | |
redis.clients.max_output_buffer
|
The longest output list among current client connections. | |
redis.commands
|
operations/s | Processed Commands per Second. The number of commands processed per second. |
redis.commands.processed
|
Total Processed Commands. The total number of commands processed by the server. | |
redis.connections.received
|
Total Connections. The total number of connections accepted by the server. | |
redis.connections.rejected
|
Total Connections. The number of connections rejected because of the maxclients limit. | |
redis.cpu.time
|
seconds (s) | Total CPU Time by State. The system CPU consumed by the Redis server in seconds since the server started. |
redis.db.avg_ttl
|
milliseconds (ms) | The average keyspace keys TTL. |
redis.db.expires
|
The number of keyspace keys with an expiration. | |
redis.db.keys
|
The number of keyspace keys. | |
redis.keys.evicted
|
Total Expired and Evicted Keys. The number of keys evicted due to the maxmemory limit. | |
redis.keys.expired
|
Total Expired and Evicted Keys. The total number of key expiration events. | |
redis.keyspace.hits
|
The number of successful lookup of keys in the main dictionary. | |
redis.keyspace.misses
|
The number of failed lookup of keys in the main dictionary. | |
redis.latest_fork
|
microseconds (μs) | The duration of the latest fork operation in microseconds. |
redis.memory.fragmentation_ratio
|
Fragmentation Ratio. The ratio between used_memory_rss and used_memory. | |
redis.memory.lua
|
bytes (By) | Used Memory. The number of bytes used by the Lua engine. |
redis.memory.peak
|
bytes (By) | Peak memory consumed by Redis (in bytes). |
redis.memory.rss
|
bytes (By) | Used Memory. The number of bytes that Redis allocated as seen by the operating system. |
redis.memory.used
|
bytes (By) | Used Memory. The total number of bytes allocated by Redis using its allocator. |
redis.net.input
|
bytes (By) | The total number of bytes read from the network. |
redis.net.output
|
bytes (By) | Total Network Traffic. The total number of bytes written to the network. |
redis.rdb.changes_since_last_save
|
Changes Since Last Save. The number of changes since the last dump. | |
redis.replication.backlog_first_byte_offset
|
The master offset of the replication backlog buffer. | |
redis.replication.offset
|
The server's current replication offset. | |
redis.role
|
Role. The Redis node's role. | |
redis.slaves.connected
|
Clients. The number of connected replicas. | |
redis.uptime
|
seconds (s) | Uptime. The number of seconds since Redis server started. |
Snowflake metrics
Metric | Units | Description |
---|---|---|
sw.metrics.healthscore
|
Percent (%) |
Health. The health state provides real-time insight into the overall health and performance of your monitored entities. The health state is determined based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health state is displayed as one of the following four states and colors: Good, Moderate, Bad, or Unknown. You can determine the impact of the alerts, anomalies, and statuses on the health of an entity type by going to Settings > Health, and selecting a specific entity type. You can also customize the impact. To view the health of Apache Web Server entities in the Metrics Explorer, filter the |
snowflake.database.bytes_scanned.avg
|
Bytes (By) |
Average bytes scanned in a database over the last 24-hour period. |
snowflake.database.query.count
|
{queries} | Query Counts. Total query count for the database over the last 24-hour period. |
snowflake.query.blocked
|
{queries} | Blocked query count for the warehouse over the last 24-hour period. |
snowflake.query.bytes_deleted.avg
|
Bytes (By) | Query Bytes. Average bytes deleted in the database over the last 24-hour period. |
snowflake.query.bytes_written.avg
|
Bytes (By) | Query Bytes. Average bytes written by the database over the last 24-hour period. |
snowflake.query.compilation_time.avg
|
Seconds (s) | Query Times. Average time taken to compile a query over the last 24-hour period. |
snowflake.query.executed
|
{queries} | Executed query count for the warehouse over the last 24-hour period. |
snowflake.query.execution_time.avg
|
Seconds (s) | Query Times. Average time spent executing queries in the database over the last 24-hour period. |
snowflake.query.queued_overload
|
{queries} | Overloaded query count for the warehouse over the last 24-hour period. |
snowflake.query.queued_provision
|
{queries} | Number of compute resources queued for provisioning over the last 24-hour period. |
snowflake.queued_overload_time.avg
|
Seconds (s) | Queued Times. Average time spent in the warehouse queue due to the warehouse being overloaded over the last 24-hour period. |
snowflake.queued_provisioning_time.avg
|
Seconds (s) | Queued Times. Average time spent in the warehouse queue waiting for resources to provision over the last 24-hour period. |
snowflake.queued_repair_time.avg
|
Seconds (s) | Queued Times. Average time spent in warehouse queue waiting for compute resources to be repaired over the last 24-hour period. |
snowflake.storage.stage_bytes.total
|
Bytes (By) | Storage Bytes. Number of bytes of stage storage used by files in all internal stages (named, table, user). |
snowflake.storage.storage_bytes.total
|
Bytes (By) | Storage Bytes. Number of bytes of table storage used, including bytes for data currently in Time Travel. |
snowflake.total_elapsed_time.avg
|
Seconds (s) | Total Elapsed Time. Average elapsed time over the last 24-hour period. |
Optional metrics
Metric | Units | Description |
---|---|---|
snowflake.billing.cloud_service.total
|
{credits} |
Reported total credits used in the cloud service over the last 24-hour period. |
snowflake.billing.total_credit.total
|
{credits} | Used Credits. Reported total credits used across the account over the last 24-hour period. |
snowflake.billing.virtual_warehouse.total
|
{credits} | Reported total credits used by the virtual warehouse service over the last 24-hour period. |
snowflake.billing.warehouse.cloud_service.total
|
{credits} | Credits used across the cloud service for the given warehouse over the last 24-hour period. |
snowflake.billing.warehouse.total_credit.total
|
{credits} | Total credits used associated with the given warehouse over the last 24-hour period. |
snowflake.billing.warehouse.virtual_warehouse.total
|
{credits} | Total credits used by the virtual warehouse service for the given warehouse over the last 24-hour period. |
snowflake.logins.total
|
{logins} | Total login attempts for account over the last 24-hour period. |
snowflake.pipe.credits_used.total
|
{credits} | Snow pipe credits total used over the last 24-hour period. |
snowflake.query.bytes_spilled.local.avg
|
Bytes (By) | Average bytes spilled (intermediate results do not fit in memory) by the local storage over the last 24-hour period. |
snowflake.query.bytes_spilled.remote.avg
|
Bytes (By) | Average bytes spilled (intermediate results do not fit in memory) by the remote storage over the last 24-hour period. |
snowflake.query.data_scanned_cache.avg
|
Percentage (%) | Average percentage of data scanned from cache over the last 24-hour period. |
snowflake.query.partitions_scanned.avg
|
{partitions} | Number of partitions scanned during the query so far over the last 24-hour period. |
snowflake.rows_deleted.avg
|
{rows} | Row Operations. Number of rows deleted from a table (or tables) over the last 24-hour period. |
snowflake.rows_inserted.avg
|
{rows} | Row Operations. Number of rows inserted into a table (or tables) over the last 24-hour period. |
snowflake.rows_produced.avg
|
{rows} | Row Operations. Average number of rows produced by the statement over the last 24-hour period. |
snowflake.rows_unloaded.avg
|
{rows} | Row Operations. Average number of rows unloaded during data export over the last 24-hour period. |
snowflake.rows_updated.avg
|
{rows} | Row Operations. Average number of rows updated in a table over the last 24-hour period. |
snowflake.session_id.count
|
{session ids} | Distinct session id's associated with the snowflake username over the last 24-hour period. |
snowflake.storage.failsafe_bytes.total
|
Bytes (By) | Number of bytes of data in Fail-safe. |
ZooKeeper metrics
Metric | Units | Description |
---|---|---|
zookeeper.connection.active
|
Connections |
The number of active clients connected to a ZooKeeper server. |
zookeeper.data_tree.ephemeral_node.count
|
Nodes | The number of ephemeral nodes that a ZooKeeper server has in its data tree. |
zookeeper.data_tree.size
|
Byte | The size of data in bytes that a ZooKeeper server has in its data tree. |
zookeeper.file_descriptor.available
|
File_descriptors | The number of file descriptors that a ZooKeeper still has available. |
zookeeper.file_descriptor.limit
|
File_descriptors | The maximum number of file descriptors that a ZooKeeper server can open. |
zookeeper.file_descriptor.open
|
File_descriptors | The number of file descriptors that a ZooKeeper server has open. |
zookeeper.latency.max
|
milliseconds (ms) | The maximum time in milliseconds for requests to be processed. |
zookeeper.latency.min
|
milliseconds (ms) | The minimum time in milliseconds for requests to be processed. |
zookeeper.packet.count
|
Packets | The number of ZooKeeper packets received or sent by a server. |
zookeeper.packet.count.rate
|
Packets per second | The number of ZooKeeper packets received and sent by a server. |
zookeeper.request.active
|
Requests | The number of currently executing requests. |
zookeeper.watch.count
|
Watches | The number of watches placed on Z-Nodes on a ZooKeeper server. |
zookeeper.znode.count
|
Znodes | The number of Z-Nodes that a ZooKeeper server has in its data tree. |
Telegraf metrics
When you integrate with DNS Query, FluentD, HAProxy, NGINX Plus API, NTPq, PHP-FPM, or Varnish, the SolarWinds Observability Agent is used to send metrics to SolarWinds Observability SaaS. See Monitor with Telegraf.
DNS Query metrics
For a comprehensive list of metrics, see DNS Query Input Plugin at GitHub.
Metric | Units | Description |
---|---|---|
query_time_ms | Millisecond (ms) | The time it takes the query to run (in milliseconds). |
result_code | integers:
| The result code, as an integer. See DNS Query Input Plugin. |
rcode_value | integer | Result code value. See DNS Query Input Plugin. |
FluentD metrics
For a comprehensive list of metrics, see Fluentd Input Plugin at GitHub.
Metric | Units | Description |
---|---|---|
fluentd_buffer_available_buffer_space_ratios | Percent (%) | Available Buffer Space. The percentage of remaining available buffer space. |
fluentd_buffer_queue_byte_size | Bytes (B) | Buffer Queue Bytes. The current size of queued buffer chunks (in bytes). |
fluentd_buffer_queue_length | Buffer Queue Length. The length of the buffer queue. | |
fluentd_buffer_stage_byte_size | Bytes (B) | Buffer Stage Bytes. The current size of staged buffer chunks (in bytes). |
fluentd_buffer_stage_length | Buffer Stage Length. The length of staged buffer chunks. | |
fluentd_buffer_total_queued_size | Bytes (B) | Buffer Queue Size. The size of the buffer queue. |
fluentd_emit_count | {emits} | Total Record Emit Count. The total number of emit calls. |
fluentd_emit_records | {records} | Total Emit Records. The total number of emitted records. |
fluentd_emit_size | Bytes (B) | Total Emit Size. The total size of emit events. |
fluentd_retry_count | {retries} | Retry Count. The number of retry attempts. |
fluentd_rollback_count | {count} | Total Rollback Count. The total number of rollbacks. Rollbacks happen when write/try_write fails. |
fluentd_slow_flush_count | {count} | Total Slow Flush Count. The total number of slow flushes. This count will be incremented when buffer flush is longer than slow_flush_log_threshold. |
fluentd_write_count | {count} | The total number of writes. |
HAProxy metrics
For a comprehensive list of metrics, see HAProxy Input Plugin at GitHub and HaProxy documentation at docs.haproxy.org.
SolarWinds Observability SaaS expects that metrics return a number. Some HAProxy metrics, such as status, return strings, and thus are not supported.
Metric | Units | Description |
---|---|---|
haproxy_active_servers | {servers} | Active Servers. The number of currently active servers. |
haproxy_backup_servers | {servers} | Backup Servers. The number of available backup servers. |
haproxy_bin | bytes | Total In and Out Traffic. The cumulative total of incoming traffic. |
haproxy_bout | bytes | Total In and Out Traffic. The cumulative total of outgoing traffic. |
haproxy_dreq | {requests} | Total Denied Requests. The cumulative number of requests denied because of security concerns. |
haproxy_dcon | {requests} | Total Denied Requests. The cumulative number of requests denied by the 'tcp-request connection' rules. |
haproxy_dses | {requests} | Total Denied Requests. The cumulative number of requests denied by the 'tcp-request session' rules. |
haproxy_dresp | {responses} | Total Denied Responses. The cumulative number of responses denied because of security concerns. For HTTP, the responses are denied because of a matched http-request rule, or 'option checkcache'. |
haproxy_eresp | {responses} | Total Denied Responses. The cumulative number of response errors, such as srv_abrt, or write errors on the client socket, or failure applying filters to the response. |
haproxy_ereq | {errors} | Total Request Errors. The cumulative number of request errors, such as early termination from the client, read error, client timeout, client closed connection,. |
haproxy_econ | {errors} | Total Request Errors. The cumulative number of request errors encountered when trying to connect to a backend server. The backend stat is the sum of the stat for all servers of that backend, plus any connection errors not associated with a particular server (such as the backend having no active servers). |
haproxy_scur | {sessions} | Current Sessions. The number of current sessions per proxy |
haproxy_slim | {sessions} | Session Limit. The currently configured session limit. |
haproxy_stot | {sessions} | Total Sessions. The cumulative number of sessions. |
haproxy_req_rate | requests per second | Request Rate. HTTP requests per second over the last elapsed second. |
haproxy_rtime | Milliseconds (ms) | Response Time. The average response time over the 1024 last requests (0 for TCP). |
haproxy_req_tot | {requests} | Total Requests. The total number of received HTTP requests. |
haproxy_ctime | Milliseconds (ms) | Connection Time. The average connect time over the last 1024 responses. |
haproxy_qtime | Milliseconds (ms) | Queue Time. The average queue time over the last 1024 responses. |
haproxy_ttime | Milliseconds (ms) | Session Time. The average session time over the last 1024 responses. |
haproxy_http_response.2xx | {responses} | Total Responses 2xx. The total number of HTTP responses with the 2xx code. |
haproxy_http_response.3xx | {responses} | Total Responses 3xx. The total number of HTTP responses with the 3xx code. |
haproxy_http_response.4xx | {responses} | Total Responses 4xx. The total number of HTTP responses with the 4xx code. |
haproxy_http_response.5xx | {responses} | Total Responses 5xx. The total number of HTTP responses with the 5xx code. |
NGINX Plus API metrics
For a more comprehensive list of metrics, see Nginx Virtual Host Traffic (VTS) Input Plugin and Nginx Plus API Input Plugin at GitHub.
Metric | Units | Description |
---|---|---|
nginx_vts_connections | {connections} | The number of connections of individual types: active, reading, writing, waiting, accepted handled, requests. |
nginx_vts_server, nginx_vts_filter | ||
nginx_vts_upstream |
| |
nginx_vts_cache |
NTPq metrics
For a comprehensive list of metrics, see NTPQ Input Plugin at GitHub.
Metric | Units | Description |
---|---|---|
ntpq_delay | Milliseconds (ms) | Round Trip Delay. Round trip communication delay to the remote peer or server. |
ntpq_jitter | Milliseconds (ms) | Jitter. Mean deviation (jitter) in the time reported for the remote peer or server (RMS or difference of multiple time samples). |
ntpq_offset | Milliseconds (ms) | Time Offsets. Mean offset (phase) in the times reported between this local host and the remote peer or server (RMS) |
ntpq_poll | Minutes (min) | Polling Frequency. RFC5905 suggests that this ranges in NTPv4 from 4 (16s) to 17 (36h) (log2 seconds), however, the observation suggests the actual displayed value is seconds for a much smaller range of 64 (26) to 1024 (210) seconds. |
ntpq_reach | Octal numbers | Reach. An 8-bit left-shift shift register value recording polls (bit set = successful, bit reset = fail) displayed in octal by default. The type can be changed to decimal/count/ratio by configuring it in the ntpq input section inside telegraf.conf. |
ntpq_when | Minutes (min) | Last Poll. The time since the last poll. |
PHP-FPM
For a comprehensive list of metrics, see PHP-FPM Input Plugin at GitHub.
Metric | Unit | Description |
---|---|---|
phpfpm_accepted_conn | Count | Total number of accepted connections. |
phpfpm_active_processes | Count | Number of active (busy) processes. |
phpfpm_idle_processes | Count | Number of idle (waiting) processes. |
phpfpm_listen_queue | Count | Number of requests in the queue of pending connections. |
phpfpm_listen_queue_len | Count | Maximum number of requests in the listen queue since FPM has started. |
phpfpm_max_active_processes | Count | Maximum number of active processes since FPM has started. |
phpfpm_max_children_reached | Count | Number of times the process limit has been reached. |
phpfpm_max_listen_queue | Count | Maximum number of requests in the listen queue since FPM has started. |
phpfpm_slow_requests | Count | Number of requests that exceeded the request_slowlog_timeout value. |
phpfpm_start_since | Seconds | Time since FPM has started. |
phpfpm_total_processes | Count | Total number of processes. |
Varnish metrics
For a comprehensive list of metrics, see Varnish Input Plugin at GitHub.
Metric | Units | Description |
---|---|---|
varnish_client_req | {requests} | Total Client Requests. The number of good client requests. |
varnish_s_req_bodybytes | bytes | Total Bytes. Total bytes for requests and responses. |
varnish_s_req_hdrbytes | bytes | Total Bytes. Total bytes for requests and responses. |
varnish_s_resp_bodybytes | bytes | Total Bytes. Total bytes for requests and responses. |
varnish_s_ressp_hdrbytes | bytes | Total Bytes. Total bytes for requests and responses. |
varnish_sess_dropped | {sessions} | Total Failed and Dropped Sessions. The number of sessions dropped for thread. The number of times an HTTP/1 session was drpped because the queue was too long already. See thread_queue_limit. |
varnish_sess_fail | {sessions} | Total Failed and Dropped Sessions. The number of sessions accept failure. The number of failures to accept a TCP connection. This counter is the sum of the sess_fail_* counters which give more detailed information. |
varnish_sess_closed | {operations} | Total Session Operations. The number of closed sessions. |
varnish_sess_herd | {operations} | Total Session Operations. The number of times the timeout_linger triggered. |
varnish_sess_readahead | {operations} | Total Session Operations. The number of read ahead sessions. |
varnish_sess_closed_err | {operations} | Total Session Operations. The number of sessions. closed with errors. |
varnish_s_sess | {sessions} | Total Sessions. The total number of sessions that occurred. |
varnish_n_expired | {objects} | Total Number of Objects. The number of objects expired because of old age. |
varnish_n_lru_moved | {objects} | Total Number of Objects. The number of moved LRU objects (move operations done on the LRU list). |
varnish_n_lru_nuked | {objects} | Total Number of Objects. The number of objects that have been forcefully evicted from the storage to make room for a new object (LRU nuked objects). |
varnish_cache_miss | {count} | Total Cache Hits and Misses. The number of cache misses. A cache miss indicates that the object was fetched from the backend before delivering it to the client. |
varnish_cache_hit | {count} | Total Cache Hits and Misses. The number of cache hits. A cache hit indicates that the object was delivered to a client without fetching it from a backend server. |
varnish_backend_busy | {connections} | Total Backed Connections. The number of times Varnish encountered a situation where it considered the backend to be too busy to handle additional connections. |
varnish_backend_conn | {connections} | Total Backed Connections. The number of successful backend connections. |
varnish_backend_fail | {connections} | Total Backed Connections. The number of failed backend connections. |
varnish_backend_recycle | {connections} | Total Backed Connections. The number of recycled backend connections. |
varnish_backend_retry | {connections} | Total Backed Connections. The number of retried backend connections. |
varnish_backend_reuse | {connections} | Total Backed Connections. The number of reused backend connections. |
varnish_backend_unhealthy | {connections} | Total Backed Connections. The number of unhealthy backend connections. |
varnish_fetch_length varnish_fetch_bad varnish_fetch_eof varnish_fetch_failed varnish_fetch_head varnish_fetch_chunked varnish_fetch_1xx varnish_fetch_204 varnish_fetch_304 varnish_fetch_none varnish_fetch_no_thread | {fetches} | Total HTTP Request Fetches. The number of all request fetches by type. |
varnish_shm_cont | {operations} | Total Shared Memory Operations. The number of contention operations (when multiple threads compete for access to SHM resources). |
varnish_shm_cycles | {operations} | Total Shared Memory Operations. The number of times data cycles through the shared memory. |
varnish_shm_flushes | {operations} | Total Shared Memory Operations. The number of flush operations. |
varnish_shm_records | {operations} | Total Shared Memory Operations. The number of record operations. |
varnish_shm_writes | {operations} | Total Shared Memory Operations. The number of write operations. |
varnish_thread_queue_len | {count} | Total Session Queue Length. The length of session queue waiting for threads. |
varnish_threads | {workers} | Total Workers. The number of threads in all pools. |
varnish_sess_queued | {sessions} | Total Queued Sessions. Sessions queued for thread. The number of times a session was queued waiting for a thread. |
varnish_threads_created | {threads} | Total Worker Threads. The total number of threads created in all pools. |
varnish_threads_destroyed | {threads} | Total Worker Threads. The total number of threads destroyed in all pools. |
varnish_threads_failed | {threads} | Total Worker Threads. The number of times creating a thread failed. |
varnish_threads_limited | {threads} | Total Worker Threads. The number of times more threads were needed but the limit was reached in a thread pool. |