Documentation forSolarWinds Observability SaaS

Managed Apache Flink metrics

Amazon Managed Apache Flink enables you to author and run code against streaming sources to perform time-series analytics, feed real-time dashboards, and create real-time metrics. Ensure your cloud platform is configured in SolarWinds Observability SaaS to collect this service's data. See Add an AWS cloud account. CloudWatch metrics must also be enabled for this service in the AWS Console for the metric data to be available.

Many of the collected metrics from Managed Apache Flink entities are displayed as widgets in SolarWinds Observability explorers; additional metrics may be collected and available in the Metrics Explorer. You can also create an alert for when an entity's metric value moves out of a specific range. See Entities in SolarWinds Observability SaaS for information about entity types in SolarWinds Observability SaaS.

The following table lists some of the metrics collected for these entities. To see the Managed Apache Flink metrics in the Metrics Explorer, type AWS.KinesisAnalytics in the search box.

Metric Units Description
sw.metrics.healthscore Percent (%)

Health score. A health score provides real-time insight into the overall health and performance of your monitored entities. The health score is calculated based on anomalies detected for the entity, alerts triggered for the entity's metrics, and the status of the entity. The health score is displayed as a single numerical value that ranges from a Good (70-100) to Moderate (40-69) to Bad (0-39) distinction.

To view the health score for Managed Apache Flink entities in the Metrics Explorer, filter the sw.metrics.healthscore metric by entity_types and select awsapacheinstance.

AWS.KinesisAnalysis.uptime milliseconds (ms) The time that the job has been running without interruption.
AWS.KinesisAnalysis.lastCheckpointSize bytes The total size of the last checkpoint.
AWS.KinesisAnalysis.lastCheckpointDuration milliseconds (ms) The time it took to complete the last checkpoint.
AWS.KinesisAnalysis.cpuUtilization Percent (%) Overall percentage of CPU utilization across task managers.
AWS.KinesisAnalysis.containerCPUUtilization Percent (%) Overall percentage of CPU utilization across task manager containers in Flink application cluster.
AWS.KinesisAnalysis.containerMemoryUtilization Percent (%) Overall percentage of memory utilization across task manager containers in Flink application cluster.
AWS.KinesisAnalysis.containerDiskUtilization Percent (%) Overall percentage of disk utilization across task manager containers in Flink application cluster.
AWS.KinesisAnalysis.heapMemoryUtilization Percent (%) Overall heap memory utilization across task managers.
AWS.KinesisAnalysis.downtime milliseconds (ms) For jobs currently in a failing/recovering situation, the time elapsed during this outage.
AWS.KinesisAnalysis.fullRestarts Count The total number of times this job has fully restarted since it was submitted.
AWS.KinesisAnalysis.managedMemoryUtilization Percent (%) Derived by managedMemoryUsed/managedMemoryTotal.
AWS.KinesisAnalysis.numRecordsInPerSecond Count per second The total number of records this application, operator or task has received per second.
AWS.KinesisAnalysis.numRecordsOutPerSecond Count per second The total number of records this application, operator or task has emitted per second.
AWS.KinesisAnalysis.threadcount Count The total number of live threads used by the application.
AWS.KinesisAnalysis.backPressuredTimeMsPerSecond milliseconds (ms) The time this task or operator is back pressured per second.
AWS.KinesisAnalysis.busyTimeMsPerSecond milliseconds (ms) The time this task or operator is busy (neither idle nor back pressured) per second.
AWS.KinesisAnalysis.currentInputWatermark milliseconds (ms) The last watermark this application/operator/task/thread has received.
AWS.KinesisAnalysis.currentOutputWatermark milliseconds (ms) The last watermark this application/operator/task/thread has emitted.
AWS.KinesisAnalysis.idleTimeMsPerSecond milliseconds (ms) The time this task or operator is idle per second.
AWS.KinesisAnalysis.managedMemoryUsed bytes The amount of managed memory currently used.
AWS.KinesisAnalysis.managedMemoryTotal bytes The total amount of managed memory.
AWS.KinesisAnalysis.numberOfFailedCheckpoints Count The number of times checkpointing has failed.
AWS.KinesisAnalysis.numRecordsIn Count The total number of records this application, operator, or task has received.
AWS.KinesisAnalysis.numRecordsOut Count The total number of records this application, operator or task has emitted.
AWS.KinesisAnalysis.numLateRecordsDropped Count The number of records that were dropped because they arrived late and were beyond the processing window.
AWS.KinesisAnalysis.oldGenerationGCcount Count The number of times the old generation garbage collection has occurred.
AWS.KinesisAnalysis.oldGenerationGCTime milliseconds (ms) The total time spent on old generation garbage collection.
AWS.KinesisAnalysis.millisBehindLatest milliseconds (ms) Indicates how many milliseconds behind the latest data the application is.
AWS.KinesisAnalysis.bytesRequestedPerFetch bytes The number of bytes requested per fetch operation from the data stream.
AWS.KinesisAnalysis.currentoffsets Count The current offsets of the data being processed in a Kinesis Data Analytics application.
AWS.KinesisAnalysis.commitsFailed Count The number of failed commit attempts in the application.
AWS.KinesisAnalysis.commitsSucceeded Count The number of successful commit operations.
AWS.KinesisAnalysis.committedoffsets Count The number of offsets that have been successfully committed.
AWS.KinesisAnalysis.records_lag_max Count The maximum lag in records being processed, measured in milliseconds.
AWS.KinesisAnalysis.bytes_consumed_rate bytes The rate at which data is consumed from the Kinesis stream.
AWS.KinesisAnalysis.zeppelinCpuUtilization Percent (%) The percentage of CPU resources being used by the Zeppelin server.
AWS.KinesisAnalysis.zeppelinHeapMemoryUtilization Percent (%) The percentage of heap memory utilized by the Zeppelin server.
AWS.KinesisAnalysis.zeppelinThreadcount Count per second The number of active threads being used by the Zeppelin server.
AWS.KinesisAnalysis.zeppelinWaitingJobs Count The number of jobs waiting to be executed in the Zeppelin server.
AWS.KinesisAnalysis.zeppelinServerUptime seconds (s) The uptime of the Zeppelin server, indicating how long it has been running continuously.