Documentation forSolarWinds Observability

Kubernetes monitoring

Included in Infrastructure monitoring is the ability to monitor Kubernetes (K8s) clusters. If you use K8s clusters to distribute resources across multiple nodes, you can use SolarWinds Observability to monitor the resource usage, responsiveness, and error rate of your cluster and its nodes.

Monitor a K8s cluster

To monitor a K8s cluster, install a SWO K8s Collector on your cluster. The collector gathers Prometheus-compatible metrics, events, and logs and sends them to SolarWinds Observability. For detailed instructions, see Add a Kubernetes cluster; for advanced configuration options, see values.yaml in the SWO K8s Collector GitHub repository.

The SWO K8s Collector uses an industry standard Helm package manager. Helm allows for easier deployments, upgrades, and changes to the collector in your K8s cluster. To see the latest enhancements and avoid an incomplete view of the data in your dashboards, keep your configurations up-to-date with the latest version of the SWO K8s Collector. See Upgrade Kubernetes monitoring whenever updates are available.

Review K8s entity details

A Kubernetes cluster entity is created for each cluster that is monitored, as well as child Kubernetes node entities for each node on the cluster. High level summaries of the Kubernetes cluster and node entities are available in widgets in the Entity Explorer and Infrastructure Overview, as are widgets containing metrics gathered for the entity or entities. In addition to general performance metrics for the cluster itself, each Kubernetes cluster entity monitors the following items as individual entities: Namespaces, Pods, Deployments, StatefulSets, DaemonSets, Jobs, Services, Persistent volumes, and Persistent volume claims. To view details about each monitored element, click the tab for the respective element type and click an element in the list. These metrics are also available in the Metrics Explorer.

Kubernetes annotations with a custom swo prefix can be used for displaying workloads metadata on inspector panels and Kubernetes workload detail views of the Entity Explorer. For more information, see Annotations in Kubernetes documentation.

The following swo annotations format is used:

links (swo.cloud.solarwinds.com/link.{link-name}: "{link-source}")

values (swo.cloud.solarwinds.com/value.{value-name}: "{value-value}")

Example:

apiVersion: apps/v1

kind: Deployment

metadata:

name: my-deployment

labels:

swo.cloud.solarwinds.com/value.team: "example"

swo.cloud.solarwinds.com/link.github-repo: "https://github.com/example"

Topology tab

The Topology tab on the Kubernetes Cluster details page of the Entity Explorer displays a topological graph illustrating the communication links between all workloads and services within the cluster.

  • The nodes on the Topology tab on the Kubernetes Cluster details page are grouped by namespaces by default. You can expand or collapse a group by double-clicking.

  • A context menu is displayed when you right-click any node, providing the following options:

    • If the node is associated with a Kubernetes workload, an option to navigate to its details is provided.

    • If the node has an associated integration, DEM probe, APM service, or database, an option to navigate to the details is provided.

    • If an integration on the node is discovered, you are provided an option to add the integration.

    • If a database on the node is discovered, you are provided an option to add the database.

    • You can hide nodes. This helps clean up the topology graph.

  • Arrows on the connections indicate the direction of communication. If no arrow is present, the direction could not be identified.

  • The color of the connections indicates the health of the communication based on the HTTP success rate percentage, following the below rules:

    • Green: 100% success rate.

    • Yellow: Success rate between 70% and 100%.

    • Red: Success rate below 70%.

    • Gray: The success rate could not be calculated, or the communication is not HTTP-based.

  • The colors on the nodes indicate the status of the related workload or the associated integration, DEM probe, APM service, or database.

  • Clicking a connection displays an inspector panel with detailed metrics of the connection.

  • Clicking a node displays an inspector panel with detailed information about the related workload or associated integration, DEM probe, APM service, or database.

  • You can temporarily rearrange the nodes using drag and drop. Custom arrangements are not displayed after refreshes.

  • With filter panels, you can filter nodes and connections via the following options:

    • Type - Filters nodes and connections that only communicate using selected types.

    • Scope - Filters nodes and their connections based on whether they run within the Kubernetes cluster.

      • Internal - The node is associated with a workload that runs within the cluster.

      • External - The node is not associated with any workload that runs within the cluster.

    • Namespace - Filters nodes and their connections that are related to a specific Kubernetes namespace.

    • Hide node - Applies filters using the Hide node option in the node context menu.

The Topology tab on the Kubernetes Deployment, StatefulSet, and DaemonSet details pages of the Entity Explorer displays a topological graph illustrating the communication links between the chosen workload and other workloads and services within the cluster.

The topology is computed using OTeL metrics or Istio metrics, depending on which are available. If both are available, displayed topology is computed based on Istio metrics. Switching to OTeL metrics is possible in the top-right corner of the topological graph.

  • The collection of OTeL metrics is not activated by default and must be enabled in the helm values.yaml using the setting ebpfNetworkMonitoring.enabled: true.
  • The collection of Istio metrics is activated by default. Istio must be installed and enabled on the cluster.
  • Fargate nodes are not supported for OTeL metrics or for Istio metrics.

Events tab

The Events tab of the Kubernetes cluster details view of the Entity Explorer lists events collected from your cluster during the specified time period. To create alerts for Kubernetes events, hover over the event you want to create an alert for, click the vertical ellipsis () on the right, and select Create alert. To find a specific event, do any of the following:

  • To change the time period, click the drop-down menu in the upper-right corner and choose how much historical data to display. You can choose to view all recent data for standard lengths of time, or to view data between two dates.

    To show data from a custom time period, choose Custom. In the calendar that appears, click the starting date of your time period and click the ending date. The time period's start and end times default to the current time. To change a start or end time, click the clock next to the time you want to change and click the desired time.

  • Enter text in the search field to find an event based on the message text.

  • Click Filter to open the Filters pane, and select one or more filters to specify what events to include.

Deployment tab Version column

The Version column on the Deployment tab of the Kubernetes Cluster details page displays the tag of the Docker image used by a container that shares the same name as its corresponding Kubernetes Deployment. You can quickly identify the specific version of the deployed application without the need to manually delve into the container or image details.

In the below scenario, the Version column for the Autopilot deployment will display dev-123, indicating that the deployment is currently utilizing the dev-123 tag of the autopilot Docker image.

  • Deployment Name: Autopilot

  • Container Name: Autopilot

  • Docker Image: autopilot:dev-123

The feature streamlines the process of monitoring and managing your deployed applications, by providing a straightforward method to verify the versions of your deployed applications across your environments. This simplifies version tracking, especially in environments where rapid iterations and deployments take place.

Review K8s logs and events

By default, some logs and events from the K8s cluster are collected. To change which K8s logs are sent to SolarWinds Observability, see Update Kubernetes cluster logs collection. Events and logs can also be found in the Logs Explorer. To search for your specific event or container log, use the following key:value pairs as search parameters:

Key Valid values
sw.k8s.log.type

String representing the type of log. Valid values include:

  • event
  • container
  • journal
k8s.cluster.name String representing the cluster name.
k8s.container.name String representing the container name.
k8s.node.name String representing the node name.
k8s.pod.name String matching the pod name.
k8s.namespace.name String matching the namespace name.
k8s.event.count For Kubernetes events, the count of how many of the same event occurred.
k8s.event.name For Kubernetes events, the name of the event.
k8s.event.reason For Kubernetes events, the reason for the event.
severity_text

String representing the severity of the event/log. Valid values include: 

  • Error
  • Warning
  • Normal

Example Kubernetes search in the Logs Explorer:

sw.k8s.log.type:container AND k8s.cluster.name:xyz AND k8s.pod.name:abc "some text to search"

Create Kubernetes alerts from alert templates

You can use pre-filled alert templates to create alerts related to your Kubernetes clusters and nodes.

The following Kubernetes alert templates are available:

  • Kubernetes container out of memory: This alert is triggered when a Kubernetes container is terminated due to out of memory.

  • Kubernetes node disk pressure: This alert is triggered when the Kubernetes node has an active disk pressure status condition.

  • Kubernetes node memory pressure: This alert is triggered when the Kubernetes node has an active memory pressure status condition.

  • Kubernetes node network unavailable: This alert is triggered when the network for a Kubernetes node is not configured correctly.

  • Kubernetes node PID pressure: This alert is triggered when the Kubernetes node has an active PID pressure status condition.

You can create Kubernetes alerts from alert templates through the Alert Settings page in SolarWinds Observability. You can also manage your Kubernetes-specific alerts on the Alerts tab of Kubernetes Cluster details or Kubernetes Node details pages of the Entity Explorer.

Kubernetes widgets

In addition to standard visualizations of metric data, the following widgets display detailed insights into your Kubernetes containers, clusters, and nodes.

Clusters per Infrastructure Provider

The Clusters per Infrastructure Provider widget shows the number of Kubernetes clusters for each Infrastructure Provider. Infrastructure Provider categories include Azure, AWS, and Other.

Events

The Events widget shows the number and type of events collected for your cluster.

Top 5 Clusters (Alerts)

The Top 5 Clusters (Alerts) widget shows the top five Kubernetes clusters with the greatest number of monitored alerts.

Top 5 Clusters (Nodes)

The Top 5 Clusters (Nodes) widget shows the top five Kubernetes clusters with the greatest number of monitored nodes.

Top 5 Clusters (Events)

The Top 5 Clusters (Warning Events) widget shows the top five Kubernetes clusters with the greatest number of warning events.

Top 5 Clusters (CPU Utilization)

The Top 5 Clusters (CPU Utilization) widget shows the top five Kubernetes clusters with the greatest percentage of CPU utilization.

Top 5 Clusters (Memory Utilization)

The Top 5 Clusters (Memory Utilization) widget shows the top five Kubernetes clusters with the greatest percentage of memory utilization.

Monitored element summary widgets

Widgets showing status summaries for Pods, Deployments, DaemonSets, and StatefulSets show the number of each respective monitored element in the cluster, categorized by the status of the elements.

Details

On the Overview tab, the Details widget shows the cluster name, universally unique identifiers (UID), infrastructure provider, and the collector status. For Kubernetes nodes, the Details widget includes the cluster name, pods, capacity details, and more.