Documentation forSolarWinds Platform Self-Hosted

Troubleshoot container monitoring

This topic applies only to the following products:

SolarWinds Observability Self-Hosted

SAMVMAN

This topic provides tips to resolve issues you may encounter when using the Container Monitoring feature. You can also search the Success Center or THWACK.

Before proceeding, note these details:

  • (Recommended) Review Container monitoring requirements.
  • SolarWinds documentation describes how to display container data in the SolarWinds Platform Web Console. To manipulate containers directly, refer to third-party vendor documentation. For example, to learn about swarm mode, see Docker docs (© 2021 Docker, Inc., available at docs.docker.com, obtained on June 14, 2021.)

Starting in Orion Platform 2020.2.6, use SolarWinds Tokens for container monitoring. Update any container services added in earlier versions. Otherwise, polling stops.

See also:

Issues

Cannot add a container service

Containers are not supported in High Availability (HA) or FIPS-enabled environments. If containers were added before HA or FIPS was enabled, remove them from nodes and delete container services. Otherwise, container polling continues.

Resolve orion-aggregator log messages

If messages similar to Node with IP: {IP_address} is not added to Orion. Data is not polled appear in logs, verify that the Linux server that hosts containers exists as a managed Orion node, with ICMP configured as the Polling Method. To add a node, see Add a single node for monitoring to the SolarWinds Platform. See also Locate logs.

Container data does not appear in the SolarWinds Platform Web Console. Container service status appears as Unknown and Last Seen time does not update

In Orion Platform 2020.2.6 or later, polling stops until you edit container services added in earlier versions to use SolarWinds Tokens.

You can also review port requirements in the following sections to rule out firewall issues:

What happened to my container data?

By default, the SolarWinds Platform clears all data, including images, for containers that report as being deleted for over 7 days.

Containers switch to Unknown status, or inactive containers cause AppStack and PerfStack errors

Verify credentials for the container service and edit, if necessary. If you upgraded to Orion Platform 2020.2.6 or later, edit existing container services to use SolarWinds Tokens.

See Manage container services to about the following types of containers that are deployed to container environments:

  • Orion Monitor containers track status and metrics for each node in a cluster.
  • Orion Aggregator containers on orchestrator master nodes collect data from Orion Monitor containers in the cluster and reporting status to the SolarWinds Platform server every five minutes

Containers are missing after an upgrade

If you recently upgraded to Orion Platform 2020.2.6 or later, edit container services added in earlier versions to use SolarWinds Tokens.

In earlier versions, you can delete the related container service and then add it back to the SolarWinds Platform again to refresh YAML files. The Orion Monitor and Orion Aggregator containers will detect containers during the next polling cycle.

Choose a polling engine for container monitoring in Orion Platform 2020.2.5 and earlier

Upgrade to Orion Platform 2020.2.6 or later to specify polling engines when adding container services.

As a workaround in earlier versions, change the Orion URL property in the script that runs on the host machine when you add a container service. By default, that value is set to the IP address of the SolarWinds Platform server, which acts as the Main Polling Engine, but you can point it to an Additional Polling Engine (APE) instead.

Note that the exact property name used in scripts varies:

  • Docker Compose file: ORION_URL
  • Kubernetes YAML file: ORION_URL
  • Apache Mesos file: ORION_CONSOLE_URL

Docker hosts cannot reach the Orion website using an IP address, domain name, or host name

When you add a container service, a configuration file with details about the SolarWinds Platform Web Console is downloaded to the host. If the host cannot access the SolarWinds Platform Web Console or resolve the IP address, domain name, or host name provided, the script fails.

Here are possible workarounds: 

  • If the container host cannot reach the SolarWinds Platform Web Console using the domain name or host name in the script, edit the script on the host to change the domain or host name to the IP address of the SolarWinds Platform Web Console. If the opposite is true, and the connection cannot be made using the IP address in the script, edit the script to change the IP address to the domain/host name.
  • Navigate to the URL in the script and download the file manually. Copy the script to the host server, and then run it without the curl command that transfers data automatically.

What happened to the EnvironmentVariable available in Orion Platform 2020.2.5 and earlier?

The EnvironmentVariable was deprecated in Orion Platform 2020.2.6. Several other SolarWinds Information Service (SWIS) changes occurred .

The following entities were removed:

  • Cortex.Orion.Cman.Container
  • Cortex.Orion.Cman.ContainerAgent
  • Cortex.Orion.Cman.ContainerEnvironmentVariable
  • Cortex.Orion.Cman.ContainerHost
  • Cortex.Orion.Cman.ContainerImage
  • Cortex.Orion.Cman.Container.CpuMetrics
  • Cortex.Orion.Cman.Container.MemoryMetrics
  • Cortex.Orion.Cman.Container.Statistics

Similar data is available under:

  • Orion.Cman.Container
  • Orion.Cman.ContainerAgent
  • Orion.Cman.ContainerCpuMetrics
  • Orion.Cman.ContainerImage
  • Orion.Cman.ContainerMemoryMetrics

AKS container monitoring stops

After Azure Kubernetes Service (AKS) switched to Containerd instead of Docker for node pools, container monitoring stops and the following messages appear in logs:

  • Cannot connect to Docker endpoint
  • Error doing controls for orionaggregator-service.orion.svc.cluster.local

No workaround is currently available. See the latest SAM Release Notes for updates.

Adjust default settings

You can edit the following settings on the Advanced Configuration page, if necessary.

  • IncludeExceptionDetailsInWcfWeb: Disabled by default, SolarWinds Support may ask you to enable this option to gather stack trace and exception details related to container monitoring.

  • PollingInterval: By default, container services are polled every five minutes.

  • WebHttpsServicePort: The SolarWinds Platform server uses port 38012 internally to send data received from Orion Aggregator containers to the SolarWinds Orion API.

Polling interval changes only apply to new container services.

You can edit default data retention settings in the SolarWinds.Orion.ContainerMgmt.BusinessLayer.dll.config file, typically stored in C:\Program Files\SolarWinds\Orion

  • CleanupJobInterval: 480 minutes
  • MaxDeletedContainerAge: 60 minutes

Locate logs

Starting in Orion Platform 2020.2.6, logs are typically stored here:

C:\ProgramData\Solarwinds\Logs\Orion\ContainerManagement

In earlier versions, logs were located in C:\ProgramData\SolarWinds\Logs\Cortex.

When examining logs, search for the following keywords:

  • CMAN
  • ContainerMgmt
  • ContainerManagement.

For logs about data collected from each node in a cluster, run the following command on the node that hosts the OrionAggregator container:

sudo docker logs -f [container_id]

To determine the container_id for the OrionAggregator container, run: sudo docker ps.

In YAML files, Orion Aggregator containers are referenced as "Scope2Orion".