Troubleshoot container monitoring
This topic applies only to the following products:
SolarWinds Observability Self-Hosted
SAM — VMAN
This topic provides tips to resolve issues you may encounter when using the Container Monitoring feature. You can also search the Success Center or THWACK.
Before proceeding, note these details:
- (Recommended) Review Container monitoring requirements.
- SolarWinds documentation describes how to display container data in the SolarWinds Platform Web Console. To manipulate containers directly, refer to third-party vendor documentation. For example, to learn about swarm mode, see Docker docs (© 2021 Docker, Inc., available at docs.docker.com, obtained on June 14, 2021.)
Starting in Orion Platform 2020.2.6, use SolarWinds Tokens for container monitoring. Update any container services added in earlier versions. Otherwise, polling stops.
See also:
Issues
Cannot add a container service
Containers are not supported in High Availability (HA) or FIPS-enabled environments. If containers were added before HA or FIPS was enabled, remove them from nodes and delete container services. Otherwise, container polling continues.
Resolve orion-aggregator log messages
If messages similar to Node with IP: {IP_address} is not added to Orion. Data is not polled
appear in logs, verify that the Linux server that hosts containers exists as a managed Orion node, with ICMP configured as the Polling Method. To add a node, see Add a single node for monitoring to the SolarWinds Platform. See also Locate logs.
Container data does not appear in the SolarWinds Platform Web Console. Container service status appears as Unknown and Last Seen time does not update
In Orion Platform 2020.2.6 or later, polling stops until you edit container services added in earlier versions to use SolarWinds Tokens.
You can also review port requirements in the following sections to rule out firewall issues:
- Container monitoring requirements
- Docker requirements, deployment command examples, and container removal steps
- Docker Swarm requirements, deployment command examples, and container removal steps
- Kubernetes requirements, deployment command examples, and container removal steps
- Apache Mesos requirements, deployment command examples, and container removal steps
What happened to my container data?
By default, the SolarWinds Platform clears all data, including images, for containers that report as being deleted for over 7 days.
Containers switch to Unknown status, or inactive containers cause AppStack and PerfStack errors
Verify credentials for the container service and edit, if necessary. If you upgraded to Orion Platform 2020.2.6 or later, edit existing container services to use SolarWinds Tokens.
See Manage container services to about the following types of containers that are deployed to container environments:
- Orion Monitor containers track status and metrics for each node in a cluster.
- Orion Aggregator containers on orchestrator master nodes collect data from Orion Monitor containers in the cluster and reporting status to the SolarWinds Platform server every five minutes
Containers are missing after an upgrade
If you recently upgraded to Orion Platform 2020.2.6 or later, edit container services added in earlier versions to use SolarWinds Tokens.
In earlier versions, you can delete the related container service and then add it back to the SolarWinds Platform again to refresh YAML files. The Orion Monitor and Orion Aggregator containers will detect containers during the next polling cycle.
Choose a polling engine for container monitoring in Orion Platform 2020.2.5 and earlier
Upgrade to Orion Platform 2020.2.6 or later to specify polling engines when adding container services.
As a workaround in earlier versions, change the Orion URL property in the script that runs on the host machine when you add a container service. By default, that value is set to the IP address of the SolarWinds Platform server, which acts as the Main Polling Engine, but you can point it to an Additional Polling Engine (APE) instead.
Note that the exact property name used in scripts varies:
- Docker Compose file: ORION_URL
- Kubernetes YAML file: ORION_URL
- Apache Mesos file: ORION_CONSOLE_URL
Docker hosts cannot reach the Orion website using an IP address, domain name, or host name
When you add a container service, a configuration file with details about the SolarWinds Platform Web Console is downloaded to the host. If the host cannot access the SolarWinds Platform Web Console or resolve the IP address, domain name, or host name provided, the script fails.
Here are possible workarounds:
- If the container host cannot reach the SolarWinds Platform Web Console using the domain name or host name in the script, edit the script on the host to change the domain or host name to the IP address of the SolarWinds Platform Web Console. If the opposite is true, and the connection cannot be made using the IP address in the script, edit the script to change the IP address to the domain/host name.
- Navigate to the URL in the script and download the file manually. Copy the script to the host server, and then run it without the
curl
command that transfers data automatically.
What happened to the EnvironmentVariable available in Orion Platform 2020.2.5 and earlier?
The EnvironmentVariable was deprecated in Orion Platform 2020.2.6. Several other SolarWinds Information Service (SWIS) changes occurred .
The following entities were removed:
- Cortex.Orion.Cman.Container
- Cortex.Orion.Cman.ContainerAgent
- Cortex.Orion.Cman.ContainerEnvironmentVariable
- Cortex.Orion.Cman.ContainerHost
- Cortex.Orion.Cman.ContainerImage
- Cortex.Orion.Cman.Container.CpuMetrics
- Cortex.Orion.Cman.Container.MemoryMetrics
- Cortex.Orion.Cman.Container.Statistics
Similar data is available under:
- Orion.Cman.Container
- Orion.Cman.ContainerAgent
- Orion.Cman.ContainerCpuMetrics
- Orion.Cman.ContainerImage
- Orion.Cman.ContainerMemoryMetrics
AKS container monitoring stops
After Azure Kubernetes Service (AKS) switched to Containerd instead of Docker for node pools, container monitoring stops and the following messages appear in logs:
Cannot connect to Docker endpoint
Error doing controls for orionaggregator-service.orion.svc.cluster.local
No workaround is currently available. See the latest SAM Release Notes for updates.
Adjust default settings
You can edit the following settings on the Advanced Configuration page, if necessary.
-
IncludeExceptionDetailsInWcfWeb: Disabled by default, SolarWinds Support may ask you to enable this option to gather stack trace and exception details related to container monitoring.
-
PollingInterval: By default, container services are polled every five minutes.
-
WebHttpsServicePort: The SolarWinds Platform server uses port 38012 internally to send data received from Orion Aggregator containers to the SolarWinds Orion API.
Polling interval changes only apply to new container services.
You can edit default data retention settings in the SolarWinds.Orion.ContainerMgmt.BusinessLayer.dll.config
file, typically stored in C:\Program Files\SolarWinds\Orion
:
- CleanupJobInterval: 480 minutes
- MaxDeletedContainerAge: 60 minutes
Locate logs
Starting in Orion Platform 2020.2.6, logs are typically stored here:
C:\ProgramData\Solarwinds\Logs\Orion\ContainerManagement
In earlier versions, logs were located in C:\ProgramData\SolarWinds\Logs\Cortex
.
When examining logs, search for the following keywords:
CMAN
ContainerMgmt
ContainerManagement
.
For logs about data collected from each node in a cluster, run the following command on the node that hosts the OrionAggregator container:
sudo docker logs -f [container_id]
To determine the container_id
for the OrionAggregator container, run: sudo docker ps
.
In YAML files, Orion Aggregator containers are referenced as "Scope2Orion".