Documentation forSolarWinds Observability

Troubleshoot the SWO K8s Collector

If you encounter trouble with Kubernetes data collection during or after installing the SWO K8s Collector, try the following steps:

Handle Helm chart installation failures

Occasionally, the installation of a Helm chart may fail due to certain policies enforced within the cluster. To address such scenarios, see the instructions below.

Identify the issue

  1. Review installation errors.

    • Look into the error messages provided during the Helm installation failure.

    • Use kubectl get events --sort-by='.metadata.creationTimestamp' -n YourNamespace to review recent events in the namespace where the installation was attempted.

  2. Check cluster policies.

    • Ensure that your configuration values within the Helm chart comply with the existing policies in your cluster.

    • For instance, a feature like AutoUpdate may not comply with certain cluster policies as it requires permissions to update almost any resource within the cluster. Make sure the configurations for such features adhere to your cluster's policies. You can also disable the feature.

  3. Disable automatic cleanup after a failed installation to collect additional information.

    • After a failed installation or upgrade, the default installation commands try to cleanup any K8s resources created during the process. However, that may remove also information about why the installation/upgrade failed.

    • To prevent that for installation, run

      helm install -f values.yaml swo-k8s-collector solarwinds/swo-k8s-collector --namespace <YourK8sNamespace> --wait
      instead of
      helm install -f values.yaml swo-k8s-collector solarwinds/swo-k8s-collector --namespace <YourK8sNamespace> --atomic

    • And for upgrade, run

      helm upgrade swo-k8s-collector solarwinds/swo-k8s-collector --namespace <YourNamespace> --wait
      instead of
      helm upgrade swo-k8s-collector solarwinds/swo-k8s-collector --namespace <YourNamespace> --cleanup-on-fail --atomic

    • This will keep all created resources in the K8s cluster also in case of a failure and will allow you to inspect their logs, configuration, etc.

Resolve the issue

By diligently reviewing error messages, aligning your Helm chart configurations with cluster policies, and ensuring a clean slate before re-installation, you enhance the likelihood of a successful deployment.

  1. Modify the Helm chart configuration to align with cluster policies by updating the values.yaml file or providing override values.

  2. Uninstall the failed Helm chart installation.

    Before attempting re-installation, clean up the failed installation to prevent any conflicts or resource leaks.

    Execute helm uninstall RELEASE_NAME -n YourNamespace to remove the failed installation, replacing RELEASE_NAME with the name of your release and YourNamespace with the appropriate namespace.

  3. Re-install the Helm chart. After making the necessary modifications, use the helm install command to start the Helm chart installation.

  4. Verify the installation was successful.

    • Make sure all resources are deployed successfully and are operating as expected.

    • Use helm status RELEASE_NAME -n YourNamespace to check the status of your release.

Use error logs to identify the cause of problems

To help identify the cause of the SWO K8s Collector problems, use the kubectl logs command to retrieve the logs from the containers running in your pods and search for logs with the severity of WARN or ERROR.

Gather the logs

Run the following command to retrieve the logs for all containers running the SWO K8s Collector, replacing YourNamespace with the Kubernetes namespace where the SWO K8s Collector is deployed.

kubectl logs --selector=app.kubernetes.io/part-of=swo-k8s-collector --all-containers -n=YourNamespace --prefix

Review the error logs to determine the cause of the issue and help identify the appropriate steps to take to resolve the issue. If the logs do not provide immediate insight into the problem or resolution, continue with the remaining troubleshooting steps.

Verify and fix connection to the SolarWinds Observability OTel endpoint

The SWO K8s Collector must be able to connect to the SolarWinds Observability OTel endpoint. The main reasons that the SWO K8s Collector cannot connect to the OTel endpoint are a missing or invalid API token and firewall or access control issues.

To identify whether the API token is configured correctly, make sure the solarwinds-api-token secret in the Kubernetes namespace where the SWO K8s Collector is deployed is not missing or invalid.

  1. Run the following command to verify the solarwinds-api-token secret exists, replacing YourNamespace with the Kubernetes namespace where the SWO K8s Collector is deployed.

    kubectl get secret solarwinds-api-token -n=YourNamespace
  2. Run the following command in Bash with the yq tool to get the value of the API token stored in the solarwinds-api-token secret, replacing YourNamespace with the Kubernetes namespace where the SWO K8s Collector is deployed.

    kubeclt get secrets solarwinds-api-token -n=YourNamespace -o yaml | yq e  ".data.SOLARWINDS_API_TOKEN" - | base64 -d
  3. Compare the API token value returned with the API token stored in SolarWinds Observability settings and make sure they're the same value. See the API Tokens settings page.

Firewall or access control settings block communications

Review your firewall or access control configuration and ensure it permits connections between the SWO K8s Collector and the SolarWinds Observability OTel endpoint (otel.collector.xx-yy.cloud.solarwinds.com (where xx-yy is determined by the URL you use to access SolarWinds Observability, described in Data centers and endpoint URIs).

See Firewall or access control requirements.

Review your Helm chart configuration

If there is unexpected behavior unrelated to connection issues between the SWO K8s Collector and the SolarWinds Observability endpoint, verify whether the Helm chart is configured correctly. Use Bash with diff and the yq (v4.x) tool installed to compare the default Helm chart configuration with configuration.

  1. Run the following command to save the default Helm chart configuration to default_values.yaml.

    helm show values solarwinds/swo-k8s-collector > default_values.yaml
  2. Run the following command to save a copy of your current deployment configuration to installed_values.yaml, replacing YourNamespace with the Kubernetes namespace where the SWO K8s Collector is deployed.

    helm get values swo-k8s-collector -n=YourNamespace installed_values.yaml
  3. Run the following command to show any differences between your configuration and the default configuration.

    diff <(yq -P e '... comments=""' default_values.yaml) <(yq -P e '... comments=""' installed_values.yaml)
  4. Review each difference listed and verify whether the change was intentional or could be causing a problem. If you do not remember making the change, review the release notes for changes to default values in the Helm chart.

  5. If in your review you identify a setting that needs to be updated in your Helm chart configuration, modify the installed_values.yaml file accordingly. Use the following command to apply the new configuration, replacing YourNamespace with the Kubernetes namespace where the SWO K8s Collector is deployed.

    helm upgrade -f installed_values.yaml swo-k8s-collector solarwinds/swo-k8s-collector --namespace YourNamespace --cleanup-on-fail --atomic