CRI + containerd plugin
In December 2020, Kubernetes v1.20 shifted to a generalized solution that uses Containers Runtime Interface (CRI) as an internal abstraction layer instead of the Docker containers runtime it previously relied upon. Containerd is currently the default CRI implementation used in Kubernetes clusters.
This plugin collects metrics from containerized environments using two data sources (whichever is found running and available to be queried using its API):
- the CRI that is defined and used inside the Kubernetes cluster as a communication protocol with the underlying containers runtime
- ContainerD proprietary extended telemetry data
The data tree blob returned by the API is flattened into a metrics list for use in AppOptics, maximizing the adherence and cohesion of the metric names with what the APIs return. The plugin connects to the CRI and containerd APIs using the standardized gRPC protocol over a UNIX socket or named pipes for Windows-based environments.
Setup
The cri
plugin is included with the SolarWinds Snap Agent by default. To enable the plugin for an agent instance, ensure the Prerequisites are met before you Configure the plugin.
Prerequisites
This plugin requires that CRI socket/named pipe exposing the CRI (and/or containerd) metrics endpoint API is available for the plugin.
Verify the socket file permissions for the solarwinds:solarwinds
user that runs the SWISnap agent are set to allow read access .
The collector features a UNIX socket/named pipe autodiscovery feature. It attempts to connect to the CRI and/or containerd API using the user-provided address in the task configuration. If the connection attempt is unsuccessful, the collector will attempt to connect to a predefined list of addresses until it successfully connects. The following table lists the paths that will be used to connect to the CRI and/or containerd API.
Linux | Windows |
---|---|
"<optional-user-provided-socket-path>"
|
"<optional-user-provided-named-pipe>"
|
"unix:///var/run/dockershim.sock"
|
"npipe:////./pipe/dockershim"
|
"unix:///run/containerd/containerd.sock"
|
"npipe:////./pipe/containerd"
|
"unix:///run/crio/crio.sock"
|
"npipe:////./pipe/crio"
|
"unix:///var/run/frakti.sock"
|
This plugin fully supports CRIs that implement any of the following protocols:
- CRI v1
- CRI v1alpha2
- containerd (both cgroups v1 and cgroups v2)
For monitoring the docker containers runtime, refer to the Docker integration plugin instructions.
Configure the plugin
Included with the agent is an example v2 task manifest file with a pre-defined task configuration that will help you get started collecting Kubernetes metrics. When the task configuration is enabled, the snap agent will load the plugin when needed.
-
To enable the task, make a copy of the example file and name it
task-cri.yaml
:sudo cp -p /opt/SolarWinds/Snap/etc/tasks-autoload.d/task-cri.yaml.example /opt/SolarWinds/Snap/etc/tasks-autoload.d/task-cri.yaml
-
Edit the
/opt/SolarWinds/Snap/etc/tasks-autoload.d/task-cri.yaml
v2 task manifest file to match your custom settings (optional):--- version: 2 schedule: type: cron interval: "0 * * * * *" plugins: - plugin_name: cri config: ## CRI API address ## Defaults to autodiscovery based on the following lists (in order of connection trial-and-error): ## for linux: # address: "unix:///var/run/dockershim.sock" # address: "unix:///run/containerd/containerd.sock" # address: "unix:///run/crio/crio.sock" # address: "unix:///var/run/frakti.sock" ## for windows: # address: "npipe:////./pipe/dockershim" # address: "npipe:////./pipe/containerd" # address: "npipe:////./pipe/crio" ## CRI API connection timeout ## Defaults to 5s # timeout: 5s ## CRI API connection insecure mode ## Defaults to false # insecure: false # metrics: ## optionally filter metrics here publish: - plugin_name: publisher-appoptics
-
To apply the changes, restart the agent.
On Linux run:
sudo service swisnapd restart
On Windows run the following from the command line:
net stop swisnapd net start swisnapd
-
Enable the CRI plugin in AppOptics:
On the Integrations Page you will see CRI plugin available. It may take a couple minutes before the cri integration is identified. Select CRI + containerd to open the configuration menu in AppOptics, and enable it. If you do not see CRI + containerd as an option, see Troubleshooting Linux or Troubleshooting Windows.
The
cri
and/orcontainerd
metrics will soon be reported to your dashboard.
Metrics and Tags
Default Metric Tags
All metrics are tagged by default with some additional information that helps to identify the origin of the values:
- CRI metadata:
- version
- runtime
- runtime-version
- runtime-api-version
- container meta-information:
- state
- id
- image
All Metrics
The metrics namespaces shown in AppOptics match the API endpoints' names in the CRI/containerd stats API, with the exception of the containerd blkio statistics. Containerd blkio statistics are modified to use simplified metrics namespaces for better readability in AppOptics.
All metrics are float64 values. Use the form /cri/container/...
to filter or request the metrics. They can also be queried for the particular container <id>
that is passed as one of the tags.
CRI metrics
All available CRI metrics are listed in the table below.
CRI metrics |
---|
cri.container.attributes.annotations.io.kubernetes.container.restartcount |
cri.container.attributes.annotations.io.kubernetes.pod.terminationgraceperiod |
cri.container.cpu.timestamp |
cri.container.cpu.usage_core_nano_seconds.value |
cri.container.memory.timestamp |
cri.container.uptime |
cri.container.writable_layer.timestamp |
cri.container.writable_layer.used_bytes.value |
Additional containerd metrics
The additional containerd metrics are listed in the table below.
containerd metrics |
---|
cri.container.attributes.metadata.attempt |
cri.container.blkio.io_merged_recursive.[device].async |
cri.container.blkio.io_merged_recursive.[device].read |
cri.container.blkio.io_merged_recursive.[device].sync |
cri.container.blkio.io_merged_recursive.[device].total |
cri.container.blkio.io_merged_recursive.[device].write |
cri.container.blkio.io_queued_recursive.[device].async |
cri.container.blkio.io_queued_recursive.[device].read |
cri.container.blkio.io_queued_recursive.[device].sync |
cri.container.blkio.io_queued_recursive.[device].total |
cri.container.blkio.io_queued_recursive.[device].write |
cri.container.blkio.io_service_bytes_recursive.[device].async |
cri.container.blkio.io_service_bytes_recursive.[device].read |
cri.container.blkio.io_service_bytes_recursive.[device].sync |
cri.container.blkio.io_service_bytes_recursive.[device].total |
cri.container.blkio.io_service_bytes_recursive.[device].write |
cri.container.blkio.io_service_time_recursive.[device].async |
cri.container.blkio.io_service_time_recursive.[device].read |
cri.container.blkio.io_service_time_recursive.[device].sync |
cri.container.blkio.io_service_time_recursive.[device].total |
cri.container.blkio.io_service_time_recursive.[device].write |
cri.container.blkio.io_serviced_recursive.[device].async |
cri.container.blkio.io_serviced_recursive.[device].read |
cri.container.blkio.io_serviced_recursive.[device].sync |
cri.container.blkio.io_serviced_recursive.[device].total |
cri.container.blkio.io_serviced_recursive.[device].write |
cri.container.blkio.io_time_recursive.0.major |
cri.container.blkio.io_time_recursive.0.value |
cri.container.blkio.io_wait_time_recursive.[device].async |
cri.container.blkio.io_wait_time_recursive.[device].read |
cri.container.blkio.io_wait_time_recursive.[device].sync |
cri.container.blkio.io_wait_time_recursive.[device].total |
cri.container.blkio.io_wait_time_recursive.[device].write |
cri.container.blkio.sectors_recursive.0.major |
cri.container.blkio.sectors_recursive.0.value |
cri.container.created_at |
cri.container.exit_code |
cri.container.finished_at |
cri.container.memory.working_set_bytes.value |
cri.container.pids.current |
cri.container.started_at |
cri.container.writable_layer.inodes_used.value |