System Metrics
CloudLinux limits number of system users that can read special files from /proc
. This can cause that some metrics cannot be gathered. Refer to CloudLinux OS documentation on instructions how to grant solarwinds
group access.
Overview
SolarWinds Snap agent by default collects basic metrics from the host, like CPU, memory, disk and network utilization. There should be no need to modify the default configuration.
Setup
The system monitoring is accomplished by system
plugin. If you need to modify its default settings please follow instructions below.
Configuration
The system task configuration is placed in the following location:
-
On Windows:
C:\Program Data\SolarWinds\Snap\tasks-autoload.d\task-aosystem.yaml
On the windows server family to have working metrics like
/system/io/
you have to enable it by running commanddiskperf -y
in command prompt.For more information please refer to psutil disk io counters documentation.
-
On Linux:
/opt/SolarWinds/Snap/etc/tasks-autoload.d/task-aosystem.yaml
To make changes to default setup:
-
Edit the task file with settings specific to your use case:
--- version: 2 schedule: type: cron interval: "0 * * * * *" plugins: - plugin_name: aosystem config: ## mount_points allows filtering mount points that will be monitored. ## The default behavior is to monitor only physical devices (ie. hard disks, USB, etc.). ## To enable monitoring of all devices, use ‘*’. # mount_points: # - /dev # - /run # - C: ## exclude_disks allows to define which mount points should not be monitored. ## Basic globbing patterns are supported. # exclude_disks: # - /dev/loop* # - /dev/sdb1 # - D: ## Timeout for collecting system metrics. Default value is 20s. # system_query_timeout: "20s" metrics: - /system/cpu/guest - /system/cpu/idle - /system/cpu/interrupt - /system/cpu/iowait - /system/cpu/steal - /system/cpu/system - /system/cpu/user - /system/cpu/utilization - /system/cpu/per_cpu/[cpu]/guest - /system/cpu/per_cpu/[cpu]/idle - /system/cpu/per_cpu/[cpu]/interrupt - /system/cpu/per_cpu/[cpu]/iowait - /system/cpu/per_cpu/[cpu]/steal - /system/cpu/per_cpu/[cpu]/system - /system/cpu/per_cpu/[cpu]/user - /system/cpu/per_cpu/[cpu]/utilization - /system/disk/[mount_point]/bytes/free - /system/disk/[mount_point]/bytes/total - /system/disk/[mount_point]/bytes/used - /system/disk/[mount_point]/percent/free - /system/disk/[mount_point]/percent/used - /system/io/[mount_point]/bytes/read - /system/io/[mount_point]/bytes/write - /system/io/[mount_point]/io_time - /system/io/[mount_point]/io_weighted_time - /system/io/[mount_point]/ops/read - /system/io/[mount_point]/ops/write - /system/io/[mount_point]/time/read - /system/io/[mount_point]/time/write - /system/load/15 - /system/load/15_rel - /system/load/1 - /system/load/1_rel - /system/load/5 - /system/load/5_rel - /system/load/procs_blocked - /system/load/procs_running - /system/mem/buffered - /system/mem/cached - /system/mem/free - /system/mem/inactive - /system/mem/nonpaged - /system/mem/paged - /system/mem/percent/free - /system/mem/percent/used - /system/mem/total - /system/mem/used - /system/mem/wired - /system/net/all/bytes/rx - /system/net/all/bytes/tx - /system/net/all/drop/rx - /system/net/all/drop/tx - /system/net/all/errors/rx - /system/net/all/errors/tx - /system/net/all/packets/rx - /system/net/all/packets/tx - /system/swap/ins - /system/swap/outs - /system/swap/page/fault - /system/swap/page/ins - /system/swap/page/outs - /system/swap/percent/free - /system/swap/percent/used - /system/swap/total ## For backwards compatibility - /system/net/bytes/rx - /system/net/bytes/tx - /system/net/drop/rx - /system/net/drop/tx - /system/net/errors/rx - /system/net/errors/tx - /system/net/packets/rx - /system/net/packets/tx ## optional metrics # - /system/net/interface/[interface]/bytes/rx # - /system/net/interface/[interface]/bytes/tx # - /system/net/interface/[interface]/drop/rx # - /system/net/interface/[interface]/drop/tx # - /system/net/interface/[interface]/errors/rx # - /system/net/interface/[interface]/errors/tx # - /system/net/interface/[interface]/packets/rx # - /system/net/interface/[interface]/packets/tx # - /system/io/[mount_point]/io_merged/read # - /system/io/[mount_point]/io_merged/write publish: - plugin_name: publisher-appoptics
-
Restart the agent:
On Windows command line:
net stop swisnapd net start swisnapd
On Linux command line:
sudo service swisnapd restart
Testing Integration
To check if and what metrics can be collected with given configuration, run system plugin
in debug mode:
On Windows command line:
"C:\Program Files\SolarWinds\Snap\bin\snap-plugin-collector-aosystem.exe" --debug-mode --plugin-config "{}"
On Linux command line:
/opt/SolarWinds/Snap/bin/snap-plugin-collector-aosystem --debug-mode --plugin-config "{}"
Metrics and Tags
The tables below list the default set, and optional extended set, of system metrics collected by the SolarWinds Snap Agent.
CPU Metrics
Metric | Description |
---|---|
system.cpu.guest | Time spent in guest mode by all CPUs (only for Linux) |
system.cpu.idle | Time spent in the idle task. This value should be USER_HZ times the second entry in the /proc/uptime pseudo-file by all CPUs |
system.cpu.interrrupt | Time servicing interrupts by all CPUs |
system.cpu.iowait | Time waiting for I/O to complete by all CPUs (only for linux) |
system.cpu.steal | Stolen time, which is the time spent in other operating systems when running in a virtualized environment by all CPUs (only for Linux) |
system.cpu.system | Time spent in system mode by all CPUs |
system.cpu.user | Time spent in user mode by all CPUs |
system.cpu.utilization | Total cpu utilization |
system.per_cpu.guest | Time spent in guest mode |
system.per_cpu.idle | Time spent in the idle task. This value should be USER_HZ times the second entry in the /proc/uptime pseudo-file |
system.per_cpu.interrrupt | Time servicing interrupts |
system.per_cpu.iowait | Time waiting for I/O to complete |
system.per_cpu.steal | Stolen time, which is the time spent in other operating systems when running in a virtualized environment |
system.per_cpu.system | Time spent in system mode |
system.per_cpu.user | Time spent in user mode |
system.per_cpu.utilization | Total cpu utilization |
CPU Metric Tags
Tag Name | Description |
---|---|
hostname | Name of the host. Instead of using this tag we recommend using the @host alias |
cpu | Number of the core for per_cpu metrics |
Disk Metrics
Metric | Description |
---|---|
system.disk.bytes.free | Free user space which is available to use in mount point |
system.disk.bytes.total | Total space which is available to root in mount point |
system.disk.bytes.used | Used user space which is available to use in mount point |
system.disk.percent.free | User usage percent compared to the total amount of space the user can use in mount point |
system.disk.percent.used | User free percent compared to the total amount of space the user can use in mount point |
If you want to control for which disks metrics are gathered, you can use mount_points
and exclude_disks
setting in task-aosystem.yaml
.
Disk Metric Tags
Tag Name | Description |
---|---|
hostname | Name of the host. Instead of using this tag we recommend using the @host alias |
device | Device name |
mount_point | Mount point |
IO Metrics
Metric | Description |
---|---|
system.io.bytes.read | Bytes in read operations on given device |
system.io.bytes.write | Bytes in write operations on given device |
system.io.io_time | Time spend on IO (ms/s) |
system.io.io_weighted_time | Time spend on IO times the IO queue |
system.io.ops.read | Number of read operations on given device |
system.io.ops.write | Number of write operations on given device |
system.io.time.read | Cumulative duration of read operations on given device |
system.io.time.write | Cumulative duration of write operations on given device |
IO Metric Tags
Tag Name | Description |
---|---|
hostname | Name of the host. Instead of using this tag we recommend using the @host alias |
device | Device name |
Load Metrics
Metric | Description |
---|---|
system.load.load1 | Load average over the last 1 minute |
system.load.load15 | Load average over the last 15 minutes |
system.load.load5 | Load average over the last 5 minutes |
system.load.load1_rel | Load average over the last 1 minute, normalized to number of cores |
system.load.load15_rel | Load average over the last 15 minutes, normalized to number of cores |
system.load.load5_rel | Load average over the last 5 minutes, normalized to number of cores |
system.load.procs_blocked | The number of processes currently blocked, waiting for I/O to complete |
system.load.procs_running | The number of processes currently running on CPUs |
Load Metric Tags
Tag Name | Description |
---|---|
hostname | Name of the host. Instead of using this tag we recommend using the @host alias |
Memory Metrics
Metric | Description |
---|---|
system.mem.buffered | Cache for things like file system metadata (bytes) |
system.mem.cached | Cache for various things (bytes) |
system.mem.free | Memory not being used at all (zeroed) that is readily available (bytes); note that this doesn't reflect the actual memory available (use system.mem.available instead). |
system.mem.inactive | Memory that is marked as not used (bytes) |
system.mem.total | Total physical memory available (bytes) |
system.mem.used | Memory used, calculated differently depending on the platform and designed for informational purposes only (bytes) |
system.mem.wired | Memory that is marked to always stay in RAM (bytes). It is never moved to disk |
system.mem.paged | Memory that is used for objects that can be written to disk when they are not being used (only for Windows) |
system.mem.nonpaged | Memory that is used for objects that cannot be written to disk, but must remain in physical memory as long as they are allocated (only for Windows) |
system.mem.percent.free | Percentage of memory that is available |
system.mem.percent.used | Percentage of memory that is not available |
Memory Metric Tags
Tag Name | Description |
---|---|
hostname | Name of the host. Instead of using this tag we recommend using the @host alias |
Network Metrics
Metric | Description |
---|---|
system.net.all.bytes.rx | Number of bytes sent |
system.net.all.bytes.tx | Number of bytes received |
system.net.all.packets.rx | Number of packets received |
system.net.all.packets.tx | Number of packets sent |
system.net.all.drop.rx | Number of packets dropped instead of being sent |
system.net.all.drop.tx | Number of packets dropped instead of being received |
system.net.all.errors.rx | Number of packets errored while sending |
system.net.all.errors.tx | Number of packets errored while receiving |
system.net.bytes.rx | Number of bytes sent on given interface |
system.net.bytes.tx | Number of bytes received on given interface |
system.net.packets.rx | Number of packets received on given interface |
system.net.packets.tx | Number of packets sent on given interface |
system.net.drop.rx | Number of packets dropped instead of being sent on given interface |
system.net.drop.tx | Number of packets dropped instead of being received on given interface |
system.net.errors.rx | Number of packets errored while sending on given interface |
system.net.errors.tx | Number of packets errored while receiving on given interface |
Interface metrics that allow filltering by interface name. If you would like to use them, please disable per interface metrics.
Metric | Description |
---|---|
system.net.interface.bytes.rx | Number of bytes sent on given interface |
system.net.interface.bytes.tx | Number of bytes received on given interface |
system.net.interface.packets.rx | Number of packets received on given interface |
system.net.interface.packets.tx | Number of packets sent on given interface |
system.net.interface.drop.rx | Number of packets dropped instead of being sent on given interface |
system.net.interface.drop.tx | Number of packets dropped instead of being received on given interface |
system.net.interface.errors.rx | Number of packets errored while sending on given interface |
system.net.interface.errors.tx | Number of packets errored while receiving on given interface |
Network Metric Tags
Tag Name | Description |
---|---|
hostname | Name of the host. Instead of using this tag we recommend using the @host alias |
interface | Interface 1 |
hardware_addr | Hardware address 1 |
mtu | Maximum transmission unit 1 |
1 Only on system.net.bytes.*
and system.net.packets.*
metrics.
Swap Metrics
Metric | Description |
---|---|
system.swap.total | Total amount of swap available (bytes) |
system.swap.percent.free | Percentage of swap available |
system.swap.percent.used | Percentage of swap used |
system.swap.ins | Number of kilobytes the system has swapped in from disk per second (only for Linux) |
system.swap.outs | Number of kilobytes the system has swapped out to disk per second (only for Linux) |
system.swap.page.fault | On Linux, number of page faults, the virtual memory statistics. On Windows, the average number of pages faulted per second |
system.swap.page.ins | On Linux, total number of kilobytes the system paged in from disk per second. Note: With old kernels (2.2.x) this value is a number of blocks per second (and not kilobytes). On Windows, the rate at which pages are read from disk to resolve hard page faults |
system.swap.page.outs | On Linux, total number of kilobytes the system paged out to disk per second. Note: With old kernels (2.2.x) this value is a number of blocks per second (and not kilobytes). On Windows, the rate at which pages are written to disk to free up space in physical memory |
Swap Metric Tags
Tag Name | Description |
---|---|
hostname | Name of the host. Instead of using this tag we recommend using the @host alias |
Optional Metrics
Optinal metrics can be activated by editing the task yaml. For more information please read the SolarWinds Snap Agent Task File article.
Metric | Description |
---|---|
system.cpu.guest_nice | Time spent running a niced guest (virtual CPU for guest operating systems under the control of the Linux kernel) |
system.cpu.nice | Time spent in user mode with low priority (nice) |
system.cpu.softirq | Time spent servicing softirqs |
system.cpu.stolen | CPU cycles that are reclaimed by a virtual machine's hypervisor because it reached maximum processing capacity performing other tasks. |
system.mem.active | Memory currently in use or very recently used, and so it is in RAM |
system.mem.available | On Linux, the actual amount of available memory that can be given instantly to processes that request more memory in bytes; this is calculated by summing different memory values depending on the platform (e.g. free + buffers + cached on Linux) and it is supposed to be used to monitor actual memory usage in a cross platform fashion. On Windows, the actual amount of available memory that can be given instantly to processes that request more memory in bytes; this is calculated by summing different memory values depending on the platform (e.g. free + buffers + cached on Linux) and it is supposed to be used to monitor actual memory usage in a cross platform fashion |
Optional Metric Tags
Tag Name | Description |
---|---|
hostname | Name of the host. Instead of using this tag we recommend using the @host alias |
cpu | (only on system.cpu.* metrics) cpu core number or total |
Troubleshooting
Timeout For Querying System Statistics
If you encounter issues with timeouts collecting system metrics (such as disk metrics), you can use system_query_timeout
setting. By default it is set to 20s.
Navigation Notice: When the APM Integrated Experience is enabled, AppOptics shares a common navigation and enhanced feature set with other integrated experience products. How you navigate AppOptics and access its features may vary from these instructions.
The scripts are not supported under any SolarWinds support program or service. The scripts are provided AS IS without warranty of any kind. SolarWinds further disclaims all warranties including, without limitation, any implied warranties of merchantability or of fitness for a particular purpose. The risk arising out of the use or performance of the scripts and documentation stays with you. In no event shall SolarWinds or anyone else involved in the creation, production, or delivery of the scripts be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the scripts or documentation.