Documentation forServer & Application Monitor

AIX

This template assesses the performance of the AIX OS installed on the target server by using Perl scripts to monitor the performance of queries. It supports all versions of IBM AIX. SolarWinds recommends installing and using NET-SNMP to monitor. Visit the SolarWinds Success Center and see How to configure SNMP on Linux and Unix for details.

Prerequisite

SSH and Perl installed on the target server

Credentials

Root credentials on the target server

Port

Use port 1161 for the template.

Component monitors

Set thresholds for counters according to your environment. SolarWinds recommends monitoring counters for some period of time to understand potential value ranges and then set the thresholds accordingly. Click here to learn more about thresholds.

CPU statistic (%)

Returns the percentage of CPU time used. The returned values are as follows:

  • User – The percentage of CPU time spent running non-kernel code (user time). This represents the time spent executing user code. This statistic depends on the programs that the user is running. It is recommended to use the lowest threshold possible.
  • System – The percentage of CPU time spent running the system kernel code (system time). It is recommended to use the lowest threshold possible.
  • Wait – The percentage of CPU time waiting for I/O. It is recommended to use the lowest threshold possible.
  • Idle – The percentage of CPU time spent idle. It is recommended to use the highest threshold possible at all times.

System faults statistic/sec

Returns the rate of system faults, per second. The returned values are as follows:

  • Interrupts – The number of interrupts per second. The threshold for this component depends on the processor. For modern CPUs, a threshold of 1,500 interrupts/sec is a acceptable. A dramatic increase in this value, without a corresponding increase in system activity, indicates a hardware problem.
  • System_Calls – The number of system calls per second. This is a measure of how busy the system is handling applications and services. High System Calls/sec indicates high utilization caused by software. With today's faster CPUs, 20,000 would represent a reasonable threshold.
  • Context_Switches – The number of context switches per second. High activity rates can result from inefficient hardware or poorly designed applications. The normal amount of Context Switches/Sec depends on your servers and applications. The threshold for Context Switches/sec is cumulative for all processors, so you need a minimum of 14,000 per processor (single=14,000, dual=28,000, quad=56,000, and so forth).

Kernel threads statistic

Returns the number of kernel threads in different states. The returned values are as follows:

  • In_Run_Queue – This component returns the average number of runnable kernel threads over the sampling interval. This should be as low as possible. If the run queue is constantly growing, it may indicate the need for a more powerful CPU or more CPUs. Set the thresholds appropriately for your environment.
  • Waiting_For_resources – This component returns the average number of kernel threads placed in the VMM wait queue (awaiting resource, awaiting input/output) over the sampling interval. This should be as low as possible. Set the thresholds appropriately for your environment.

Memory and Swap statistic (MB)

Returns the memory and swap statistic in MB. The returned values are as follows:

  • Free_Memory – The amount of available memory in MB. Use the highest threshold possible at all times. Set the thresholds appropriately for your environment.
  • Used_Memory – The amount of used memory in MB. Use the lowest threshold possible.
  • Free_Swap – The amount of available swap in MB. Use the highest threshold possible at all times. Set the thresholds appropriately for your environment.
  • Used_Swap – The amount of used swap in MB. Use the lowest threshold possible.

Paging statistic/sec

Returns the different paging statistics. The returned values are as follows:

  • Page_Faults – The number of page faults per second. This is not a count of page faults that generate I/O. Some page faults can be resolved without I/O. Use the lowest threshold possible.
  • Paged_In – The rate of pages "paged in" from paging space in kB, per second. The operation of reading one inactive page or a cluster of inactive memory pages from the disk is called a "page in." Use the lowest threshold possible.
  • Paged_Out – The rate of pages "paged out" from paging space in kB, per second. The operation of writing one inactive page or a cluster of inactive memory pages to the disk is called a "page out." Use the lowest threshold possible. Values above 20 pages (80 kB), or so, indicate a significant performance problem. In this situation, more memory should be installed.

Processes in different states

Returns the different paging statistics. The returned values are as follows:

  • Zombie – The number of processes that are terminated and where the parent is not waiting. This should always be zero. If it is not zero, you should manually kill zombie processes. Use the following commands to see these zombie processes: ps –ef | grep defunct.
  • Active – The number of processes that are on run queue.
  • Swapped – The number of processes that are currently in swap.
  • Idle – The number of processes that are idle (waiting for startup).
  • Canceled – The number of processes that were canceled.
  • Stopped – The number of processes that are stopped, either by a job control signal or because it is being traced.

Space on root (/) partition (MB)

This monitor returns the available and used space of the root (/) partition in MB. The returned values are as follows:

  • Available_Space – The available space on the root (/) partition in MB. Use the highest threshold possible at all times.
  • Used_Space – The used space on the root (/) partition in MB.

Percentage of using system devices

Returns the name of the system device and the percentage of time the device was busy servicing a transfer request.

After applying this template on the target node, navigate to the Edit Application Page and click Get Script Output in the Script section. This will build the list of system devices that should be monitored.

Disk operations/sec of system devices

Returns the name of the system device and its read/write transfers to or from the device.

After applying this template on the target node, navigate to the Edit Application Page and click Get Script Output in the Script section. This will build the list of system devices that should be monitored.

Top 10 active processes

Returns the top 10 active processes and share of CPU usage in percent.