Microsoft Windows Server 2012 Failover Cluster
This template assesses the status and overall performance of a Microsoft Windows 2012 Failover Cluster by retrieving information from performance counters and the Windows System Event Log.
Prerequisites
WMI access to the target server.
Credentials
Windows Administrator on the target server.
All Windows Event Log monitors should return zero values. Returned values other than zero indicate an abnormality. Examining the Windows system log files should provide information pertaining to the issue.
Component monitors
Click here for an overview about SAM application monitor templates and component monitors. SAM API Poller templates are also available.
You need to set thresholds for counters according to your environment. It is recommended to monitor counters for some period of time to understand potential value ranges and then set the thresholds accordingly.
Service: Windows Time
This monitor returns the CPU and memory usage of the Windows Time service. This service maintains date and time synchronization on all clients and servers in the network. If this service is stopped, date and time synchronization will be unavailable. If this service is disabled, any services that explicitly depend on it will fail to start.
Service: Cluster Service
This monitor returns the CPU and memory usage of the Cluster service. This service enables servers to work together as a cluster to keep server-based applications highly available, regardless of individual component failures. If this service is stopped, clustering will be unavailable. If this service is disabled, any services that explicitly depend on it will fail to start.
Network Reconnections: Reconnect Count
This monitor returns the number of times the nodes have reconnected.
The instance field is installation-specific. You need to specify the hostname of your cluster node (for example: node1). By default, this component monitor is disabled and should only be enabled for troubleshooting purposes.
Network Reconnections: Normal Message Queue Length
This monitor returns the number of normal messages that are in the queue waiting to be sent. Normally this number is 0, but if the TCP connection breaks, you might observe it is going up until the TCP connection is reestablished and we can send all of them through.
The instance field is installation-specific. You need to specify the hostname of your cluster node (for example: node1). By default, this component monitor is disabled and should only be enabled for troubleshooting purposes.
Network Reconnections: Urgent Message Queue Length
This monitor returns the number of urgent messages that are in the queue waiting to be sent. Normally this number is 0, but if the TCP connection breaks, you might observe it going up until the TCP connection is re-established, thereby allowing all messages to be sent.
The instance field is installation-specific. You need to specify the hostname of your cluster node (for example: node1). By default, this component monitor is disabled and should only be enabled for troubleshooting purposes.
Messages Outstanding
This monitor returns the number of cluster MRR outstanding messages. The returned value should be near zero.
Resource Control Manager: Groups Online
This monitor returns the number of online cluster resource groups on this node. The returned value should be above zero at all times.
Resource Control Manager: RHS Processes
This monitor returns the number of running resource host subsystem processes (rhs.exe). The returned value should be above zero at all times.
Resource Control Manager: RHS Restarts
This monitor returns the number of resource host subsystem process (rhs.exe) restarts.
By default, this component monitor is disabled and should only be enabled for troubleshooting purposes.
Resources: Resource Failure
This monitor returns the number of resource failures. The returned value should be as low as possible.
Resources: Resource Failure Access Violation
This monitor returns the number of resource failures caused by access violation. The returned value should be as low as possible.
By default, this component monitor is disabled and should only be enabled for troubleshooting purposes.
Resources: Resource Failure Deadlock
This monitor returns the number of resource failures caused by deadlock. Deadlocks are usually caused by the resource taking too long to execute certain operations. The returned value should be as low as possible.
By default, this component monitor is disabled and should only be enabled for troubleshooting purposes.