Microsoft Lync Server 2013 (Front-End Role)
This SAM template assesses the status and overall health of services as well as the performance of the Front-End Microsoft Lync Server 2013.
Prerequisites
WMI access to the target server.
Credentials
Windows Administrator on the target server.
Component monitors
You need to set thresholds for these counters according to your environment. It is recommended to monitor these counters for some period of time to understand potential value ranges and then set the thresholds accordingly.
Service: Lync Server Application Sharing
This component monitor returns the CPU and memory usage of the Lync Server Application Sharing service.
Service: Lync Server Audio Test Service
This component monitor returns the CPU and memory usage of the Lync Server Audio Test Service. This service offers users the ability to subjectively test the quality of a call before placing the call. The user checks the call quality by making a test call.
Service: Lync Server Audio/Video Conferencing
This component monitor returns the CPU and memory usage of the Lync Server Audio/Video Conferencing service.
Service: Lync Server File Transfer Agent
This component monitor returns the CPU and memory usage of the Lync Server File Transfer Agent. The File Transfer Agent is responsible for replicating configuration settings with the Replica Replicator Agent that runs on every Lync Server.
Service: Lync Server Front-End
This component monitor returns the CPU and memory usage of the Front-End Lync Server. The Front-End Servers maintain transient information, such as logged-on state and control information for an IM, Web, or audio/video (A/V) conference.
Service: Lync Server IM Conferencing
This component monitor returns the CPU and memory usage of the Lync Server IM Conferencing. The IM Conferencing service is responsible for multiplexing the instant messages data feed from the leader to all participants in the session.
Service: Lync Server Master Replicator Agent
This component monitor returns the CPU and memory usage of the Lync Server Master Replicator Agent. This service is used by File Transfer Agent for replication configuration settings.
Service: Lync Server Replica Replicator Agent
This component monitor returns the CPU and memory usage of the Lync Server Replica Replicator Agent. This service is used by the File Transfer Agent for replication configuration settings.
Peers: Connections Active
This component monitor returns the number of established connections that are currently active. A connection is considered established when peer credentials are verified (e.g. via MTLS), or the peer receives a 2xx response. You will need to baseline this counter by testing and monitoring the user load. This returned value should be less than 15,000 connections per Front-End.
Peers: TLS Connections Active
This component monitor returns the number of established TLS connections that are currently active. A TLS connection is considered established when the peer certificate, and possibly the host name, are verified for a trust relationship. You will need to baseline this counter by testing and monitoring the user load.
Peers: Sends Outstanding
This component monitor returns the number of messages that are currently present in the outgoing queues. If you receive error message 504, investigate the results from this counter. Doing so will indicate which servers are having problems. To do so, you will need to change the instance from _Total, to the server hostname. You can check this within perfmon.exe
Peers: Average Outgoing Queue Delay
This component monitor returns the average time, in seconds, that messages have been delayed in outgoing queues. Check the Outgoing Queue Delay for delays in sending messages to other servers or clients that could be causing messages to be accumulated in the server. The server will drop client connections if it is in a throttle state and messages stay in the outgoing queue for more than 32 seconds.
Peers: Flow-controlled Connections
This component monitor returns the number of connections that are currently being flow-controlled (no socket receives are posted).
Peers: Average Flow-Control Delay
This component monitor returns the average delay, in seconds, in message processing when the socket is flow-controlled. You will need to baseline this counter by testing and monitoring the server's health. The returned value should be as low as possible.
Peers: Incoming Requests/sec
This component monitor returns the rate of received requests, per second. You will need to baseline this counter by testing and monitoring the user load.
Peers: Incoming Responses/sec
This component monitor returns the rate of received responses, per second.
Peers: Outgoing Requests/sec
The per-second rate of outgoing requests.
Peers: Outgoing Responses/sec
This component monitor returns the rate of outgoing responses, per second.
Protocol: Average Event Processing Time
This component monitor returns the average time (in seconds) it takes to process a SIP transaction or dialog state change event.
Protocol: Average Incoming Message Processing Time
This component monitor returns the average time (in seconds) it takes to process an incoming message.
Protocol: Average Local Message Processing Time
This component monitor returns the average time (in seconds) it takes to process a locally generated message.
Protocol: Average Number Of Active Worker Threads
This component monitor returns the average number of active SIP worker threads that process incoming messages.
Protocol: Events In Processing
This component monitor returns the number of SIP transactions, or dialog state change events, that are currently being processed. You will need to baseline this counter by testing and monitoring the user load.
Protocol: Events Processed/sec
This component monitor returns the rate of SIP transaction or dialog state change events that were delivered for processing, per second.
Protocol: Incoming Messages/sec
This component monitor returns the rate of received messages, per second. You will need to baseline this counter by testing and monitoring the user load.
Protocol: Messages In Server
This component monitor returns the number of messages currently being processed by the server.
Protocol: Outgoing Messages/sec
This component monitor returns the rate of sent messages, per second.
Responses: Local 500 Responses/sec
This component monitor returns the rate of 500 responses generated by the server, per second. This can indicate that there is a server component that is not functioning correctly.
Responses: Local 503 Responses/sec
This component monitor returns the rate of 503 responses generated by the server, per second. The 503 code corresponds to the server being unavailable. On a healthy server, you should not receive this code at a steady rate. However, during ramp up, after a server has been brought back online, there may be some 503 responses. Once all users get back in and the server returns to a stable state, there should no longer be any 503 responses returned.
Responses: Local 504 Responses/sec
This component monitor returns the rate of 504 responses generated by the server, per second. A few 504 responses to clients (for clients disconnecting abruptly) is to be expected, but this counter mainly indicates connectivity issues with other servers. It can indicate connection failures or delays connecting to remote servers.
Load Management: Address space usage
This component monitor returns the percentage of available address space currently in use by the server process. The returned value should be as low as possible.
Load Management: Average Holding Time For Incoming Messages
This component monitor returns the average time that the server held the incoming messages currently being processed. This should usually be less than one second, on average, but it is normal to see short spikes of up to three seconds. The server will throttle new incoming messages after going above the high watermark and until the number of messages falls below the low watermark. The server starts rejecting new connections when the average holding time is greater than overload time of 15 seconds.
Load Management: Page file usage
This component monitor returns the percentage of available page file space currently in use by the server process. The returned value should be as low as possible.
IMMcu Conferences: Active Conferences
This component monitor returns the number of active conferences. You will need to baseline this counter by testing and monitoring the user load.
IMMcu Conferences: Connected Users
This component monitor returns the number of connected users in all conferences. You will need to baseline this counter by testing and monitoring the user load.
IMMcu Conferences: Throttled Sip Connections
This component monitor returns the number of throttled Sip connections. If the value is greater than ten, it could indicate that Peer is not processing requests in a timely fashion. This can happen if the peer machine is overloaded. Peer is defined as the connected servers, adjacent Front-End servers, or MCUs in the same EE Pool. The same set of counters apply.
MCU Health And Performance: MCU Draining State
This component monitor returns the current draining status of the MCU.
Possible values:
0 = Not requesting to drain.
1 = Requesting to drain.
2 = Draining.
When a server is drained, it stops taking new connections and calls. These new connections and calls are routed through other servers in the pool. A server being drained allows its sessions on existing connections to continue until they naturally end. When all existing sessions have ended, the server is ready to be taken offline.
MCU Health And Performance: MCU Health State
This component monitor returns the current health of the MCU.
Possible values:
- 0 = Normal.
- 1 = Loaded.
- 2 = Full.
- 3 = Unavailable.
USrv - DBStore: Queue Latency (msec)
This component monitor returns the average time, in milliseconds, that a request is held in the database queue. This counter represents the time that a request spends in the queue of the Back-End Database Server. If the topology is healthy, this counter averages less than 100 ms. Occasional spikes are acceptable. The value will be higher on Front-End Servers that are located at the site opposite the location of the Back-End Database Servers. This value can increase if the Back-End Database Server is having performance problems or if network latency is too high. If the returned value is high, check both network latency and the health of the Back-End Database Server. Server health decreases as latency increases to 12 seconds, when server throttling begins.
USrv - DBStore: Sproc Latency (msec)
This component monitor returns the average time, in milliseconds, it takes to execute a stored procedure call. A healthy state is considered to be less than 100 ms. Server health decreases as latency increases to 12 seconds, when server throttling begins.
USrv - Https Transport: Active HTTPS connections
This component monitor returns the number of active HTTPS connections
USrv - Https Transport: Number of failed connection attempts / Sec
This component monitor returns the rate of connection attempt failures, per second. You will need to baseline this counter by testing and monitoring the server's health.