Understand the impact IP SLA operations have on your network
When configured properly, IP SLA operations have a minimal impact on your overall network health. However, problems can arise when configurations force operations to be tested too frequently, or when too many overlapping operations are performed across similar paths.
Most problems occur when using IP SLA operations on a fully meshed network. For example, in a fully meshed network with seven devices, a simple ICMP Echo operation would require 42 operations to test each link in each direction. The number of links is calculated in the following way:
Hub-and-Spoke Links = N - 1
Full Mesh Links = N(N - 1)/2
N is the number of devices on the network. Therefore, the number of links in a seven device fully meshed network would be 7(7 - 1)/2, or 7(6)/2, or 21.
To test each link bi-directionally, twice as many operations are needed. The number of bi-directional links is found using the following calculation:
Hub-and-Spoke Links = (N - 1)2
Full Mesh Links = N(N - 1)
Therefore the total number of operations for the seven site hub-and-spoke and seven site full mesh are as follows:
Hub-and-Spoke Operations = (7 - 1)2 = 12
Full Mesh Operations = 7(7 - 1) = 42
Adding three more operations to that network would increase the operations from 36 to 144 (36 x 4). 144 operations will not have a significant impact on this small network.
When looking at a typical mid-sized network with 30 devices, the number of operations begins to quickly increase, according to the following calculation:
Links = 30 x 29/2 = 435
Total Operations = 435 x 2 x 4 = 3480
The number of operations grows at an exponential rate. Here is the same arithmetic for a 180 device network:
Links = 180 x 179/2 = 16,110
Total Operations = 16,110 x 2 x 4 = 128,800
Dangers of overusing IP SLA operations
By continuing to add operations and devices to any network, especially in a fully meshed environment, overall network performance will start to degrade. In addition to burdening the network with test packets, a large number of IP SLA operations can cause the following effects:
- Several thousand test results stored every five minutes can create a large database affecting other services on the database.
- Chances are that most of the historical results will never be examined due to the large number of results to filter.
- Adding thousands of IP SLA operations could add a significant burden to the SNMP poller.
Strategies for the proper use of IP SLA operations
IP SLA operations can negatively affect network performance when they are implemented improperly. To avoid affecting the performance of your network, use the following strategies:
Keep local tests local
Not all test types are used to test WAN services (DHCP is one example). A large network may have several distributed DHCP servers. If each site has a local DHCP server, users at that site would receive IP addresses from the local server if it is available. For 40 sites you could accomplish DHCP testing by deploying an operation from the local switch or router of each site to the local DHCP server of the site. This creates only 40 tests with 40 results to poll and store every five minutes. You might also add tests for some secondary DHCP servers and have around 50 tests in total. If you added all DHCP testing to all sites to all servers you would have approximately 402, or 1600 tests. Most of these tests are for DHCP requests to remote sites, which will never actually be what the users request when obtaining an IP address.
Test paths only for supported traffic
For this example, UDP jitter, a common IP SLA test, will be used. On an MPLS 40-site network, the UDP jitter operation is implemented between five sites that use UDP to deliver video conferencing. Because video conferencing is sensitive to network jitter and delay, implementing jitter operations between these sites is recommended. Using the formula for a full mesh network such as an MPLS network, we need to set up ten operations. However, if full mesh is deployed to test the links between all sites, there would be 40 x 39/2=780 tests, and only 1.3% of the tests would be for valid video paths. Therefore, a custom deployment of the operations is the recommended option in this scenario.
Consider decreasing the test frequency when possible
Decreasing or increasing the test frequency has a significant impact on the network load. For example, decreasing the test frequency from 300 seconds to 360 seconds will lessen the test impact on the source device and network by ten percent. Increasing the frequency to 150 seconds will increase the load by one hundred percent.
Avoid overlapping tests
It is possible to deploy a DNS test to an internal DNS server, an HTTP test to an intranet page, a ping test to the HTTP server, and a TCP connect to the HTTP server from a local switch. While there are four individual operations testing four services, there are now three redundant tests overlapping each other. The HTTP operation performs the following:
- Resolves the URL to an IP address using the DNS server.
- Performs a TCP port 80 request to the HTTP server.
- Requests the HTTP and detects a successful page load.
- Records the DNS resolve time, TCP open time, and page load time.
Using the HTTP test, the other three tests can be eliminated because they yield the same results.
To prevent overloading the network with IP SLA operations, SolarWinds VoIP and Network Quality Manager limits the number of operations that can be created at one time to 306, or 18 nodes in a fully meshed environment.