SolarWinds NPM 12 NetPath
I’ve been writing a bit on SolarWinds’ Thwack site lately about distributed monitoring of application performance on the network. The default thinking for many networking professionals still appears to work in terms of centralized observation of devices and interfaces, so I figured the topic could use a little bit of a boost. Thursday morning at Networking Field Day 13, serendipity arrived with SolarWinds’ presentation of the new NetPath functionality in Network Performance Monitor (NPM) 12 and gave me even more to go on about.
Evolution
The old tried-and-true method of finding paths through the network and measuring latency between the hops is the venerable traceroute (or tracert for the Windows users out there) command. This involves gleaning a path from a source to a destination using UDP, ICMP or TCP SYN, depending on the implementation. Unfortunately, for reasons ranging from control plane protection to aggressive filtering policies and deficient multipath reporting, traceroute has become less useful as networks have evolved. Enter the idea for NetPath, which is essentially a modern re-addressing of the requirements that gave birth to traceroute in the first place.
SolarWinds first presented NetPath as a lab project at Tech Field Day 10 in February of 2016 and released it as a component of NPM four months later. In the five months since then, it has enjoyed significant adoption and has already reached a milestone of 140,000 endpoint deployments.
The Clockwork
The NetPath poller is a component of the SolarWinds NPM agent, which means that it needs to be loaded onto a Windows 7 Professional or newer installation. The pollers can then be configured from the NPM management console, which also receives telemetry data as the pollers continue to monitor.
It begins by behaving like the application traffic being monitored, using the same TCP ports that the application would. If a firewall isn’t going to block application traffic, it isn’t likely to block the corresponding NetPath traffic. This puts it one step ahead of traceroute’s approach right out of the gate.
The pollers use multiple probe requests every 10 minutes to establish the initial paths to the destination and any subsequent changes that may occur. They report the delay in the various paths at each hop, factoring in control plane delay and delivering only the relevant transit delay. This generates an easy to read diagram that allows for quick pinpointing of offending portions of the path.
A “green”, “yellow” or “red” indicator is then applied to each hop to designate the health of the path, based on historical information and a number of secret sauce calculations. Even better, NetPath maintains a history of these changes for those moments when problems are reported days later.
Favourite Comments
“What’s the network truth here? The truth is that we’re being blocked because we don’t look like everything else.” This gets us into discussions of why firewalls treat control traffic like undesirables, but we’ll save that for another post.
“NAT is dirty. Sometimes it’s messing with your payload.” NAT has always been dirty… particularly when there are multiple layers of it. When building a product like this, there’s no way of coming out of it unsoiled.
“The intent here is to point fingers intelligently.” The potential to minimize blamestorming sessions is really impressive.
The Whisper in the Wires
At only 1.0 and four months out of the lab, the NetPath functionality of NPM 12 is already very impressive. It demystifies not only known quantities like internal application paths, but also what’s happening beyond the enterprise network edge at the carriers and the destination networks.
For future versions, there are a few things that will be worth looking at:
Bringing the pollers down to a size where they can be economically deployed at scale. Depending on the environment, a full-blown Windows workstation or server isn’t always going to be available wherever a poller is needed. Having a look at Windows IoT might be worth investigating as an option.
Poller-to-poller connectivity to allow effective monitoring of non-TCP applications that are not currently supported. This is essential for things like VoIP analysis. This will lead to more granularity options in the application polling setup. Having the ability to set DSCP and other options to account for traffic classes is useful in a QoS-controlled environment.
While I’m at it, I’ll throw detection of path MTU problems onto my wish list.
I’ve always had great respect for those who seek truth. SolarWinds underlined “seeking network truth” repeatedly as their goal for NetPath and their first step has been a respectable one. This is going to be a very interesting product to watch and I’m looking forward to seeing more.
Disclaimer: I was invited to SolarWinds’ session at the Networking Field Day 13 event in San Jose. I was not compensated in any way by the presenter for my attendance. Neither the Tech Field Day staff nor the presenting vendor have had an opportunity to review what I have written. I have no obligation to write about the presenter, nor is there an assumption that I will show any positive bias towards their presentation. The expectation is only that I be honest in any writing that I do.
(Originally published at Packet Pushers.)