Leading Network Indicator's

Post date: Oct 29, 2012 4:26:58 PM

There is an idea in economics of leading indicators. These are items that can be tracked in the economy such as housing starts or new unemployment claims that change directions before the rest of the economy. So housing starts will go down before the GDP for the country as a whole goes done. There are entire baskets of indicators that are used by the Fed and other organizations to figure out which way the economy is going before it actually gets there.

In the economy the things that are trailing indicators such as high unemployment, shrinking GDP, high bankruptcy rates and inflationary are all things that we would like to avoid by taking action once we see the leading indicators moving in a specific direction.

So what does this have to do with networking? Funny you should ask it turns out that a network also has leading and trailing indicators of its performance. Most network engineers are use to tracking the utilization of the links in the network. They manage the utilization thru traffic engineering and addition of bandwidth to kind this factor within specific limits.

But why are they doing this? 100% utilization is good since it means you are using you bandwidth as efficiently as possible. So why don't networks run this way?

It turns out that utilization is actually a leading indicator of things that actually matter in a network naming, delay, jitter and packet loss. It is very very difficult to get a network to 100% utilization without incurring delay, jitter or packet loss. The utilization on a link can be used as a leading indicator of when packet loss, delay and jitter will start occurring in the network. These are all trailing indicators that a problem is already effecting users and applications so you will start to get complaints.

The key is to calibrate utilization rates in your network to the acceptable network performance metrics. This is something that will also need to change over time. You can calibrate your network to get a graph that looks something like:

All of the QOS mechanisms such as WRED, LLQ and other mechanism help to move the Knee out for specific traffic types so that traffic that does not do well with packet drops gets better treatment. This means that for each traffic aggregate and per hop behavior (BHP) there may be a different utilization curve.

The other thing to bear in mind is that in most high availability networks links work in pairs so the aggregate utilization between the pair of network links needs to be kept below 50% to make sure a single link will be able to take the entire load if one of the links were to fail.

It is also interesting to note that the average utilization of links will creep up over time as new applications and users are added. This is a direct effect of more users and applications. These are items that can be tracked in a linear regression model, new applications and dummy variables (1,0) while users as a direct variable.

So between treating Utilization as a leading indicator for packet loss, delay and jitter and adding user population and application growth into a the model a very good linear regression model for a network can be created.

Page updated

Report abuse