IT organisations have invested millions of pounds in implementing fault management tools and processes to maximise network availability. However, while availability management is critical, infrastructure reliability has improved to the point at which 99.9% availability is not uncommon. At the same time, network traffic is growing in both volume and complexity, creating performance issues. This is why real network and application improvements require focusing on performance, not just availability.
Below are eight rules that will help your organisation take a performance first approach to network management. This approach will not only help you understand how performance is impacted by infrastructure/application changes, but also enable you to manage your network for application performance, which after all, is the most important thing.
Rule 1: If you can’t measure it, you can’t manage it
Network and application performance issues are growing dramatically due to data centre consolidation, the rise of multimedia traffic, increasing numbers of remote users, and other trends. As a result, the responsibility for application delivery is increasingly falling on the shoulders of network professionals. Measuring infrastructure availability and utilisation alone is no longer enough to understand network health and make informed management decisions.
Today’s network professionals must shift their focus from fault management - which is largely under control - to performance-based management in order to deliver better services and make themselves more relevant to the business units they serve. It is crucial for organisations to implement application service level agreements (SLAs) with baselines to measure against, providing a quantifiable goal to work towards and a way to measure progress. If you aren’t measuring performance metrics, you are managing to availability rather than performance, and in today’s IT environment, that’s not enough.
Rule 2: Performance is relative
The best way to understand the notion that all performance is relative is to ask someone who uses a networked system or application: “Is a three-second application response time good or bad?” The answer is, it depends. If the normal response time is ten seconds, a three-second response time is very good. But if the normal response time is one second or less, three seconds is not very good at all. For the same measurement, different circumstances lead directly to different interpretations.
Performance is usually based on either previous experience – ‘it took 15 seconds to download this page yesterday’ or user’s changing expectations. Employees nowadays expect their SAP or customer relationship manager (CRM) system to perform as fast as eBay’s website.
What users care most about are large variations in performance. Therefore, what should concern you most from a performance management perspective is finding and addressing the places in your network where there are large variations in performance.
Rule 3: Link utilisation is insufficient
Utilisation is not an effective metric to assess performance.
The best indication of how applications are performing for the end user is to measure response times by monitoring real traffic. High utilisation is only a problem if it actually impacts application performance. Response time measurements, not utilisation, should be the foundation for effective network performance monitoring.
Rule 4: Bandwidth doesn’t solve all your problems
Increasing bandwidth is not a panacea for solving performance problems. Make sure you understand the cause of the problem before taking corrective action like throwing bandwidth at it. Delay for example could be cause by the server, the application or even the transit path. The ability to measure the right performance metrics is key.
Rule 5: TCP’s not a utility
The network is not like turning on a faucet or plugging into an electrical outlet. Understanding and managing TCP or transmission control protocol is important to getting the most out of applications and services that depend on its reliable connection-oriented nature for transport. It is crucial that the network group responsible for application delivery work closely with application developers to help them understand how to get better performance from their applications and to better qualify their applications before they roll them out.
Rule 6: Any user can mess it all up
Although often unintentional, any employee using the corporate network could cause a traffic surge that the network can’t handle, i.e. a virus or a popular website. In order to protect your network from these ‘accidents,’ you need controls in place to block traffic in ports and in order to do this, it is vital to understand the composition of traffic on the network. You wouldn’t shut off a port without knowing what it is used for, right? If so, good luck explaining to your CEO why the sales port has been shut down…
Rule 7: Nobody’s responsible but the network gets the blame
For some reason the phrase “The network is slow” is any user’s favourite catch phrase whenever anything isn’t performing up to their expectations. Why is it always the network manager’s fault? Whether it is a network issue or not, network managers need the performance metrics to prove whether or not it is a network-related issue. Then, if it is a network problem, they have the data to understand the cause and how it can be fixed.
Rule 8: To monitor is great, to test is divine
Trends such as the rise of multimedia applications, increasing numbers of remote users and branch offices, data centre consolidation, software as a service, and others are forcing organisations to rely on the WAN increasingly more to deliver business-critical traffic. Applications designed for the LAN do not operate as well on the WAN. Testing basically allows network managers to see how applications are going to behave before they are rolled out.
Joel Trammell, CEO and co-founder, NetQoS, has more than 20 years of experience in network solutions and large-scale IT systems deployments.