Data center cables

Do SDN and NFV Make MTTR Meaningless?

Data center cablesMaybe I am just in a strange mood. It happens every now and then. However, based on some recent work I have been involved with, the following questions keeps running around my head: A few years from now, will anybody really care about MTTR?

One of the challenges in writing about MTTR is that the acronym means different things to different people. To some it means the mean time to repair a problem while to others it means the mean time to respond to a problem. Within this blog, MTTR will refer to the mean time to repair a problem.

There is a classic three-step process associated with troubleshooting a problem and computing the associated MTTR. The first step, problem identification, gets the clock ticking relative to MTTR, while the second step, problem diagnosis, takes time and hence adds to the MTTR. The third step, solution selection and repair, also adds time but it stops the clock from ticking. In the traditional environment, repairing the problem usually means taking something which isn’t working well, like a line card on a switch, and replacing it. Hopefully when that happens the problem goes away.

But the traditional IT environment is physical and static, and the emerging environment is virtual and dynamic. One of the promised benefits of Software Defined Networking (SDN) is the ability to dynamically set up multiple end-to-end virtual networks on top of a physical network. In similar fashion, one of the promised benefits of Network Functions Virtualization (NFV) is the ability to dynamically spin up, move around or spin down network functions such as network optimization or firewalls.

So, let’s come back to the problem of a line card on a switch that’s having a problem and dropping packets and causing application performance to degrade. With SDN it should be possible to dynamically set up additional virtual networks that circumvent the troubled switch and which restore acceptable application performance. Alternatively, consider the traffic of an important application flow that has a medium priority class. If the network becomes congested and it’s not possible to find another end-to-end path through the network, it should be possible to dynamically change the traffic classification to be high in order to continue to meet an established SLA.

But networks aren’t the only cause of degraded application performance. Any component of the end-to-end service chain can cause degraded application performance. That includes servers, WAN optimization controllers (WOCs), Application Delivery controllers, and the whole gamut of security appliances. But that’s where NFV comes in. Part of the promise of NFV is that SLAs will be negotiated dynamically as virtual network functions (VNFs) are chained together or configurations are modified. This new, dynamic model will require that SLAs be generated by a set of flexible policies that can take into account the end-to-end characteristics of the service, including the available SLA metrics, SLA enforcement capabilities, and the overall ability to measure service quality and guarantee SLAs. The bottom line is that if the performance of an application is beginning to degrade because it doesn’t have enough resources, then additional resources can be dynamically spun up and restore acceptable application performance.

Remember the three-step troubleshooting process? Those three steps don’t go away. However, if you accept that the promise of SDN and NFV will be realized, the first two steps happen dynamically. The third step changes. Instead of replacing a faulty piece of equipment, new resources are spun up and/or the configurations of the appropriate pieces of equipment are changed. MTTR becomes meaningless in a dynamic, virtualized environment because it becomes negligible — in part because the time consuming task of physically replacing the faulty piece of equipment becomes an offline activity.

Does the situation described above seem far-fetched? Yes, it certainly does. However, I will close this blog with a quote from Bill Gates: “We always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten. Don’t let yourself be lulled into inaction.”

About the author
Jim Metzler
Jim has a broad background in the IT industry. This includes serving as a software engineer, an engineering manager for high-speed data services for a major network service provider, a product manager for network hardware, a network manager at two Fortune 500 companies, and the principal of a consulting organization. In addition, Jim has created software tools for designing customer networks for a major network service provider and directed and performed market research at a major industry analyst firm. Jim’s current interests include both cloud networking and application and service delivery. Jim has a Ph.D. in Mathematics from Boston University.