It’s hard to have any discussion with any networking person today without the term “Software Defined Networks” (SDNs) coming up within the first five minutes. However, what I’m starting to see are discussions occurring a level down from the über term of just “SDN”. Topics like Network Functions Virtualization (NFV) and programmability of the network bring the blurry term of SDN into a bit of focus. One of the subtopics of SDN that I’ve seen a rise in inquiry in recently is the topic of virtual overlay networks.
In theory, virtual overlay networks can be used to do things like isolate traffic for multi-tenancy, make virtual machine migrations simpler, and enable networks to scale past the VLAN limit of 4092. That all sounds great, but what’s the risk of running virtual overlay networks?
I think there’s a significant risk of something that I’ve been calling network overlay sprawl. To help understand what network overlay sprawl is, I’ll use an analogy that any VM administrator is familiar with.
Back in the early part of last decade, server virtualization was about 5 years old and just starting to move out of the labs and into mainstream data centers. The technology proved invaluable, as the time taken to provision a new server went from up to a month (or more) to however long it took to spin one up — instantaneous in some cases. This meant anyone could create a virtual server, which was a huge time saver. Instead of Alice the developer having to procure a server and keeping it under her desk, she could simply provision one virtually.
The downside of this model was that as Alice the developer was creating her server, David the web administrator was creating one and Joe to Windows manager was creating another one and so on. Now when a problem occurred everyone called Kim the server manager to find out why things were not working, and she had no idea what servers had been created where. People were creating servers and there was no central point of control or management. I remember talking to some companies and being told that there were twice as many virtual servers as there were physical servers prior to the consolidation process. Plus, any time anyone needed a new server, they would spin one up and no one would ever get rid of them.
This problem was known as VM sprawl. Virtual servers were everywhere and there was no way of tracking which ones were in use, who owned what sever, what virtual machines were on any particular server etc. Over time though, better management tools were created, including VMware vCenter and Microsoft System Center. This gave data center managers the control and visibility they needed to manage VM sprawl and it’s been smooth sailing since (well, relatively smooth).
Now bring this analogy into the network world. What if Sandra, the person who runs video surveillance, spins up a virtual network? Then Gary the security manager temporarily creates a network to test a new appliance. The company then rolls out video conferencing and Chis the AV manager decides to create another virtual network. Oh, and let’s not forget about Suzanne the application developer that wants to test how a new application works over a WAN.
So far, so good. But now there’s a network problem and everyone calls Wes the network manager and he has no idea why there’s a problem, as he has no real concept of who created what virtual network and for what purpose. Even if he were aware of the virtual networks, it’s not like all of them look the same. The security network might only be in the main headquarter locations; the video network might be across all branches, and so on. This adds a layer of complexity that didn’t exist in the server world, as back then each VM was constrained to a single device. Overlay networks can touch every device on the network, or just a couple. Also, the virtual servers were initially constrained to just tier 2 and tier 3 workloads but there’s no sense of that in the network space — there’s just one, and it’s pretty important.
What’s missing today in the network industry is visibility tools to map virtual networks to physical networks. This can help organizations leverage virtual overlay networks without the risk of virtual network sprawl. I know there are several management vendors today who solve part of the problem, but a holistic, end-to-end tool still doesn’t exist. Given that the server management tools took the better part of 5-7 years to get to the point where VM sprawl could be managed, we’re likely still a few years away from having the right tools to mitigate against virtual network sprawl.
In the meantime, I’m certainly not saying companies shouldn’t invest in virtual overlay networks, I’m just saying go into the deployment with the understanding that the risk of overlay sprawl is something to be aware of.
Image credit: WikiMedia Commons / CC-BY-SA