You’ve accelerated your WAN, virtualized your servers, and switched over to 10Gig Ethernet, but VM management — and in particular VM migration — is still taking too long, and when you do it, it hammers the servers. What is going wrong, and what can you do?
In many cases the culprit will be network overhead. Like many virtualization adopters, you have probably calculated your workloads to maximize the utilization of your physical servers, for example pushing it from under 20% up to 70% or more. However, moving VMs around requires the host servers to feed multiple gigabytes on and off the LAN, and this sudden demand for huge amounts of network packet processing can cause other VMs running on the same hardware to slow down, or even stall.
Enter converged network offload. This means adding an intelligent network interface card or NIC, typically a converged or triple-play device capable of offloading the processing of network, compute and storage protocols, which in turn means TCP/IP, RDMA, iSCSI, and FCoE (earlier NICs of this type only offloaded TCP/IP). Typically these devices also support NIC virtualization, so they can be carved up into multiple virtual NICs which can then be applied or allocated to different functions up in the VMs.
These intelligent NICs have been around for several years from smaller, and often more innovative, players such as Chelsio and NetXen (the latter was bought for its expertise by Qlogic). They have been slow to take off though, to the extent that one intelligent NIC start-up, Neterion, was bought in 2010 and then closed barely a year later by new owner Exar when expected sales did not materialize.
But with the growing uptake of server virtualization and 40Gig Ethernet, and with signs that economic recovery is at last starting to happen, this could be the year intelligent NICs finally take off. One sign of confidence here is that Emulex, the number two in NICs after Intel, said it planned to begin shipping triple-offload NICs for 40Gig Ethernet in February 2014.
When I spoke recently with Mike Jochimsen, Emulex’s Senior Partner Alliances Manager, he predicted that “the 10 to 40Gig transition will be the inflection point for a more intelligent Converged Ethernet NIC.” (Mind you, similar things were said about the Gigabit to 10Gig Ethernet transition, and it didn’t exactly happen; then again, there was also the small matter of an intervening global economic crisis.)
According to Jochimsen, the other key factor driving demand for these types of devices is Software Defined Networking, because once you have used SDN to flatten your network to Layer 2, you are asking your servers to do a lot more. “We need to make users aware of the issues they might encounter as they scale,” he adds. “We see opportunities in the enterprise where they need balance – it’s a better, more efficient low-latency pipe onto IOPS and Gigabits per second. For example, with RDMA and SMB-Direct [SMB over RDMA] offload turned on, it can accelerate workload migration.” He claims the technology can boost a server’s small packet processing capacity four-fold and achieve network speeds near line-rate with 50% less CPU utilization.
Of course, the one drawback with innovations of this kind is that they fix tomorrow’s problems, not today’s. Who among us has had the time — and the courage — to take down working VM host servers, fit them with a new type of NIC that needs new drivers and so on, and then get everything up and working again?
But as virtualization gathers pace, and as you specify and commission new host servers and then migrate VMs to them, it becomes a logical route to take, which is why Emulex, Qlogic, Chelsio, and others are all targeting the major server OEMs. It could even be a viable upgrade if you are already doing a network refresh, which is why Jochimsen says one of his tasks is to build up his company’s distribution partnerships and sell more under the Emulex brand.
There are also other opportunities to use that added intelligence on the NIC, for example it could accelerate additional protocols for vertical markets, such as Intel DPDK in the telco business. And once your servers can shift VMs around without worrying about the overhead, you can make proper use of all that accelerated WAN and LAN capacity by taking your virtualization density up to the next level.
Image credit: WikiMedia Commons