Paul Bunyan

It’s Time to Chop Down the Spanning Tree

Paul BunyanI believe it was the esteemed singer/songwriter, John Lennon, who uttered the words:

Imagine there’s no spanning tree, it’s easy if you try,
No multi-tiers below us, just a flat network like the sky,
Imagine all the users, living for the apps…

I recently spoke at Avaya’s Technology Forum, the company’s data networking event (yes, Avaya does networking).  After my presentation, Avaya’s Jean Turgeon did a presentation where he asked the audience to “imagine” a world without Spanning Tree.  The presentation got me thinking about Imagine — one of my all time favorite Lennon songs — and, more importantly, about all the reasons why it’s important that companies look to move past Spanning Tree Protocol (STP) today.

The tried and true STP has been under significant fire since data center networking vendors have started to tout “fabric” solutions based primarily on TRILL or Shortest Path Bridging (SPB), both of which are being positioned as replacements for STP.   As a network engineer I relied heavily on STP but I have to admit it’s time to move on, and I’d like to share the reasons why I think that’s the case.

First, I know not everyone is fluent in networking speak, so I’ll define STP first.  STP is protocol that operates at layer 2 of the OSI stack.  The protocol prevents routing loops and broadcast storms from occurring by creating a “spanning tree” within a mesh Ethernet network.  The protocol disables all links that are not part of the active tree resulting in a single, active path between any two points in a network.  The disabled link becomes active when one of the active links fails.

As an analogy, consider the highway system between San Francisco and San Jose.  There are multiple paths between the two.  One could take 280, 101, 880, 580 to 680 or another combination.  If STP were to manage the traffic on this route, all links would be disable except for one, let’s use 101 for example.  The only way another route would be opened up is if 101 were to be closed.  This analogy should let you see the inherent inefficiency of STP.

Specifically, STP has the following limitations:

  • Optimized for North-South traffic.  Legacy networks were designed for the client server era.  Client server traffic moved in a North-South direction through the network, meaning from the client, to the server and back up.  Today, virtualization is driving more East-West traffic, across a data center.  The single, active link, tree architecture has far too many hops and inefficient traffic flows to met the needs of an increasingly virtualized data center.
  • Inefficient network.  The redundant, idle links are highly inefficient.  Since STP makes only the fastest path active, the network could potentially have only half of the ports in a network active.  While this does achieve the goal of redundancy, it’s an incredibly inefficient way of running a network.   Can you imagine how big you would have to make all paths between San Francisco and San Jose to accommodate all traffic?  And then what a waste it would be to have multiple roads that size be disabled because traffic can only take one route?
  • Long recovery times when failure occurs.  When a primary link does actually fail, STP needs to go through a process known as “re-convergence”, which recalculates all of the paths and relative “cost” to determine where to send the active traffic.  Depending on the size of the network, these recovery times can actually be quite long and cause performance problems when the network is reconverging.
  • High total cost of interest.   Spanning tree means idle ports.  Idle ports means wasted money.  Since, up to half the ports in a STP based network can be inactive, network managers need to over-build the network to provide a high quality user experience.

SPB and TRILL are competing standards to replace STP.  Both protocols use some form of shortest path, multi-hop routing to overcome the slow network convergence times associated with STP.  In a TRILL- or SPB-based network, all paths are equally valid and every path is active, which can up to double the total bandwidth available on a network while using fewer physical ports.  This “active-active” configuration is the reason why these new networks are considered “fabrics” more so than a legacy network.

Any company looking to expand the use of virtualization should look to drop STP and investigate a TRILL or SPB based fabric to meet the new demands created by cloud and virtualization.

Image credit: archer10 (Dennis) (flickr) – CC-BY-SA

  • Mike

    We have a customer with an ELAN WAN setup. I found out yesterday they have STP running on the WAN links. They are all single homed to the ELAN cloud. They had some STP issues and I thought the best option would be to disable STP on the WAN links. Thoughts?