Multi-Path Ethernet: The Flying Cars of the Data Center

Update 8/23/11: I’ve added a bit of info on Brocade VCS

In the movie Ghostbusters, Dr Egon Spangler gave a dire warning to the other Ghostbusters: “Don’t cross the streams”.

That’s a bridging loop waiting to happen

In Ethernet, we have a similar warning: “Don’t ever let there be more than one way to get anywhere”. Ethernet is too stupid to handle a condition when a single source MAC address has the ability to get to a destination MAC address by more than one path.

One of the reasons for this is the Ethernet format lacks the Layer 2 version of IP’s TTL. TTLs are decremented at each hop, so if an IP packet does find itself hitting the same place over and over again, eventually the TTL will go to zero and the packet will get dropped. Annoying, but the network isn’t going to be flooded with an ever-increasing barrage of lost packets.With Ethernet and no TTL, a frame could be forwarded in a loop indefinitely. It’s not total proton reversal, but it’s still bad. (Ever cause a bridging loop? I have, it’s a hoot.)

We tend to build redundant networks because it’s a good idea to have, you know, redundancy. And redundancy means multiple paths, which violates Egon’s golden rule. A bit of a conundrum.

Of course, we’ve been doing multiple paths without bridging loops.  The primary solution for the past 21 years has been the spanning-tree protocol. (Fancy that, spanning-tree is legal to drink in the US).

If spanning tree is drinking, spanning-tree will be buying its own drinks, because no one likes spanning tree because of (and not limited to) these annoying attributes:

  • Links are active/standby
  • Topology changes can cause network-wide connectivity outages for 60+ seconds
  • Even rapid spanning-tree causes network outages for several seconds
  • There are several dozen ways to mess up and royally screw you your network (root bridge priority, timer values too low/too high, etc.)

This best expresses our collective feeling for spanning-tree protocol

Some network architectures would avoid STP altogether, by making every pair of switches their own isolated Layer 2 networks, with Layer 3 routing between the pairs. Using a routing protocol like OSPF and fast enough MLS (multi-layer switches), you can build a completely mesh network with plenty of multi-pathing.

Radia Perlman is the mother of STP, although it appears she didn’t quite intend it to end up being used the way it was. She came up with a replacement called TRILL. IEEE brushed her off apparently, so she went to the IETF. The IEEE then said “wait a minute” and came up with their own 802.1aq, (shortest path bridging or SPB). You can take a look at an interesting TRILL/SPB smack down at NANOG here.

IETF/IEEE is the biggest beef since west coast/east coast

Cisco, the largest network vendor, is going the TRILL route, but since TRILL isn’t done yet, they came up with a pre-standard implementation they call Fabric Path. Juniper has come up with their own multi-path technology, not remotely based on a standard, called QFabric.

So why the need for multi-path Ethernet?

For one, STP sucks. A lot. No one likes it. It would sit alone at the lunch table, and not because the other kids are mean, but because STP is a total jerk and we’re stick of its shit. We could go with Layer 3, but that won’t work with virtualization.

On top of that, virtualization has really kicked the need for multi-path up another notch by adding in a lot of east/west traffic that occurs during virtual machine live migrations (vMotion).  If I have to traverse the core every time I do a vmotion from one access switch to another, that’s going to be a very busy core. It’s a choke point, where multi-path lets us build more of a mesh.

Also we’re putting a lot more than data on these networks; we’re adding storage too (iSCSI/FCoE/NFS). So we’re greatly increasing the demand for bandwidth, and it doesn’t make sense to have 10 Gbit links sitting idle.

Another advantage is convergence time. If you go to page 4 of the Fabric Path review at Network World, they found that re-routing of a path failure was 162 milliseconds, a helluva lot quicker than even rapid spanning tree.

So multi-path is great, yada yada. The trick is, you can’t really implement it yet.

It looks like SPB as a protocol is done, but it’s been mostly metro Ethernet vendors that have adopted it (it works with metro and data center Ethernet). TRILL hasn’t been finalized as far as I can tell, although it’s supposed to be real soon now(tm).

Fabric Path from Cisco is shipping, however it only runs right now on the Nexus 7000 series, which hasn’t exactly taken data centers by storm (Cisco’s choice of selling a huge 7010 and an even beggar 7018 is curious). The most popular data center switch is the venerable “old man of the data center”, the Catalyst 6500 and it probably won’t ever do Fabric Path or TRILL.

The 6500 could *potentially* do SPB from what I can tell (doesn’t change the Ethernet format in a way that needs new ASICs), but that would require significant development in IOS, which isn’t exactly easy to develop (one of the reasons why Cisco is moving to NX-OS). The 6500 is also preventing a lot of other data center technologies, like FCoE.

QFabric from Juniper is apparently running in some customer locations, but it’s not released to the general public and isn’t likely to any time soon. QFabric is also proprietary, and not in the “pre-standard” proprietary way that can be changed to interoperate with other vendors at a later date, but the “not a chance in hell will you mesh another vendor’s product” in. As far as I can tell, at least.

Brocade has VCS, which apparently implements a TRILL-like multipath setup, although instead of the IS-IS protocol that TRILL uses, VCS uses FSPF (the routing protocol that Fibre Channel uses). It makes sense that Brocade would use FSPF, since they’re a company with heavy FC chops, and obtained Ethernet chops through acquisition of Foundry. It looks like it only runs on two of their ToR switches, however.

So it looks like we’re stuck with single-path Ethernet networks for the foreseeable future, and using all sorts of tricks/hacks to get around Ethernet’s limitations, like EtherChannel, as well as VSS/vPC and other multi-chassis aggregation. However, those don’t really let us do a “full mesh” network like we’re dreaming of.

So it seems like for now, multi-path Ethernet is like flying cars: We should have had them by now, but we don’t.

9 Responses to Multi-Path Ethernet: The Flying Cars of the Data Center

  1. tonybourke says:

    Oh man, I didn’t watch that whole IBM commercial. I remember it from over a decade ago, and thought it was quite clever. I forgot it was for Lotus Note, which is a pox upon the IT landscape.

    • Thomas C says:

      From an IBMer having to use Lotus on a daily basis, amen.
      Great article Tony, this helps clarify a lot of these upcoming multi-path L2 yadda yadda terms that newbies like me don’t quite understand yet (STP is good enough, right? :P)

  2. Michael Schipp says:

    Very good read, well layed out.

    Would also point out that there are others in the pre standard camp. e.g. Brocade VCS/VDX it is TRILL based and available (shipping). Note is use FSPF instead is IS-IS with the extensions.

    But like you said if you want fully open standards based TRILL – nobody has one yet 🙂 but real soon 🙂

  3. Duro says:

    juniper claimed that they will release QFabric 3Q 2011, end of September we should have whole QFabric (if they keep their promises). So might have solution soon (one of them – not the standard)

  4. Enjoyed the frank discussion, and the Aerobatics Video!! I know just enough about both to be dangerous (I own an aerobatic plane and also am in the thick of this multi pathing stuff).

    While I have no idea how long flying cars will take to become available, I can assure you that multipath Ethernet is pretty close. I’ve had 10 node networks up in my lab between different vendors and we’ve added emulations into the hundreds. Have a look on wikipedia under IEEE 802.1aq and you can see the interop test summaries completed this summer.

    So don’t give up just yet!!

    Oh, and not to be outdone, have a look at videos on youtube for aerobatics under “GZRO” 😉

  5. tonybourke says:

    Hi Peter,

    Just have to say wow. That is a nice airplane. I’m a private pilot, and I’ve only done two aerobatic flights with an instructor.

    You certainly outdo me. Just to warn you, next time I’m in Montreal, I’m stealing that beautiful plane of yours 🙂


  6. Peter Ashwood-Smith says:

    Its addictive. I started with a couple of flights too. Definitely a good way to unwind after a week of work.

  7. Pingback: Fibre Channel and Ethernet: The Odd Couple | The Data Center Overlords

  8. Pingback: TRILLapalooza | The Data Center Overlords

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: