Multi-Path Ethernet: The Flying Cars of the Data Center
August 19, 2011 9 Comments
Update 8/23/11: I’ve added a bit of info on Brocade VCS
In the movie Ghostbusters, Dr Egon Spangler gave a dire warning to the other Ghostbusters: “Don’t cross the streams”.
That’s a bridging loop waiting to happen
In Ethernet, we have a similar warning: “Don’t ever let there be more than one way to get anywhere”. Ethernet is too stupid to handle a condition when a single source MAC address has the ability to get to a destination MAC address by more than one path.
One of the reasons for this is the Ethernet format lacks the Layer 2 version of IP’s TTL. TTLs are decremented at each hop, so if an IP packet does find itself hitting the same place over and over again, eventually the TTL will go to zero and the packet will get dropped. Annoying, but the network isn’t going to be flooded with an ever-increasing barrage of lost packets.With Ethernet and no TTL, a frame could be forwarded in a loop indefinitely. It’s not total proton reversal, but it’s still bad. (Ever cause a bridging loop? I have, it’s a hoot.)
We tend to build redundant networks because it’s a good idea to have, you know, redundancy. And redundancy means multiple paths, which violates Egon’s golden rule. A bit of a conundrum.
Of course, we’ve been doing multiple paths without bridging loops. The primary solution for the past 21 years has been the spanning-tree protocol. (Fancy that, spanning-tree is legal to drink in the US).
If spanning tree is drinking, spanning-tree will be buying its own drinks, because no one likes spanning tree because of (and not limited to) these annoying attributes:
- Links are active/standby
- Topology changes can cause network-wide connectivity outages for 60+ seconds
- Even rapid spanning-tree causes network outages for several seconds
- There are several dozen ways to mess up and royally screw you your network (root bridge priority, timer values too low/too high, etc.)
Some network architectures would avoid STP altogether, by making every pair of switches their own isolated Layer 2 networks, with Layer 3 routing between the pairs. Using a routing protocol like OSPF and fast enough MLS (multi-layer switches), you can build a completely mesh network with plenty of multi-pathing.
Radia Perlman is the mother of STP, although it appears she didn’t quite intend it to end up being used the way it was. She came up with a replacement called TRILL. IEEE brushed her off apparently, so she went to the IETF. The IEEE then said “wait a minute” and came up with their own 802.1aq, (shortest path bridging or SPB). You can take a look at an interesting TRILL/SPB smack down at NANOG here.
Cisco, the largest network vendor, is going the TRILL route, but since TRILL isn’t done yet, they came up with a pre-standard implementation they call Fabric Path. Juniper has come up with their own multi-path technology, not remotely based on a standard, called QFabric.
So why the need for multi-path Ethernet?
For one, STP sucks. A lot. No one likes it. It would sit alone at the lunch table, and not because the other kids are mean, but because STP is a total jerk and we’re stick of its shit. We could go with Layer 3, but that won’t work with virtualization.
On top of that, virtualization has really kicked the need for multi-path up another notch by adding in a lot of east/west traffic that occurs during virtual machine live migrations (vMotion). If I have to traverse the core every time I do a vmotion from one access switch to another, that’s going to be a very busy core. It’s a choke point, where multi-path lets us build more of a mesh.
Also we’re putting a lot more than data on these networks; we’re adding storage too (iSCSI/FCoE/NFS). So we’re greatly increasing the demand for bandwidth, and it doesn’t make sense to have 10 Gbit links sitting idle.
Another advantage is convergence time. If you go to page 4 of the Fabric Path review at Network World, they found that re-routing of a path failure was 162 milliseconds, a helluva lot quicker than even rapid spanning tree.
So multi-path is great, yada yada. The trick is, you can’t really implement it yet.
It looks like SPB as a protocol is done, but it’s been mostly metro Ethernet vendors that have adopted it (it works with metro and data center Ethernet). TRILL hasn’t been finalized as far as I can tell, although it’s supposed to be real soon now(tm).
Fabric Path from Cisco is shipping, however it only runs right now on the Nexus 7000 series, which hasn’t exactly taken data centers by storm (Cisco’s choice of selling a huge 7010 and an even beggar 7018 is curious). The most popular data center switch is the venerable “old man of the data center”, the Catalyst 6500 and it probably won’t ever do Fabric Path or TRILL.
The 6500 could *potentially* do SPB from what I can tell (doesn’t change the Ethernet format in a way that needs new ASICs), but that would require significant development in IOS, which isn’t exactly easy to develop (one of the reasons why Cisco is moving to NX-OS). The 6500 is also preventing a lot of other data center technologies, like FCoE.
QFabric from Juniper is apparently running in some customer locations, but it’s not released to the general public and isn’t likely to any time soon. QFabric is also proprietary, and not in the “pre-standard” proprietary way that can be changed to interoperate with other vendors at a later date, but the “not a chance in hell will you mesh another vendor’s product” in. As far as I can tell, at least.
Brocade has VCS, which apparently implements a TRILL-like multipath setup, although instead of the IS-IS protocol that TRILL uses, VCS uses FSPF (the routing protocol that Fibre Channel uses). It makes sense that Brocade would use FSPF, since they’re a company with heavy FC chops, and obtained Ethernet chops through acquisition of Foundry. It looks like it only runs on two of their ToR switches, however.
So it looks like we’re stuck with single-path Ethernet networks for the foreseeable future, and using all sorts of tricks/hacks to get around Ethernet’s limitations, like EtherChannel, as well as VSS/vPC and other multi-chassis aggregation. However, those don’t really let us do a “full mesh” network like we’re dreaming of.
So it seems like for now, multi-path Ethernet is like flying cars: We should have had them by now, but we don’t.