Po-tay-to, Po-ta-to: Analogies and NPIV/NPV

In a recent post, I took a look at the Fibre Channel subjects of NPIV and NPV, both topics covered in the CCIE Data Center written exam (currently in beta, take yours now, $50!). The post generated a lot of comments. I mean, a lot. Over 50 so far (and still going).  An epic battle (although very unInternet-like in that it was very civil and respectful) brewed over how Fibre Channel compares to Ethernet/IP. The comments look like the aftermath of the battle of Wolf 359.

Captain, the analogy regarding squirrels and time travel didn’t survive

One camp, lead by Erik Smith from EMC (who co-wrote the best Fibre Channel book I’ve seen so far, and it’s free), compares the WWPNs to IP addresses, and FCIDs to MAC addresses. Some others, such as Ivan Pepelnjak and myself, compare WWPNs to MAC addresses, and FCIDs to IP addresses. There were many points and counter-points. Valid arguments were made supporting each position. Eventually, people agreed to disagree. So which one is right? They both are.

Wait, what? Two sides can’t be right, not on the Internet!

When comparing Fibre Channel to Ethernet/IP, it’s important to remember that they are different. In fact, significantly different. The only purpose for relating Fibre Channel to Ethernet/IP is for the purpose of relating those who are familiar with Ethernet/IP to the world of Fibre Channel. Many (most? all?) people learn by building associations with known subjects (in our case Ethernet/IP)  to lesser known (in this case Fibre Channel) subjects.

Of course, any association includes includes its inherent inaccuracies. We purposefully sacrifice some accuracy in order to attain relatability. Specific details and inaccuracies are glossed over. To some, introducing any inaccuracy is sacrilege. To me, it’s being overly pedantic. Pedantic details are for the expert level. Using pedantic facts as an admonishment of an analogy misses the point entirely. With any analogy, there will always be inaccuracies, and there will always be many analogies to be made.

Personally, I still prefer the WWPN ~= MAC/FC_ID ~= IP approach, and will continue to use it when I teach. But the other approach I believe is completely valid as well. At that point, it’s just a matter of preference. Both roads lead to the same destination, and that is what’s really important.

Learning always happens in layers. Coat after coat is applied, increasing in accuracy and pedantic details as you go along. Analogies is a very useful and effective tool to learn any subject.

TRILLapalooza

If there’s one thing people lament in the routing and switching world, it’s the spanning tree protocol and the way Ethernet forwarding is done (or more specifically, it’s limitations). I’ve made my own lament last year (don’t cross the streams), and it’s come up recently in Ivan Pepelnjak’s blog.  Even server admins who’ve never logged into a switch in their lives know what spanning-tree is: It is the destroyer of uptime, causer of the Sev1 events, a pox among switches.

I am become spanning-tree, destroyer of networks

The root of the problem is the way Ethernet forwarding is done: There can’t be more than one path for an Ethernet frame to take from a source MAC to a destination MAC. This basic limitation has not changed for the past few decades.

And yet, for all the outages, all of the configuration issues and problems spanning-tree has caused, there doesn’t seem to be much enthusiasm for the more fundamentals cures: TRILL (and the current proprietary implementations), SPB, QFabric, and to a lesser extent OpenFlow (data center use cases), and others. And although the OpenFlow has been getting a lot of hype, it’s more because VCs are drooling over it than from its STP-slieghing ways.

For a while in the early 2000s, it looked like we might get rid of it for the most part. There was a glorious time when we started to see multi-layer switches that could route Layer 3 as fast as they could switch Layer 2, giving us the option of getting rid spanning-tree entirely. Every pair of switches, even at the access layer, would be it’s own Layer 3 domain. Everything was routed to everywhere, and the broadcast domains were very small so there wasn’t a possibility for Ethernet to take multiple paths. And with Layer 3 routing, multi-pathing was easy through ECMP. Convergence on a failed link was way faster than spanning tree.

Then virtualization came, and screwed it all up. Now Layer 3 wasn’t going to work for a lot o the workloads, and we needed to build huge Layer 2 networks. Although some non-virtualization uses, the Layer-3 everywhere solution works great. To take a look at a wonderfully multi-path, high bandwidth environment, check out Brad Hedlund’s own blog entry on creating a Hadoop super network with shit-tons of bandwidth out of 10G and 40G low latency ports.

Overlays

Which brings me to overlays. There are some that propose overlay networks, such as VXLAN, NVGRE, and Nicira as solutions to the Ethernet multipathing problem (among other problems). An overlay technology like VXLAN not only brings us back to the glory days of no spanning-tree by routing to the access layer, but solves another issue that plagues large scale deployments: 4000+ VLANs ain’t enough. VXLAN for instance has a 24-bit identifier on top of the normal 12-bit 802.1Q VLAN identifier, so that’s 236 separate broadcast domains, giving us the ability to support 68,719,476,736 VLANs. Hrm.. that would be….

While I like (and am enthusiastic) about overlay technologies in general, I’m not convinced they are the  final solution we need for Ethernet’s current forwarding limitations. Building an overlay infrastructure (at least right now) is a more complicated (and potentially more expensive) prospect than TRILL/SPB, depending on how you look at it. Also availability is an issue currently (likely to change, of course), since NVGRE has no implementations I’m aware of, and VXLAN only has one (Cisco’s Nexus 1000v). Also, VXLAN doesn’t terminate into any hardware currently, making it difficult to put in load balancers and firewalls that aren’t virtual (as mentioned in the Packet Pusher’s VXLAN podcast).

Of course, I’m afraid TRILL doesn’t have it much better in the way of availability. Only two vendors that I’m aware of ship TRILL-based products, Brocade with VCS and Cisco with FabricPath, and both FabricPath and VCS only run on a few switches out of their respective vendor’s offerings. As has often been discussed (and lamented), TRILL has a new header format, new silicon is needed to implement TRILL (or TRILL-based) offerings in any switch. So sadly it’s not just a matter of adding new firmware, the underlying hardware needs to support it too. For instance, the Nexus 5500s from Cisco can do TRILL (and the code has recently been released) while the Nexus 5000 series cannot.

It had been assumed that the vendors that use merchant silicon for their switches (such as Arista and Dell Force10) couldn’t do TRILL, because the merchant silicon didn’t. Turns out, that’s not the case. I’m still not sure which chips from the merchant vendors can and can’t do TRILL, but the much ballyhooed Broadcom Trident/Trident+ chipset (BCM56840 I believe, thanks to #packetpushers) can do TRILL. So anything built on on Trident should be able to do TRILL. Which right now is a ton of switches. Broadcom is making it rain Tridents right now. The new Intel/Fulcrum chipsets can do TRILL as well I believe.

And TRILL though does have the advantage of boing stupid easy. Ethan Banks and I were paired up during NFD2 at Brocade, and tasked with configuring VCS (built on pre-standard TRILL). It took us 5 minutes and just a few commands. FabricPath (Cisco’s pre-standard implementation built on TRILL) is also easy: 3 commands. If you can’t configure FabricPath, you deserve the smug look you get from Smug Cisco Guy. Here is how you turn on FabricPath on a Nexus 7K:

switch# config terminal 
switch(config)# feature-set fabricpath
switch(config)# mac address learning-mode conversational vlan 1-10 

Non-overlay solutions to STP without TRILL/SPB/QFabirc/etc. include MLAG (commonly known as Cisco’s trademarked  term Etherchannel) and MC-LAG (Multi-chassis Link Aggregration), also known as VLAG, vPC, VSS depending on the vendor. They also provide multi-pathing in a sense that while there are multiple active physical paths, no single flow will have more than one possible path, providing both redundancy and full link utilization. But it’s all manually configured at each link, and not nearly as flexible (or easy) as TRILL to instantiate. MLAG/MC-LAG can provide simple multi-path scenarios, while TRILL is so flexible, you can actually get yourself into trouble (as Ivan has mentioned here). So while MLAG/MC-LAG work as workarounds, why not just fix what they workaround? It would be much simpler.

Vendor Lock-In or FUD?

Brocade with VCS and Cisco with FabricPath are currently proprietary implementations of TRILL, and won’t work with each other or any other version of TRILL. The assumption is that when TRILL becomes more prevalent, they will have standards-based implementations that will interoperate (Cisco and Brocade have both said they will). But for now, it’s proprietary. Oh noes! Some vendors have decried this as vendor lock-in, but I disagree. For one, you’re not going to build a multi-vendor fabric, like staggering two different vendors every other rack. You might not have just one vendor amongst your networking gear, but your server switch blocks, core/aggregation, and other such groupings of switches are very likely to be single vendor. Every product has a “proprietary boundary” (new term! I made it!). Even token ring, totally proprietary, could be bridged to traditional Ethernet networks. You can also connect your proprietary TRILL fabrics to traditional STP domains at the edge (although there are design concerns as Ivan Pepelnjak has noted).

QFabric will never interoperate with another vendor, that’s their secret sauce (running on Broadcom Trident+ if the rumors are to be believed). Still, QFabric is STP-less, so I’m a fan. And like TRILL, it’s easy. My only complaint about QFabric right now is that it requires a huge port count (500+ 10 Gbit ports) to make sense (so does Nexus 7000K with TRILL, but you can also do 5500s now). Interestingly enough, Juniper’s Anjan Venkatramani did a hit piece on TRILL, but the joke is on them because it’s on tech target behind a register-wall, so no one will read it.

So far, the solutions for Ethernet forwarding are as follows: Overlay networks (may be fantastic for large environments, though very complex), Layer 3 everywhere (doable, but challenges in certain environments), and MLAG/MCAG (tough to scale, manual configuration but workable). All of that is fine. I’ve nothing against any of those technologies. In fact, I’m getting rather excited about VXLAN/Nicira overlays. I still think we should fix Layer 2 forwarding with TRILL, SPB, or something like it. And while even if every vendor went full bore on one standard, it would be several years before we were able to totally rid spanning-tree in our networks.

But wouldn’t it be grand?

Further resources on TRILL (and where to feast on brains)

OpenFlow’s Awkward Teen Years

During the Networking Field Day 3 (The Nerdening) event, I attended a presentation by NEC (yeah, I know, turns out they make switches and have a shipping OpenFlow product, who knew?).

This was actually the second time I’ve seen this presentation. The first was at Networking Field Day 2 (The Electric Boogaloo), and my initial impressions can be found on my post here.

So what is OpenFlow? Well, it could be a lot of things. More generically it could be considered Software Defined Networking (SDN). (All OpenFlow is SDN, but not all SDN is OpenFlow?) It’s adding a layer of intelligence to a network that we have previously lacked. On a more technical level, OpenFlow is a way to program the forwarding tables of L2/L3 switches using a standard API.

Rather than each switch building their own forwarding tables through MAC learning, routing protocols, and policy-based routing, the switches (which could be from multiple vendors) are brainless zombies, accepting forwarding table updates from a maniacal controller that can perceive the entire network.

This could be used for large data centers, large global networks (WAN), but the NEC demonstration was a data center fabric representation of OpenFlow. In some respects, it’s similar to Juniper’s QFabric and Brocade’s VCS. They all have a boundary, and the connections to networks outside of that boundary is done through traditional routing and switching mechanisms, such as OSPF or MLAG (Multi-chassis Link Aggregation). Juniper’s QFabric is based on Juniper’s own proprietary machinations, Brocade’s VCS is based on TRILL (although not [yet?] interopable with other TRILL implementations), and NEC’s OpenFlow is based on, well, OpenFlow.

While it was the same presentation, the delegates (a few of us had seen the presentation before, most had not) were much tougher on NEC than we were last time. Or at least I’m sure it seemed that way. The fault wasn’t NEC, it was the fact that understanding of Openflow and its capabilities have increased, and that’s the awkward teen years of any technology. We were poking and prodding, thinking of new uses and new implications, limitations and caveats.

As we start to understand a technology better, we start to see it’s potential, as well as probe it’s potential drawbacks. Everything works great in PowerPoint, after all.  So while our questions seemed tougher, don’t take that as a sign that we’re souring on Openflow. We’re only tough because we care.

One thing that OpenFlow will likely not be is a way to manipulate individual flows, or at least all individual flows. The “flow” in Openflow does not necessary (and in all likelihood would rarely, if ever) represent an individual 6 tuple TCP connection or UDP flow (a source and destination MAC, IP, and TCP/UDP port). One of the common issues that Openflow haters/doubters/skeptics have brought up that an Openflow controller can only program about 700 flows per second into a switch. That’s certainly not enough to handle a site that may see hundreds of thousands of new TCP connections/UDP flows per second.

But that’s not what Openflow is meant to do, nor is that a huge issue. Think about routing protocols and Ethernet forwarding mechanisms. Neither handle deal with specific flows, only general flows (all traffic to a particular MAC address goes to this port, this /8 network goes out this interface, etc.). OpenFlow isn’t any different. So a limit of 700 flows per second per switch? Not a big deal.

OpenFlow is another way the build an Ethernet fabric, which means offering services and configurations beyond just a Layer 2/Layer 3 network.

Think of firewalls as how we deal with them now. Scaling is an issue, so they’re often choke points. You have to direct traffic in and out of them (NAT, transparent mode, routed mode), and they’re often deployed in pairs. N+1 firewalls is not that common (and often a huge pain in the ass to configure, although it’s been a while).  With Openflow (or SDN in general) it’s possible to define an endpoint (VM or physical) and say that endpoint requires flows pass through a firewall. Since we can steer flows (again, not on a per-individual flow basis, but general catch-most rules) scaling a firewall isn’t an issue. Need more capacity? Throw another firewall/IPS/IDS on the barbie. Openflow could put in forwarding rules on the switches and steer flows between active/active/active firewalls. These services could also be tied to the VM, and not the individual switch port (which is a characteristic of a fabric).

Myself, I’m all for any technology that abolishes spanning-tree/traditional Layer 2 forwarding. It’s an assbackwards way of doing things, and if that’s how we ran Layer 3 networks, networking administrators would have revolted by now. (I’m surprised more network administrators aren’t outraged by spanning-tree, but that’s another blog post).

NEC’s OpenFlow (and OpenFlow in general) is an interesting direction in the effort to make network more intelligent (aka “fabrics”). Some vendors are a bit dismissive of OpenFlow, and of fabrics in general. My take is that fabrics are a good thing, especially with virtualization. But more on that later. NEC’s OpenFlow product is interesting, but like any new technology it’s got a lot to prove (and the fact that it’s NEC means they have even more to prove).

Disclaimer: I was invited graciously by Stephen Foskett and Greg Ferro to be a delegate at Networking Field Day 3. My time was unpaid. The vendors who presented indirectly paid for my meals and lodging, but otherwise the vendors did not give me any compensation for my time, and there was no expectation (specific or implied) that I write about them, or write positively about them. I’ve even called a presenter’s product shit, because it was. Also, I wrote this blog post mostly in Aruba while drunk.

A High Fibre Diet: Twisted Pair Strikes Back

I saw a tweet recently from storage and virtualization expert Stu Miniman regarding Emulex announcing copper 10GBase-T Converged Network Adapters, running 10 Gigabit Ethernet over copper (specifically Cat 6a cable).

I recalled a comment I heard Greg Ferro made on a packet pushers episode (and subsequent blog post) about copper not being reliable enough for storage, with the specific issue being the bit error rate (BER), how how many errors the standard (FC, Ethernet, etc.) will allow over a physical medium. As we’ve talked about before, networking people tend to be a little more devil-may-care about their bits, where as storage folks get all anal rententive chef about their bits.

For 1 Gigabit Ethernet over copper (802.3ab/1000Base-T), the standard calls for a  goal BER of less than 10-10, or one wrong bit in every 10,000,000,000 bits. Which incidentally, is one error every second for a line rate 10 Gigabit Ethernet.  For Gigabit, that’s on error every 10 seconds, or 6 per minute.

Fibre Channel has a BER goal of less than 10-12, or on error in every 1,000,000,000,000 bits. That would be about 2 errors a minute with 10 Gigabit Ethernet.  That’s also 100 times less error-prone than Ethernet, which if you think about it, is a lot.

To give a little scale, that’s like comparing Barney Fife from The Andy Griffith show’s bad assery to Jason Statham’s character in.. well any movie he’s ever been in.

Holy shit, is he fighting… truancy?

Barney Fife, the 10-10 error rate of law enforcement. Wait… Wow, did I really just say that?

So given how fastidious about their storage networks storage folks can be, it’s understandable that storage administrator wouldn’t want their precious SCSI commands running over a network that’s 100 times less reliable than Fibre Channel.

However, while the Gigabit Ethernet standard has a BER target of less than 10-10, the 802.3an standard for 10 Gigabit Ethernet over copper (10GBaseT) has a BER goal of less than 10-12, which is in line with Fibre Channel’s goal. So is 10 Gigabit Ethernet over Cat 6A good enough for storage (specifically FCoE)? Sounds like it.

But the discussion also got me thinking, how close do we get to 10-10 as an error rate in Gigabit Ethernet? I just checked all the physical interfaces in my data center (laundry room), and every error counter is zero (presumably most errors would show up as CRC errors). And all it takes to hit 1010 bits is 1.25 Gigabytes of data transfer, and I do that when I download a movie off of iTunes.  So I know I’ve put dozens of gigs through my desktop since it was last rebooted, and nary an error. And my cabling isn’t exactly data center standard. One cable I use came with a cheap wireless access point I got a while ago. It makes me curious to what the actual BER is in reality with decent cables that don’t come close to 100 meters.

Of course, there’s still the power consumption issues and other drawbacks that Greg mentioned when compared to fiber (or coax). However, it’ll be good to have another option. There are some shops that won’t likely ever have fiber optics deployed.

Gigamon Side Story

The modern data center is a lot like modern air transportation. Not nearly as sexy as it used to be, the food isn’t nearly as good as it used to be, and more choke points than we used to deal with.

With 10 Gigabit Ethernet Fabrics available from vendors like Cisco, Juniper, Brocade, et all, we can conceive of these great, non-blocking, lossless networks that let us zip VMs and data to and fro.

And then reality sets in. The security team needs to inspection points. That means firewalls, IPS, and IDS devices. And one thing they’re not terribly good at? Gigs and gigs of traffic. Also scaling. And not pissing me off.

Pictured: Firewall Choke Points

This battle between scalability and security has data center administrators and security groups rumbling like some sort of West Side Data Center Story.

Dun dun da dun! Scalability!

Dun doo doo ta doo! Inspection!

So what to do? Enter Gigamon, the makers of the orangiest network devices you’ll find in a data center. They were part of Networking Field Day 2, which I participated in back in October.

Essentially what Gigamon allows you to do is scale out your SPAN/Mirror ports. On most Cisco switches, only two ports at a time can be spitting mirrored traffic. For something like a Nexus 7000 with up to 256 10 Gigabit Interfaces, that’s usually not sufficient for monitoring anything but a small smattering of your traffic.

A product like Gigamon can tap fibre and copper links, or take in the output of a span port, classify the traffic, and send it out an appropriate port. This would allow a data center to effectively scale traffic monitoring in a way that’s not possible with mere mirrored ports alone. It would effectively remove all choke points that we normally associate with security. You’d just need to scale up with the appropriate number of IDS/IPS devices.

But with great power, comes the ability to do some unsavory things. During the presentation Gigamon mentioned they’d just done a huge install with Russia (note: I wouldn’t bring that up in your next presentation), allowing the government to monitor data of its citizens. That made me less than comfortable (and it’s also why it scares the shit out of Jeremy Gaddis). But “hey, that’s how Russia rolls” you might say. We do it here in the US, as well, through the concept of “lawful interception“. Yeah, I did feel a little dirty after that discussion.

Still, it could be used for good by removing the standard security choke points. Even if you didn’t need to IPS every packet in your data center, I would still consider architecting a design with Gigamon or another vendor like them in mind. It wouldn’t be difficult to consider where to put the devices, and it could save loads of time in the long run. If a security edict came down from on high, the appropriate devices would be put in place with Gigamon providing the pipping without choking your traffic.

In the mean time, I’m going to make sure everything I do is SSL’d.

Note: As a delegate/blogger, my travel and accommodations were covered by Gestalt IT, who vendors paid to have spots during the Networking Field Day. Vendors pay Gestalt IT to present, so while my travel (hotel, airfare, meals) were covered indirectly by the vendors, no other remuneration (save for the occasional tchotchke) from any of the vendors, directly or indirectly, or by Gestalt IT was recieved. Vendors were not promised, nor did they ask for any of us to write about them, or write about them positively. In fact, we sometimes say their products are shit (when, to be honest, sometimes they are, although this one wasn’t). My time was unpaid. 

A Tale of Two FCoEs

A favorite topic of discussion among the data center infrastructure crowd is the state of FCoE. Depending on who you ask, FCoE is dead, stillborn, or thriving.

So, which is it? Are we dealing with FUD or are we dealing with vendor hype?  Is FCoE a success, or is it a failure? The quick answer is.. yes? FCoE is both thriving, and yet-to-launch. So… are we dealing with Schrödinger’s protocol?

Note quite. To understand the answer, it’s important to make to make the distinction with two very different ways that FCoE is implemented: Edge FCoE and Multi-hop FCoE (a subject I’ve written about before, although I’ve renamed things a bit).

Edge FCoE

Edge FCoE is thriving, and has been for the past few years. Edge FCoE is when you take a server (or sometimes a storage array), connect it to an FCoE switch. And everything beyond that first switch is either native Fibre Channel or native Ethernet.

Edge FCoE is distinct from Multi-hop for one main reason: It’s a helluva lot easier to pull off than multi-hop FCoE. With edge-FCoE, the only switch that needs to understand FCoE  is that edge FCoE switch. They plug into traditional Fibre Channel networks over traditional Fibre Channel links (typically with NPV mode).

Essentially, no other part of your network needs to do anything you haven’t done already. You do traditional Ethernet, and traditional Fibre Channel. FCoE only exists in that first switch, and is invisible to the rest of your LAN and SAN.

Here are the things you (for the most part) don’t have to worry about configuring on your network with Edge FCoE:

  • Data Center Bridging (DCB) technologies
    • Priority Flow Control (PFC) which enables lossless Ethernet
    • Enhanced Transmission Selection (ETS) allowing the ability to dedicate bandwidth to various traffic (not required but recommended -Ivan Pepelnjak)
    • DCBx: A method to communicate DCB functionality between switches over LLDP (oh, hey, you do PFC? Me too!)
  • Whether or not your aggregation and core switches support FCoE (they probably don’t, or at least the line cards don’t)

There is PFC and DCBx in the server-to-edge FCoE link, but it’s typically inherint, and supported by the CNA and the edge-FCoE switch and turned on by default or auto-detected. In some implementations, there’s nothing to configure. PFC is there, and un-alterable. Even if there are some settings to tweak, it’s generally easier to do it on edge ports than on a aggregation/core network.

Edge FCoE is the vast majority of how FCoE is implemented today. Everyone from Cisco’s UCS to HP C7000 series can do it, and do it well.

Multi-Hop

The very term multi-hop FCoE is controversial in nature (just check the comments section of my terminology FCoE article), but for the sake of this article, multi-hop FCoE is any topological implmentation of FCoE where FCoE frames move around a converged network beyond a single switch.

Multi-hop FCoE requires a couple of things: It requires a Fibre Channel-aware network, losslessness through priority flow control (PFC), DCBx (Data Center Bridging Exchange), enhanced transmission selection (ETS),  and you’ve got a recipe for a switch that I’m pretty sure ain’t in your rack right now. For instance, the old man of the data center, the Cisco Catalyst 6500, doesn’t now, and will likely never do FCoE.

Switch-wise, there are two types of ways to do multi-hop FCoE: A switch can either forward FCoE frames based on the Ethernet headers (MAC address source/destination), or you can forward frames based on the Fibre Channel headers (FCID source/destionation).

Ethernet-forwarded/Pass-through Multi-hop

If you build a multi-hop network with switches that forward based on Ethernet headers (as Juniper and Brocade do), then you’ll want something other than spanning-tree to do loop prevention and enable multi-pathing. Brocade uses a method based on TRILL, and Juniper uses their proprietary QFabric (based on unicorn tears).

Ethernet-forwaded FCoE switches don’t have a full Fibre Channel stack, so they’re unaware of what goes on in the Fibre Channel world, such as zoning and with the exception of the FIP (FCoE Initiation Protocol), which handles discovery of attached Fibre Channel devices (connecting virtual N_Ports to virtual F_Ports).

FC-Forwarded/Dual-stack Multi-hop

If you build a multi-hop network with switches that forward based on Fibre Channel headers, your FCoE switch needs to have both a full DCB-enabled Ethernet stack, and a full Fibre Channel stack. This is the way Cisco does it on their Nexus 5000s, Nexus 7000s, and MDS 9000 (with FCoE line cards), although the Nexus 4000 blade switch is the Ethernet-forwarded kind of switch.

The benefit of using a FC-Forwarded switch is that you don’t need a network that does TRILL or anything fancier than spanning-tree (spanning-tree isn’t enabled on any VLAN that passes FCoE). It’s pretty much a Fibre Channel network, with the ports being Ethernet instead of Fibre Channel. In fact, in Cisco’s FCoE reference design, storage and networking traffic are still port-gaped (a subject of a future blog post). FCoE frames and regular networking frames don’t run over the same links, there are dedicated FCoE links.

It’s like running a Fibre Channel SAN that just happens to sit on top of your Ethernet network. As Victor Moreno the LISP project manager at Cisco says: “The only way is to overlay”.

State of FCoE

It’s not accurate to say that FCoE is dead, or that FCoE is a success, or anything in between really, because the answer is very different once you separate multi-hop and edge-FCoE.

Currently, multi-hop has yet to launch in a significant way. In the past 2 months, I have heard rumors of a customer here or there implementing it, but I’ve yet to hear any confirmed reports or first hand tales. I haven’t even configured it personally. I’m not sure I’m quite as wary as Greg Ferro is, but I do agree with his wariness. It’s new, it’s not widely deployed, and that makes it riskier. There are interopability issues, which in some ways are obviated by the fact no one is doing Ethernet fabrics in a multi-vendor way, and NPV/NPIV can help keep things “native”. But historically, Fibre Channel vendors haven’t played well together. Stephen Foskett lists interopability among his reasonable concerns with FCoE multi-hop. (Greg, Stephen, and everyone else I know are totally fine with edge FCoE.)

Edge-FCoE is of course vibrant and thriving. I’ve configured it personally, and it works easily and seamlessly into an existing FC/Ethernet network. I have no qualms about deploying it, and anyone doing convergence should at least consider it.

Crystal Ball

In terms of networking and storage, it’s impossible to tell what the future will hold. There are a number of different directions FCoE, iSCSI, NFS, DCB, Ethernet Fabrics, et all could go. FCoE could end up replacing Fibre Channel entirely, or it could be relegated to the edge and never move from there. Another possibility as suggested to me by Stephen Foskett is that Ethernet will become the connection standard for Fibre Channel devices. They would still be called Fibre Channel switches, and SANs setup just like they always have been, but instead of having 8/16/32 Gbit FC ports, they’d have 10/40/100 Gbit Ethernet ports. To paraphrase Bob Metcalfe, “I don’t know what will come after Fibre Channel, but it will be called Ethernet”.

I, For One, Welcome Our New OpenFlow Overlords

When I first signed up for Networking Field Day 2 (The Electric Boogaloo), I really had no idea what OpenFlow was. I’d read a few articles, listened to a few podcasts, but still only had a vague idea of what it was. People I respect highly like Greg Ferro of Packet Pushers were into it, so it had my attention. But still, not much of a clue what it was. I attended the OpenFlow Symposium, which preceeded the activites of Networking Field Day 2, and had even less of an idea of what it was.

Then I saw NEC (really? NEC?) do a demonstration. And my mind was blown.

Side note: Let this be a lesson to all vendors. Everything works great in a PowerPoint presentation. It also conveys very little about what a product actually does. Live demonstrations are what get grumpy network admins (and we’re all grumpy) giddy like schoolgirls at  Justin Bieber concert. You should have seen Ivan Pepelnjak

I’m not sure if I got all my assumptions right about OpenFlow, so feel free to point out if I got something completely bone-headedly wrong. But from what I could gather, OpenFlow could potentially do a lot of things:

  • Replace traditional Layer 2 MAC learning and propagation mechanisms
  • Replace traditional Layer 3 protocols
  • Make policy-based routing (routing based on TCP/UDP port) something useful instead of a one-off, pain in the ass, ok just-this-one time creature it is now
  • Create “traceroute on steroids”

Switching (Layer 2)

Switching is, well, rather stupid. At least learning MAC addresses and their locations are. To forward frames, switches need to learn which ports to find the various MAC addresses. Right now the only way they learn about it is listening to the cacophony of hosts broadcasting and spewing frames. And when one switch learns a MAC address, it’s not like it tells the others. No, in switching, every switch is on its own for learning. In a single Layer 2 domain, every switch needs to learn where to find every MAC address on its own.

Probably the three biggest consequences of this method are as follows

  • No loop avoidance. The only way to prevent loops is to prevent redundant paths (i.e. spanning-tree protocol)
  • Every switch in a Layer 2 domain needs to know every frickin’ MAC address. The larger the Layer 2 domain, the more MAC addresses need to be learned. Suddenly, a CAM table size of 8,000 MAC addresses doesn’t seem quite enough.
  • Broadcasts like woah. What happens when a switch gets a frame that it doesn’t have a CAM entry for? BROADCAST IT OUT ALL PORTS BUT THE RECEIVING PORT. It’s the all-caps typing of the network world.
For a while in the early 2000’s we could get away with all this. Multi-layer switches (switches that did Layer 3 routing as well) got fast enough to route as fast as they could switch, so we could easily keep our Layer 2 domains small and just route everything.

That is, until VMware came and screwed it all up. Now we had to have Layer 2 domains much larger than we’d planned for. 4,000 entry CAM tables quickly became cramped.

MAC learning would be more centralized with OpenFlow. ARP would still be there at the edge, so a server would still think it was communicating with a regular switch network. But OpenFlow could determine which switches need to know what MAC addresses are where, so every switch doesn’t need to learn everything.

And no spanning-tree. Loop avoidance is prevented by the OpenFlow controller. No spanning-tree (although you can certainly do spanning-tree at the edge to communicate with legacy segments).

Routing (Layer 3)

Routing isn’t quite as stupid as switching. There are a number of good protocols out there that will scale pretty well, but it does require configuration on each device. It’s dynamic in that it can do multi-pathing (where traditional Layer 2 can’t), as well as recover from dead links without taking down the network for several (dozens of) seconds.  but it doesn’t quite allow for centralized control, and it has limited dynamic ability. For instance, there’s not mechanism to do “oh, hey, for right now why don’t we just move all these packets from this source to that source” in an efficient way. Sure, you can inject some host routes to do that, but it’s got to come from some sort of centralized controller.

Flow Routing (Layer 4)

So why stop at Layer 3? Why not route based on TCP/UDP header information? It can be done with policy-based routing (PBR) today, but it’s not something that can be communicated from router to router (OSPF cares not how you want to direct a TCP port 80 flow versus a TCP port 443 flow).  There is also WCCP, the Web Cache Communication Protocol, which today is not used for web caches, but WAN Optimization Controllers, like Cisco’s WAAS, or Cisco’s sworn enemy, Riverbed (seriously, just say the word ‘Riverbed’ at a Cisco office).

Sure it’s watery and tastes like piss, but at least it’s not policy-based routing

A switch with modern silicon can look at Layer 3 and Layer 4 headers as easily as they can look at Layer 2 headers. It’s all just bits in the flow, man. OpenFlow takes advantage of this, and creates, for lack of a cooler term, a Layer 2/3/4 overlord.

I, for one, welcome our new OpenFlow overlords

TCAMs or shared memory, or whatever you want to call the forwarding tables in your multi-layer switches can be programmed at will by an OpenFlow overlord, instead of being populated by the lame-ass Layer 2, Layer 3, and sometimes Layer 4 mechanisms on a switch-by-switch basis.

Since we can direct traffic based on flows throughout a multi-switch network, there’s lots of interesting things we can do with respect to load balancers, firewalls, IPS, caches, etc. Pretty interesting stuff.

Flow View (or Traceroute on Steroids)

I think one of the coolest demonstrations from NEC was when they showed the flow maps. They could punch up any source and destination address (IP or MAC) and there would be a graphical representation of the flow (and which devices they went through) on the screen. The benefits for that would be obvious. Server admin complain about slowness? Trace the flow, and check the interfaces on all the transit devices. That’s something that might take quite a while in a regular route/switch network, but can be done in a few seconds with an OpenFlow controller.

An OpenFlow Controller Tracks a Flow

To some extent, there are other technologies that can take care of some of these issues. For instance, TRILL and SPB take a good wack at the Layer 2 bullshit. Juniper’s QFabric does a lot of the ain’t-nothin-but-a-tuple thang and switches based on Layer2/3 information. But in terms of potential, I think OpenFlow has them all beat.

Don’t get too excited right now though, as NEC is the only vendor that has working implementation of OpenFlow controller, and other vendors are working on theirs. Standford apparently has OpenFlow up and running in their environment, but its all still in the early stages.

Will OpenFlow become the future? Possibly, quite possibly. But even if what we now call OpenFlow isn’t victorious, something like it will be. There’s no denying that this approach, or something similar, is a much better way to handle traffic engineering in the future than our current approach. I’ve only scratched the surface of what can be done with this type of network design. There’s also a lot that can be gained in terms of virtualization (an OpenFlow vSwitch?) as well as applications telling the network what to do. Cool stuff.

Note: As a delegate/blogger, my travel and accommodations were covered by Gestalt IT, who vendors paid to have spots during the Networking Field Day. Vendors pay Gestalt IT to present, so while my travel (hotel, airfare, meals) were covered indirectly by the vendors, no other remuneration (save for the occasional tchotchke) from any of the vendors, directly or indirectly, or by Gestalt IT was recieved. Vendors were not promised, nor did they ask for any of us to write about them, or write about them positively. In fact, we sometimes say their products are shit (when, to be honest, sometimes they are, although this one wasn’t). My time was unpaid.