Peak Fibre Channel

There have been several articles talking about the death of Fibre Channel. This isn’t one of them. However, it is an article about “peak Fibre Channel”. I think, as a technology, Fibre Channel is in the process of (if it hasn’t already) peaking.

There’s a lot of technology in IT that doesn’t simply die. Instead, it grows, peaks, then slowly (or perhaps very slowly) fades. Consider Unix/RISC. The Unix/RISC market right now is a caretaker platform. Very few new projects are built on Unix/RISC. Typically a new Unix server is purchased to replace an existing but no-longer-supported Unix server to run an older application that we can’t or won’t move onto a more modern platform. The Unix market has been shrinking for over a decade (2004 was probably the year of Peak Unix), yet the market is still a multi-billion dollar revenue market. It’s just a (slowly) shrinking one.

I think that is what is happening to Fibre Channel, and it may have already started. It will become (or already is) a caretaker platform. It will run the workloads of yesterday (or rather the workloads that were designed yesterday), while the workloads of today and tomorrow have a vastly different set of requirements, and where Fibre Channel doesn’t make as much sense.

Why Fibre Channel Doesn’t Make Sense in the Cloud World

There are a few trends in storage that are working against Fibre Channel:

  • Public cloud growth outpaces private cloud
  • Private cloud storage endpoints are more ephemeral and storage connectivity is more dynamic
  • Block storage is taking a back seat to object (and file) storage
  • RAIN versus RAID
  • IP storage is as performant as Fibre Channel, and more flexible

Cloudy With A Chance of Obsolescence

The transition to cloud-style operations isn’t a great for Fibre Channel. First, we have the public cloud providers: Amazon AWS, Microsoft Azure, Rackspace, Google, etc. They tend not to use much Fibre Channel (if any at all) and rely instead on IP-based storage or other solutions. And what Fibre Channel they might consume, it’s still far fewer ports purchased (HBAs, switches) as workloads migrate to public cloud versus private data centers.

The Ephemeral Data Center

In enterprise datacenters, most operations are what I would call traditional virtualization. And that is dominated by VMware’s vSphere. However, vSphere isn’t a private cloud. According to NIST, to be a private cloud you need to be self service, multi-tenant, programmable, dynamic, and show usage. That ain’t vSphere.

For VMware’s vSphere, I believe Fibre Channel is the hands down best storage platform. vSphere likes very static block storage, and Fibre Channel is great at providing that. Everything is configured by IT staff, a few things are automated though Fibre Channel configurations are still done mostly by hand.

Probably the biggest difference between traditional virtualization (i.e. VMware vSphere) and private cloud is the self-service aspect. This also makes it a very dynamic environment. Developers, DevOpsers, and overall consumers of IT resources configure spin-up and spin-down their own resources. This leads to a very, very dynamic environment.


Endpoints are far more ephemeral, as demonstrated here by Mr Mittens.

Where we used to deal with virtual machines as everlasting constructs (pets), we’re moving to a more ephemeral model (cattle). In Netflix’s infrastructure, the average lifespan of a virtual machine is 36 hours. And compared to virtual machines, containers (such as Docker containers) tend to live for even shorter periods of time. All of this means a very dynamic environment, and that requires self-service portals and automation.

And one thing we’re not used to in the Fibre Channel world is a dynamic environment.

scaredgifA SAN administrator at the thought of automated zoning and zonesets

Virtual machines will need to attach to block storage on the fly, or they’ll rely on other types of storage, such as container images, retrieved from an object store, and run on a local file system. For these reasons, Fibre Channel is not usually a consideration for Docker, OpenStack (though there is work on Fibre Channel integration), and very dynamic, ephemeral workloads.


Block storage isn’t growing, at least not at the pace that object storage is. Object storage is becoming the de-facto way to store the deluge of unstructured data being stored. Object storage consumption is growing at 25% per year according to IDC, while traditional RAID revenues seem to be contracting.

Making it RAIN


In order to handle the immense scale necessary, storage is moving from RAID to RAIN. RAID is of course Redundant Array of Inexpensive Disks, and RAIN is Redundant Array of Inexpensive Nodes. RAID-based storage typically relies on controllers and shelves. This is a scale-up style approach. RAIN is a scale-out approach.

For these huge scale storage requirements, such as Hadoop’s HDFS, Ceph, Swift, ScaleIO, and other RAIN handle the exponential increase in storage requirements better than traditional scale-up storage arrays. And primarily these technologies are using IP connectivity/Ethernet as the node-to-node and node-to-client communication, and not Fibre Channel. Fibre Channel is great for many-to-one communication (many initiators to a few storage arrays) but is not great at many-to-many meshing.

Ethernet and Fibre Channel

It’s been widely regarded in many circles that Fibre Channel is a higher performance protocol than say, iSCSI. That was probably true in the days of 1 Gigabit Ethernet, however these days there’s not much of a difference between IP storage and Fibre Channel in terms of latency and IOPS. Provided you don’t saturate the link (neither handles eliminates congestion issues when you oversaturate a link) they’re about the same, as shown in several tests such as this one from NetApp and VMware.

Fibre Channel is currently at 16 Gigabit per second maximum. Ethernet is 10, 40, and 100, though most server connections are currently at 10 Gigabit, with some storage arrays being 40 Gigabit. Iin 2016 Fibre Channel is coming out with 32 Gigabit Fibre Channel HBAs and switches, and Ethernet is coming out with 25 Gigabit Ethernet interfaces and switches. They both provide nearly identical throughput.

Wait, what?

But isn’t 32 Gigabit Fibre Channel faster than 25 Gigabit Ethernet? Yes, but barely.

  • 25 Gigabit Ethernet raw throughput: 3125 MB/s
  • 32 Gigabit Fibre Channel raw throughput: 3200 MB/s

Do what now?

32 Gigabit Fibre Channel isn’t really 32 Gigabit Fibre Channel. It actually runs at about 28 Gigabits per second. This is a holdover from the 8/10 encoding in 1/2/4/8 Gigabit FC, where every Gigabit of speed brought 100 MB/s of throughput (instead of 125 MB/s like in 1 Gigabit Ethernet). When FC switched to 64/66 encoding for 16 Gigabit FC, they kept the 100 MB/s per gigabit, and as such lowered the speed (16 Gigabit FC is really 14 Gigabit FC). This concept is outlined here in this screencast I did a while back. 16 Gigabit Fibre Channel is really 14 Gigabit Fibre Channel. 32 Gigabit Fibre Channel is 28 Gigabit Fibre Channel.

As a result, 32 Gigabit Fibre Channel is only about 2% faster than 25 Gigabit Ethernet. 128 Gigabit Fibre Channel (12800 MB/s) is only 2% faster than 100 Gigabit Ethernet (12500 MB/s).

Ethernet/IP Is More Flexible

In the world of bare metal server to storage array, and virtualization hosts to storage array, Fibre Channel had a lot of advantages over Ethernet/IP. These advantages included a fairly easy to learn distributed access control system, a purpose-built network designed exclusively to carry storage traffic, and a separately operated fabric.  But those advantages are turning into disadvantages in a more dynamic and scaled-out environment.

In terms of scaling, Fibre Channel has limits on how big a fabric can get. Typically it’s around 50 switches and a couple thousand endpoints. The theoretical maximums are higher (based on the 24-bit FC_ID address space) but both Brocade and Cisco have practical limits that are much lower. For the current (or past) generations of workloads, this wasn’t a big deal. Typically endpoints numbered in the dozens or possibly hundreds for the large scale deployments. With a large OpenStack deployment, it’s not unusual to have tens of thousands of virtual machines in a large OpenStack environment, and if those virtual machines need access to block storage, Fibre Channel probably isn’t the best choice. It’s going to be iSCSI or NFS. Plus, you can run it all on a good Ethernet fabric, so why spend money on extra Fibre Channel switches when you can run it all on IP? And IP/Ethernet fabrics scale far beyond Fibre Channel fabrics.

Another issue is that Fibre Channel doesn’t play well with others. There’s only two vendors that make Fibre Channel switches today, Cisco and Brocade (if you have a Fibre Channel switch that says another vendor made it, such as IBM, it’s actually a re-badged Brocade). There are ways around it in some cases (NPIV), though you still can’t mesh two vendor fabrics reliably.


Pictured: Fibre Channel Interoperability Mode

And personally, one of my biggest pet peeves regarding Fibre Channel is the lack of ability to create a LAG to a host. There’s no way to bond several links together to a host. It’s all individual links, which requires special configurations to make a storage array with many interfaces utilize them all (essentially you zone certain hosts).

None of these are issues with Ethernet. Ethernet vendors (for the most part) play well with others. You can build an Ethernet Layer 2 or Layer 3 fabric with multiple vendors, there are plenty of vendors that make a variety of Ethernet switches, and you can easily create a LAG/MCLAG to a host.


My name is MCLAG and my flows be distributed by a deterministic hash of a header value or combination of header values.

What About FCoE?

FCoE will share the fate of Fibre Channel. It has the same scaling, multi-node communication, multi-vendor interoperability, and dynamism problems as native Fibre Channel. Multi-hop FCoE never really caught on, as it didn’t end up being less expensive than Fibre Channel, and it tended to complicate operations, not simplify them. Single-hop/End-host FCoE, like the type used in Cisco’s popular UCS server system, will continue to be used in environments where blades need Fibre Channel connectivity. But again, I think that need has peaked, or will peak shortly.

Fibre Channel isn’t going anywhere anytime soon, just like Unix servers can still be found in many datacenters. But I think we’ve just about hit the peak. The workload requirements have shifted. It’s my belief that for the current/older generation of workloads (bare metal, traditional/pet virtualization), Fibre Channel is the best platform. But as we transition to the next generation of platforms and applications, the needs have changed and they don’t align very well with Fibre Channel’s strengths.

It’s an IP world now. We’re just forwarding packets in it.



Traditional versus Cloud Native Web Applications

Here’s a quick whiteboard session of the differences between traditional and cloud native web applications.

The Cloud Is Now A Thing

In the networking world, we’re starting to see the term “cloud” more and more. When I teach classes, if I so much as mention the word cloud, I start to see some eyes roll. That’s completely understandable, as the term cloud was such an overused buzzword, only having been recently supplanted only by “software defined”.

Here’s real-life supervillain (dude owns an MiG 29 and an island with a volcano on it… seriously) Larry Ellison freaking out about the term cloud.

“It’s not water vapor! All it is, is a computer attached to a network!”

But here’s the thing, it’s actually a thing now. Rather than a catch-all buzzword, it’s being used more and more to define a particular type of operational model. And it’s defined by NIST, the National Institute of Standards and Technology, part of the US Department of Commerce. With the term cloud, we now get a higher degree of specificity.

The NIST definition of cloud is as follows:

  • On-demand self service
  • Broad network access
  • Resource pooling (multi-tenant)
  • Rapid Elasticity
  • Measured service

That first item on the list, the on-demand self service, is a huge change in how we will be doing networking. Right now network configurations are mostly done by network administrators. If you have a network need and aren’t a network admin, you open up a ticket and wait.

In (private) cloud computing, which will include a large networking component, the network elements, end points, and devices will be configured by end-users/developers, not the IT staff. The IT staff will maintain the overall cloud infrastructure, but will not do the day-to-day changes. The changes will happen far too frequently, and they will happen in the middle of the day. Change control will probably be handled for the underlying infrastructure, but the tenants will likely make many changes during the day. The fault domains will be a lot smaller, making mistakes impactful to a small segment for these changes, and the automation will make chance that a change (such as adding a new load balancing VIP) will be done correctly much higher.

This is how things have been done in public clouds (Amazon, Rackspace, etc.) for a while now.

When people talk about the death of the CLI, this is what they’re referring to. The configuration changes we make won’t be on a Cisco or Juniper CLI, but through some sort of portal (which can be either GUI, CLI, or API calls) and will be largely automated. We’ve hit the twilight of the age of Conf T.

With OpenStack, Docker, CoreOS, containers, DevOps, ACI, NSX, and all of the new operational models, technologies, and platforms, the next generation data center will be a self-service data center.

Ethernet over Fibre Channel

Since the 80’s, Ethernet has dominated the networking world. The LAN, the WAN, and the MAN are all now dominated by Ethernet links. FIDDI, HIPPI, ATM, Frame Relay, they’ve all gone by the wayside. But there is one protocol that has stuck around to run alongside Ethernet, and that’s Fibre Channel. While Fibre Channel has mostly sat in the shadow of Ethernet, relegated to only storage traffic, it’s now poised to overtake Ethernet in the battle for the LAN. And the way that Fibre Channel is taking on Ethernet is with Ethernet over Fibre Channel.


Suck it, Metcalfe

While Ethernet has enjoyed tremendous popularity, it has several (debilitating) limitations. For one, forwarding is haunted the possibility of a loop, and Spanning Tree Protocol is required to keep a watchful eye. Unfortunately, STP is almost as bad as a loop, with the ample opportunity for misconfigurations (rouge root bridges) and other shenanigans.  TRILL, a Layer 2 overlay for Ethernet that allows multi-pathing, hasn’t found its way into a commercial product yet, and its derivatives (FabricPath from Cisco and VCS from Brocade) haven’t seen much in the way of adoption.

Rathern than pile fix upon fix on Ethernet, SAN administrators (known for being the loose canons of the data center) are making a bold push to take over LAN networks as well… and they’re winning.

The T17 committe had been established by the INCITS, which is the standards body that is responsible for Fibre Channel, FCoE, and now EoFC. The T17 is responsible for all the specifications around EoFC, and in particular the interface between the two.

We really have a lot of advantages over Ethernet in terms of topology and forwarding. For one, we’re a lossless network, providing a lot more reliability than a traditional Ethernet network. We also have multi-pathing built in with FSPF routing, while still providing Layer 2 adjacencies that are still required by the old crusty crapplications that are still on people’s networks, somehow.” -John Etherman, T17 committee chair.

They’ve made a lot of progress in a relatively short time, from ironing out the specifications to getting ASICs spun, and their work is bearing fruit. Products are starting to ship, and several marquee clients have announced fabrics built entirely with EoFC.

A Day in the life of a EoFC Frame

To keep compatibility with older Ethernet/TCP/IP stacks, CNHs (Converged Network HBAs) provide Ethernet interfaces to the host operating system. The frame is formed by the host, and the CNH encapsulates the Ethernet frame into a Fibre Channel frame. Since standard Ethernet MTU is only 1500 bytes, they fit quite nicely into the maximum 2048 byte Fibre Channel frame. The T13 working group also provides specifications for Jumbo Ethernet frames up to 9216 bytes, by either fragmenting the frame into multiple 2048-byte Fibre Channel frames,

WWPNs are derived from the MAC addresses that the hosts sees. Since MAC addresses aren’t a full 64-bits, the T17 working group has allocated the 80:08 prefix to EoFC. So if your MAC address was 00:25:B6:01:23:45, the WWPN would be 80:08:00:25:B6:01:23:45. This keeps the EoFC WWPNs out of the range of the initiators (starting with 1 or 2) and targets (starting with 5).


FC_IDs are assigned to the WWPNs on a transitory basis, and are what the Fibre Channel headers have in terms of source/destination addresses. When the Fibre Channel frame reaches its destination NX_Port (Node LAN port), the Ethernet frame is de-encapsulated from the Fibre Channel frame, and the hosts networking stack takes care of the rest. From a host’s perspective, it has no idea the transport is Fibre Channel.


The biggest benefit to EoFC is the lossless network that Fibre Channel provides. Since the majority of traffic is East/West in modern data center workloads, busy hosts can suffer from an incast problem, where the buffers can be overloaded as a single 10 Gigabit link receives packets from multiple sources, all operating at 10 Gigabit. Fibre Channel transport provides port to port flow control, and can ensure that nothing gets dropped.


Configuration of EoFC is fairly straightforward. I’ve got access to a new Nexus 8008, with a 32 Gbit EoFC line card that I’ve connected to a Cisco C-series server with a CNH.

nexus1# feature eofc
EoFC feature checked out
Loading Ethernet module...
Loading Spanning Tree module...
Loading LLDP...
Grace period license remaining: 110 days

nexus1# vlan 10
nexus1(vlandb)# vsan 10
nexus1(vsandb)# 10 name Storage-A
nexus1(vsandb)# vsan 1010
nexus1(vsandb)# vsan 1010 name Ethernet transport
nexus1(vsandb)# eofc vlan 10
nexus1(vsandb)# interface veth1
nexus1(vif)# switchport
nexus1(vif)# switchport mode access
nexus1(vif)# switchport access vlan 10
nexus1(vif)# bind interface fc1/1
nexus1(vif)# no shut
nexus1(vif)# int fc1/1
nexus1(if)# switchport mode F
nexus1(if)# switchport allowed vsan 10,1010
nexus1(if)# no shut 

Doing a show interface shows me that my connection is live.

 nexus1# show interface ethernet veth1 
 vEthernet1 is up
 Hardware: 1000/10000 Ethernet, address: 000d.ece7.df48 (bia 000d.ece7.df48)
 Attached to: fc1/1 (pWWN: 80:08:00:0D:EC:E7:DF:48)
 MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec,
 reliability 255/255, txload 1/255, rxload 1/255
 Encapsulation EoFC/ARPA
 Port mode is EoFC
 full-duplex, 32 Gb/s, media type is 1/2/4/8/16/32g
 Beacon is turned off
 Input flow-control is off, output flow-control is off
 Rate mode is dedicated
 Switchport monitor is off
 Last link flapped 09:03:57
 Last clearing of "show interface" counters never
 30 seconds input rate 2376 bits/sec, 0 packets/sec
 30 seconds output rate 1584 bits/sec, 0 packets/sec
 Load-Interval #2: 5 minute (300 seconds)
 input rate 1.58 Kbps, 0 pps; output rate 792 bps, 0 pps
 0 unicast packets 10440 multicast packets 0 broadcast packets
 10440 input packets 11108120 bytes
 0 jumbo packets 0 storm suppression packets
 0 runts 0 giants 0 CRC 0 no buffer
 0 input error 0 short frame 0 overrun 0 underrun 0 ignored
 0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
 0 input with dribble 0 input discard
 0 Rx pause
 0 unicast packets 20241 multicast packets 105 broadcast packets
 20346 output packets 7633280 bytes
 0 jumbo packets
 0 output errors 0 collision 0 deferred 0 late collision
 0 lost carrier 0 no carrier 0 babble
 0 Tx pause
 1 interface resets

Speeds and Feeds

EoFC is backwards compatible with 1/2/4/8 and 16 Gigabit Fibre Channel, but it’s really expected to take off with the newest 32/128 Gbit interfaces that are being released from vendors like Cisco, Juniper, and Brocade. Brocade, QLogix, Intel, and Emulex are all expected to provide CNHs operating at 32 Gbit speeds, with 32 and 128 Gbit interfaces on line cards and fixed switches to operate as ISLs.

Nexus 8009

Nexus 8008: 384 ports of 32 Gbit EoFC

Switches are already shipping from Cisco and Brocade, with Juniper to release their newest QFC line before the end of Q2.

Differences in how Fibre Channel and Ethernet Measure Speed

Fibre Channel and FCoE: Some Basics

There’s been some misconceptions and misinformation lately about FCoE. Like any technology, there are times when it makes sense and times when it doesn’t, but much of the anti-FCoE talk lately has been primarily ignorance and/or wilful misrepresentation.

In an effort to fight that ignorance, I put together a quick introduction to how FC and FCoE works. They both operate on the basic premise that you can’t drop any frames. Fibre Channel was built as a lossless protocol, and with a bit of work, Ethernet can also be lossless.

Check it out:

Learn what Russ Fellows Doesn’t Know

So how’s this for a condescending tweet?

It’s from Russ Fellows, author of the infamous FCoE “study” (which has been widely debunked for its many hilarious errors):

Interesting article (check it out). But the sad/amusing irony is that he’s wrong. How is he wrong? Here’s what Russ Fellows doesn’t know about storage:

1, 2, 4, and 8 Gbit Fibre Channel (as he points out) uses 8/10 bit encoding. That means about a 20% of the bandwidth available was lost due to encoding overhead (as Russ pointed out). That’s why 8 Gbit Fibre Channel only provides 800 MB/s of connectivity, even though 8,000 Megabits per second equates to 1,000 Megabytes per second (8000 Megabits / (8 bits per byte) = 1,000 Megabytes).

With this overhead in mind, Fibre Channel was designed to give 100 MB/s for every Gigabit of speed. It never increased the baud rate to make up for the overhead.

Ethernet, on the other hand, did increase the baud rate to make up for the overhead. Gigabit Ethernet uses the same 8/10 bit encoding, but they kicked the baud rate up to 1.25 gigabaud to make up the differences. As such, Gigabit Ethernet provides true 1 gigabit of throughput, or 125 Megabytes per second.

10 Gigabit Ethernet moved to 64/66 encoding, and kept to the approach of not letting the overhead impact throughput. 10 Gigabit Ethernet then provides 1250 Megabytes per second of throughput. The baud rate is 10.3125, giving true 10 Gigabit per second of data.

When Fibre Channel moved to the more efficient 64/66 bit encoding, rather than change the 100 MB/s per gigabit to 125 MB/s (which you get with all Ethernet speeds), they left the ratio (1 Gigabit to 100 MB/s) the same. Thus, every Gigabit = 100 MB/s, just like in previous speeds (1/2/4/8 FC). So while 16 Gbit Fibre Channel provides 1600 MB/s of throughput, the baud rate is actually only 14 gigabaud, and not true 16 Gbit. And don’t take my word for it, check out page 7 of Scott Shimomura‘s (of Brocade) presentation at the SPDE conference.

  • 1 Gbit Fibre Channel = 100 MB/s
  • 1 Gbit Ethernet = 125 MB/s
  • 2 Gbit Fibre Channel = 200 MB/s
  • 4 Gbit Fibre Channel = 400 MB/s
  • 8 Gbit Fibre Channel = 800 MB/s
  • 10 Gbit Ethernet/FCoE = 1250 MB/s
  • 16 Gbit Fibre Channel = 1600 MB/s

10 Gigabit Ethernet provides 1250 MB/s, providing true 10 Gigabit Ethernet, and not putting the slight overhead into the equation. So while 10 Gigabit Ethernet is true 10 Gigabit, 16 Gigabit Fibre Channel is actually 14 Gigabit Fibre Channel (14.025, to be exact).

And that’s what Russ Fellows doesn’t know. His entire article is based on a false premise: Thinking that the move to 64/66 makes 16 Gbit pass more than twice as much traffic as 8 Gbit. But it’s not. He says that with 8 Gbit FC, 1+1 = 1.6 (when compared to 16 Gbit FC), which is factually incorrect for the reasons I’ve just explained. Yes, 64/66 bit encoding is more efficient. But they dropped the baud rate, negating the efficiency gains

8 Gigabit Fibre Channel provides 800 Megabytes per second of data transfer. 16 Gigabit Fibre Channel (really 14 Gigabit Fibre Channel) provides 1600 Megabytes per second of data transfer. 800 + 800 = 1600.

Sorry Russ, 1+1 really does equal 2. Even in Fibre Channel.



Get every new post delivered to your Inbox.

Join 97 other followers