LACP is not Link Aggregation

So there’s a mistake I’ve been making, for years. I’ve referred to what is link aggregation as “LACP”.  As in “I’m setting up an LACP between two switches”. While you can certainly set up LACP between to switches, the more correct term for the technology is link aggregation (as defined by the IEEE), and an instance of that is generically called a LAG (Link Aggregation Group). LACP is an optional part of this technology.

Here I am explaining this and more in an 18 minute Youtube video.

EtherChannel and Port Channel

In the networking world, you’ve no doubt heard the terms EtherChannel, port channel, LAG, MLAG, etc. These of course refer to taking multiple Ethernet connections and treating them as a single link. But one of the more confusing aspects I’ve run into is what’s the difference, if any, between the term EtherChannel and port channel? Well, I’m here to break it down for you.

break-it-down

OK, not that kind of break-it-down

First, let’s talk about what is vendor-neutral and what is Cisco trademark. EtherChannel is a Cisco trademarked term (I’m not sure if port channel is), while the vendor neutral term is LAG (Link Aggregation). Colloquially, however, I’ve seen both Cisco terms used with non-Cisco gear. For instance: “Let’s setup an Etherchannel between the Arista switch and the Juniper switch”. It’s kind of like in the UK using the term “hoovering” when the vacuum cleaner says Dyson on the side.

So what’s the difference between EtherChannel and port channel? That’s a good question. I used to think that EtherChannel was the name of the technology, and port channel was a single instance of that technology. But in researching the terms, it’s a bit more complicated than that.

Both Etherchannel and port channel appear in early Cisco documentation, such as this CatOS configuration guide. (Remember configuring switches with the “set” command?) In that document, it seems that port channel was used as the name of the individual instance of Etherchannel, just as I had assumed.

imright

I love it when I’m right

And that seems to hold true in this fairly recent document on Catalyst IOS 15, where EtherChannel is the technology and port channel is the individual instance.

But wait… in this older CatOS configuration guide, it explicitly states:

This document uses the term “EtherChannel” to refer to GEC (Gigabit EtherChannel), FEC (Fast EtherChannel), port channel, channel, and port group.

So it’s a bit murkier than I thought. And that’s just the IOS world. In the Nexus world, EtherChannel as a term seems to be falling out of favor.

Take a look at this Nexus 5000 CLI configuration guide for NXOS 4.0, and you see they use the term EtherChannel. By NX-OS 5.2, the term seems to have changed to just port channel. In the great book NX-OS and Cisco Nexus Switching, port-channel is used as the term almost exclusively. EtherChannel is mentioned once that I can see.

So in the IOS world, it seems that EtherChannel is the technology, and port channel is the interface. In the Nexus world, port channel is used as the term for the technology and the individual interface, though sometimes EtherChannel is referenced.

It’s likely that port channel is preferred in the Nexus world because NX-OS is an offspring of SANOS, which Cisco initially developed for the MDS line of Fibre Channel switches. Bundling Fibre Channels ports on Cisco switches isn’t called EtherChannels, since those interfaces aren’t, well, Ethernet. The Fibre Channel bundling technology is instead called a SAN port channel. The command on a Nexus switch to look at a port cchannel is “show port-channel”, while on IOS switches its “show etherchannel”.

When a dual-homed technology was developed on the Nexus platform, it was called vPC (Virtual Port Channel) instead of VEC (Virtual EtherChannel).

Style Guide

Another interesting aspect to this discussion is that EtherChannel is capitalized as a proper noun, while port channel is not. In the IOS world, it’s EtherChannel, though when its even mentioned in the Nexus world, it’s sometimes Etherchannel, without the capital “C”. Port channel is written often as port channel or port-channel (the later is used almost exclusively in the NX-OS book).

So where does that leave the discussion? Well, I think in very general terms, if you’re talking about Cisco technology, Etherchannel, EtherChannel, port channel, port channel, and LAG are all acceptable term for the same concept. When discussing IOS, it’s probably more correct to use the term Etherchannel. When discussing NX-OS, port channel. But again, either way would work.

Jumbo Fibre Channel Frames

In the world of Ethernet, jumbo frames (technically any Ethernet frame larger than 1,500 bytes) is often a recommendation for certain workloads, such as iSCSI, vMotion, backups, basically anything that doesn’t communicate with the Internet because of MTU issues. And in fact, MTU issues is one of the biggest hurdles with Ethernet jumbo frames, since every device on a given LAN must have the correct MTU size set if you want them to successfully communicate with each other.

What’s less known is that you can also enable jumbo frames for Fibre Channel as well, and that doing so can have dramatic benefits with certain workloads. Best of all, since SANs are typically smaller in scope, ensuring every device has jumbo frames enabled is much easier.

Fibre Channel Frames

Fibre Channel frames typically have a maximum payload size of 2112, and with with headers makes the MTU 2148 bytes. However you can increase the payload up to 9000 bytes, and with headers that 9036 bytes. You can play with the payload size if you like, but just to make it easier I either do regular MTU (2148 bytes) or I do the max for most switches (9036). My perfunctory tests don’t seem to indicate much benefit with anything in between.

Supported

Of course, not all Fibre Channel devices can handle jumbo frames. It’s part of the T10 standard though since 2005, so most modern devices (anything capable of 4 Gbit or more, for the most part). This includes switches from Cisco and Brocade, as well as HBAs from Qlogic and Emulex.

Configuration

The way you configure jumbo frames of course varies, and of course I don’t have lots of different types of equipment, so I’m just going to demonstrate on an MDS that I have access to.

Note: Do not do anything in a production environment until you’ve tested it in a dev environment. If you do otherwise, you’re an idiot.

If you go into an interface configuration, you’ll probably notice there’s no MTU option. That’s because we’re dealing with a fabric here, and the MTU is set per VSAN, not per port. Multi-VSAN ISLs (TE_Ports) will do multiple MTUs without any problem. Keep in mind, changing MTUs on an MDS/Nexus requires a reboot (I think mostly because the N_Ports need to do a new FLOGI). I’m not sure about Brocade, however.

Switch#1# config t 
Enter configuration commands, one per line.  End with CNTL/Z. 
Switch#1(config)# mtu size vsan 10 9036
VSAN 10 MTU changed. Reboot required.
Switch#1(config)# mtu size vsan 20 9036
VSAN 20 MTU changed. Reboot required.
Switch#1(config)# exit 
Switch#1# reload

Right now the maximum MTU size per the standard is 9036 bytes, though it depends on the vendor. Brocade, with it’s Gen 6 FC (32 Gbit) is proposing a maximum MTU size of around 15,000 bytes, though it’s not set in stone yet. It’ll be interesting to see how this all plays out, that’s for sure.

Testing

How well does it work? I haven’t tested it for VDI or otherwise high-IOPs environments, but my suspicion is that it’s not going to help. Given the longer serialization, it would actually probably hurt performance. For other workloads, such as backups, they can make a dramatic difference. As a test, I did a couple of vSphere Storage vMotions, and I was surprised to see how fast it went.

Hosts were B200 M2 blades, with 96 GByte of RAM running EXI 5.1.  Fibre Channel cards were Cisco M81KR, configured for 9036 byte frames (9,100 bytes on the FCoE interface since VICs are CNAs, so had to add several dozen bytes for FCoE headers/VNTAG, etc.). The FC switch was actually a Cisco 6248 Fabric Interconnect set for Fibre Channel switching mode. All tests were done on the A fabric. I initiated Storage vMotions on a 2 TB VMDK file. The VMDK wasn’t receiving an active workload, so it was mostly idle. I ran the test 20 times for each size, and averaged the results.

svmotion

It’s just a quick and dirty test on a barely-active VMDK, but you can see that with 9000 Byte FC frames, it’s almost twice as fast.

Is The OS Relevant Anymore?

I started out my career as a condescending Unix administrator, and while I’m not a Unix administrator anymore, I’m still quite condescending. In the past, I’ve run data centers based on Linux, FreeBSD, Solaris, as well as administered Windows boxes, OpenBSD and NetBSD, and even NeXTSTEP (best desktop in the 90s).

Dilbert.com

In my role as a network administrator (and network instructor), this experience has become invaluable. Why? One reason is that most networking devices these days have an open sourced based operating system as the underlying OS.

And recently, I got into a discussion on Twitter (OK, kind of a twitter fight, but it’s all good with the other party) about the underlying operating systems for these network devices, and their relevance. My position? The underlying OS is mostly irrelevant.

First of all, the term OS can mean a great many things. In the context of this post, when I talk about OS I’m referring to only the underlying OS. That’s the kernel, libraries, command line, drivers, networking stack, and file system. I’m not referring to the GUI stack (GNOME, KDE, or Unity for the Unixes, Mac OS X’s GUI stack, Win32 for Window) or other types of stack such as a web application stack like LAMP (Linux, Apache, MySQL, and PHP).

Most routers and MLS (multi-layer switches, swtiches that can route as fast as they can switch) run an open source operating system as its control plane. The biggest exception is of course Cisco’s IOS, which is proprietary as hell. But IOS has reached its limits, and Cisco’s NX-OS, which runs on Cisco’s next-gen Nexus switches, is based on Linux. Arista famously runs Linux (Fedora Core) and doesn’t hide it from the users (which allows it to do some really cool things). Juniper’s Junos is based on FreeBSD.

In almost every case of router and multi-layer switch however, the operating system doesn’t forward any packets. That is all handled in specialized silicon. The operating system is only responsible for the control plane, running processes like an OSPF, spanning-tree, BGP, and other services to decide on a set of rules for forwarding incoming packets and frames. These rules, sometimes called a FIB (Forwarding Information Base), are programmed into the hardware forwarding engines (such as the much-used Broadcom Trident chipset). These forwarding engines do the actual switching/routing. Packets don’t hit the general x86 CPU, they’re all handled in the hardware. The control plane (running as various coordinated processes on top of a one of these open source operating systems) tells the hardware how to handle packets.

So the only thing the operating system does (other than the occasional punted packet) is tell the hardware how to handle traffic the general CPU will never see. This is the way it has to be, because x86 hardware can’t scale nearly as well as special purpose silicon can, especially considering power and cooling consumption. Latency is way lower as well.

In fact, hardware wise, most vendors (Juniper, Arista, Huawei, Alcatel-Lucent ,etc.) have been using the exact same chip in their latest switches. So the differentiation isn’t the silicon. Is the differentiation the underlying operating system? No, it makes little difference for the end user. They are instead a (mostly) invisible platform for which the services (CLI, APIs, routing protocols, SDN hooks, etc.) are built upon. Networking vendors are in the middle of a transition into software developers (and motherboard gluers).

All you need to create a 10 Gigabit Switch

The biggest holdout in networking devices and non-open source is of course, Cisco’s IOS, which is proprietary as hell. Still, the future for Cisco appears to be NX-OS running on all of the Nexus switches, and that’s based on Linux.

Let’s also take a look at networking devices where the underlying OS may actually touch the data plane, and a genre in which I’m very much acquatned with: Load balancers (and no, I’m not calling them Application Delivery Controllers).

F5’s venerable BIG-IPs used to be based on BSDI initially (a years-dead BSD), and then switched to Linux. CoyotePoint was based on FreeBSD, and is now based on NetBSD. Cisco’s ACE is based on Linux (although Cisco’s shitty CSS runs proprietary vxWorks, but it’s not shitty because of vxWorks). Most of the other vendors are based on Linux. However, the baseline operating system makes very little difference these days.

Most load balancers have SSL offload (to push the CPU-intensive asymmetric encryption onto a specialized processor). This is especially important as we move to 2048-bit SSL certificates. Some load balancers have Layer 2/3/4 silicon (either ASICs or FPGAs, which are flexible ASICs) to help out with forwarding traffic, and hit general CPUs (usually x86) for the Layer 7 parsing. So does the operating system touch the traffic going through a load balancer? Usually, not always, and well, it depends.

So with Cisco on Linux and Juniper with FreeBSD, would either company benefit from switching to a different OS? Does either company enjoy a competitive advantage by having chose their respective platform? No. In fact, switching platforms would likely be a colossal waist of time and resources. The underlying operating systems just provide some common services to run the networking services that program the line cards and silicon.

When I brought up Arista and their Fedora Core-based control plane which they open up to customers, here’s what someone (a BSD fan) described Fedora as: “Inconsistent and convoluted”, “building/testing/development as painful”, and “hasn’t a stable file system after 10 years”.

Reading that statement, you’d think that dealing with Fedora is a nightmare. That’s not remotely true. Some of that statement is exaggeration (and you could find specific examples to support that statement for any operating system) and some of it is fantasy. No stable file system? Linux has had several file systems, including ext2, ext3, ext4, XFS, and more for a while, and they’ve been solid.

In a general sense, I think the operating system is less relevant than it used to be. Take OpenBSD for example. It’s well deserved reputation for security is legendary. Still, would there be any advantage today to running your web application stack on OpenBSD? Would your site be any more secure? Probably not. Not because OpenBSD is any less secure today than it was a while ago, quite the opposite. It’s because the attack vectors have changed. The attacks are hitting the web stack and other pieces rather than the underlying operating system. Local exploits aren’t that big of deal because few systems let anyone but a few users log in anyway. The biggest attacks lately have come from either SQL injection or attacks on desktop operating systems (mostly Windows, but now recently Apple as well).

If you’re going to expose a server directly to the Internet on a DMZ or (gasp) without any firewall at all, OpenBSD is an attractive choice. But that doesn’t happen much anymore. Servers are typically protected by layers of firealls, IPS/IDS, and load balancers.

Would Android be more successful or less successful if Google switched from Linux as the underpinnings to one of the BSDs? Would it be more secure if they switched to OpenBSD? No, and it would it be an entirely wasted effort. It’s not likely any of the security benefits of OpenBSD would translate into the Dalvik stack that is the heart of Android.

As much as fanboys/girls don’t want to admit it, it’s likely the number one reason people choose an OS is familiarity. I tend to go with Linux (although I have FreeBSD and OpenBSD-based VMs running in my infrastructure) because I’m more familiar with it. For my day to day uses, Linux or FreeBSD would both work. There’s not a competitive advantage either have over each other in that regard. Linux outright wins in some cases, such as virtualization (BSDs have been very behind in that technology, though they run fine as guests), but for most stuff it doesn’t matter. I use FreeNAS, which is FreeBSD based, but I don’t care what it runs. I’d use FreeNAS if it were based on Linux, OpenBSD, or whatever.  (Because it’s based on FreeBSD, FreeNAS does run ZFS, which for some uses is better than any of the Linux file systems, although I don’t run FreeNAS’s ZFS since it’s missing encryption).

So fanboy/girlism aside, for the most part today, choice of an operating system isn’t the huge deal it may once have been. People succeed with using Linux, FreeBSD, OpenBSD, NetBSD, Windows, and more as the basis for their platforms (web stack, mobile stack, network device OS, etc.).