← That Moment When You Realized You “write erase” the Wrong Device…

Link Aggregation Confusion

September 2, 2013 9 Comments

In a previous article, I discussed the somewhat pedantic question: “What’s the difference between EtherChannel and port channel?” The answer, as it turns out, is none. EtherChannel is mostly an IOS term, and port channel is mostly an NXOS term. But either is correct.

But I did get one thing wrong. I was using the term LAG incorrectly. I had assumed it was short for Link Aggregation (the umbrella term of most of this). But in fact, LAG is short for Link Aggregation Group, which is a particular instance of link aggregation, not the umbrella term. So wait, what do we call the technology that links links together?

LAG? Link Aggregation? No wait, LACP. It’s gotta be LACP.

In case you haven’t noticed, the terminology for one of the most critical technologies in networking (especially the data center) is still quite murky.

Before you answer that, let’s throw in some more terms, like LACP, MLAG, MC-LAG, VLAG, 802.3ad, 802.1AX, link bonding, and more.

The term “link aggregation” can mean a number of things. Certainly EtherChannel and port channels are are form of link aggregation. 802.3ad and 802.1AX count as well. Wait, what’s 802.1AX?

802.3ad versus 802.1AX

What is 802.3ad? It’s the old IEEE working group for what is now known as 802.1AX. The standard that we often refer to colloquially as port channel, EtherChannels, and link aggregation was moved from the 802.3 working group to the 802.1 working group sometime in 2008. However, it is sometimes still referred to as 802.3ad. Or LAG. Or link aggregation. Or link group things. Whatever.

What about LACP? LACP is part of the 802.1AX standard, but it is neither the entirety of the 802.1AX standard, nor is it required in order to stand up a LAG. LACP is also not link aggregation. It is a protocol to build LAGs automatically, versus static. You can usually build an 802.1AX LAG without using LACP. Many devices support static and dynamic LAGs. VMware ESXi 5.0 only supported static LAGs, while ESXi 5.1 introduced LACP as a method as well.

Some devices only support dynamic LAGs, while some only support static. For example, Cisco UCS fabric interconnects require LACP in order to setup a LAG (the alternative is to use pinning, which is another type of link aggregation, but not 802.1AX). The discontinued Cisco ACE 4710 doesn’t support LACP at all, instead only static LAGs are supported.

One way to think of LACP is that it is a control-plane protocol, while 802.1AX is a data-plane standard.

Is Cisco’s EtherChannel/port channel proprietary?

As far as I can tell, no, they’re not. There’s no (functional at least) difference between 802.3ad/802.1ax and what Cisco calls EtherChannel/port channel, and you can set up LAGs between Cisco and non-Cisco without any issue. PAgP (Port Aggregation Protocol), the precursor to LACP, was proprietary, but Cisco has mostly moved to LACP for its devices. Cisco Nexus kit won’t even support PAgP.

Even in LACP, there’s no method for negotiating the load distribution method. Each side picks which method it wants to do. In fact, you don’t have to have the same load distribution method configured on both ends of a LAG (though it’s usually a good idea).

There is are also types of link aggregation that aren’t part of the 802.1AX or any other standard. I group these types of link aggregation into two types: Pinning, and fake link aggregation. Or FLAG (Fake Link Aggregation).

First, lets talk about pinning. In Ethernet, we have the rule that there can’t be more than one way to get anywhere. Ethernet can’t handle multi-pathing, which is why we have spanning-tree and other tricks to prevent there from being more than one logical way for an Ethernet frame to get from one source MAC to a given destination MAC. Pinning is a clever way to get around this.

The most common place we tend to see pinning is in VMware. Most ESXi hosts have multiple connections to a switch. But it doesn’t have to be the same switch. And look at that, we can have multiple paths. And no spanning-tree protocol. So how do we not melt down the network?

The answer is pinning. VMware refers to this as load balancing by virtual port ID. Each VM’s vNIC has a virtual port ID, and that ID is pinning to one and only one of the external physical NICs (pNICs). To utilize all your links, you need at least as many virtual ports as you do physical ports. And load distributation can be an issue. But generally, this pinning works great. Cisco UCS also uses pinning for both Ethernet and Fibre Channel, when 802.1AX-style link aggregation isn’t used.

It works great, and a fantastic way to get active/active links without running into spanning-tree issues and doesn’t require 802.1AX.

Then there’s… a type of link aggregation that scares me. This is FLAG.

Some operating systems such as FreeBSD and Linux support a weird kind of link aggregation where packets are sent out various active links, but only received on one link. It requires no special configuration on a switch, but the server is oddly blasting out packets on various switch ports. Transmit is active/active, but receive is active/standby.

What’s the point? I’d prefer active/standby in a more sane configuration. I think it would make troubleshooting much easier that way.

There’s not much need for this type of fake link aggregation anymore. Most managed switches support 802.1AX, and end hosts either support the aforementioned pinning or they support 802.1AX well (LACP or static). So there are easier ways to do it.

So as you can see, link aggregation is a pretty broad term, too broad to encompass only what would be under the umbrella of 802.1AX, as it also includes pinning and Fake Link Aggregation. LAG isn’t a good term either, since it refers to a specific instance, and isn’t suited as the catch-all term for the methodology of inverse-multiplexing. 802.1AX is probably the best term, but it’s not widely known, and it also includes the optional LACP control plane protocol. Perhaps we need a new term. But if you’ve found the terms confusing, you’re not alone.

Filed under data center, Ethernet, GIFmadness, Virtualization, VMware

9 Responses to Link Aggregation Confusion

Zack says:

September 4, 2013 at 10:57 am

I’ve frequently also run into server people who refer to LAG/bonding as “trunking”… Usually “LACP” or “link aggregation” is less likely to cause confusion when communicating cross discipline.

Reply
IJdoD says:

September 5, 2013 at 6:16 am

The ‘FLAG’ has a benefit over the real LAG options: it doesn’t require the switchports to be part of the same switch or (virtual or physical) chassis. This means you can have the downstream capacity of both server NICs to use, even if both ports are connected to different switches (default redundancy scenario).

One obvious benefit is more efficient use of resources If most of your traffic is downsteam, as this allows both NICs to be used.

Reply
- tonybourke says:
  
  September 5, 2013 at 2:35 pm
  
  For pinning (which I don’t include in “FLAG”) I totally agree. In a virtualization environment, multiple active uplinks can be plugged into various switches without any special configuration. This works because there is usually a lot of MACs from various VMs. If it’s not a virtualization host, pinning isn’t really an option.
  
  For non-virtualization hosts, the option is some type of LAG or using the FLAG, where packets are sent out the various uplinks, but the traffic is only received on one of the link. Receiving traffic is then active/standby since any IPs need to be bound to a MAC via ARP, and the switch (without special configuration) can have a MAC only on one port.
  
  Active/active upstream and active/standby downstream doesn’t sit well with me for troubleshooting, NetFlow, security, etc. I’d rather just make it active/standby, period. Or set up a (real) LAG.
  
  Reply
  - IJdoD says:
    
    September 5, 2013 at 3:04 pm
    
    I also prefer not to use them, for much of the same reasons. Just wanted to point out the practical side. FLAGs don’t rank high on my list of technologies (and vendors) I’d like to kill with fire :D.
Pingback: Configuring EtherChannel (PAgb) « Cisco Skills
Pingback: How To Link Aggregation | Adagemma
wisoni says:

July 10, 2018 at 9:39 pm

Thank you, really helpful. I’ve read so many articles and posts, your illustration is so clear and logical. I see most people just mention LACP as a dynamic protocol whereas LAG a static one. Now I understand that LACP is an optional protocol for building LAG. You can choose to build a static LAG without LACP. Or you can choose to set up a dynamic LAG while using LACP.

Reply
Robert says:

October 22, 2019 at 5:06 am

why do you pollute your blog entry with gratuitious, utterly worthless animated GIFs that make it more difficult to read. Do you think childen are going to be more interested in looking at this if you include that junk?

Reply
- tonybourke says:
  
  October 25, 2019 at 10:32 am
  
  I understand your concerns, and I apologize. However, if I might add a suggestion: Have you tried fucking off?
  
  Reply