VXLAN: Millions or Billions?

I was putting slides together for my upcoming talk and there is some confusion about VXLAN in particular, how many VLANs it provides.

The VXLAN header provides a 24-bit address space called the VNI (VXLAN Network Identifier) to separate out tenant segments, which is 16 million. And that’s the number I see quoted with regards to VXLAN (and NVGRE, which also has a 24-bit identifier). However, take a look at the entire VXLAN packet (from the VXLAN standard… whoa, how did I get so sleepy?):

vxlan

Tony’s amazing Technicolor Packet Nightmare

The full 802.1Q Ethernet frame can be encapsulated, providing the full 12-bit 4096 VLANs per VXLAN. 16 million multiplied by 4096 is about 68 billion (with a “B”). However, most material discussing VXLAN refers to 16 million.

So is it 16 million or 68 billion?

billionsandbillionsofvlans

The answer is: Yes?

So according to the standard, each VXLAN segment is capable of carrying an 802.1Q encoded Ethernet frame, so each VXLAN can have a total of 4096(ish) VLANs. The question is whether or not this is actually feasible. Can we run multiple VLANs over VXLAN? Or is each VXLAN only going to realistically carry a single (presumably non-tagged) VLAN.

I think much of this depends on how smart the VTEP is. The VTEP is the termination point, the encap/decap point for the VXLANs. Regular frames enter a VTEP and get encapsulated and sent over the VXLAN overlays (regular Layer 3 fabric) to another VTEPthe terminating endpoint and decap’d.

The trick is the MAC learning process of the VTEPs. Each VTEP is responsible for learning the local MAC addresses as well as the destination MAC addresses, just like a traditional switch’s CAM table. Otherwise, each VTEP would act kind of like a hub, and send every single unicast frame to every other VTEP associated with that VXLAN.

What I’m wondering is, do VTEPs keep separate MAC tables per VLAN?

I’m thinking it must create a multi-VLAN table, because what happens if we have the same MAC address in two different VLANs? A rare occurrence  to be sure, but I don’t think it violates any standards (could be wrong on that). If it only keeps a single MAC table for all VLANs, then we really can’t run multiple VLANs per VXLAN. But I imagine it has to keep multiple tables per VLAN. Or at least, it should.

I can’t imagine there would be any situation where different tenants would get VLANs in the same VXLAN/VNI, so there are still 16 million multi-tenant segments, so it’s not exactly 68 billion VLANs.  But each tenant might be able to have multiple VLANs.

Having tenants capable of having multiple VLAN segments may prove to be useful, though I doubt any tenant would have more than a handful of VLANs (perhaps DMZ, internal, etc.). I haven’t played enough with VXLAN software yet to figure this one out, and discussions on Twitter (many thanks to @dkalintsev for great discussions) while educational haven’t seemed to solidify the answer.

10 Responses to VXLAN: Millions or Billions?

  1. You know.. I was just thinking about how we might be able to ID the tenant by embedding a tenant ID right in the packet. This inception of VLANs makes me wonder, what if the VXLAN ID was the tenant ID and virtual networks were VLANs embedded in the VXLAN?

    It’s probably a terribly idea. But if we could wrap counters around the VXLAN ID (dropped packets) then we could see what tenants are impacted by dropped packets? If we divide the VXLAN space into two fields.. like a tenant ID, then a “class” of virtual networks.. we could make differentiating policies in the network based on the class of the virtual network…

    Anyway, that’s my completely not-thought-out reply.

    • > what if the VXLAN ID was the tenant ID

      You can run VXLANs over 802.1Q (or QinQ, or MAC-in-MAC), giving you your tenant separation – VXLAN “domain” per tenant; each with 16M VNIs.

      > counters around the VXLAN ID (dropped packets)

      Since VXLAN rides on top of UDP, I can’t quite imagine how you would figure out how many packets you’ve dropped (unless you’re talking about the local ingress/egress queue drops). That can work with STT, though.

  2. Hi Tony,

    > do VTEPs keep separate MAC tables per VLAN?

    My take on this is “no, it doesn’t”. Here’s one of the “why”s that makes me think so:

    http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-03#page-18 reads:

    “6.1. Inner VLAN Tag Handling

    Inner VLAN Tag Handling in VTEP and VXLAN Gateway should conform to the following:

    Decapsulated VXLAN frames with the inner VLAN tag SHOULD be discarded unless configured otherwise. On the encapsulation side, a VTEP SHOULD NOT include an inner VLAN tag on tunnel packets unless configured otherwise. When a VLAN-tagged packet is a candidate for VXLAN tunneling, the encapsulating VTEP SHOULD strip the VLAN tag unless configured otherwise.”

    Essentially, it reads to me like “the inner VLAN tags are undesirable”.

    The second reason is because I think individual VLANs are widely expected to have independent lists of endpoints, which most definitely isn’t the case with VXLAN – it has no mechanisms to control the BUM flooding bound to the inner VLAN tags.

    I suspect that in the “VXLAN think” there is no longer a concept of traditional VLAN; there’s only VNIs, period. And the VNIs are the ones that get their individual MAC tables.

    • tonybourke says:

      Thanks for posting that. I’m inclined to agree that perhaps inner 802.1Q is undesirable. However, since it is in the standard (802.1Q header) It seems though they did leave some room. I wonder if the current implementations of VXLAN (such as the 1000v) could handle it. And if they do, how they’d handle multiple tables.

      As for BUM traffic, if it’s just a single tenant with a few VLANs, the traffic would likely go to each endpoint VTEP anyway, since each endpoint would have the same VLANs attached. Though I can’t think of any reason why forwarding mechanisms could be put into place to handle multiple tables (especially if multicast is disfavored as it likely will be) and prune VLANs on certain VTEPs.

      Though it seems reasonable that handling multiple VLANs in a VNI won’t be in the first versions of the various implementations (or perhaps future ones).

    • “The second reason is because I think individual VLANs are widely expected to have independent lists of endpoints, which most definitely isn’t the case with VXLAN – it has no mechanisms to control the BUM flooding bound to the inner VLAN tags.”

      VLANs (and Ethernet switching before that) had to mimic the technology it was replacing – hubs – and flooding is simply mimicing how a hub worked.

      The VXLAN spec says nothing about how to handle flooding and its one of those areas that I’d forseeable a variety of optimizations to take place.

  3. Eelco Nieuwstad says:

    I’m thinking it must create a multi-VLAN table, because what happens if we have the same MAC address in two different VLANs? A rare occurrence to be sure

    this is a very common occurence with dot1q layer 3 subinterfaces. Each subinterface will inherit the mac address of the main interface. I also have seen it with NetApp filers which have a subinterface in each vlan where they have NFS or iSCSI clients. The same mac address appears in all vlans.

    • “because what happens if we have the same MAC address in two different VLANs? A rare occurrence to be sure.”
      Not as rare as you think. Just about every L3 switch in existence conserves mac-address space by reusing the same mac-address across every interface it has for the L3 ‘gateway’ mac entry.

  4. Carl Sagan was awesome.

  5. “do VTEPs keep separate MAC tables per VLAN?” Yes, absolutely. It must.

  6. Tony, did you have any takeaways after talking to Kulin from Arista about this?

Leave a reply to Dmitri Kalintsev (@dkalintsev) Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.