Did VMware vSphere 6.0 Remove the Layer 2 Adjacency Requirement For vMotion? No.

images

I’ve seen this misconception a few times on message boards, reddit, and even comments on this blog: That Layer 2 adjacency is no longer required with vSphere 6.0, as VMware now supports Layer 3 vMotion. The (mis)perception is that you no longer need to stretch a Layer 2 domain between ESXi hosts.

That is incorrect. VMware did remove a Layer 2 adjacency requirement for the vMotion Network, but not for the VMs. Lemme explain.

It used to be (before vSphere 6.0) that you were required to have the VMkernel interfaces that performed vMotion on the same subnet. You weren’t supposed to go through a default gateway (though I think you could, it just wasn’t supported). So not only did your VM networks need to be stretched between hosts, but so did your VMkernel interfaces that performed the vMotion sending/receiving.

What vSphere added was a separate TCP/IP stack for vMotion networks, so you could have a specific default gateway for vMotion, allowing your vMotion VMkernel interfaces to be on different subnets.

This does not remove the requirement that the same Layer 2 network exist on the sending and receiving ESXi host. The IP of the VM needs to be the same, so the VM network you vMotion to needs to have the same default gateway (for outbound packets) and inbound routing (for inbound packets).

Inside of a data center this adjacency is typically done by simply making the same VLAN available (natively or now through VXLAN) on all the ESXi hosts in the cluster.

If it’s between datacenter, things tend to get a more complicated. As in dumpster fire. Here’s a presentation I recently did on the topic, and Ivan Pepelnjak has far more high-brow explanations of why it’s a bad idea.

You’ll need solutions like LISP (for inbound), FHRP filtering (for outbound), OTV (for stretching the VLAN), and a whole host of other solutions to handle all the other problems long distance vMotion can introduce.

Screen Shot 2016-05-24 at 12.20.33 PM.png

Where is your God now?!?!?

So when you hear that vSphere 6 no longer requires Layer 2 adjacency between ESXi hosts, that’s only for the vmkernel interfaces, not the VM networks. So yes, Virginia, you still need Layer 2 adjacency for vMotion. Even in vSphere 6.0.

 

Long Distance vMotion Is A Dumpster Fire

In this screencast, I go on a rant about why long-distance vMotion is a dumpster fire. Seriously, don’t do it.

Fibre Channel in the Cloud: FCaaS

Public cloud providers such as Amazon Web Services, Microsoft Azure, and Rackspace, as well as private cloud systems such as OpenStack, have dominated the computing landscape for the past several years. And once a joke of a marketing term (remember Larry Ellison’s super villain-monologue on the topic?), the cloud is now A Thing, with a definition and everything.

One technology that seemed like it was getting left behind in all these cloud games, however, was Fibre Channel. Ephemeral compute nodes, object storage, extreme scale, elastic provisioning — all of these were characteristics that were initially thought to be bad fits for Fibre Channel.

giphy

Sad Fibre Channel is Sad

As it turns out, Fibre Channel is right at home in the cloud.

mrp6ibd

Amazon Web Services has recently rolled out Fibre Channel as a Service (FCaaS), as have Rackspace, Digital Ocean, and Microsoft Azure.

All of those public cloud providers have some sort of block storage offerings, but they’re typically based on something like iSCSI or another back-end block protocol. Customers have been demanding the kind of block storage in the public cloud, where they can control zoning and zonesets, just like they do in their traditional data centers worlds.

The problem with that historically is that AWS and the others haven’t been able to provide this to customers because of the limitations of Fibre Channel at scale. I’ll explain.

Fibre Channel uses FC_IDs, which are like IP addresses, to send Fibre Channel frames around a given SAN. Here’s an FC_ID: 0x510121.

It’s a 24-bit number, typically written in hexadecimal notation. The first octet (two digits) is known as the domain ID. This is given to the switch, so that means there’s a limit of about 240 or so switches in a given fabric (some domain IDs are reserved). Plus, the two vendors of Fibre Channel switches (Brocade and Cisco) limit domain IDs to a maximum of 50 or so, so no more than 50 or so switches for a given fabric.

For a private data center with a single tenant, this isn’t a problem as a 50 switch Fibre Channel fabric is huge. But for Amazon, 50 switches is miniscule.

So enter VXSAN. The SNIA introduced VXSAN recently under the T18 working group, which provides an extension of typical Fibre Channel frame formats. Like VXLAN, VXSAN adds a higher degree of segmentation.

Cisco has VSANs of course, and Brocade has Virtual Fabrics. Neither are compatible with each other, and neither provide the additional scale required to handle massive cloud scale. VXSAN fixes both of those. VXSAN will work on a traditional Fibre Channel SAN from either Brocade or Cisco, without modification through use of the Open Virtual Fibre Channel Switch.

Wait, what?

That’s right, part of any VXSAN implementation is the Open Virtual Fibre Channel Switch (kind of a mouthful, even with the acronym OVFCS).

Similar to how VXLAN operates an overlay network on a traditional IP network as an underlay, VXSAN operates as an overlay SAN on top of a traditional Fibre Channel SAN.

Instead of VTEPs, OVFC switches terminate the VXSAN segments into virtualization hosts and VXSAN aware storage arrays (both EMC and NetApp have them in their latest software revs) to terminate the VXSAN-applied LUN to the a given virtual machine.

vsan

A given virtualization host has two virtual Fibre Channel switches (A/B), each connected to their own Fibre Channel interface (A/B).

vfc

The virtual Fibre Channel switches rely on upstream NPIV to get their connectivity, so they can run alongside the hypervisor’s traditional SCSI subsystem. In the example below, both virtual Fibre Channel switches do FLOGIs, as does the hypervisor.

FLOGI

The virtual machines, however, to a vFLOGI into the VXSAN segment, not into the traditional switching infrastructure. The upstream physical switches have no idea a FLOGI happened from the VM.

vflogi

The VXSAN header, like VXLAN, has a 24-bit address space, providing 16 million segments, each with their own VXSAN fabric capable of having a full Fibre Channel fabric with up to 239 virtual Fibre Channel switches each. So while 239 Fibre Channel switches won’t work for Amazon, 3.8 billion will (16 million x 239).

You will have to enable Fibre Channel jumbo frames on your traditional Fibre Channel fabric, as the VXSAN header adds 62 bytes to the frame format.

VXSAN is designed to run on VXSAN-unaware switches, as it takes for new header formats to make it into silicon, but both Cisco and Brocade have said they plane to release VXSAN-aware switches by the end of the year.

VXSAN is built to be mulit-tenant, so customers from Amazon and others can do their own zoning. I got to play with a Beta of the FCaaS from AWS and I did just a quick configuration with a single VM and a virtual LUN.

First, you log into the A or B virtual Fibre Channel switches. There’s no password, you use the keys you’ve uploaded into Amazon.

Linux Foundation Open Virtual Fibre Channel Switch (Read the Apache 2.0 License for licensing details)
switch#
switch# config
switch#(config) zone Host1
switch(config-zone)# member pwwn 20:00:00:12:34:45:67:aa
switch(config-zone)# member pwwn 50:00:00:00:00:ab:cd:ef

I was able to push a zoneset and connected my instance to storage pretty quickly. All in all, it only took about 10 minutes to get it up and running.

OpenStack is prepating to include FCaaS and the Open Virtual Fibre Channel Switch in with the next release (Mikata) due out this month.

So check out FCaaS on Amazon, Azure, and the others. FCaaS should bring Fibre Channel into the cloud world.

Edit: Also, this is an April Fool’s joke. 5 years running.

Traditional versus Cloud Native Web Applications

Here’s a quick whiteboard session of the differences between traditional and cloud native web applications.

Differences in how Fibre Channel and Ethernet Measure Speed

That Moment When You Realized You “write erase” the Wrong Device…

ohshitdata

Nexus versus Catalyst

When someone asks me to explain Nexus versus Catalyst:

oldandbusted

Follow

Get every new post delivered to your Inbox.

Join 108 other followers