February 9, 2015 3 Comments
Where servers, storage and networking combine to form Voltron.
February 9, 2015 3 Comments
September 3, 2014 Leave a comment
There’s been some misconceptions and misinformation lately about FCoE. Like any technology, there are times when it makes sense and times when it doesn’t, but much of the anti-FCoE talk lately has been primarily ignorance and/or wilful misrepresentation.
In an effort to fight that ignorance, I put together a quick introduction to how FC and FCoE works. They both operate on the basic premise that you can’t drop any frames. Fibre Channel was built as a lossless protocol, and with a bit of work, Ethernet can also be lossless.
Check it out:
August 27, 2014 Leave a comment
So how’s this for a condescending tweet?
— EGI Russ (@russtystorage) August 27, 2014
Interesting article (check it out). But the sad/amusing irony is that he’s wrong. How is he wrong? Here’s what Russ Fellows doesn’t know about storage:
1, 2, 4, and 8 Gbit Fibre Channel (as he points out) uses 8/10 bit encoding. That means about a 20% of the bandwidth available was lost due to encoding overhead (as Russ pointed out). That’s why 8 Gbit Fibre Channel only provides 800 MB/s of connectivity, even though 8,000 Megabits per second equates to 1,000 Megabytes per second (8000 Megabits / (8 bits per byte) = 1,000 Megabytes).
With this overhead in mind, Fibre Channel was designed to give 100 MB/s for every Gigabit of speed. It never increased the baud rate to make up for the overhead.
Ethernet, on the other hand, did increase the baud rate to make up for the overhead. Gigabit Ethernet uses the same 8/10 bit encoding, but they kicked the baud rate up to 1.25 gigabaud to make up the differences. As such, Gigabit Ethernet provides true 1 gigabit of throughput, or 125 Megabytes per second.
10 Gigabit Ethernet moved to 64/66 encoding, and kept to the approach of not letting the overhead impact throughput. 10 Gigabit Ethernet then provides 1250 Megabytes per second of throughput. The baud rate is 10.3125, giving true 10 Gigabit per second of data.
When Fibre Channel moved to the more efficient 64/66 bit encoding, rather than change the 100 MB/s per gigabit to 125 MB/s (which you get with all Ethernet speeds), they left the ratio (1 Gigabit to 100 MB/s) the same. Thus, every Gigabit = 100 MB/s, just like in previous speeds (1/2/4/8 FC). So while 16 Gbit Fibre Channel provides 1600 MB/s of throughput, the baud rate is actually only 14 gigabaud, and not true 16 Gbit. And don’t take my word for it, check out page 7 of Scott Shimomura‘s (of Brocade) presentation at the SPDE conference.
10 Gigabit Ethernet provides 1250 MB/s, providing true 10 Gigabit Ethernet, and not putting the slight overhead into the equation. So while 10 Gigabit Ethernet is true 10 Gigabit, 16 Gigabit Fibre Channel is actually 14 Gigabit Fibre Channel (14.025, to be exact).
And that’s what Russ Fellows doesn’t know. His entire article is based on a false premise: Thinking that the move to 64/66 makes 16 Gbit pass more than twice as much traffic as 8 Gbit. But it’s not. He says that with 8 Gbit FC, 1+1 = 1.6 (when compared to 16 Gbit FC), which is factually incorrect for the reasons I’ve just explained. Yes, 64/66 bit encoding is more efficient. But they dropped the baud rate, negating the efficiency gains.
8 Gigabit Fibre Channel provides 800 Megabytes per second of data transfer. 16 Gigabit Fibre Channel (really 14 Gigabit Fibre Channel) provides 1600 Megabytes per second of data transfer. 800 + 800 = 1600.
Sorry Russ, 1+1 really does equal 2. Even in Fibre Channel.
July 14, 2014 Leave a comment
While prepping for CCIE Data Center and playing around with a lab environment, I ran into a problem I’d like to share.
I was setting up a basic OTV setup with three VDCs running OTV, connecting to a core VDC running the multicast core (which is a lot easier than it sounds). I’m running it in a lab environment we have at Firefly, but I’m not going by our normal lab guide, instead making it up as I go along in order to save some time, and make sure I can stand up OTV without a lab guide.
Each VDC will set up an adjacency with the other two, with the core VDC providing unicast and multicast connectivity. That part was pretty easy to setup (even the multicast part, which had previously freaked me the shit out). Each VDC would be its own site, so no redundant AEDs.
On each OTV VDC, I setup the following as per my pre-OTV checklist:
So I got all that configured and then configured the OTV setup. Very basic:
feature otv otv site-vlan 10 interface Overlay1 otv join-interface Ethernet1/2 otv control-group 126.96.36.199 otv data-group 188.8.131.52/28 otv extend-vlan 100 no shutdown otv site-identifier 0000.0000.0002 ip pim rp-address 10.11.200.1 group-list 184.108.40.206/4 ip pim ssm range 220.127.116.11/8
The only difference between the three OTV VDC configurations was the site-identifier and the join interface. Everything else was identical, pretty easy configuration. But… it didn’t work. Shit. Time for some show commands:
N7K-11-vdc-2# show otv adjacency Overlay Adjacency database
Overlay-Interface Overlay1 : Hostname System-ID Dest Addr Up Time State VDC-3 18ef.63e9.5d43 10.11.3.2 01:36:52 UP vdc-4 18ef.63e9.5d44 10.11.101.2 01:41:57 UP vdc-2#
OK, so the adjacencies are built. I’ve at least got IP4 unicast and multicast going on. How about “show otv”?
N7K-11-vdc-2# show otv OTV Overlay Information Site Identifier 0000.0000.0002 Overlay interface Overlay1 VPN name : Overlay1 VPN state : UP Extended vlans : 100 (Total:1) Control group : 18.104.22.168 Data group range(s) : 22.214.171.124/28 Join interface(s) : Eth1/2 (10.11.2.2) Site vlan : 11 (up) AED-Capable : No (Site-ID mismatch) Capability : Multicast-Reachable N7K-11-vdc-2#
Site-ID mismatch? What the shit? They’re supposed to mismatch. I try another command:
N7K-11-vdc-2# show otv site Dual Adjacency State Description Full - Both site and overlay adjacency up Partial - Either site/overlay adjacency down Down - Both adjacencies are down (Neighbor is down/unreachable) (!) - Site-ID mismatch detected Local Edge Device Information: Hostname vdc-2 System-ID 18ef.63e9.5d42 Site-Identifier 0000.0000.0002 Site-VLAN 11 State is Up Site Information for Overlay1: Local device is not AED-Capable (Site-ID mismatch) Neighbor Edge Devices in Site: 1 Hostname System-ID Adjacency- Adjacency- AED- State Uptime Capable -------------------------------------------------------------------------------- VDC-3 18ef.63e9.5d43 Partial (!) 00:17:39 Yes
Now this show command confused me for a while. I was trying to figure out the Site-ID mismatch. I was also wondering why I could see VDC-3 but couldn’t see VDC-4. Then it dawned on me (after am embarrassing amount of time) I’m not supposed to. I’m not supposed to see VDC-3, either. The “show site” command is only looking at the local area. For my configuration, I shouldn’t see any other VDCs with “show otv site”.
This means that there’s some type of Layer 2 connectivity between the different sites. VDC-3 and VDC-4 both somehow see each other as Layer 2 adjacent. That shouldn’t happen if they’re supposedly on remote sites. This is a lab environment, so there’s some sort of Layer 2 connectivity for the Site-VLAN that I need to kill.
OTV edge devices are like highlanders, if there’s Layer 2 adjacency, they sense each other.
“I could sense you by your VLAN”
It probably happened on the interface that I assigned the site-VLAN to as an access port. A VLAN will not show “active” unless you have an active physical link (interface VLANs don’t count).
So I went through and re-configured the site VLAN. Instead of VLAN 10 (which was probably active on the other ends of those interfaces somehow) I created new VLANs, and used a unique VLAN for each VDC. The site-VLANs do not need to be identical between sites. I put the VLAN on a physical link that was up, and voila.
In the real world, you probably won’t run into this. However, it’s possible if there are other Layer 2 interconnects going on in your data center (perhaps dark fiber) or you’re transitioning from one DCI to another, you may hit this.
April 25, 2014 Leave a comment
It’s been a while since the trainwreck of a “study” commissioned by Brocade and performed by The Evaluator Group, but it’s still being discussed in various storage circles (and that’s not good news for Brocade). Some pretty much parroted the results, seemingly without reading the actual test. Then got all pissy when confronted about it. I did a piece on my interpretations of the results, as did Dave Alexander of WWT and J Metz of Cisco. Our mutual conclusion can be best summed up with a single animated GIF.
But since a bit of time has passed, I’ve had time to absorb Dave and J’s opinions, as well as others, I’ve come up with a list of the Top 5 Reasons by The Evaluator Group Screwed Up. This isn’t the complete list, of course, but some of the more glaring problems. Let’s start with #1:
Reason #1: I Have No Idea What I’m Doing
Their hilariously bad conclusion to the higher variance in response times and higher CPU usage was that it was the cause of the software initiators. Except, they didn’t use software initiators. The had actually configured hardware initiators, and didn’t know it. Let that sink in: They’re charged with performing an evaluation, without knowing what they’re doing.
The Cisco UCS VIC 1240 hardware CNA’s were utilized. Referring to them as software initiators caused some confusion. The Cisco VIC is a hardware initiator and we configured them with virtual HBAs. Evaluator Group has no knowledge of the internal architecture of the VIC or its driver. Our commentary of the possible cause for higher CPU utilization is our opinion and further analysis would be required to pinpoint the specific root cause.
Of course, it wasn’t the software initiator. They didn’t use a software initiator, but they were so clueless, they didn’t know they’d actually used a hardware initiator. Without knowing how they performed their tests (since they didn’t publish their methodology) it’s purely speculation, but it looks like the problem was caused by congestion (from them architecting the UCS solution incorrectly).
Reason #2: They’re Hilariously Bad At Math.
They claimed FCoE required 50% more cables, based on the fact that there were 50% more cables in the FCoE solution than the FC solution. Which makes sense… except that the FC system had zero Ethernet.
That’s right, in the HP/Fibre Channel solution, each blade had absolutely zero Ethernet connectivity. In the Cisco UCS solution, every blade had full Ethernet and Fibre Channel connectivity. None. Zilch. Why did they do that? Probably because had they included any network connectivity to the HP system, the cable count would have shifted to FCoE’s favor. Let me state this again, because it’s astonishingly stupid: They claimed FCoE (which included Ethernet and FC connectivity) required more cables without including any network connectivity for the HP/FC system.
Also, they made some power/cooling claims, despite the fact that the UCS solution didn’t require a separate FC switch (it’s capable of being a full-fledged Fibre Channel switch by itself), though the HP solution would have required a separate pair of Ethernet switches (which wasn’t included). So yeah, their math is a bit off. Had they done things, you know, correctly, the power, cooling, and cable count would have flipped in favor of FCoE.
Reason #3: UCS is Hard, You Guys!
They whinged about UCS being more difficult to setup. Anytime you’re dealing with unfamiliar technology, it’s natural that it’s going to be more difficult. However, they claimed that they had zero experience with HP as well (seriously, who at Brocade hired these guys?) How easy is UCS? Here is a video done from Amsterdam where a couple of Cisco techs added a new chassis and blade and had it booted up and running ESXi in less than 30 minutes from in the box to booted. Cisco UCS is different than other blade systems, but it’s also very easy (and very quick) to stand up. And keep in mind, the video I linked was done in Amsterdam, so they were probably baked.
Reason #4: It Contradicts Everyone Else’s Results (Especially those that know what they’re doing)
For the past couple of years, VMware and NetApp have been doing performance tests on various storage protocols. Here’s one from a few years ago, which includes (native) 4 and 8 Gbit Fibre Channel, 10 Gbit FCoE, 10 Gbit iSCSI, and 10 Gbit NFS. The conclusion? The protocol doesn’t much matter. They all came out about the same when normalized for bandwidth. The big difference is in the storage backend. At least they published their methodology (I’m looking at you, Evaluator Group). Here’s one from Demartek that shows a mixture of storage protocols saturating 10 Gbit Ethernet. Again, the limitation is only the link speed itself, not the protocol. And again, again, Demartek published their methodology.
Reason #5: How Did They Set Everything Up? Magic!
Most of the time with these commissioned reports, the details of how it’s configured are given so that the results can be reproduced and audited. How did the Evaluator Group set up their environment?
As far as I can tell, magic. There’s several things they could have easily gotten wrong with the UCS setup, and given their mistake about software/hardware initiators, quite likely. They didn’t even mention which storage vendor they used.
So there you have it. A bit of a re-hash, but hey, it was a dumb report. The upside though is that it did provide me with some entertainment.
April 1, 2014 Leave a comment
From Juniper to Cisco to VMware, companies are spouting up new SDN solutions. Juniper’s Contrail, Cisco’s ACI, VMware’s NSX, and more are all vying to be the next generation of data center networking. What is surprising, however, is what’s at the heart of these new technologies.
Is it VXLAN, NVGRE, Openflow? Nope. It’s Fibre Channel.
If you think about it, it makes sense. Fibre Channel has been doing fabrics since before we ever called Ethernet fabrics, well, fabrics. And this isn’t the first time that Fibre Channel has shown up in unusual places. There’s a version of Fibre Channel that runs inside certain airplanes, including jet fighters like the F-22.
Keep the skies safe from FCoE (sponsored by the Evaluator Group)
New generation of switches have been capable of Data Center Bridging (DCB), which enables Fibre Channel over Ethernet. These chips are also capable of doing native Fibre Channel So rather than build complicated VPLS fabrics or routed networks, various data center switching companies are leveraging the inherent Fibre Channel capabilities of the merchant silicon and building Fibre Channel-based underlay networks to support an IP-based overlay.
Buffer-to-buffer (B2B) credit system and losslessness of Fibre Channel, plus the new 32/128 Gigabit interfaces with the newest Fibre Channel standard are all being leveraged for these underlays. I find it surprising that so many companies are adopting this, you’d think it’d be just Brocade. But Cisco, Arista (who notoriously shunned FCoE) and Juniper are all on board with new or announced SDN offerings that are based mostly or in part on Fibre Channel.
However, most of the switches from various vendors are primarily Ethernet today, so the 10/40 Gigabit interfaces can run FCoE until more switches are available with native FC interfaces. Of course, these switches will still be required to have a number of native Ethernet ports in order to connect to border networks that aren’t part of the overlay network, so there will be still a need for Ethernet. But it seems the market has spoken, and they want Fibre Channel.
March 19, 2014 1 Comment
Hey, remember vTax/vRAM? It’s dead and gone, but with 6 Terabyte of RAM servers now available, imagine what could have been (your insanely high licensing costs).
Set the wayback machine to 2011, when VMware introduced vSphere version 5. It had some really great enhancements over version 4, but no one was talking about the new features. Instead, they talked about the new licensing scheme and how much it sucked.
While some defended VMware’s position, most were critical, and my own opinion… let’s just say I’ve likely ensured I’ll never be employed by VMware. Fortunately, VMware came to their senses and realized what a bone-headed, dumbass move that vRAM/vTax was, and repealed the vRAM licensing one year later in 2012. So while I don’t want to beat a dead horse (which, seriously, disturbing idiom), I do think it’s worth looking back for just a moment to see how monumentally stupid that licensing scheme was for customers, and serve as a lesson in the economies of scaling for the x86 platform, and as a reminder about the ramifications of CapEx versus OpEx-oriented licensing.
Why am I thinking about this almost 2 years after they got rid of vRAM/vTax? I’ve been reading up on the newly released Intel’s E7 v2 processors, and among the updates to Intel’s high-end server chip is the ability to have 24 DIMMs per socket (the previous limit was 12) and the support of 64 GB DIMMs. This means that a 4-way motherboard (which you can order now from Cisco, HP, and others) can support up to 6 TB of RAM, using 96 DIMM slots and 64 GB DIMMs. And you’d get up to 60 cores/120 threads with that much RAM, too.
And I remembered one (of many) aspects about vRAM that I found horrible, which was just how quickly costs could spiral out of control, because server vendors (which weren’t happy about vRAM either) are cramming more and more RAM into these servers.
The original vRAM licensing with vSphere 5 was that for every socket you paid for, you were entitled to/limited to 48 GB of vRAM with Enterprise Plus. To be fair the licensing scheme didn’t care how much physical RAM (pRAM) you had, only how much of the RAM was consumed by spun-up VMs (vRAM). With vSphere 4 (and the current vSphere licensing, thankfully), RAM had been essentially free: you only paid per socket. You could use as much RAM as you could cram into a server. But with the vRAM licensing, if you had a dual-socket motherboard with 256 GB of RAM you would have to buy 6 licenses instead of 2. At the time, 256 GB servers weren’t super common, but you could order them from the various server vendors (IBM, Cisco, HP, etc.). So with vSphere 4, you would have paid about $7,000 to license that system. With vSphere 5, assuming you used all the RAM, you’d pay about $21,000 to license the system, a 300% increase in licensing costs. And that was day 1.
Now lets see how much it would cost to license a system with 6 TB of RAM. If you use the original vRAM allotment amounts from 2011, each socket granted you 48 GB of vRAM with Enterprise Plus (they did up the allotments after all of the backlash, but that ammended vRAM licensing model was so convoluted you literally needed an application to tell you how much you owed). That means to use all 6 TB (and after all, why would you buy that much RAM and not use it), you would need 128 socket licences, which would have cost $448,000 in licensing. A cluster of 4 vSphere hosts would cost just shy of $2 million to license. With current, non-insane licensing, the same 4-way 6 TB server costs a whopping $14,000. That’s a 32,000% price differential.
Again, this is all old news. VMware got rid of the awful licensing, so it’s a non-issue now. But still important to remember what almost happened, and how insane licensing costs could have been just a few years later.
My graph from 2011 was pretty accurate.
Rumor has it VMware is having trouble getting customers to go for OpEx-oriented licensing for NSX. While VMware hasn’t publicly discussed licensing, it’s a poorly kept secret that VMware is looking to charge for NSX on a per VM, per month basis. The number I’d been hearing is $10 per month ($120 per year), per VM. I’ve also heard as high as $40, and as low as $5. But whatever the numbers are, VMware is gunning for OpEx-oriented licensing, and no one seems to be biting. And it’s not the technology, everyone agrees that it’s pretty nifty, but the licensing terms are a concern. NSX is viewed as network infrastructure, and in that world we’re used to CapEx-oriented licensing. Some of VMware’s products are OpEx-oriented, but their attempt to switch vSphere over to OpEx was disastrous. And it seems to be the same for NSX.