April 25, 2014 Leave a comment
It’s been a while since the trainwreck of a “study” commissioned by Brocade and performed by The Evaluator Group, but it’s still being discussed in various storage circles (and that’s not good news for Brocade). Some pretty much parroted the results, seemingly without reading the actual test. Then got all pissy when confronted about it. I did a piece on my interpretations of the results, as did Dave Alexander of WWT and J Metz of Cisco. Our mutual conclusion can be best summed up with a single animated GIF.
But since a bit of time has passed, I’ve had time to absorb Dave and J’s opinions, as well as others, I’ve come up with a list of the Top 5 Reasons by The Evaluator Group Screwed Up. This isn’t the complete list, of course, but some of the more glaring problems. Let’s start with #1:
Reason #1: I Have No Idea What I’m Doing
Their hilariously bad conclusion to the higher variance in response times and higher CPU usage was that it was the cause of the software initiators. Except, they didn’t use software initiators. The had actually configured hardware initiators, and didn’t know it. Let that sink in: They’re charged with performing an evaluation, without knowing what they’re doing.
The Cisco UCS VIC 1240 hardware CNA’s were utilized. Referring to them as software initiators caused some confusion. The Cisco VIC is a hardware initiator and we configured them with virtual HBAs. Evaluator Group has no knowledge of the internal architecture of the VIC or its driver. Our commentary of the possible cause for higher CPU utilization is our opinion and further analysis would be required to pinpoint the specific root cause.
Of course, it wasn’t the software initiator. They didn’t use a software initiator, but they were so clueless, they didn’t know they’d actually used a hardware initiator. Without knowing how they performed their tests (since they didn’t publish their methodology) it’s purely speculation, but it looks like the problem was caused by congestion (from them architecting the UCS solution incorrectly).
Reason #2: They’re Hilariously Bad At Math.
They claimed FCoE required 50% more cables, based on the fact that there were 50% more cables in the FCoE solution than the FC solution. Which makes sense… except that the FC system had zero Ethernet.
That’s right, in the HP/Fibre Channel solution, each blade had absolutely zero Ethernet connectivity. In the Cisco UCS solution, every blade had full Ethernet and Fibre Channel connectivity. None. Zilch. Why did they do that? Probably because had they included any network connectivity to the HP system, the cable count would have shifted to FCoE’s favor. Let me state this again, because it’s astonishingly stupid: They claimed FCoE (which included Ethernet and FC connectivity) required more cables without including any network connectivity for the HP/FC system.
Also, they made some power/cooling claims, despite the fact that the UCS solution didn’t require a separate FC switch (it’s capable of being a full-fledged Fibre Channel switch by itself), though the HP solution would have required a separate pair of Ethernet switches (which wasn’t included). So yeah, their math is a bit off. Had they done things, you know, correctly, the power, cooling, and cable count would have flipped in favor of FCoE.
Reason #3: UCS is Hard, You Guys!
They whinged about UCS being more difficult to setup. Anytime you’re dealing with unfamiliar technology, it’s natural that it’s going to be more difficult. However, they claimed that they had zero experience with HP as well (seriously, who at Brocade hired these guys?) How easy is UCS? Here is a video done from Amsterdam where a couple of Cisco techs added a new chassis and blade and had it booted up and running ESXi in less than 30 minutes from in the box to booted. Cisco UCS is different than other blade systems, but it’s also very easy (and very quick) to stand up. And keep in mind, the video I linked was done in Amsterdam, so they were probably baked.
Reason #4: It Contradicts Everyone Else’s Results (Especially those that know what they’re doing)
For the past couple of years, VMware and NetApp have been doing performance tests on various storage protocols. Here’s one from a few years ago, which includes (native) 4 and 8 Gbit Fibre Channel, 10 Gbit FCoE, 10 Gbit iSCSI, and 10 Gbit NFS. The conclusion? The protocol doesn’t much matter. They all came out about the same when normalized for bandwidth. The big difference is in the storage backend. At least they published their methodology (I’m looking at you, Evaluator Group). Here’s one from Demartek that shows a mixture of storage protocols saturating 10 Gbit Ethernet. Again, the limitation is only the link speed itself, not the protocol. And again, again, Demartek published their methodology.
Reason #5: How Did They Set Everything Up? Magic!
Most of the time with these commissioned reports, the details of how it’s configured are given so that the results can be reproduced and audited. How did the Evaluator Group set up their environment?
As far as I can tell, magic. There’s several things they could have easily gotten wrong with the UCS setup, and given their mistake about software/hardware initiators, quite likely. They didn’t even mention which storage vendor they used.
So there you have it. A bit of a re-hash, but hey, it was a dumb report. The upside though is that it did provide me with some entertainment.