Top 5 Reasons The Evaluator Group Screwed Up

It’s been a while since the trainwreck of a “study” commissioned by Brocade and performed by The Evaluator Group,  but it’s still being discussed in various storage circles (and that’s not good news for Brocade). Some pretty much parroted the results, seemingly without reading the actual test. Then got all pissy when confronted about it.  I did a piece on my interpretations of the results, as did Dave Alexander of WWT and J Metz of Cisco. Our mutual conclusion can be best summed up with a single animated GIF.



But since a bit of time has passed, I’ve had time to absorb Dave and J’s opinions, as well as others, I’ve come up with a list of the Top 5 Reasons by The Evaluator Group Screwed Up. This isn’t the complete list, of course, but some of the more glaring problems. Let’s start with #1:

Reason #1: I Have No Idea What I’m Doing

Their hilariously bad conclusion to the higher variance in response times and higher CPU usage was that it was the cause of the software initiators. Except, they didn’t use software initiators. The had actually configured hardware initiators, and didn’t know it. Let that sink in: They’re charged with performing an evaluation, without knowing what they’re doing.

The Cisco UCS VIC 1240 hardware CNA’s were utilized.  Referring to them as software initiators caused some confusion. The Cisco VIC is a hardware initiator and we configured them with virtual HBAs. Evaluator Group has no knowledge of the internal architecture of the VIC or its driver.  Our commentary of the possible cause for higher CPU utilization is our opinion and further analysis would be required to pinpoint the specific root cause.

Of course, it wasn’t the software initiator. They didn’t use a software initiator, but they were so clueless, they didn’t know they’d actually used a hardware initiator. Without knowing how they performed their tests (since they didn’t publish their methodology) it’s purely speculation, but it looks like the problem was caused by congestion (from them architecting the UCS solution incorrectly).

Reason #2: They’re Hilariously Bad At Math.

They claimed FCoE required 50% more cables, based on the fact that there were 50% more cables in the FCoE solution than the FC solution. Which makes sense… except that the FC system had zero Ethernet.

That’s right, in the HP/Fibre Channel solution, each blade had absolutely zero Ethernet connectivity. In the Cisco UCS solution, every blade had full Ethernet and Fibre Channel connectivity.  None. Zilch. Why did they do that? Probably because had they included any network connectivity to the HP system, the cable count would have shifted to FCoE’s favor.  Let me state this again, because it’s astonishingly stupid: They claimed FCoE (which included Ethernet and FC connectivity) required more cables without including any network connectivity for the HP/FC system. 


Also, they made some power/cooling claims, despite the fact that the UCS solution didn’t require a separate FC switch (it’s capable of being a full-fledged Fibre Channel switch by itself), though the HP solution would have required a separate pair of Ethernet switches (which wasn’t included). So yeah, their math is a bit off. Had they done things, you know, correctly, the power, cooling, and cable count would have flipped in favor of FCoE.

Reason #3: UCS is Hard, You Guys!

They whinged about UCS being more difficult to setup. Anytime you’re dealing with unfamiliar technology, it’s natural that it’s going to be more difficult. However, they claimed that they had zero experience with HP as well (seriously, who at Brocade hired these guys?) How easy is UCS? Here is a video done from Amsterdam where a couple of Cisco techs added a new chassis and blade and had it booted up and running ESXi in less than 30 minutes from in the box to booted. Cisco UCS is different than other blade systems, but it’s also very easy (and very quick) to stand up. And keep in mind, the video I linked was done in Amsterdam, so they were probably baked   

Reason #4: It Contradicts Everyone Else’s Results (Especially those that know what they’re doing)

For the past couple of years, VMware and NetApp have been doing performance tests on various storage protocols. Here’s one from a few years ago, which includes (native) 4 and 8 Gbit Fibre Channel, 10 Gbit FCoE, 10 Gbit iSCSI, and 10 Gbit NFS. The conclusion? The protocol doesn’t much matter. They all came out about the same when normalized for bandwidth. The big difference is in the storage backend. At least they published their methodology (I’m looking at you, Evaluator Group). Here’s one from Demartek that shows a mixture of storage protocols saturating 10 Gbit Ethernet. Again, the limitation is only the link speed itself, not the protocol. And again, again, Demartek published their methodology.

Reason #5: How Did They Set Everything Up? Magic!

Most of the time with these commissioned reports, the details of how it’s configured are given so that the results can be reproduced and audited. How did the Evaluator Group set up their environment?


As far as I can tell, magic. There’s several things they could have easily gotten wrong with the UCS setup, and given their mistake about software/hardware initiators, quite likely. They didn’t even mention which storage vendor they used.

So there you have it. A bit of a re-hash, but hey, it was a dumb report. The upside though is that it did provide me with some entertainment.

Fibre Channel: The Heart of New SDN Solutions

From Juniper to Cisco to VMware, companies are spouting up new SDN solutions. Juniper’s Contrail, Cisco’s ACI, VMware’s NSX, and more are all vying to be the next generation of data center networking. What is surprising, however, is what’s at the heart of these new technologies.

Is it VXLAN, NVGRE, Openflow? Nope. It’s Fibre Channel.


If you think about it, it makes sense. Fibre Channel has been doing fabrics since before we ever called Ethernet fabrics, well, fabrics. And this isn’t the first time that Fibre Channel has shown up in unusual places. There’s a version of Fibre Channel that runs inside certain airplanes, including jet fighters like the F-22.


Keep the skies safe from FCoE (sponsored by the Evaluator Group)

New generation of switches have been capable of Data Center Bridging (DCB), which enables Fibre Channel over Ethernet. These chips are also capable of doing native Fibre Channel So rather than build complicated VPLS fabrics or routed networks, various data center switching companies are leveraging the inherent Fibre Channel capabilities of the merchant silicon and building Fibre Channel-based underlay networks to support an IP-based overlay.

Buffer-to-buffer (B2B) credit system and losslessness of Fibre Channel, plus the new 32/128 Gigabit interfaces with the newest Fibre Channel standard are all being leveraged for these underlays. I find it surprising that so many companies are adopting this, you’d think it’d be just Brocade. But Cisco, Arista (who notoriously shunned FCoE) and Juniper are all on board with new or announced SDN offerings that are based mostly or in part on Fibre Channel.

However, most of the switches from various vendors are primarily Ethernet today, so the 10/40 Gigabit interfaces can run FCoE until more switches are available with native FC interfaces. Of course, these switches will still be required to have a number of native Ethernet ports in order to connect to border networks that aren’t part of the overlay network, so there will be still a need for Ethernet. But it seems the market has spoken, and they want Fibre Channel.


CCIE DC Attempt #1: Did Not Pass

Earlier this month, I drove my rental car up to Cisco’s infamous 150 Tasman Drive after being stuck on the 101 for about an hour. I checked in, sat down, and dug into my very first CCIE lab attempt. A bit over 8 hours later, I knew I didn’t pass, but I got a good feel for what the lab is like.

My preparation for the exam had been very unbalanced, working extensively with some parts of the blueprint, while other aspects of the blueprint I hadn’t really touched in over a year. So I was not surprised at all to see the “FAIL” notice when I got my score.

The good news is that I think with the right preparation on my weak parts, I can pass on the next attempt (which I haven’t yet scheduled, but will soon).

The following animated GIF is what it’s like to do parts of a CCIE lab exam that you haven’t prepared for.







How It Feels Studying for my CCIE DC Lab


First Call I Made When I First Heard About “Gen5 Fibre Channel”


Is The OS Relevant Anymore?

I started out my career as a condescending Unix administrator, and while I’m not a Unix administrator anymore, I’m still quite condescending. In the past, I’ve run data centers based on Linux, FreeBSD, Solaris, as well as administered Windows boxes, OpenBSD and NetBSD, and even NeXTSTEP (best desktop in the 90s).

In my role as a network administrator (and network instructor), this experience has become invaluable. Why? One reason is that most networking devices these days have an open sourced based operating system as the underlying OS.

And recently, I got into a discussion on Twitter (OK, kind of a twitter fight, but it’s all good with the other party) about the underlying operating systems for these network devices, and their relevance. My position? The underlying OS is mostly irrelevant.

First of all, the term OS can mean a great many things. In the context of this post, when I talk about OS I’m referring to only the underlying OS. That’s the kernel, libraries, command line, drivers, networking stack, and file system. I’m not referring to the GUI stack (GNOME, KDE, or Unity for the Unixes, Mac OS X’s GUI stack, Win32 for Window) or other types of stack such as a web application stack like LAMP (Linux, Apache, MySQL, and PHP).

Most routers and MLS (multi-layer switches, swtiches that can route as fast as they can switch) run an open source operating system as its control plane. The biggest exception is of course Cisco’s IOS, which is proprietary as hell. But IOS has reached its limits, and Cisco’s NX-OS, which runs on Cisco’s next-gen Nexus switches, is based on Linux. Arista famously runs Linux (Fedora Core) and doesn’t hide it from the users (which allows it to do some really cool things). Juniper’s Junos is based on FreeBSD.

In almost every case of router and multi-layer switch however, the operating system doesn’t forward any packets. That is all handled in specialized silicon. The operating system is only responsible for the control plane, running processes like an OSPF, spanning-tree, BGP, and other services to decide on a set of rules for forwarding incoming packets and frames. These rules, sometimes called a FIB (Forwarding Information Base), are programmed into the hardware forwarding engines (such as the much-used Broadcom Trident chipset). These forwarding engines do the actual switching/routing. Packets don’t hit the general x86 CPU, they’re all handled in the hardware. The control plane (running as various coordinated processes on top of a one of these open source operating systems) tells the hardware how to handle packets.

So the only thing the operating system does (other than the occasional punted packet) is tell the hardware how to handle traffic the general CPU will never see. This is the way it has to be, because x86 hardware can’t scale nearly as well as special purpose silicon can, especially considering power and cooling consumption. Latency is way lower as well.

In fact, hardware wise, most vendors (Juniper, Arista, Huawei, Alcatel-Lucent ,etc.) have been using the exact same chip in their latest switches. So the differentiation isn’t the silicon. Is the differentiation the underlying operating system? No, it makes little difference for the end user. They are instead a (mostly) invisible platform for which the services (CLI, APIs, routing protocols, SDN hooks, etc.) are built upon. Networking vendors are in the middle of a transition into software developers (and motherboard gluers).

All you need to create a 10 Gigabit Switch

The biggest holdout in networking devices and non-open source is of course, Cisco’s IOS, which is proprietary as hell. Still, the future for Cisco appears to be NX-OS running on all of the Nexus switches, and that’s based on Linux.

Let’s also take a look at networking devices where the underlying OS may actually touch the data plane, and a genre in which I’m very much acquatned with: Load balancers (and no, I’m not calling them Application Delivery Controllers).

F5′s venerable BIG-IPs used to be based on BSDI initially (a years-dead BSD), and then switched to Linux. CoyotePoint was based on FreeBSD, and is now based on NetBSD. Cisco’s ACE is based on Linux (although Cisco’s shitty CSS runs proprietary vxWorks, but it’s not shitty because of vxWorks). Most of the other vendors are based on Linux. However, the baseline operating system makes very little difference these days.

Most load balancers have SSL offload (to push the CPU-intensive asymmetric encryption onto a specialized processor). This is especially important as we move to 2048-bit SSL certificates. Some load balancers have Layer 2/3/4 silicon (either ASICs or FPGAs, which are flexible ASICs) to help out with forwarding traffic, and hit general CPUs (usually x86) for the Layer 7 parsing. So does the operating system touch the traffic going through a load balancer? Usually, not always, and well, it depends.

So with Cisco on Linux and Juniper with FreeBSD, would either company benefit from switching to a different OS? Does either company enjoy a competitive advantage by having chose their respective platform? No. In fact, switching platforms would likely be a colossal waist of time and resources. The underlying operating systems just provide some common services to run the networking services that program the line cards and silicon.

When I brought up Arista and their Fedora Core-based control plane which they open up to customers, here’s what someone (a BSD fan) described Fedora as: “Inconsistent and convoluted”, “building/testing/development as painful”, and “hasn’t a stable file system after 10 years”.

Reading that statement, you’d think that dealing with Fedora is a nightmare. That’s not remotely true. Some of that statement is exaggeration (and you could find specific examples to support that statement for any operating system) and some of it is fantasy. No stable file system? Linux has had several file systems, including ext2, ext3, ext4, XFS, and more for a while, and they’ve been solid.

In a general sense, I think the operating system is less relevant than it used to be. Take OpenBSD for example. It’s well deserved reputation for security is legendary. Still, would there be any advantage today to running your web application stack on OpenBSD? Would your site be any more secure? Probably not. Not because OpenBSD is any less secure today than it was a while ago, quite the opposite. It’s because the attack vectors have changed. The attacks are hitting the web stack and other pieces rather than the underlying operating system. Local exploits aren’t that big of deal because few systems let anyone but a few users log in anyway. The biggest attacks lately have come from either SQL injection or attacks on desktop operating systems (mostly Windows, but now recently Apple as well).

If you’re going to expose a server directly to the Internet on a DMZ or (gasp) without any firewall at all, OpenBSD is an attractive choice. But that doesn’t happen much anymore. Servers are typically protected by layers of firealls, IPS/IDS, and load balancers.

Would Android be more successful or less successful if Google switched from Linux as the underpinnings to one of the BSDs? Would it be more secure if they switched to OpenBSD? No, and it would it be an entirely wasted effort. It’s not likely any of the security benefits of OpenBSD would translate into the Dalvik stack that is the heart of Android.

As much as fanboys/girls don’t want to admit it, it’s likely the number one reason people choose an OS is familiarity. I tend to go with Linux (although I have FreeBSD and OpenBSD-based VMs running in my infrastructure) because I’m more familiar with it. For my day to day uses, Linux or FreeBSD would both work. There’s not a competitive advantage either have over each other in that regard. Linux outright wins in some cases, such as virtualization (BSDs have been very behind in that technology, though they run fine as guests), but for most stuff it doesn’t matter. I use FreeNAS, which is FreeBSD based, but I don’t care what it runs. I’d use FreeNAS if it were based on Linux, OpenBSD, or whatever.  (Because it’s based on FreeBSD, FreeNAS does run ZFS, which for some uses is better than any of the Linux file systems, although I don’t run FreeNAS’s ZFS since it’s missing encryption).

So fanboy/girlism aside, for the most part today, choice of an operating system isn’t the huge deal it may once have been. People succeed with using Linux, FreeBSD, OpenBSD, NetBSD, Windows, and more as the basis for their platforms (web stack, mobile stack, network device OS, etc.).

It May Already Be Too Late!

I’m very enthusiastic about anything that makes corporate IT suck less (such as BYOD, Bring Your Own Device), and despite not working for any company other than myself, I’m still quite sensitive to things that increase IT suckitude. And I’ve found the later recently in a blog post over at Juniper called “BYOD Isn’t As Scary As You Think, Mr. or Ms. CIO“.

The title of the article seems to say that BYOD isn’t scary for corporate environments. But the article reads as if the author intended to induce a panic attack.

The article is frustrating for a couple of reasons. One, CIOs might take that shit seriously, and while huffing on a paper bag because of panic-induced hyperventilation, might fire off a new bone-headed security policy. One would hope that someone at the CIO level would know better, but I’ve known CIOs that don’t.

Two, one of the great things about smart phones is the lack of shitty security products on them. And you want to go ruin that? If I’m bringing my own device, with saucy texts from my supermodel girlfriends, I’m not likely to let any company put anything on my phone.

Why Ensign Ro, those are not bridge-duty appropriate texts you’re sending to Commander Data

Three, of the possible security implications with smart phones, only a couple of edge cases would even be solved by the software that Juniper offers as a solution. For instance, the threat of a rogue employee. You used to be able to tell if you were let go because your passwords didn’t work, now you could know when your phone reboots and wipes. But how do you know they’ve gone rogue? Why, monitor photos and texts on that employee’s phone of course.

Wait, what?

You can monitor emails, texts, and camphone images? With Junos Pulse mobile security, you can.

Hi there Brett Favre, Big Brother here. We, uhh, couldn’t help but notice that photo you texted from your personal phone that we are always monitoring…

This is just making corporate security, which already sucks, even worse. It’s a mentality that is lose-lose. The IT organization would get additional complexity for very little gain, and the users would get more hindrance, little security, and a huge invasion of privacy. Maybe I’m alone in this, but if any company offered me a job and required my personal device be subjected to this, the compensation package would need to include a mega-yacht to make it worthwhile.

I’ve been self employed since 2007, and having been free of corporate laptop builds, moldy email systems, and maniacal IT managers, I can say this: Being independent is 30% about calling the shots on my own schedule, 70% is calling the shots on my own equipment.

“That’s a very attractive offer, however judging from that crusty-ass laptop you have an the bizarre no-Mac policy by your brain-dead IT head/security officer, working for your company would eat away at my soul and cause me to activate the genesis device out of frustration.”

I really like Juniper, I do. But one of the things you do with friends is call them on their shit. I do it with Cisco all the time, now it’s Juniper’s turn.

Really? I Need A Mainframe?

On the twitters today I came across a tweet by @mreferre pointing to a blog about mainframes aptly named The Mainframe Blog. The catchphrase for that blog seems to be “blank needs a mainframe”, with links to various outages like those by AWS. So my first thought is…

Are you f#@%ing kidding me?

Actually, no, that wasn’t my first thought. My first thought was Beeper King, Liz Lemon’s boyfriend’s pager business from 30-Rock. A pager store, in the 21st century.

I cut my teeth in the mid to late 90s on the Unix systems that usurped the mainframe. I even took a mainframe class at IBM’s headquarters, installing Linux on LPARS (IBM came out with the first hypervisor in the 60′s after all). It was a weird, bizarre experience. 31-bit Linux. That’s not a typo. 31-bit. And it was painfully slow. Mainframes have a reputation for power, but really the power that they have isn’t in the processors. Or RAM for that matter. Take a look at the specs for the new IBM zEnterprise 114 mainframe system.

10 processors (a variation of the Power7s I believe, fast but not game changing processors) and 256 GB of RAID’d RAM. Only 256 GB and 10 CPUs in a full rack? I can get 512 GB and 2 CPUs into a single blade with a Cisco UCS B230 M2 blade, 8 slots per chassis for 4 TB and 16 CPUs in a chassis, and 6 chassis in a rack. That’s 24 TB of RAM versus 256 GB of RAM, 10 CPUs versus 96 CPUs (960 cores with the new Intel E7′s). 10 CPUs and 256 GB is also the max, so you know it’ll cost an arm and a leg.

I admit, I’m not very schooled on mainframes, and I have minimal stick time on them, but they don’t seem as all that flexible. They seem to me to be designed to do a few very important tasks very reliably (not necessarily fast).  Something like keeping track of money, or airline reservations. But these days even those systems are typically front ended by a web application, not a TN3720 terminal. Given the processing and memory limitations, I don’t think mainframes could possibly handle running web applications in any kind of cost effective or scalable way.

Add to that the cost of mainframes (wheelbarrows of cash), and a host of proprietary storage and networking connections that also cost an arm and a leg. Also, last time I checked, mainframes still booted via a virtual punchcard. Seriously. Most mainframes will have an IBM/Lenovo laptop plugged into them at all times because they can’t boot without them. Probably not any reason not to go mainframe, but it is strange.

Another reason not to trust mainframes. If a computerized voice asks you to play a game, you say no.

Besides, I don’t think “blank needs a mainframe” is usable in all cases. You could say the same thing about Air New Zealand’s outage in 2009, but they had a mainframe. Run by IBM.

I could be wrong about a lot of things in this article, but I just don’t see mainframes as a viable data center technology except for speciality cases. But maybe that’s just me (and everyone else I know).

Old Man of the Datacenter

I’d been thinking about how the ubiquity of Catalyst 6500 was impressive in many ways (always been there for me, etc.), but also how it prevents a lot of new technology adoption, such as FCoE, Fabric Path/TRILL, etc. because it doesn’t support them. And today I heard that there’s a new supervisor module for the Cat6K. On an episode of Packet Pushers, I’d heard the Cat6K referred to by Ethan as the “Old Man of the Datacenter”.  So I hit up some Photoshop (or rather, GIMP), and voila.


Get every new post delivered to your Inbox.

Join 73 other followers