Requiem for the ACE

Ah, the Cisco ACE. As we mourn our fallen product, I’ll take a moment to reflect on this development as well as what the future holds for Cisco and load balancing/ADC. First off, let me state I have no inside knowledge of what Cisco’s plans are in this regard. While I teach Cisco ACE courses for Firefly and develop Firefly’s courseware for both ACE products and bootcamp material for the CCIE Data Center, I’m not an employee of Cisco and have no inside knowledge of their plans. As a result, I’ve no idea what Cisco’s plans are, so this is pure speculation.

Also, it should be made clear that Cisco has not EOL’d (End of Life) or even EOS’d (End of Sale) the ACE product, and in a post on the CCIE Data Center group Walid Issa, the project manager for CCIE Data Center, made a statement reiterating this. And just as I was about to publish this post, there’s a great post by Brad Casemore also reflecting on the ACE, and there’s an interesting comment from Steven Schuchart of Cisco (analyst relations?) making a claim that ACE is, in fact, not dead.

However, there was a statement Cisco sent to CRN confirming the rumor, and my conversations with people inside Cisco have confirmed that yes, the ACE is dead. Or at least, that’s the understanding of Cisco employees in several areas. The word I’m getting will be bug-fixed and security-fixed, but further development will halt. The ACE may not officially be EOL/EOS, but for all intents and purposes, and until I hear otherwise, it’s a dead-end product.

The news of ACE’s probable demise was kind of like a red-shirt getting killed. We all knew it was coming, and you’re not going to see a Spock-like funeral, either. 

We do know one thing: For now at least, the ACE 4710 appliance is staying inside the CCIE Data Center exam. Presumably in the written (I’ve yet to sit the non-beta written) as well as in the lab. Though it seems certain now that the next iteration (2.0) of the CCIE Data Center will be ACE-less.

Now let’s take a look down memory land, to the Ghosts of Load Balancers Past…

Ghosts of Load Balancers Past

As many are aware, Cisco has long had a long yet… imperfect relationship with load balancing. This somewhat ironic considering that Cisco was, in fact, the very first vendor to bring a load balancer to market. In 1996, Cisco released the LocalDirector, the world’s first load balancer. The product itself sprung from the Cisco purchase of Network Translation Incorporated in 1996, which also brought about the PIX firewall platform.

The LocalDirectors did relatively well in the market, at least at first. It addressed a growing need for scaling out websites (rather than the more expensive, less resilient method of scaling up). The LocalDirectors had a bit of a cult following, especially from the routing and switching crowd, which I suspect had a lot to do with its relatively simple functionality: For most of its product life, the LocalDirector was just a simple Layer 4 device, and only moved up the stack in the last few years of its product life. While other vendors went higher up the stack with Layer 7 functionality, the LocalDirector stayed Layer 4 (until near the end, when it got cookie-based persistence). In terms of functionality and performance, however,  vendors were able to surpass the LocalDirector pretty quickly.

The most important feature that the other vendors developed in the late 90s was arguably cookie persistence. (The LocalDirector didn’t get this feature until about 2001 if I recall correctly.) This allowed the load balancer to treat multiple people coming from the same IP address as separate users. Without cookie-based persistence, load balancers could only do persistence based on an IP address, and was thus susceptible to the AOL megaproxy problem (you could have thousands of individual users coming from a single IP address). There was more than one client in the 1999-2000 time period where I had to yank out a LocalDirector and put in a Layer 7-capable device because of AOL.

Cookie persistence is a tough habit to break

At some point Cisco came to terms with the fact that the LocalDirector was pretty far behind and must have concluded it was an evolutionary dead end, so it paid $6.7 billion (with B) to buy ArrowPoint, a load balancing company that had a much better product than the LocalDirector. That product became the Cisco CSS, and for a short time Cisco was on par with other offerings from other vendors. Unfortunately, as with the LocalDirector, development and innovation seemed to stop after the purchase, and the CSS was forever a product frozen in the year 2000. Other vendors innovated (especially F5), and as time went on the CSS won fewer and fewer deals. By 2007, the CSS was largely a joke in load balancing circles. Many sites were happily running the CSS of course, (and some still are today), but feature-wise, it was getting its ass handed to it by the competition.

The next load balancer Cisco came up with had a very short lifecycle. The Cisco CSM (Content Switch Module), a load balancing module for the Catalyst 6500 series, didn’t last very long and as far as I can remember never had a significant install base. Also, I don’t recall ever using, and know it only through legend (as being not very good). It was replaced quickly by the next load balancing product from Cisco.

And that brings us to the Cisco ACE. Available in two iterations, the Service Module and the ACE 4710 Appliance, it looked like Cisco might have learned from its mistakes when it released the Cisco ACE. Out of the gate it was a bit more of a modern load balancer, offering features and capabilities that the CSS lacked, such as a three-tired VIP configuration mechanism (real servers, server farms, and VIPs, which made URL rules much easier) and the ability to insert the client’s true-source IP address in an HTTP header in SNAT situations. The latter was a critical function that the CSS never had.

But the ACE certainly had its downsides. The biggest issue is that the ACE could never go toe-to-toe with the other big names in load balancing in terms of features. F5 and NetScaler, as well as A10, Radware, and others, always had a far richer feature set than the ACE. It is, as Greg Ferro said, a moderately competent load balancer in that it does what it’s supposed to do, but it lacked the features the other guys had.

The number one feature that keeps ACE from eating at the big-boy table is an answer to F5’s iRules. F5’s iRules give a huge amount of control over how to load balance and manipulate traffic. You can use it to create a login page on the F5 that authenticates against AD(without ever touching a web server), re-write http:// URLs to https:// (very useful in certain SSL termination setups), and even calculate Pi everytime someone hits a web page. Many of the other high end vendors have something similar, but F5’s iRules is the king of the hill.

In contrast, the ACE can evaluate existing HTTP headers, and can manipulate headers to a certain extent, but the ACE cannot do anything with HTTP content. There’s more than one installation where I had to replace the ACE with another load balancer because of that issue.

The ACE never had a FIPS-compliant SSL implementation either, which prevented the ACE from being in a lot of deals, especially with government and financial institutions. ACE was very late to the game with OCSP support and IPv6 (both were part of the 5.0 release in 2011), and the ACE10 and ACE20 Service Modules will never, ever be able to do IPv6. You’d have to upgrade to the ACE30 Module to do IPv6, though right now you’d be better off with another vendor.

For some reason, Cisco decided to make use of MQC (Module QoS CLI) as the configuration framework in the ACE. This meant configuring a VIP required setting up class-maps, policy-maps, and service-policies in addition to real server and server farms. This was far more complicated than the configuring of most of the competition, despite the fact that the ACE had less functionality. If you weren’t a CCNP level or higher, the MQC could be maddening. (On the upside, if you mastered it on the ACE, QoS was a lot easier to learn, as was my case.)

If the CLI was too daunting, there was always the GUI on the ACE 4710 Appliance and/or the ACE Network Manager (ANM), which was separate user interface that ran on RedHat and later became it’s own OVA-based virtual appliance. The GUI in the beginning wasn’t very good, and the ACE Service Modules (ACE10, ACE20, and now the ACE30) lacked a built-in GUI. Also, when it hits the fan, the CLI is the best way to quickly diagnose an issue. If you weren’t fluent in the MQC and the ACE’s rather esoteric utilization of such, it was tough to troubleshoot.

There was also a brief period of time when Cisco was selling the ACE XML Gateway, a product obtained through the purchase of Reactivity in 2007, which provided some (but not nearly all) of the features the ACE lacked. It still couldn’t do something like iRules, but it did have Web Application Firewall abilities, FIPS compliance, and could do some interesting XML validation and other security. Of course, that product was short lived as well, and Cisco pulled the plug in 2010.

Despite these short comings, the ACE was a decent load balancer. The ACE service module was a popular service module for the Catalyst 6500 series, and could push up to 16 Gbps of traffic, making it suitable for just about any site. The ACE 4710 appliance was also a popular option at a lower price point, and could push 4 Gbps (although it only had (4) 1 Gbit ports, never 10 Gbit). Those that were comfortable with the ACE enjoyed it, and there are thousands of happy ACE customers with deployments.

But “decent” isn’t good enough in the highly competitive load balancing/ADC market. Industry juggernauts like F5 and scrappy startups like A10 smoke the ACE in terms of features, and unless a shop is going all-Cisco, the ACE almost never wins in a bake-off. I even know of more than one occasion where Cisco had to essentially invite itself to a bake-off (which in those cases never won). The ACE’s market share continued to drop from its release, and from what I’ve heard is in the low teens in terms of percentage, while F5 has about 50%.

In short, the ACE was the knife that Cisco brought to the gunfight. And F5 had a machine gun.

I’d thought for years that Cisco might just up and decide to drop the ACE. Even with the marketing might and sales channels of Cisco, the ACE could never hope to usurp F5 with the feature set it had. Cisco didn’t seem committed to developing new features, and it fell further behind.

Then Cisco included ACE in the CCIE Data Center blueprint, so I figured they were sticking with it for the long haul. Then the CRN article came out, and surprised everybody (including many in Cisco from what I understand).

So now the big question is whether or not Cisco is bowing out of load balancing entirely, or coming out with something new. We’re certainly getting conflicting information out of Cisco.

I think both are possible. Cisco has made a commitment (that they seem to be living up to) to drop businesses and products that they aren’t successful in. While Cisco has shipped tens of thousands of load balancing units since the first LocalDirector was unboxed, except for the beginning they’ve never led the market. Somewhere in the early 2000s, that title belong almost exclusively to F5.

For a company as broad as Cisco is, load balancing as a technology is especially tough to sell and support. It takes a particular skill set that doesn’t relate fully to Cisco’s traditional routing and switching strengths, as load balancing sits in two distinct worlds: Server/app development, and networking. With companies like F5, A10, Citrix, and Radware, it’s all they do, and every SE they have knows their products forwards and backwards.

The hardware platform that the ACE is based on (Cavium Octeon network processors) I think are one of the reasons why the ACE hasn’t caught up in terms of features. To do things like iRules, you need fast, generalized processors. And most of the vendors have gone with x86 cores, and lots of them. Vendors can use pure x86 power to do both Layer 4 and Layer 7 load balancing, or some like F5 and A10 incorporate FGPAs to hardware-assist the Layer 4 load balancing, and distribute flows to x86 cores for the more advanced Layer 7 processing.

The Cavium network processors don’t have the horsepower to handle the advanced Layer 7 functionality, and the ACE Modules don’t have x86 at all. The ACE 4710 Appliance has an x86 core, but it’s several generations back (it’s seriously a single Pentium 4 with one core). As Greg Ferro mentioned, they could be transitioning completely away from that dead-end hardware platform, and going all virtualized x86. That would make a lot more sense, and would allow Cisco to add features that it desperately needs.

But for now, I’m treating the ACE as dead.

Health Checking On Load Balancers: More Art Than Science

One of the trickiest aspects of load balancing (and load balancing has lots of tricky aspects) is how to handle health checking. Health checking is of course the process where by the load balancer (or application delivery controller) does periodic checks on the servers to make sure they’re up and responding. If a server is down for any reason, the load balancer should detect this and stop sending traffic its way.

Pretty simple functionality, really. Some load balancers call it keep-alives or other terms, but it’s all the same: Make sure the server is still alive.

One of the misconceptions about health checking is that it can instantly detect a failed server. It can’t. Instead, a load balancer can detect a server failure within a window of time. And that window of time is dependent upon a couple of factors:

  • Interval (how often is the health check performed)
  • Timeout (how often does the load balancer wait before it gives up)
  • Count (some load balancers will try several times before marking a server as “down”)

As an example, take a very common interval setting of 15 seconds, a timeout of 5 seconds, and a count of 2. If I took a shotgun to a server (which would ensure that it’s down), how long would it take the load balancer to detect the failure?

In the worst case scenario for time to detection, the failure occurred right after that last successful health check, so that would be about 14 seconds before the first failure was even detected. The health check fails once, so we wait another 15 seconds before the second health check. Now that’s two down, and we’ve got a server marked as down.

So that’s about 29 seconds at a worst case scenario, or 16 seconds on a best case scenario. Sometimes server administrators hear that and want you to tune the variables down, so they can detect a failure quicker. However, that’s about as low as they go.

If you set the interval for more than 15 seconds, depending on the load balancer, it can unduly burden the control plane processor with all those health checks. This is especially true if you have hundreds of servers in your server farm. You can adjust the count down to 1, which is common, but remember a server would be marked down on just a single health check failure.

I see you have failed a single health check. Pity.

The worst value to tune down, however, is the timeout value. I had a client once tell me that the load balancer was causing all sorts of performance issues in their environment. A little bit of investigating, and it turned out that they had set the timeout value to 1 second. If a server didn’t come up with the appropriate response to the health check in 1 second, the server would be marked down. As a result, every server in the farm was bouncing up and down more than a low-rider in a Dr Dre video.

As a result, users where being bounced from one server to another, with lots of TCP RSTs and re-logging in (the application was stateful, requiring users being tied to a specific server to keep their session going). Also, when one server took 1.1 seconds to respond, it was taken out of rotation. The other servers would have to pick up the slack, and thus had more load. It wasn’t long before one of them took more than a second to respond. And it would cascade over and over again.

When I talked to the customer about this, they said they wanted their site to be as fast as possible, so they set the timeout very low. They didn’t want users going onto a slow server. A noble aspiration, but the wrong way to accomplish that goal. The right way would be to add more servers. We tweaked the timeout value to 5 seconds (about as low as I would set it), and things calmed down considerably. The servers were much happier.

So tweaking those knobs (interval, timeout, count) are always a compromise between detecting a server failure quickly, and giving a server a decent chance to respond as well as not overwhelming the control plane. As a result, it’s not an exact science. Still, there are guidelines to keep in mind, and if you set the expectations correctly, the server/application team will be a lot happier.

Ode To MRTG

I was wasting time/procrastinating keeping up with current events on Twitter when I saw a tweet from someone with a familiar name, but I couldn’t quite place where I knew it from: Tobi Oetiker (@oetiker). Then it came to me. He’s the author of the fantastic MRTG, among other tools.

MRTG was my favorite trending utility back in the day. “But Tony, weren’t you a condescending Unix administrator back then, and isn’t MRTG a networking tool?” Yes, yes I was. But MRTG isn’t just for trending network links, you can use it to graph bandwidth in and out of servers as well as other metrics like CPU utilization, memory utilization, number of processes, etc. I had a whole set of standard metrics I would graph with MRTG, depending on the device.

Connection rate, open connections, and bandwidth for an F5 load balancer back when “Friends” was still on the air

MRTG combined with net-snmp (or in Window’s case, the built-in SNMP service) I could graph just about anything on the servers I was responsible for. This saved my ass so many times. Here’s a couple of examples where it saved me:

Customer: “We were down for 5 hours!”

Me: “No, actually your server was down for 5 minutes. Here’s the graph.”

Another customer: “Your network is slow!”

Me: “Our network graphs show very low latency and plenty of capacity. In addition, here’s a graph showing CPU utilization on your servers spiking to 100% for several hours at a time. It’s either time to expand your capacity, or perhaps look at your application to see why it’s using up so many resources.”

In the late 90s, I set up a huge server farm for a major music television network . As part of my automated installs, I included MRTG monitoring for every server’s switch port, server NIC, CPU, memory, as well as other server-relatied metrics. I also graphed the F5 load balancer’s various metrics for all of the VIPs (bandwidth, connection rate). Feeling proud of myself, I showed them to one of the customer’s technical executives thinking they’d look at it and say “oh that’s nice.”

Instead, he called me several times a day for a month asking me (very good) questions about what all the data meant. He absolutely loved it, and I never built a server farm without it (or something like it).

Plenty of tools can show you graphs, but MRTG and tools like it trend not just when you’re looking, but when you’re not. When you’re sleeping, it collects data. When you’re out to lunch, it’s collecting data. When you’re listening to the Beastie Boys or whoever the kids are listening to these day, it collects data. Data that you can pull up at a later date. MRTG was fairly simple, but extremely powerful.

MRTG taught me several important lessons with respect to system monitoring. Perhaps the most important lesson is that monitoring is really two very different disciplines: Trending and alerting. A mistake a lot of operations made was confusing the two. Probably the biggest difference between trending and alerting is that with trending, you can never do too much. With alerting, it’s very easy to over-alert.

How many times have you, in either a server or network administrator role, been the victim of “alert creep”? When alarm after alarm is configured in your network monitoring tool, sending out emails and traps, until you’re so inundated with noise that you can’t tell the difference between the system crying wolf and a real issue?

It’s easy to over-alert. However, it’s very difficult to over-trend. And honestly, trending data is far more useful to me than 99% of alerting. Usually a customer is my best alerting mechanism, they almost always seem to know well before my monitoring system does. And having historical trending data helps me get to the bottom of things much quicker.

Many have improved upon the art of trending with tools like Observium and even RRDTool (also written by Tobi Oetiker). Many more tried but succeeded in only making overly complicated messes that ignored the strength of MRTG which was its simplicity. The simplicity of graphing and keeping various metrics and providing a simple way to get access to them when needed. MRTG was the first killer app for not only network administrators, but server administrators. And it proved how important the old adage is:

If you didn’t write it down, it didn’t happen.

Automation Conundrum

During Tech Field Day 8, we saw lots of great presentations from amazing companies. One presentation that disappointed, however, was Symantec. It was dull (lots of marketing) and a lot of the stuff they had to offer (like dedupe and compression in their storage file system) had been around in competing products for many years. If you use the products, that’s probably pretty exciting, but it’s not exactly making me want to jump in. And I think Robin Harris had the best description of their marketing position when he called it “Cloudwashing”. I’ve taken the liberty of creating a dictionary-style definition for Cloudwashing:

Cloudwashing: Verb. 1. The act of taking your shitty products and contriving some relevance to the blah blah cloud. 

One product that I particularly wasn’t impressed by was Symantec’s Veritas Operations Manager. It’s a suite that’s supposed to automate and report on a disparate set of operating systems and platforms, providing a single pane of glass for virtualized data center operations. “With just a few clicks on Veritas Operations Manager, you can start and stop multi-tier applications decreasing downtime.” That’s the marketing, anyway.

In reality, what they seemed to have create was an elaborate system to… automatically restart a service if it failed. You’d install this pane of glass, pay the licensing fee or whatever, configure the hooks into all your application server, web servers, database servers… and what does it do? It restarts the process if it fails. What does it do beyond that? Not much more, from what I could see in the demo. I pressed them on a few issues during the presentation (which you can see here, the Virtual Business Services part starts around the 32 minutes mark), and that’s all they seemed to have. Restarting a process.

So, not terribly useful. But I don’t think The problem is one of engineering, instead I think it’s the overall philosophy of top-down automation.

See, we all have visions of Iron Man in our heads.

Iron Man Says: Cloud That Shit

Wait, what? Iron Man?

Yes, Iron Man. In the first Iron Man movie, billionaire industrialist Tony Stark built three suits: One to get him out of the cave, the second, all silver, had an icing problem, and the third which was used in the rest of the movie to punch the shit out of a lot of bad guys. He built the first two by hand. The third however, was built through complete automation. He said “build it” and the Jarvis, his computer said: “Commencing automated assembly. Estimated completion time is five hours.”

And then he takes his super car to go to a glamorous party.

Fuck you, Tony Stark. I bet you never had to restart a service manually.

How many times have you said “Jarvis, spin up 1,000 new desktop VMs, replicate our existing environment to the standby datacenter, and resolve the last three trouble tickets” and then went off in an Audi R8 to a glamorous party? I’m guessing none.

So, are we stuck doing everything manually, by hand, like chumps? No, but I don’t believe the solution will be top-down. It will be bottom-up.

The real benefit of automation that we’re seeing today is from automating the simple tasks, not by orchestrating some amazing AI with a self-healing, self-replicating SkyNet-type singularity. It’s from automating the little mundane tasks here and there. The time savings are enormous, and while it isn’t as glamorous as having a self-healing sky-net style data center, it does give us a lot more time to do actual glamorous things.

Since I teach Cisco’s UCS blade systems, I’ll use them as an example (sorry, HP). In UCS, there is the concept of service profiles, which are an abstraction of aspects of a server that are usually tied to a physical server, and found in disparate places. Boot order (BIOS), connectivity (SAN and LAN switches), BIOS and HBA firmware (typically flashed separately and manually), MAC and WWN addresses (burnt in), and more are all stored and configured via a single service profile, and that profile is then assigned to a blade. Cisco even made a demonstration video showing they could get a new chassis with a single blade up and online with ESXi in less than 30 minutes from sitting-in-the-box to install.

The Cisco UCS system isn’t particularly intelligent, doesn’t respond dynamically to increased load, but it automates a lot of tasks that we used to have to do manually. It “lubes” the process, as Chris Sacca used the term lube in a great talk he did at Le Web 2009. I’ll take that over some overly complicated pane-of-glass solution that essentially restarts processes when they stop any day.

Perhaps at some point we’ll get to the uber-smart self-healing data center, but right now everyone who has tried has come up really, really short. Instead, there have been tremendous benefits in automating the mundane tasks, the unsexy tasks.

The BEAST That Slayed SSL?

We’re screwed.

Well, maybe. As you’ve probably seen all over the Twitters and coverage from The Register, security researchers Juliano Rizzo and Thai Duong may have found a way to exploit a previously known (but thought to be far too impractical) flaw in SSL/TLS, one that would allow an observer to break encryption with relative ease. And they claim to have working code that would automate that process, turning it from the realm of the mathematical researcher or cryptographic developer with a PhD to the realm of script -kiddies.

Wait, what?

Are we talking skeleton key here?

They’re not presenting their paper and exploit until on September 23rd, 2011 (tomorrow), so we don’t know the full scale of the issue. So it’s not time to panic. Just yet, at least. When it’s time to panic, I have prepared a Homeland Security-inspired color-coded scale.

What, too subtle? 

There’s some speculation about the attack, but so far here is the best description I’ve found of a likely scenario. Essentially, you become a man in the middle (pretty easy if you’re a rogue government, have a three letter acronym in your agency’s name (NSA, FSB, CIA, etc.), or if you sit at a coffee shop with a rogue access point). You could then start guessing (and testing) a session  cookie, one character at time, until you had the full cookie.

Wait, plugging away, one character at a time, until you get the code? Where have I seen this before?

War Games: Prophetic Movie

That’s right, War Games. The WOPR (or Joshua) breaks the nuclear launch codes one character at a time. Let’s hope the Department of Defense doesn’t have a web interface for launching the nukes. (I can see it now, as the President gets ready to hit the button, he’s asked to fill out a short survey.)

If someone gets your session cookie, they can imitate you. They can read your emails, send email as you, transfer money, etc., all without your password. This is all part of the HTTP standard.

Oh, you don’t know HTTP? You should.

With the HTTP protocol, we have the data unit: The HTTP message.  And there are two types of HTTP messages, a request and a response. In HTTP, every object on a page requires a request. Every GIF image, CSS style sheet, and every hilariously caption JPEG requires its own request. In HTTP, there is no way to make one request and ask for multiple objects. And don’t forget the HTML page itself, that’s also a separate request and response. One object, one request, one response.

HTTP by itself has no way to tie these requests together. The web server doesn’t know that the JPEG and the GIF that it’s been asked for are even part of the same page, let alone the same user. That’s where the session ID cookie comes in. The session cookie ID is a specific kind of HTTP cookie, and it’s the glue that joins all the request you make together which allows the web site to build a unique relationship with you.

HTTP by itself has no way to differentiate the IMG2.JPG request from one user versus another

When you log into a website, you’re asked for a username and password (and perhaps a token ID or client certificate if you’re doing two-factor). Once you supply it, you don’t supply it for every object you request. Instead, you’re granted a session cookie. They often have familiar names, like JSESSIONID, ASPSESSIONID, or PHPSESSIONID. But the cookie name could be anything. They’ll typically have some random value, a long string of alphanumeric characters.

With HTTP session ID cookies, the web server knows who’s requests came from who, and can deliver customized responses

Every web site you log into, whether it’s Twitter (auth_token), Facebook (there’s a lot), Gmail (also a lot), your browser is assigned a session ID cookie (or depending on how they do authentication, several session ID cookies). Session ID cookies are the glue that holds it all together.

These values don’t last forever. You know how GMail requires you re-login every 2 weeks? You’re generated new session ID cookies at that point. That keeps an old session ID from working past a certain point.

If someone gets this cookie value, they can pretend to be you. If you work at a company that uses key cards for access into various areas, what happens if someone steals your keycard? They can walk into all the areas you can. Same with getting the session ID cookie.

This attack, the one outlined, is based on getting this cookie. If they get it, they can pretend to be you. For any site they can grab the cookie for.

“Comrade General, we haf cracked ze cookie of the caplitalists! Permishun to ‘Buy Eet Now’?”

“Permission granted, comrade. Soon, ze secrets of ze easy bake ofen vill be ours!”

As the caption above illustrates, this is indeed serious.

Since this is fixed in TLS 1.1 or 1.2, which is used exactly nowhere, it’ll be interesting to see how much of our infrastructure we need to replace to remediate this potential hack. It could be as simple as a software update, or we could have to replace every load balancer on the planet (since they do SSL acceleration). That is a frightening prospect, indeed.

Fibre Channel and Ethernet: The Odd Couple

Fibre Channel? Meet Ethernet. Ethernet? Meet Fibre Channel. Hilarity ensues.

The entire thesis of this blog is that the traditional data center silos are collapsing. We are witnessing the rapid convergence of networking, storage, virtualization, server administration, security, and who knows what else. It’s becoming more and more difficult to be “just a networking/server/storage/etc person”.

One of the byproducts of this is the often hilarious fallout from conflicting interests, philosophies, and mentalities. And perhaps the greatest friction comes from the conflict of storage and network administrators. They are the odd couple of the data center.

Storage and Networking: The Odd Couple

Ethernet is the messy roomate. Ethernet just throws its shit all over the place, dirty clothes never end up in the hamper, and I think you can figure out Ethernet’s policy on dish washing.  It’s disorganized and loses stuff all the time. Overflow a receive buffer? No problem. Hey, Ethernet, why’d you drop that frame? Oh, I dunno, because WRED, that’s why.

WRED is the Yosamite Sam of Networking

But Ethernet is also really flexible, and compared to Fibre Channel (and virtually all other networking technologies) inexpensive. Ethernet can be messy, because it either relies on higher protocols to handle dropped frames (TCP) or it just doesn’t care (UDP).

Fibre Channel, on the other hand, is the anal-retentive network: A place for everything, and everything in its place. Fibre Channel never loses anything, and keeps track of it all.

There now, we’re just going to put this frame right here in this reserved buffer space.

The overall philosophies are vastly different between the two. Ethernet (and TCP/IP on top of it) is meant to be flexible, mostly reliable, and lossy. You’ll probably get the Layer 2 frames and Layer 3 packets from one destination to another, but there’s no gurantee. Fibre Channel is meant to be inflexible (compared with Ethernet), absolutely reliable, and loss-less.

Fibre channel and Ethernet have a very different set of philosophies in terms of building out a network. For instance, in Ethernet networks, we cross-connect the hell out of everything. Network administrators haven’t met two switches they didn’t want to cross connect.


Did I miss a way to cross-connect? Because I totally have more cables

It’s just one big cloud to Ethernet administrators. For Fibre Channel administrators, one “SAN” is abomination. There are always two, air gap separated, completely separate fabrics.

The greatest SAN diagram ever created

The Fibre Channel host at the bottom is connected into two separate, Gandalf-separated, non-overlapping Fibre Channel fabrics. This allows the host two independent paths to get to the same storage array for full redundancy. You’ll note that the Fibre Channel switches on both sides have two links from switch to switch in the same fabric. Guess what? They’re both active. Multi-pathing in Fibre Channel is allowed through use of the FSPF protocol (Fabric Shortest Path First). Fibre Channel switch to Fibre Channel switch is, what we would consider in the Ethernet world, layer 3 routed. It’s enough to give one multi-path envy.

One of the common ways (although by no means the only way) that an Ethernet frame could meet an unfortunate demise is through tail drop or WRED of a receive buffer. As a buffer in Ethernet gets full, WRED or a similar technology will typically start to randomly drop frames. As the buffer gets closer to full, the faster the frames are randomly dropped. WRED prevents tail drop, which is bad for TCP, but dropping frames when the buffer gets closer to full.

Essentially, an Ethernet buffer is a bit like Thunderdome: Many frames enter, not all frames leave. With Ethernet, if you tried to do full line rate of two 10 Gbit links through a single 10 Gbit choke point, half the frames would be dropped.

To a Fibre Channel adminsitrator, this is barbaric. Fibre Channel is much more civilized with the use of Buffer-to-Buffer (B2B) credits. Before a Fibre Channel frame is sent from one port to another, the sending port reserves space on the receiving port’s buffer. A Fibre Channel frame won’t get sent  unless there’s guaranteed space at the receiving end. This insures that no matter how much you over subscribe a port, no frames will get lost. Also, when a Fibre Channel frame meets another Fibre Channel frame in a buffer, it asks for the Grey Poupon.

With Fibre Channel, if you tried to push two 8 Gbit links through a single 8 Gbit choke point, no frames would be lost, and each 8 Gbit port would end up throttled back to roughly 4 Gbit through the use of B2B credits.

Why is Fibre Channel so anal retentive? Because SCSI, that’s why. SCSI is the protocol that most enterprise servers use to communicate with storage. (I mean, there’s also SATA, but SCSI makes fun of SATA behind SATA’s back.) Fibre Channel runs the Fibre Channel Protocol, which encapsulates SCSI commands onto Fibre Channel fames (as odd as it sounds, Fibre Channel and Fibre Channel Protocol are two distinct technologies).  Fibre Channel is essentially SCSI over Fibre Channel.

SCSI doesn’t take kindly to dropped commands. It’s a bit of a misconception that SCSI can’t tolerate a lost command. It can, it just takes a long time to recover (relatively speaking). I’ve seen plenty of SCSI errors, and they’ll slow a system down to a crawl. So it’s best not to lose any SCSI commands.

The Converged Clusterfu… Network

We used to have separate storage and networking environments. Now we’re seeing an explosion of convergence: Putting data and storage onto the same (Ethernet) wire.

Ethernet is the obvious choice, because it’s the most popular networking technology. Port per port, Ethernet is the most inexpensive, most flexible, most widely deployed networking technology around. It has slated the FIDDI dragon, the token ring revolution, and now it has its sights on the Fibre Channel Jabberwocky.

The current two competing technologies for this convergence are iSCSI and FCoE. SCSI doesn’t tolerate failure to deliver the SCSI command very well, so both iSCSI and FCoE have ways to guarantee delivery. With iSCSI, delivery is guaranteed because iSCSI runs on TCP, the reliable Layer 4 protocol. If a lower level frame or packet carrying a TCP segment gets lost, no big deal. TCP using sequence numbers, which are like FedEx tracking numbers, and can re-send a lost segment. So go ahead, WRED, do your worst.

FCoE provides losslessness through priority flow control, which is similar to B2B credits in Fibre Channel. Instead of reserving space on the receiving buffer, PFC keeps track of how full a particular buffer is, the one that’s dedicated to FCoE traffic. If that FCoE buffer gets close to full, the receiving Ethernet port sends a PAUSE MAC control frame to the sending port, and the sending port stops. This is done on a port-per-port basis, so end-to-end FCoE traffic is guaranteed to drive without dropping frames. For this to work though, the Ethernet switches need to speak PFC, and that isn’t part of the regular Ethernet standard, and is instead part of the DCB (Data Center Bridging)  set of standards.

Hilarity Ensues

Like the shields of the Enterprise, converged networking is in a state of flux. Network administrators and storage administrators are not very happy with the result. Network administrators don’t want storage traffic (and their silly demands for losslessness) on their data networks. St0rage administrators are appalled by Ethernet and it’s devil-may-care attitude towards frames. They’re also not terribly fond of iSCSI, and only grudgingly accepting of FCoE. But convergence is happening, whether they like it or not.

Personally, I’m not invested in any particular technology. I’m a bit more pro-iSCSI than pro-FCoE, but I’m warming to the later (and certainly curious about it).

But given some dyed-in-the-wool network administrators and server administrators are, the biggest problems in convergence won’t be the technology, but instead will be the Layer 8 issues generated. My take is that it’s time to think like a data center administrator, and not a storage or network administrator. However, that will take time. Until then, hilarity ensues.

How To Talk To Detractors In Technology

I read with great morbid curiosity the open letter to RIM from a RIM employee. In the letter, the anonymous author mentions the video below of Steve Jobs from a WWDC developers conference in 1997. It’s a fascinating video, and as the author says really does speak to RIM today as it did Apple and developers in 1997.

In this video, I watched perhaps one of the best ways for someone to deal with what I would call “The Righteous Pedant”.  In IT, we’ve all probably dealt with this personality type, especially those of us that do public speaking or teaching.

At this point in 1997, and Steve Jobs had not yet taken over Apple, but he had come back into the fold after Apple had bought NeXT, one of the two companies that Jobs started after he left Apple (the other being Pixar). He did something that you don’t see a lot, in that he sat down and took frank and sometimes confrontational questions from the audience at WWDC. The entire video is a great example of a how to talk to detractors, and how to explain a controversial strategy. He’s thoughtful, respectful, and rather than counter-attack he made his case and acknowledged his fallibility.

Apple made a lot of tough decisions in those years, including killing a platform called OpenDoc. OpenDoc had the same of the same goals as Java, but Steve was involved in (his own words) putting a bullet in the head of OpenDoc. A very controversial decision at the time.

And so, during this WWDC talk, at 50:22 in the video, a very angry developer takes the mic and asks Steve this question. His tone was dripping with contempt, anger, and righteousness.

Mr Jobs, you’re a bright and influential man… it’s sad and clear that on several counts you’ve discussed that you don’t know what you’re talking about. I would like, for example, for you to express in clear terms how say Java in any of its incarnations addresses the ideas embodied in OpenDoc. And when you’re finished with that, perhaps you could tell us what you’ve been personally been doing for the past seven years.”

His answer was a brilliant, frank, and honest way to answer the question. It wasn’t the executive dodge, it made his case, and he took ownership of the decision. That’s not something you see a lot of, and it’s a great example of how to not only talk to a detractor, but do it in an honest and effective way.