The Data Center Overlords

Run a Cisco ACE? Then Do This Command Right Now!

July 2, 2011 5 Comments

It may already be too late! OK, it’s not too late, but there’s a common scenario I run into with Cisco ACE load balancers. Around 25% of the ACE load balancers (4710 appliance and Service Module) have this condition called STANDBY_COLD.

The horror of STANDBY_COLD

So here’s a command you should run when logged into the Admin context of your redundant ACE deployment:

show ft group detail

You’re looking for “Peer State” to stay STANDBY_HOT. STANDBY_HOT is good, and you don’t need to do anything else. However, it’s very common to see something else:

FT Group                     : 1
Configured Status            : in-service
Maintenance mode             : MAINT_MODE_OFF
My State                     : FSM_FT_STATE_ACTIVE
Peer State                   : FSM_FT_STATE_STANDBY_COLD
Peer Id                      : 1
No. of Contexts              : 1

STANDBY_COLD is a peer state where the standby ACE context is not receiving automatic configuration syncs from the active ACE. If you had a failover right now with the status of STANDBY_COLD, you would be running on an older version of the configuration, potentially months old.

How Did We Get Here?

When you make a configuration change on the primary ACE, it DOES get automatically copied automatically to the standby ACE.

When you upload a certificate and key to the primary ACE, it DOES NOT get automatically copied to the standby.

The problem is typically that the configuration on the standby ACE references a key and certificate file that don’t exist on the standby, only the active. The standby ACE looks for the files, can’t find them, then stops accepting configuration updates.

How Do We Fix It?

The fix is to upload manually all of the certificates and keys to the standby ACE that were referenced in the configuration. You can import them into the ACE with the crypto import command through either terminal (cut and paste in the SSH/Telnet window), SFTP, TFTP, or FTP.

Then, reboot the standby. To fix STANDBY_COLD you need to reboot. It will do a fresh configuration sync (it might take a few minutes), but then it should be in STANDBY_HOT again. You’ll need to do this on a context by context basis, as you can have soms contexts in STANDBY_HOT and others in STANDBY_COLD. If it doesn’t fix it, make sure that you’ve got the file names matched exactly.

How Do We Avoid It In The Future?

Keep in mind that when you add SSL certificates and keys, you must add them manually to both the active and standby ACE contexts. So far, no version of the ACE code (that I’m aware of) does certificate and key automatic sync. And make sure to add the files before you put them in the configuration file.

Filed under Load Balancing

It’s All About The IOPS, Baby

June 28, 2011 Leave a comment

“It’s all about the benjamins, baby” – Puff Daddy

One of the new terms network administrators are starting to wrap our brains around is the term IOPS, or I/O Operations Per Second. Traditionally, IOPS have been of the utmost importance to data base administrators, but haven’t shown up on our radar as network admins.

So what is an IOP? It’s a transaction with the disk. How long does it take a storage client to send a command to a storage operation (write or read) and receive the information or confirmation the data has been written. Applications like databases, desktops, and virtual desktops are fairly sensitive to IOPS. You know your laptop is suffering from deficient IOPS when you hear the grinding noise of the laptop when you open up too many VMs or too many apps simultaneously.

Traditional hard drives (spinning rust as Ivan Pepelnjak likes to call them) have varying degrees of IOPS. Your average 7200 RPM SATA drive can perform roughly 80 IOPS. A really fast 15,000 RPM SAS drive can do around 180 IOPS (figures are from Wikipedia, and need citation). Of course, it all depends on you measure an IOP (reads, writes, how big is the read or write), but those are general figures it keep in mind. One IO operation make take place on one part of a disk, and the next IO operating may take place on a completely different.

That may sound like not a lot, and you’re right, it isn’t. For databases and virtual desktops, we tend to need lots of IOPS. IOPS scale almost linearly by adding more disks. Mo spindles, mo IOPS. That’s one of the primary purposes of SANs, to provide access to lots and lots of disks. Some servers will let you cram up to 16 drives in them, although most have space for 4 or 5. A storage array on a SAN can provide access to a LUN comprised of hundreds or even thousands of disks.

Single hard drives are pretty good at sequential reading and writing, which is different than IOPs. If you have a DVR at home, it depends mostly on sequential reading and writing. A 7200 RPM can do about 50-60 Megabytes/second of sequential reads, less in writes. And that’s the absolute best case scenario: A single file located in a contiguous space on the platter (so the spindle doesn’t have to bounce around the platter getting bits of data here and there). If the file is discontiguous, or lots of smaller files all located in various areas of the platter, the numbers will go down significantly.

What does that mean to network admins? A single 7200 RPM SATA drive can push about 400-500 Megabits per second on your network.

Bandwidth

In networking, we tend to think in terms of bandwidth. The more the better. We also think of things like latency and congestion, but bandwidth is usually on our minds. In storage, it’s not so much bandwidth as IOPS.

Let’s say I tell you that I have two different storage arrays for you to choose from. One of the arrays is accessible via a 1 Gbit end-to-end FC connection. The other array is connected via accessible via an 8 Gbit end-to-end FC connection. You know nothing of how the arrays are configured, or how many drives are in them. They both present to you a drive (LUN) with a 2 Terabyte capacity.

Which one do you chose?

The networkers in us would tend to go for the 8Gbit FC connected array. After all, 8 Gbit FC is 8 times faster than 1 Gbit FC. There’s no arguing that.

But is it the right choice? Maybe not.

Now you learn a little more about the array. The LUN in the 1 Gbit storage array is comprised of 20 drives in a RAID 10 setup with 100 IOPS per drive. That’s about 1,000 total IOPS.

The 8 Gbit FC connected array is comprised of (2) two terabyte drives at 100 IOPS per drive in a RAID 1 array (two drives, mirrored). That’s 100 IOPs.

So while the 8 Gbit array has 8 times the bandwidth, the 1 Gbit array has 10 times the IOPS. Also, the (2) two terabytes drives aren’t going to be able to push 800 Megabytes per second that 8 Gbit FC would provide. They’d be lucky to push 80 Megabytes.

If you’re using those LUNs for databases or virtualization (especially desktop virtualization), you’re going to want IOPS.

Operating systems love IOPS, especially desktop operating systems. Chances are the slowest part of your computer is your spinning rust. When your computer slows to a crawl, you’ll typically hear the hard drive grinding away.

Solid State Drives

SSDs have changed the game in terms of performance. While your highest performing enterprise spinning disk can do about 180 IOPS, the SSD you get from Best Buy can do over 5,000. (By the way, I highly recommend your next laptop have an SSD drive).

There are hybrid solutions that use combinations of SATA, SSDs, and battery-backed up RAM to provide a good mix of IOPS and economy. There are a lot of other interesting solutions too to provide access to lots of IOPS.

Filed under Storage

CCIE Data Center?

June 24, 2011 1 Comment

The CCIE certification from Cisco is widely considered to be one of the best, toughest certifications to get.

Generally obtaining this certification requires months, if not years of preparation, abanonding free time (and in some cases, hope). You hear of CCIE widows/widowers. It has a high failure rate the first attempt, and some (really smart people too) take several attempts.

I haven’t seriously considered getting a CCIE, despite working a lot in the Cisco realm (I’m a Cisco Certified Systems Instructor). And it’s not because of the insane prep and soul crushing defeats. I mean, something difficult and insane? Sign me up. (I enjoy insane goals, like running marathons and training to be an aerobatic pilot.)

The problem is relevance. Right now there are six different CCIE tracks: CCIE Route & Switch, CCIE Storage, CCIE Service Provider, CCIE Security, CCIE Wireless, and CCIE Voice. The vast majority are CCIE R&S. CCIE Wireless has less than 50 at last count.

Not one of them would dramatically increase my skills in areas that I typically work in. I deal with switching, a bit of spanning-tree, virtualization, and storage (some FC, more FCoE, and iSCSI). Things I never deal with, ever: ATM, voice, metro Ethernet, routing protocols (although IS-IS maybe a new skill I need to pick up).

This will require intense study. Right after I check Twitter.

For a year or so now however, there’s been a rumor that a CCIE Data Center is coming. It would likely involve MDS/storage, FCoE, Nexus switching, UCS, even some load balancing and WAAS.

So I’m hoping it gets released soon. I would be all over that shit.

Filed under Always Be Learning, Certifications

“Do Said Skills Pay The Bills?”

June 24, 2011 1 Comment

“Do said skills pay the bills?” -Professor Hubert Farnsworth

That data center landscape is changing rapidly. If you’re a network admin, you’re dealing with server stuff you never thought you’d have to put up with. If you’re a server admin, there’s all this networking stuff that you can’t ignore anymore. If you’re storage-oriented, Fibre Channel is about to jump onto an Ethernet network near you, or you’ll find your storage connecting via iSCSI.

We all need additional skills.

So where do we start? I have a few suggestions.

Networking Admins

If you’re a networking admin, I’d start looking at virtualization as soon as possible. If you can get your employer to pay for it, I recommend getting the VCP certification from VMware (which requires taking a VMware class). While there are other virtualization technologies out there, VMware has about 90% of the server virtualization market and it’s a good foundation for virtualization technology in general. The VMware training is generally excellent, and the VCP (currently VCP4) is a good certification to have in the industry.

Also, look into setting up your own home lab running the free version of ESXi, or some other virtualization technology such as Xen or Hyper-V, both which can be obtained free (I think that’s the case for Hyper-V).

Also, Linux. Learn Linux. From Juniper to Cisco’s Nexus to Arista to most of the routers and switches coming out are based on Linux (or one of the BSDs, but the skills are very transferable). Plus, most of the virtualization technologies are based on Linux. So yeah, Linux.

Server Admins

If you’re a server admin, you really, really need to learn some networking. Specifically, Ethernet switching. You’ll also need to learn IPv4 and IPv6, TCP behavior (like sliding windows) and the HTTP protocol. Fortunately routing protocols isn’t something you’ll typically need to deal with, even today.

Cisco’s CCNA exam is a good start. It’s tough for a server admin (I failed my first attempt so bad I thought John Chambers was going to get a phone call), but it fills in a lot of blanks about networking.

You can play with routers using GNS3, a graphical front end for a Cisco IOS router emulator called DynaMIPs/Dynagen (you need to provide your own copy of an IOS router image).

A cohort at Firefly Communications Chris Welsh has put together an Ubuntu-based virtual machine that has GNS3 already pre-installed and ready to go and can be found on his site rednetctar.net.

Always Be Learning

Podcasts and webinars are a great way to brush up and expand skills. A couple of months ago I found the fantastic Packet Pushers Podcast, and I’ve listened to just about every episode (and some, like the episode on Shortest Path Bridging, a multi-path Layer 2 protocol to replace Spanning Tree Protocol, several times).

Ivan Pepelnjak at ioshints.info has a great blog, very technical, and also a series of webinars you can purchase (I bought the year subscription for $199, and it’s already paid for itself in brain filling goodness).

And I’m amazed how fantastic Twitter is for keeping up with technical stuff. Starting off with myself and @etherealmind and moving from there is a great way to branch out.

If you’re comfortable in the traditional silo’d environment, prepare to be uncomfortable very soon. There’s no turning back. Time to get more skillz.

I’d love to hear any other tips, resources, sites, etc., that you would suggest for the overlord conversions.

Filed under Always Be Learning

“But It’s Got Electrolytes, It’s What Plants Crave…”

June 22, 2011 Leave a comment

“Fibre Channel has what VMware craves.”

In the movie Idiocracy, among other hilarious hijinks, humanity is stuck parroting things that sound authoritative, but really don’t understand anything beyond the superficial statement.

Sound familiar? Like maybe tech?

Take for example that VMware datastores are best on Fibre Channel SAN arrays. Now, I’ve caught myself parroting this before. In fact, here is the often parroted pecking order for performance:

Fibre Channel/FCoE (Best)
iSCSI (Wannabe)
NFS (Newb)

NFS has long been considered the slowest of the three options for accessing data stores in VMware, with some VMware administrators deriding it mercilessly. iSCSI has also been considered behind Fibre Channel in terms of performance.

But how true is that actually? What proof are those assertions based on? Likely it was when Fibre Channel was rocking 4 Gbits while most Ethernet networks were a single Gigabit. But with the advent of 10 Gbit Ethernet/FCoE and 8 Gbit FC (and even 16 Gbit FC), how true is any of this anymore?

It seems though that the conventional wisdom may be wrong. Netapp and VMware got together and did some tests to see what the performance difference was for the various ways to get access to a data store (FC, FCoE, iSCSI, NFS).

This mirrors some other performance tests by VMware, comparing 1 Gbit NFS and iSCSI to 4 Gbit FC. 4 Gbit FC was faster, but more interesting was that iSCSI and NFS were very close to each other in terms of performance. Here’s part of the conclusion from VMware 10 Gbit smackdown (FC, FC0E, iSCSI, NFS):

All four storage protocols for shared storage on ESX are shown to be capable of achieving throughput levels that are only limited by the capabilities of the storage array and the connection between it and the ESX server…

Another assumption is that jumbo frames (Ethernet frames above 1,500 bytes, typically around 9,000 bytes) improves iSCSI performance. But here’s a performance test that challenges that assumption. So it didn’t seem to matter much.

In fact, it shows that if you’re having an argument about which transports, it’s typically the wrong argument to have. The decision on your storage array is far more important.

Another surprise in the recent batch of tests is that iSCSI hardware initiators didn’t seem to add a whole lot of benefit, and neither did jumbo frames. Both of which were the conventional wisdom (that I myself have parroted before) in terms of iSCSI performance.

I remember the same phenomenon with Sun’s Solaris years ago, when it was nick-named “Slowlaris“. Initially, when Solaris was released in the 90’s, it was replacing BSD-based SunOS. Like a lot of newer operating systems, it required beefier hardware, and as such the new Solaris tended to run slower than SunOS on the same hardware, hence the derisive name Slowlaris. This ended in the late 90’s, but the name (and perception) still stuck in the naughts (00’s), despite numerous benchmarks showing Solaris going toe-to-toe with Linux.

In technology, we’re always going to get stuff wrong. Things change too often to be 100% accurate at all times, but we should be open to being wrong when new data is presented, and we should also occasionally test our assumptions.

Filed under VMware

Data Center Overlords

June 13, 2011 2 Comments

For the past 15 years in data center work, you have either been a network admin, a server admin, storage admin, developer, etc. We were pretty silo’d, and the walls between the various career tracks were well defined and solid.

Today, the lines are blurring. We have switches inside of servers (i.e., VMware’s vswitch), servers inside of switches (Cisco’s UCS, HP’s Virtual Connect), storage sitting on a network (Fibre Channel), and developers programming switches (devops). Dogs and cats living together (mass hysteria).

The old silo’d approach is collapsing, and it’s becoming more and more difficult to be just one of anything. That’s what this blog is about: the new role, combining knowledge of server administration, virtualization, storage networking, networking, and more. It’s not even a specialization, it’s a requirement to keep everything from flying apart. This blog is about the issues we all face, as well as what it takes to keep up with the explosion of emerging technologies (virtualization, VDI, FCoE, TRILL, VEPA/VN-LINK, and about 400 new 802.1xxx standards).

We’ve had cracks in the silos for years. The first technology that I ran across that really started to blur the lines was the load balancer. They straddled the world between servers and networks, as well as application development. Load balancers started out as simple Layer 4 devices, but have become a who industry of their own, with devops, protocol awareness, control languages, and more.

But virtualization has brought it all crashing down. And that’s why we’re here.

We are The Data Center Overlords.

Filed under MOTD

Newer posts →