Creating Your Own SSL Certificate Authority (and Dumping Self Signed Certs)

March 1, 2012 75 Comments

October 3rd, 2025: Huge update! Several things have changed with TLS/SSL and this needs a revamp! Also I fixed the “-“, which was messed up.

I would write a new article, but this comes up as one of the top search results when searching for how to make your own root CA, so I’m going to make the edits here. In the intereste of transparency, I’ll keep the original instructions below, but crossed out.

This article is now just about making your own root certificate authority (ICA, Internal Certificate Authority). There are a few options like Let’s Encrypt and FreeSSL if you want to use a PCA (Public Certificate Authority) that may be better suited for that purpose.

The goal here is to make your own ICA to sign certificates to put on your own devices for things like HTTP-based management (WebUIs, plus any HTTP-based APIs such as gNMI, etc.).

One of the goals of this update was to set it up so that not only FQDNs and hostnames would work, but also devices that don’t have a DNS entry and you just go in via the IP address.

Creating the Root Certificate

Creating root certificates is ridiculously easy as they’re just self-signed certs. That’s it. The only thing that separates the root certificates we create here and the companies like Digicert, Entrust, etc. have is that they have their root certificates installed on billions of devices (through operating system certificate stores that you find on laptops, desktops, and the billions of smart phones out in the world).

ICAs only need to have the root certificate installed on the organization’s devices, which for home labs and many organizations is relatively easy to do.

Creating a root certificate has just three steps:

Create a root private key
Create a root certificate by self-signing a CSR (certificate signing request) with the private key
Distribute the root certificate

Creating the Root Private Key

Creating the key is just a single command.

openssl ecparam -name -prime256 -genkey -out rootCA.key

This creates a key using the NIST P-256 key (alternatively you can use the older but more traditional RSA with openssl genrsa -out rootCA.key 4096).

Self-Sign CSR with Private Key

The next step is to create a self-signed certificate.

openssl req -x509 -new -nodes -key rootCA.key -sha512 -days 3650 -out rootCA.cer \
-subj "/CN=MyRootCA" -set_serial 1

Now you’ve got a certificate and a private key. That private key should be very private of course, as it can be used to sign valid certificates for anyone that trusts the root certificate.

Place Certificate in Certificate Stores

Take the rootCA.cer and place it in your various certificate stores. Most of the time your operating system will have a certificate store, though some applications (like Firefox) maintain their own certificate store. Any certificate signed by this cert and key will then be trusted.

Server Certs

Create Server Key (once per server)

Next, create a server’s private key:

openssl ecparam –name prime256v1 -genkey -out server1.key

Create Certificate Signing Request (CSR)

Then create a CSR (certificate signing request):

 openssl req -new -key leaf1.key -subj "/CN=server1" -out server1.csr \
-addext "subjectAltName=IP:192.168.1.101,DNS:server1,DNS:server1.domain.com"

One of the things that has changed with certificates is how the CN field is treated. If you have a SAN (Subject Alternative Name) the CN is usually ignored. If you want your certificate to be valid for an IP or DNS name, you’ll want to put them in the subjectAltName field. You can put multiple domain names and multiple IP addresses as key:value pairs separated by commas. Whatever might end up the browser’s address bar needs to be in here. In this case, once the certificate is signed, “server1”, “server1.domain.com”, and the IP address of “192.168.1.101” would both show valid as long as the client trusted the root certificate.

Sign the CSR with the Root Certificate and Key

Now sign the CSR with the root key and root certificate:

openssl x509 -req -in server1.csr -CA rootCA.cer -CAkey rootCA.key -CAcreateserial -set_serial 1 -out server1.cer -days 365 -sha512 -copy_extensions copyall

You can delete the CSR, and now you have a private server key and server certificate. The private key and cert will need to be set in the server’s HTTP server, and of course the private key should be kept very private.

Implementation

You may have to restart your browser to see the warning go away. This confounded me a few times. Simply refreshing doesn’t do it.

Jan 11th, 2016: New Year! Also, there was a comment below about adding -sha256 to the signing (both self-signed and CSR signing) since browsers are starting to reject SHA1. Added (I ran through a test, it worked out for me at least).

~~November 18th, 2015: Oops! A few have mentioned additional errors that I missed. Fixed.~~

~~July 11th, 2015: There were a few bugs in this article that went unfixed for a while. They’ve been fixed.~~

~~SSL (or TLS if you want to be super totally correct) gives us many things (despite many of the recent shortcomings).~~

~~Privacy (stop looking at my password)~~
~~Integrity (data has not been altered in flight)~~
~~Trust (you are who you say you are)~~

All three of those are needed when you’re buying stuff from say, Amazon (damn you, Amazon Prime!). But we also use SSL for web user interfaces and other GUIs when administering devices in our control. When a website gets an SSL certificate, they typically purchase one from a major certificate authority such as DigiCert, Symantec (they bought Verisign’s registrar business), or if you like the murder of elephants and freedom, GoDaddy. They range from around $12 USD a year to several hundred, depending on the company and level of trust. The benefit that these certificate authorities provide is a chain of trust. Your browser trusts them, they trust a website, therefore your browser trusts the website (check my article on SSL trust, which contains the best SSL diagram ever conceived).

Your devices, on the other hand, the ones you configure and only your organization accesses, don’t need that trust chain built upon the public infrastrucuture. For one, it could get really expensive buying an SSL certificate for each device you control. And secondly, you set the devices up, so you don’t really need that level of trust. So web user interfaces (and other SSL-based interfaces) are almost always protected with self-signed certificates. They’re easy to create, and they’re free. They also provide you with the privacy that comes with encryption, although they don’t do anything about trust. Which is why when you connect to a device with a self-signed certificate, you get one of these: So you have the choice, buy an overpriced SSL certificate from a CA (certificate authority), or get those errors. Well, there’s a third option, one where you can create a private certificate authority, and setting it up is absolutely free.

OpenSSL

OpenSSL is a free utility that comes with most installations of MacOS X, Linux, the *BSDs, and Unixes. You can also download a binary copy to run on your Windows installation. And OpenSSL is all you need to create your own private certificate authority. The process for creating your own certificate authority is pretty straight forward:

~~Create a private key~~
~~Self-sign~~
~~Install root CA on your various workstations~~

~~Once you do that, every device that you manage via HTTPS just needs to have its own certificate created with the following steps:~~

~~Create CSR for device~~
~~Sign CSR with root CA key~~

~~You can have your own private CA setup in less than an hour. And here’s how to do it.~~

Create the Root Certificate (Done Once)

Creating the root certificate is easy and can be done quickly. Once you do these steps, you’ll end up with a root SSL certificate that you’ll install on all of your desktops, and a private key you’ll use to sign the certificates that get installed on your various devices.

Create the Root Key

~~The first step is to create the private root key which only takes one step. In the example below, I’m creating a 2048 bit key:~~

openssl genrsa -out rootCA.key 2048

The standard key sizes today are 1024, 2048, and to a much lesser extent, 4096. I go with 2048, which is what most people use now. 4096 is usually overkill (and 4096 key length is 5 times more computationally intensive than 2048), and people are transitioning away from 1024. Important note: Keep this private key very private. This is the basis of all trust for your certificates, and if someone gets a hold of it, they can generate certificates that your browser will accept. You can also create a key that is password protected by adding -des3:

openssl genrsa -des3 -out rootCA.key 2048

You’ll be prompted to give a password, and from then on you’ll be challenged password every time you use the key. Of course, if you forget the password, you’ll have to do all of this all over again.

The next step is to self-sign this certificate.

openssl req -x509 -new -nodes -key rootCA.key -sha256 -days 1024 -out rootCA.pem

~~This will start an interactive script which will ask you for various bits of information. Fill it out as you see fit.~~

You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:US
State or Province Name (full name) [Some-State]:Oregon
Locality Name (eg, city) []:Portland
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Overlords
Organizational Unit Name (eg, section) []:IT
Common Name (eg, YOUR name) []:Data Center Overlords
Email Address []:none@none.com

Once done, this will create an SSL certificate called rootCA.pem, signed by itself, valid for 1024 days, and it will act as our root certificate. The interesting thing about traditional certificate authorities is that root certificate is also self-signed. But before you can start your own certificate authority, remember the trick is getting those certs in every browser in the entire world.

Install Root Certificate Into Workstations

For you laptops/desktops/workstations, you’ll need to install the root certificate into your trusted certificate repositories. This can get a little tricky. Some browsers use the default operating system repository. For instance, in Windows both IE and Chrome use the default certificate management. Go to IE, Internet Options, go to the Content tab, then hit the Certificates button. In Chrome going to Options and Under The Hood, and Manage certificates. They both take you to the same place, the Windows certificate repository. You’ll want to install the root CA certificate (not the key) under the Trusted Root Certificate Authorities tab. However, in Windows Firefox has its own certificate repository, so if you use IE or Chrome as well as Firefox, you’ll have to install the root certificate into both the Windows repository and the Firefox repository. In a Mac, Safari, Firefox, and Chrome all use the Mac OS X certificate management system, so you just have to install it once on a Mac. With Linux, I believe it’s on a browser-per-browser basis.

Create A Certificate (Done Once Per Device)

Every device that you wish to install a trusted certificate will need to go through this process. First, just like with the root CA step, you’ll need to create a private key (different from the root CA).

openssl genrsa -out device.key 2048

~~Once the key is created, you’ll generate the certificate signing request.~~

openssl req -new -key device.key -out device.csr

~~You’ll be asked various questions (Country, State/Province, etc.). Answer them how you see fit. The important question to answer though is common-name.~~

Common Name (eg, YOUR name) []: 10.0.0.1

Whatever you see in the address field in your browser when you go to your device must be what you put under common name, even if it’s an IP address. Yes, even an IP (IPv4 or IPv6) address works under common name. If it doesn’t match, even a properly signed certificate will not validate correctly and you’ll get the “cannot verify authenticity” error. Once that’s done, you’ll sign the CSR, which requires the CA root key.

openssl x509 -req -in device.csr -CA rootCA.pem -CAkey rootCA.key -CAcreateserial -out device.crt -days 500 -sha256

This creates a signed certificate called device.crt which is valid for 500 days (you can adjust the number of days of course, although it doesn’t make sense to have a certificate that lasts longer than the root certificate). The next step is to take the key and the certificate and install them in your device. Most network devices that are controlled via HTTPS have some mechanism for you to install. For example, I’m running F5’s LTM VE (virtual edition) as a VM on my ESXi 4 host. Log into F5’s web GUI (and should be the last time you’re greeted by the warning), and go to System, Device Certificates, and Device Certificate. In the drop down select Certificate and Key, and either past the contents of the key and certificate file, or you can upload them from your workstation.

~~After that, all you need to do is close your browser and hit the GUI site again. If you did it right, you’ll see no warning and a nice greenness in your address bar.~~

~~And speaking of VMware, you know that annoying message you always get when connecting to an ESXi host?~~

~~You can get rid of that by creating a key and certificate for your ESXi server and installing them as /etc/vmware/ssl/rui.crt and /etc/vmware/ssl/rui.key.~~

Filed under Load Balancing, Security, Virtualization, VMware

I, For One, Welcome Our New OpenFlow Overlords

November 7, 2011 4 Comments

When I first signed up for Networking Field Day 2 (The Electric Boogaloo), I really had no idea what OpenFlow was. I’d read a few articles, listened to a few podcasts, but still only had a vague idea of what it was. People I respect highly like Greg Ferro of Packet Pushers were into it, so it had my attention. But still, not much of a clue what it was. I attended the OpenFlow Symposium, which preceeded the activites of Networking Field Day 2, and had even less of an idea of what it was.

Then I saw NEC (really? NEC?) do a demonstration. And my mind was blown.

Side note: Let this be a lesson to all vendors. Everything works great in a PowerPoint presentation. It also conveys very little about what a product actually does. Live demonstrations are what get grumpy network admins (and we’re all grumpy) giddy like schoolgirls at Justin Bieber concert. You should have seen Ivan Pepelnjak.

I’m not sure if I got all my assumptions right about OpenFlow, so feel free to point out if I got something completely bone-headedly wrong. But from what I could gather, OpenFlow could potentially do a lot of things:

Replace traditional Layer 2 MAC learning and propagation mechanisms
Replace traditional Layer 3 protocols
Make policy-based routing (routing based on TCP/UDP port) something useful instead of a one-off, pain in the ass, ok just-this-one time creature it is now
Create “traceroute on steroids”

Switching (Layer 2)

Switching is, well, rather stupid. At least learning MAC addresses and their locations are. To forward frames, switches need to learn which ports to find the various MAC addresses. Right now the only way they learn about it is listening to the cacophony of hosts broadcasting and spewing frames. And when one switch learns a MAC address, it’s not like it tells the others. No, in switching, every switch is on its own for learning. In a single Layer 2 domain, every switch needs to learn where to find every MAC address on its own.

Probably the three biggest consequences of this method are as follows

No loop avoidance. The only way to prevent loops is to prevent redundant paths (i.e. spanning-tree protocol)
Every switch in a Layer 2 domain needs to know every frickin’ MAC address. The larger the Layer 2 domain, the more MAC addresses need to be learned. Suddenly, a CAM table size of 8,000 MAC addresses doesn’t seem quite enough.
Broadcasts like woah. What happens when a switch gets a frame that it doesn’t have a CAM entry for? BROADCAST IT OUT ALL PORTS BUT THE RECEIVING PORT. It’s the all-caps typing of the network world.

For a while in the early 2000’s we could get away with all this. Multi-layer switches (switches that did Layer 3 routing as well) got fast enough to route as fast as they could switch, so we could easily keep our Layer 2 domains small and just route everything.

That is, until VMware came and screwed it all up. Now we had to have Layer 2 domains much larger than we’d planned for. 4,000 entry CAM tables quickly became cramped.

MAC learning would be more centralized with OpenFlow. ARP would still be there at the edge, so a server would still think it was communicating with a regular switch network. But OpenFlow could determine which switches need to know what MAC addresses are where, so every switch doesn’t need to learn everything.

And no spanning-tree. Loop avoidance is prevented by the OpenFlow controller. No spanning-tree (although you can certainly do spanning-tree at the edge to communicate with legacy segments).

Routing (Layer 3)

Routing isn’t quite as stupid as switching. There are a number of good protocols out there that will scale pretty well, but it does require configuration on each device. It’s dynamic in that it can do multi-pathing (where traditional Layer 2 can’t), as well as recover from dead links without taking down the network for several (dozens of) seconds. but it doesn’t quite allow for centralized control, and it has limited dynamic ability. For instance, there’s not mechanism to do “oh, hey, for right now why don’t we just move all these packets from this source to that source” in an efficient way. Sure, you can inject some host routes to do that, but it’s got to come from some sort of centralized controller.

Flow Routing (Layer 4)

So why stop at Layer 3? Why not route based on TCP/UDP header information? It can be done with policy-based routing (PBR) today, but it’s not something that can be communicated from router to router (OSPF cares not how you want to direct a TCP port 80 flow versus a TCP port 443 flow). There is also WCCP, the Web Cache Communication Protocol, which today is not used for web caches, but WAN Optimization Controllers, like Cisco’s WAAS, or Cisco’s sworn enemy, Riverbed (seriously, just say the word ‘Riverbed’ at a Cisco office).

Sure it’s watery and tastes like piss, but at least it’s not policy-based routing

A switch with modern silicon can look at Layer 3 and Layer 4 headers as easily as they can look at Layer 2 headers. It’s all just bits in the flow, man. OpenFlow takes advantage of this, and creates, for lack of a cooler term, a Layer 2/3/4 overlord.

I, for one, welcome our new OpenFlow overlords

TCAMs or shared memory, or whatever you want to call the forwarding tables in your multi-layer switches can be programmed at will by an OpenFlow overlord, instead of being populated by the lame-ass Layer 2, Layer 3, and sometimes Layer 4 mechanisms on a switch-by-switch basis.

Since we can direct traffic based on flows throughout a multi-switch network, there’s lots of interesting things we can do with respect to load balancers, firewalls, IPS, caches, etc. Pretty interesting stuff.

Flow View (or Traceroute on Steroids)

I think one of the coolest demonstrations from NEC was when they showed the flow maps. They could punch up any source and destination address (IP or MAC) and there would be a graphical representation of the flow (and which devices they went through) on the screen. The benefits for that would be obvious. Server admin complain about slowness? Trace the flow, and check the interfaces on all the transit devices. That’s something that might take quite a while in a regular route/switch network, but can be done in a few seconds with an OpenFlow controller.

An OpenFlow Controller Tracks a Flow

To some extent, there are other technologies that can take care of some of these issues. For instance, TRILL and SPB take a good wack at the Layer 2 bullshit. Juniper’s QFabric does a lot of the ain’t-nothin-but-a-tuple thang and switches based on Layer2/3 information. But in terms of potential, I think OpenFlow has them all beat.

Don’t get too excited right now though, as NEC is the only vendor that has working implementation of OpenFlow controller, and other vendors are working on theirs. Standford apparently has OpenFlow up and running in their environment, but its all still in the early stages.

Will OpenFlow become the future? Possibly, quite possibly. But even if what we now call OpenFlow isn’t victorious, something like it will be. There’s no denying that this approach, or something similar, is a much better way to handle traffic engineering in the future than our current approach. I’ve only scratched the surface of what can be done with this type of network design. There’s also a lot that can be gained in terms of virtualization (an OpenFlow vSwitch?) as well as applications telling the network what to do. Cool stuff.

Note: As a delegate/blogger, my travel and accommodations were covered by Gestalt IT, who vendors paid to have spots during the Networking Field Day. Vendors pay Gestalt IT to present, so while my travel (hotel, airfare, meals) were covered indirectly by the vendors, no other remuneration (save for the occasional tchotchke) from any of the vendors, directly or indirectly, or by Gestalt IT was recieved. Vendors were not promised, nor did they ask for any of us to write about them, or write about them positively. In fact, we sometimes say their products are shit (when, to be honest, sometimes they are, although this one wasn’t). My time was unpaid.

Filed under Ethernet, NetworkingFieldDay, OpenFlow, Virtualization

The Problem

November 3, 2011 13 Comments

One recurring theme from virtually every one of the Network Field Day 2 vendor presentations last week (as well as the OpenFlow symposium) was affectionately referred to as “The Problem”.

It was a theme because, as vendor after vendor gave a presentation, they essentially said the same thing when describing the problem they were going to solve. For us the delegates/bloggers, it quickly went from the problem to “The Problem”. We’d heard it over and over again so often that during the (5th?) iteration of the same problem we all started laughing like a group of Beavis and Butt-Heads during a vendor’s presentation, and we had to apologize profusely (it wasn’t their fault, after all).

Huh huhuhuhuhuh… he said “scalability issues”

In fact, I created a simple diagram with some crayons brought by another delegate to save everyone some time.

Hello my name is Simon, and I like to do draw-wrings

But with The Problem on repeat it became very clear that the majority of networking companies are all tackling the very same Problem. And imagine the VC funding that’s chasing the solution as well.

So what is “The Problem”? It’s multi-faceted and interrelated set of issues:

Virtualization Has Messed Things Up, Big Time

The biggest problem of them all was caused by the rise of virtualization. Virtualization has disrupted much of the server world, but the impact that it’s had on the network is arguably orders of magnitude greater. Virtualization wants big, flat networks, just when we got to the point where we could route Layer 3 as fast as we could switch Layer 2. We’d just gotten to the point where we could get our networks small.

And it’s not just virtualization in general, much of its impact is the very simple act of vMotion. VMs want to keep their IPs the same when they move, so now we have to bend over backwards to get it done. Add to the the vSwitch sitting inside the hypervisor, and the limited functionality of that switch (and who the hell manages it anyway? Server team? Network team?)

4000 VLANs Ain’t Enough

If you’re a single enterprise running your own network, chances are 4000+ VLANs are sufficient (or perhaps not). In multi-tenant environments with thousands of customers, 4000+ VLANs quickly becomes a problem. There is a need for some type of VLAN multiplier, something like QinQ or VXLAN, which gives us 4096 times 4096 VLANs (16 million or so).

Spanning Tree Sucks

One of my first introductions to networking was accidentally causing a bridging loop on a 10 megabit Ethernet switch (with a 100 Mbit uplink) as a green Solaris admin. I’d accidentally double-connected a hub, and I noticed the utilization LED on the switch went from 0% to 100% when I plugged a certain cable in. I entertained myself with plugging in and unplugging the port to watch the utilization LED flucutate (that is, until the network admin stormed in and asked what the hell was going on with his network).

And thus began my love affair with bridging loops. After the Brocade presentation where we built a TRILL-based Fabric very quickly, with active-active uplinks and nary a port in blocking mode, Ethan Banks became a convert to my anti-spanning tree cause.

OpenFlow offers an even more comprehensive (and potentially more impressive) solution as well. More on that later.

Layer 2 Switching Isn’t Scaling

The current method by which MAC addresses are learned in modern switches causes two problems: Only one viable path can be allowed at a time (only way to prevent loops is to prevent multiple paths by blocking ports), and large Layer 2 networks involve so many MAC addresses that it doesn’t scale.

From QFabric, to TRILL, to OpenFlow (to half a dozen other solutions), Layer 2 transforms into something Layer 3-like. MAC addresses are routed just like IP addresses, and the MAC address becomes just another tuple (another recurring word) for a frame/packet/segment traveling from one end of your datacenter to another. In the simplest solution (probably TRILL?) MAC learning is done at the edge.

There’s A Lot of Shit To Configure

Automation is coming, and in a big way. Whether it’s a centralized controller environment, or magical software powered by unicorn tears, vendors are chomping at the bit to provide some sort of automation for all the shit we need to do in the network and server world. While certainly welcomed, it’s a tough nut to crack (as I’ve mentioned before in Automation Conundrum).

Data center automation is a little bit like the Gom Jabbar. They tried and failed you ask? They tried and died.

“What’s in the box?”

“Pain. And an EULA that you must agree to. Also, man-years of customization. So yeah, pain.”

Ethernet Rules Everything Around Me

It’s quite clear that Ethernet has won the networking wars. Not that this is any news to anyone who’s worked in a data center for the past ten years, but it has struck me that no other technology has been so much as even mentioned as one for the future. Bob Metcalfe had the prophetic quote that Stephen Foskett likes to use: “I don’t know what will come after Ethernet, but it will be called Ethernet.”

But there are limitations (Layer 2 MAC learning, virtualization, VLANs, storage) that need to be addressed for it to become what comes after Ethernet. Fibre Channel is holding ground, but isn’t exactly expanding, and some crazy bastards are trying to merge the two.

Oof. Storage.

Most people agree that storage is going to end up on our network (converged networking), but there are as many opinions on how to achieve this network/storage convergence as there are nerd and pop culture reference in my blog posts. Some companies are pro-iSCSI, others pro FC/NFS, and some like Greg Ferro have the purest of all hate: He hates SCSI.

“Yo iSCSI, I’m really happy for you and imma let you finish, but Fibre Channel is the best storage protocol of all time”

So that’s “The Problem”. And for the most part, the articles on Networking Field Day, and the solutions the vendors propose will be framed around The Problem.

Filed under Ethernet, FCoE, Fibre Channel, NetworkingFieldDay, Storage, Virtualization

Fsck It, We’ll Do It All With SSDs!

September 30, 2011 13 Comments

At Tech Field Day 8, we saw presentations from two vendors that had an all-flash SAN offering, taking on a storage problem that’s been brewing in data centers for a while now, and the skewed performance/capacity scale.

While storage capacity has been increasing exponentially, storage performance hasn’t caught up nearly that fast. In fact, performance has been mostly stagnant, especially in the area where it counts: Latency and IOPS (I/O Operations Per Second).

It’s all about the IOPS, baby

In modern data centers, capacity isn’t so much of an issue with storage. Neither is the traditional throughput metric, such as megabytes per second. What really counts is IOPS and latency/seek time. Don’t get me wrong, some data center applications certainly have capacity requirements, as well as potential throughput requirements, but for the most part these are easily met by today’s technology.

IOPS and latency are super critical for virtual desktops (and desktops in general) and databases. If you computer is sluggish, it’s probably not a lack of RAM or CPU, by and large it’s a factor of IOPS (or lack thereof).

There are a few tricks that storage administrators and vendors have up their sleeve to increase IOPS and drop latency.

In a RAID array, you can scale IOPS linerally by just throwing more disks at the array. If you have a drive that does 100 IOPS per second, add a second drive for a RAID 0 (mirror) and you’ve got double the IOPS. Add a third and you’ve got 300 IOPS (and of course add more for redundancy).

Another trick that storage administrators have up their sleeve is the technique known as “short stroking“, where only a portion of the drive is used. In a spinning platter, the outside is spinning the fastest, giving the best performance. If you only format that out portion, the physical drive head doesn’t have to travel as far. This can reduce seek time substantially.

Tiered storage can help with both latency and IOPS, were a combination of NVRAM, SSDs, and hard drives are combined.”Hot” data is accessed from high-speed RAM cache, “warm” data is on a bank of SSDs, and “cold” data would be stored on cheaper SAS or (increasingly) consumer SATA drives.

And still our demand for IOPS is insatiable, and the tricks in some cases aren’t catching up. Short stroking only goes so far, and cache misses can really impact performance for tiered storage. While IOPS scale linearly, the IOPS we need can sometimes end up with racks full of spinning rust, while only using a tenth of the actual capacity. That’s a lot of wasted space and wasted power.

And want to hear a depressing fact?

A high-end enterprise SAS 15,000 RPM drive (which spins faster than most jet engines) gives you about 150 IOPS in performance (depending on the workload of course). A good consumer grade SSD from Newegg gives you around 85,000 IOPS. That means you would need almost 600 drives to equal the performance of one consumer grade SSD.

That’s enough to cause anyone to have a Bill O’Reilly moment.

600 drives? Fuck it, we’ll do it with all flash!

No one is going to put their entire database or virtual desktop infrastructure on a single flash drive of course. And that’s where vendors like Pure Storage and SolidFire come into play. (You can see Pure Storage’s presentation at Tech Field Day 8 here. SolidFire’s can be seen here.)

The overall premise with the we’ll-do-it-all-in-flash play is that you can take a lot of consumer grade flash drives, use the shitload of IOPS that they bring, and combine it with a lot of storage controller CPU power for deduplication and compression. With that combination, they can offer an all-flash based array at the same price per gig as traditional arrays comprise of spinning rust (disk drives).

How many IOPS are we talking about? SolidFire’s SF3010 claims 50,000 IOPS per 1 RU node. That would replace over 300 drives of traditional drives, which I don’t think you can put in 1RU. Pure Storage claims 300,000 IOPS in 8U of space. With a traditional array, you’d need over 2000 drives, also unlikely to fit in 8 RU. Also, imagine the power savings, with only 250 watts needed for SoldFire’s node, and 1300 Watts for the PureStorage cluster. And Both allow you to scale up by adding more nodes.

You wire them into your SAN the traditional ways, as well. The Pure Storage solution has options for 10 Gbit iSCSI and 8 Gbit Fibre Channel, while the SolidFire solution is iSCSI only. (Sadly, neither support FCoE or FCoTR.)

For organizations that are doing virtual desktops or databases, an all-flash storage array with the power savings and monster IOPS must look more tantalizing than a starship full of green-skinned girls does to Captain Kirk.

There is a bit of a controversy in that many of the all-flash vendors will tell you capacity numbers with deduplication and compression taken into account. At the same time, if the performance is better than spinning rust even with the compression/dedupe, then who cares?

So SSD it is. And as anyone who has an SSD in their laptop or desktop will tell you, that shit is choice. Seriously, I get all Charlton Heston about my SSD.

You’ll pry this SSD from my cold, dead hands

It’s not all roses and unicorn-powered SSDs. There are two issues with the all-flash solution thus far. One is that they don’t have a name like NetApp, EMC, or Fujitsu, so there is a bit of a trust issue there. The other issue is that many have some negative preconceptions about flash, such as they have a high failure rate (due to a series of bad firmwares from vendors) and the limited write cycle of memory cells (true, but mitigaitable). Pure Storage claims to have never had a drive fail on them (Amy called them flash driver whispers).

Still though, check them (and any other all-SSD vendor) out. This is clearly the future in terms of high performance storage where IOPS is needed. Spinning rust will probably rule the capacity play for a while, but you have to imagine its days are numbered.

Filed under Storage, Virtualization, VMware

FCoE: I’m not Dead! Arista: You’ll Be Stone Dead in a Moment!

September 19, 2011 7 Comments

I was at Arista on Friday for Tech Field Day 8, and when FCoE was brought up (always a good way to get a lively discussion going), Andre Pech from Arista (who did a fantastic job as a presenter) brought up an article written by Douglas Gourlay, another Arista employee, entitled “Why FCoE is Dead, But Not Buried Yet“.

FCoE: “I feel happy!”

It’s an interesting article, because much of the player-hating seems to directed at TRILL, not FCoE, and as J Metz has said time and time again, you don’t need TRILL to do FCoE if you do FCoE the way Cisco does (by using Fibre Channel Forwarders in each FCoE switch). Arista, not having any Fibre Channel skills, can’t do it this way. If they were to do FCoE, Arista (like Juniper) would need to do it the sparse-mode/FIP-snooping FCoE way, which would need a non-STP way of handling multi-pathing such as TRILL or SPB.

Jayshree Ullal, The CEO of Arista, hated on TRILL and spoke highly of VXLAN and NVGRE (Arista is on the standards body for both). I think part of that is that like Cisco, not all of their switches will be able to support TRILL, since TRILL requires new Ethernet silicon.

Even the CEO of Arista acknowledged that FCoE worked great at the edge, where you plug a server with a FCoE CNA into an FCoE switch, and the traffic is sent along to native Ethernet and native Fibre Channel networks from there (what I call single-hop or no-hop FCoE). This doesn’t require any additional FCoE infrastructure in your environment, just the edge switch. The Cisco UCS Fabric Interconnects are a great example of this no-hop architecture.

I don’t think FCoE is quite dead, but I have to imagine that it’s not going as well as vendors like Cisco have hoped. At least, it’s not been the success that some vendors have imagined. And I think there are two major contributors to FCoE’s failure to launch, and both of those reasons are more Layer 8 than Layer 2.

Old Man of the Data Center

Reason number one is also the reason why we won’t see TRILL/Fabric Path deployed widely: It’s this guy:

Don’t let him trap you into hearing him tell stories about being a FDDI bridge, whatever FDDI is

The Catalyst 6500 series switch. This is “The Old Man of the Data Center”. And he’s everywhere. The switch is a bit long in the tooth, and although capacity is much higher on the Nexus 7000s (and even the 5000s in some cases), the Catalyst 6500 still has a huge install base.

And it won’t ever do FCoE.

And it (probably) won’t ever do TRILL/Fabric Path (spanning-tree fo-evah!)

The 6500s aren’t getting replaced in significant numbers from what I can see. Especially with the release of the Sup 2T supervisor for the 6500es, the 6500s aren’t going anywhere anytime soon. I can only speculate as to why Cisco is pursuing the 6500 so much, but I think it comes down to two reasons:

The idea of “Don’t let the customer replace the chassis, lest they replace it with a competitor“
Cisco is afraid of eating its young. Apple is the opposite, love ’em or hate ’em, they weren’t afraid to cannibalize a highly lucrative and profitable business (iPods) with an unproven (but now proven) product (iPhone). Cisco doesn’t have the guts to cannibalize the 6500 sales.

Another reason why customers haven’t replaced the 6500s are that the Nexus 7000 isn’t a full-on replacement. With no service modules, limited routing capability (it just recently got the ability to do MPLS), and a form factor that’s much larger than the 6500 (although the 7009 just hit the streets with a very similar 6500 form factor, which begs the question: Why didn’t Cisco release the 7009 first?).

Premature FCoE

So reason number two? I think Cisco jumped the gun. They’ve been pushing FCoE for a while, but they weren’t quite ready. It wasn’t until July 2011 that Cisco released NX-OS 5.2, which is what’s required to do multi-hop FCoE in the Nexus 7000s and MDS 9000. They’ve had the ability to do multi-hop FCoE in the Nexus 5000s for a bit longer, but not much. Yet they’ve been talking about multi-hop for longer than it was possible to actually implement. Cisco has had a multi-hop FCoE reference architecture posted since March 2011 on their website, showing a beautifully designed multi-hop FCoE network with 5000s, 7000s, and MDS 9000s, that for months wasn’t possible to implement. Even today, if you wanted to implement multi-hop FCoE with Cisco gear (or anyone else), you’d be a very, very early adopter.

So no, I don’t think FCoE is dead. No-hop FCoE is certainly successful (even Arista’s CEO acknowedged as such), and I don’t think even multi-hop FCoE is dead, but it certainly hasn’t caught on (yet). Will multi-hop FCoE catch on? I’m not sure. We’ll have to see.

Filed under Ethernet, FCoE, Storage, TechFieldDay, Virtualization

You Changed, VMware. You Used To Be Cool.

August 10, 2011 1 Comment

So by now we’ve all heard about VMware’s licensing debacle. The two biggest changes where that VMware would be charging for RAM utilization (vRAM) as well as per CPU, and the free version of ESXi would go from a 256 GB limit to an 8 GB limit.

There was a bit of pushback from customers. A lot, actually. A new term has entered the data center lexicon: vTax, a take on the new vRAM term. VMware sales reps were inundated with angry customers. VMware’s own message boards were swarmed with angry posts. Dogs and cats living together, mass hysteria. And I think my stance on the issue has been, well, unambiguous.

It’s tough to imagine that VMware anticipated that bad of a response. I’m sure they thought there would be some grumblings, but we haven’t seen a product launch go this badly since New Coke.

VMware is in good company

VMware introduced the concept of vRAM, essentially making customers pay for something that had been free. The only bone they threw at customers was the removal of the core limit on processors (previously 6 or 12 cores per CPU depending on the license). However, it’s not more cores most customers need, it’s more RAM. Physical RAM is usually the limiting factor for increasing VM density on a single host, not lack of cores. Higher VM density means lower power, lower cooling, and less hosts.

VMware knew this, so they hit their customers where it counts. The only way to describe this, no matter how VMware tries to spin it, is a price increase.

VMware argued that the vast majority of customers would be unaffected by the new licensing scheme, that they would pay the same with vSphere 5 as they did with vSphere 4. While that might have been true, IT doesn’t think about right now, they think about the next hardware they’re going to order, and that next hardware is dripping with RAM. Blade systems from Cisco and HP have to ability to cram 512 GB of RAM in them. Monster stand-alone servers can take even more, like Oracle’s Sun Fire X4800 can hold up to 2 TB of RAM. And it’s not like servers are going to get less RAM over time. Customers saw their VMware licensing costs increasing with Moore’s Law. We’re supposed to pay less with Moore’s Law, and VMware figured out a way to make us pay more.

So people were mad. VMware decided to back track a bit, and announced adjusted vRAM allotments and rules. So what’s changed?

You still pay for vRAM allotments, but the allotments have been increased across the board. Even the free version of the hypervisor got ups to 32 GB from 8 GB.

Also, the vRAM usage would be based on a yearly average, so spikes in vRAM utilization wouldn’t increase costs so long as they were relatively short lived. vRAM utilization is now capped at 96 GB, so no matter how large a single VM is, it will only count as 96 GB of vRAM used.

Even with the new vRAM allotments, it’s still a price increase

The adjustments to vRAM allotments have helped quell the revolt, and the discussion is starting to turn from vTax to the new features in vSphere 5. The 32 GB limit for the free version of ESXi also made a lot more sense. 8 GB was almost useless (my own ESXi host has 18 GB of RAM), and given what it costs to power and cool even a basement lab, not even worth powering up. 32 GB means a nice beefy lab system for the next 2 years or thereabouts.

What’s Still Wrong

While the updated vRAM licensing has alleviated the immediate crisis, there is some damage done, some of it permanent.

The updated vRAM allotments make more sense for today, and give some room for growth in the future, but it still has a limited shelf life. As servers get more and more RAM over the next several years, vTax will automatically increase. VMware is still tying their liscensing to a deprecating asset in RAM.

That was part of what got people so peeved about the vRAM model. Even if they ran the numbers and it turned out they didn’t need additional licensing from vSphere 4 right now, most organizations had an eye on their hardware refresh cycle, because servers get more and more RAM with each refresh cycle.

VMware is going to have to continue to up vRAM allotments on a regular basis. I find it uncomfortable to know that my licensing costs could increase as exponentially as RAM becomes exponentially more plentiful as time goes on. I don’t doubt that they will increase allotments, but we have no idea when (and honestly, even if) they will.

You Used to be Cool, VMware

The relationship between VMware and its customers base has also been damaged. VMware had incredible goodwill from customers as a vendor, a relationship that was the envy of the IT industry. We had their back, and we felt they had our back. No longer. Customers and the VMware community will now look at VMware with a somewhat wary eye. Not as wary with the vRAM adjustments, but wary still.

I have to imagine that Citrix and Microsoft have sent VMware’s executives flowers with messages like “thank you so much” and “I’m going to name my next child after you”. I’m hearing anecdotal evidence that interest in HyperV and Citrix Xen has skyrocketed, even with the vRAM adjustments. In the Cisco UCS classes I teach, virtually everyone has taken more than just a casual look at Citrix and HyperV.

Ironically Citrix and Microsoft have struggled to break the stranglehold that VMware had on server virtualization (over 80% market share). They’ve tried several approaches, and haven’t been successful. It’s somewhat amusing that it’s a move that VMware made that seems to be loosening the grip. And remember, part of that grip was the loyal user community.

Thank about how significant that is. Before vSphere 5, there was no reason for most organizations to get “wandering eye”. VMware has the most features and a huge install base, plus plenty of resources and expertise around to have made VMware the obvious choice for most . The pricing was reasonable, so there wasn’t a real need to look elsewhere.

Certainly the landscape for both VMware and the other vendors has changed substantially. It will be interesting to see how it all plays out. My impression is that VMware will still make money, but they will lose market share, even if they stay #1 in terms of revenue (since they’re charging more, after all).

Filed under Citrix Xen, Virtualization, VMware

VMware? Surprised. Oracle? Not Surprised.

July 29, 2011 Leave a comment

There’s a rumor that VMware is backing away a bit from their vRAM (vtax) licensing. Nothing confirmed yet, but according to gabesvirtualworld.com, they’re not getting rid of vRAM licensing entirely, but are making some adjustments. They’re upping the vRAM allotments, and 96 GB is the maximum a VM can count towards vRAM allocation, If you had a VM with 256 GB of RAM, only 96 of it would count towards vRAM usage. I’d still like to see vRAM go away entirely, but it seems the new limits are less catastrophic in terms of future growth.

On a related note, it seems Oracle is one-upping VMware in the “dick move” department. A blog post on virtualizationpractice.com points out Oracle’s new Java 7 licensing: In virtualized environments, Java 7 only officially supported by an Oracle hypervisor. Knowing Larry Ellison, it’s not all the surprising. He’s a lock’em in kinda guy. I avoid Oracle at all costs.

VMware’s dick move was a surprise. That didn’t seem like the VMware we’d been working with for years. Oracle on the other hand, you’re only surprised they didn’t do it sooner.

Filed under Dick Move, Oracle, Virtualization, VMware

Newer posts →

The Data Center Overlords