CCIE Data Center Prep | The Data Center Overlords

OTV AEDs Are Like Highlanders

July 14, 2014 1 Comment

While prepping for CCIE Data Center and playing around with a lab environment, I ran into a problem I’d like to share.

I was setting up a basic OTV setup with three VDCs running OTV, connecting to a core VDC running the multicast core (which is a lot easier than it sounds). I’m running it in a lab environment we have at Firefly, but I’m not going by our normal lab guide, instead making it up as I go along in order to save some time, and make sure I can stand up OTV without a lab guide.

Each VDC will set up an adjacency with the other two, with the core VDC providing unicast and multicast connectivity. That part was pretty easy to setup (even the multicast part, which had previously freaked me the shit out). Each VDC would be its own site, so no redundant AEDs.

On each OTV VDC, I setup the following as per my pre-OTV checklist:

Bi-directional IPv4 unicast connectivity to each join interface (I used a single OSPF area)
MTU of 9216 end-to-end (easy since OTV requires M line cards, and it’s just an MTU command on the interface)
An OTV site VLAN which requires:
- That the VLAN is configured on the VDC
- That the VLAN is active on a physical port that is up
Multicast configuration
- IP pim sparse-mode configuration on every interface, end-to-end
- IP igmp version 3 on every interface end-to-end
- Rendezvous point (RP) configured on the loopback address of the core VDC (I used the bidir tag)

So I got all that configured and then configured the OTV setup. Very basic:

feature otv

otv site-vlan 10

interface Overlay1
  otv join-interface Ethernet1/2
  otv control-group 239.1.1.1
  otv data-group 232.1.1.0/28
  otv extend-vlan 100
  no shutdown
otv site-identifier 0000.0000.0002

ip pim rp-address 10.11.200.1 group-list 224.0.0.0/4
ip pim ssm range 232.0.0.0/8

The only difference between the three OTV VDC configurations was the site-identifier and the join interface. Everything else was identical, pretty easy configuration. But… it didn’t work. Shit. Time for some show commands:

N7K-11-vdc-2# show otv adjacency
Overlay Adjacency database

Overlay-Interface Overlay1 :
Hostname System-ID Dest Addr Up Time State
VDC-3 18ef.63e9.5d43 10.11.3.2 01:36:52 UP
vdc-4 18ef.63e9.5d44 10.11.101.2 01:41:57 UP
vdc-2#

OK, so the adjacencies are built. I’ve at least got IP4 unicast and multicast going on. How about “show otv”?

N7K-11-vdc-2# show otv

OTV Overlay Information
Site Identifier 0000.0000.0002

Overlay interface Overlay1

 VPN name : Overlay1
 VPN state : UP
 Extended vlans : 100 (Total:1)
 Control group : 239.1.1.1
 Data group range(s) : 232.1.1.0/28
 Join interface(s) : Eth1/2 (10.11.2.2)
 Site vlan : 11 (up)
 AED-Capable : No (Site-ID mismatch)
 Capability : Multicast-Reachable
N7K-11-vdc-2#

Site-ID mismatch? What the shit? They’re supposed to mismatch. I try another command:

N7K-11-vdc-2# show otv site

Dual Adjacency State Description
 Full - Both site and overlay adjacency up
 Partial - Either site/overlay adjacency down
 Down - Both adjacencies are down (Neighbor is down/unreachable)
 (!) - Site-ID mismatch detected

Local Edge Device Information:
 Hostname vdc-2
 System-ID 18ef.63e9.5d42
 Site-Identifier 0000.0000.0002
 Site-VLAN 11 State is Up

Site Information for Overlay1:

Local device is not AED-Capable (Site-ID mismatch)
Neighbor Edge Devices in Site: 1

Hostname System-ID Adjacency- Adjacency- AED-

 State Uptime Capable

--------------------------------------------------------------------------------
VDC-3 18ef.63e9.5d43 Partial (!) 00:17:39 Yes

Now this show command confused me for a while. I was trying to figure out the Site-ID mismatch. I was also wondering why I could see VDC-3 but couldn’t see VDC-4. Then it dawned on me (after am embarrassing amount of time) I’m not supposed to. I’m not supposed to see VDC-3, either. The “show site” command is only looking at the local area. For my configuration, I shouldn’t see any other VDCs with “show otv site”.

This means that there’s some type of Layer 2 connectivity between the different sites. VDC-3 and VDC-4 both somehow see each other as Layer 2 adjacent. That shouldn’t happen if they’re supposedly on remote sites. This is a lab environment, so there’s some sort of Layer 2 connectivity for the Site-VLAN that I need to kill.

OTV edge devices are like highlanders, if there’s Layer 2 adjacency, they sense each other.

“I could sense you by your VLAN”

It probably happened on the interface that I assigned the site-VLAN to as an access port. A VLAN will not show “active” unless you have an active physical link (interface VLANs don’t count).

So I went through and re-configured the site VLAN. Instead of VLAN 10 (which was probably active on the other ends of those interfaces somehow) I created new VLANs, and used a unique VLAN for each VDC. The site-VLANs do not need to be identical between sites. I put the VLAN on a physical link that was up, and voila.

In the real world, you probably won’t run into this. However, it’s possible if there are other Layer 2 interconnects going on in your data center (perhaps dark fiber) or you’re transitioning from one DCI to another, you may hit this.

Filed under Always Be Learning, CCIE Data Center Prep, Certifications, data center

CCIE DC Attempt #1: Did Not Pass

July 22, 2013 3 Comments

Earlier this month, I drove my rental car up to Cisco’s infamous 150 Tasman Drive after being stuck on the 101 for about an hour. I checked in, sat down, and dug into my very first CCIE lab attempt. A bit over 8 hours later, I knew I didn’t pass, but I got a good feel for what the lab is like.

My preparation for the exam had been very unbalanced, working extensively with some parts of the blueprint, while other aspects of the blueprint I hadn’t really touched in over a year. So I was not surprised at all to see the “FAIL” notice when I got my score.

The good news is that I think with the right preparation on my weak parts, I can pass on the next attempt (which I haven’t yet scheduled, but will soon).

The following animated GIF is what it’s like to do parts of a CCIE lab exam that you haven’t prepared for.

Filed under Always Be Learning, Attempts at Humor, CCIE Data Center Prep, Certifications

#CCNADC CCNA Data Center (my short journey)

November 24, 2012 10 Comments

On Monday I think it was, Cisco announced the completion of the Data Center track: The CCNA Data Center and CCNP Data Center certifications, and tests are available immediately. And you know me, I live in PearsonVUE test centers, and I’m a data center nut, so I signed that shit right up.

I’m now CCNA Data Center certified.

CCNA Data Center in less than a week of it coming out

Took the first test (640-911) on Wednesday 11/21/12 (first day I could schedule) and passed with an 830. I booked the next available date (today 11/24/12) for the 640-916 test and passed, squeaking by with a 798 (797 required).

How I felt when I saw that I passed by one point

I found 640-911 tougher, and thought I got more answers wrong. 640-916 seemed easier, since it’s more of the topics I teach on a regular basis (UCS, ACE, Fibre Channel). But for some reason I scored higher on the 640-911. Go figure.

I took them both blind, without studying or reading up (and no, no “study guides”). I didn’t even look at the exam topics for 640-911, and I barely glanced at them for 640-916. Generally, the questions were all data center specific, and covered topics you’d find in the various non-track (specialization) data center certs from Cisco. Also, I’ve gotten the question “Is there WAAS on the CCNA Data Center?” It’s not in the exam topics, and I don’t think I’m violating the confidentiality agreement by confirming the exam topics list by saying no, there’s no WAAS. Thankfully, because ugh WAAS.

So why take the trouble for a CCNA Data Center when I’m working on the CCIE Data Center? The reason is the CCNP Data Center. To get the CCNP Data Center, I need the CCNA Data Center. My goal is CCIE Data Center, but I’m impatient. There are very limited seats for the CCIE Data Center because right now, I think there’s only a single pod for the entire world (I think CCIE Wireless is like that too, or at least it was when it started out). Thus it’ll be a while before I get it (I’m guessing Summer 2013), even assuming I make it on the first try (which, odds are, I won’t). My highest Cisco certification is a CCSI, which is the teaching certification. I don’t have an NP-level at all, having dropped pursuit of my CCNP R&S a while ago in pursuit of other certs.

So by January I hope to have the CCNP Data Center hammered out. I’ve already got one of the tests done (DCUCI from like, ages ago), and I can’t recall if I did DCUCD or not. I need DCUFI and DCUFD, both of which I need to get anyway. Plus one of the troubleshooting (DCUFTS/DCUCTS) and I’ll be a CCNP Data Center.

Edit (11/25/12): Turns out my DCUCI pass won’t cut it. It’s an older version of the test, and they need either the V4 or the V5. So I’m back to square zero. Also, I got the required tests wrong:

You need to pass only four exams.

You have to pass DCUCI and DCUFI (V4 or V5), and you can either do the two design exams (DCUCD and DCUFD) or do the two troubleshooting exams (DCUCT and DCUFT). In all likelihood, I’ll end up doing all 6 tests because I’m a Cisco instructor and I need certs like woah, but I think I’ll go design first.

Overall, I’m very pleased that Cisco now has a full data center track. They’ve had several specializations, but unless you’re an instructor like me or have a partner-level requirement, those certs are pretty much worthless career wise. They have zero brand recognition. For example, if I told you I’m a Cisco Data Center Application Services Support Specialist, would you care? Probably not. You’ve never heard of it, so you have no idea how difficult/easy it is. That’s the benefit of a CCIE, since it has probably the best brand recognition of any certification in any genre of IT. Whether you’re a Linux admin, Microsoft developer, or Juniper router jockey, you likely are aware of the CCIE (and the difficulty associated with it). CCNP is not too far down that list either.

So, onward to the CCNP Data Center.

Filed under CCIE Data Center Prep, data center

Requiem for the ACE

September 25, 2012 12 Comments

Ah, the Cisco ACE. As we mourn our fallen product, I’ll take a moment to reflect on this development as well as what the future holds for Cisco and load balancing/ADC. First off, let me state I have no inside knowledge of what Cisco’s plans are in this regard. While I teach Cisco ACE courses for Firefly and develop Firefly’s courseware for both ACE products and bootcamp material for the CCIE Data Center, I’m not an employee of Cisco and have no inside knowledge of their plans. As a result, I’ve no idea what Cisco’s plans are, so this is pure speculation.

Also, it should be made clear that Cisco has not EOL’d (End of Life) or even EOS’d (End of Sale) the ACE product, and in a post on the CCIE Data Center group Walid Issa, the project manager for CCIE Data Center, made a statement reiterating this. And just as I was about to publish this post, there’s a great post by Brad Casemore also reflecting on the ACE, and there’s an interesting comment from Steven Schuchart of Cisco (analyst relations?) making a claim that ACE is, in fact, not dead.

However, there was a statement Cisco sent to CRN confirming the rumor, and my conversations with people inside Cisco have confirmed that yes, the ACE is dead. Or at least, that’s the understanding of Cisco employees in several areas. The word I’m getting will be bug-fixed and security-fixed, but further development will halt. The ACE may not officially be EOL/EOS, but for all intents and purposes, and until I hear otherwise, it’s a dead-end product.

The news of ACE’s probable demise was kind of like a red-shirt getting killed. We all knew it was coming, and you’re not going to see a Spock-like funeral, either.

We do know one thing: For now at least, the ACE 4710 appliance is staying inside the CCIE Data Center exam. Presumably in the written (I’ve yet to sit the non-beta written) as well as in the lab. Though it seems certain now that the next iteration (2.0) of the CCIE Data Center will be ACE-less.

Now let’s take a look down memory land, to the Ghosts of Load Balancers Past…

Ghosts of Load Balancers Past

As many are aware, Cisco has long had a long yet… imperfect relationship with load balancing. This somewhat ironic considering that Cisco was, in fact, the very first vendor to bring a load balancer to market. In 1996, Cisco released the LocalDirector, the world’s first load balancer. The product itself sprung from the Cisco purchase of Network Translation Incorporated in 1996, which also brought about the PIX firewall platform.

The LocalDirectors did relatively well in the market, at least at first. It addressed a growing need for scaling out websites (rather than the more expensive, less resilient method of scaling up). The LocalDirectors had a bit of a cult following, especially from the routing and switching crowd, which I suspect had a lot to do with its relatively simple functionality: For most of its product life, the LocalDirector was just a simple Layer 4 device, and only moved up the stack in the last few years of its product life. While other vendors went higher up the stack with Layer 7 functionality, the LocalDirector stayed Layer 4 (until near the end, when it got cookie-based persistence). In terms of functionality and performance, however, vendors were able to surpass the LocalDirector pretty quickly.

The most important feature that the other vendors developed in the late 90s was arguably cookie persistence. (The LocalDirector didn’t get this feature until about 2001 if I recall correctly.) This allowed the load balancer to treat multiple people coming from the same IP address as separate users. Without cookie-based persistence, load balancers could only do persistence based on an IP address, and was thus susceptible to the AOL megaproxy problem (you could have thousands of individual users coming from a single IP address). There was more than one client in the 1999-2000 time period where I had to yank out a LocalDirector and put in a Layer 7-capable device because of AOL.

Cookie persistence is a tough habit to break

At some point Cisco came to terms with the fact that the LocalDirector was pretty far behind and must have concluded it was an evolutionary dead end, so it paid $6.7 billion (with B) to buy ArrowPoint, a load balancing company that had a much better product than the LocalDirector. That product became the Cisco CSS, and for a short time Cisco was on par with other offerings from other vendors. Unfortunately, as with the LocalDirector, development and innovation seemed to stop after the purchase, and the CSS was forever a product frozen in the year 2000. Other vendors innovated (especially F5), and as time went on the CSS won fewer and fewer deals. By 2007, the CSS was largely a joke in load balancing circles. Many sites were happily running the CSS of course, (and some still are today), but feature-wise, it was getting its ass handed to it by the competition.

The next load balancer Cisco came up with had a very short lifecycle. The Cisco CSM (Content Switch Module), a load balancing module for the Catalyst 6500 series, didn’t last very long and as far as I can remember never had a significant install base. Also, I don’t recall ever using, and know it only through legend (as being not very good). It was replaced quickly by the next load balancing product from Cisco.

And that brings us to the Cisco ACE. Available in two iterations, the Service Module and the ACE 4710 Appliance, it looked like Cisco might have learned from its mistakes when it released the Cisco ACE. Out of the gate it was a bit more of a modern load balancer, offering features and capabilities that the CSS lacked, such as a three-tired VIP configuration mechanism (real servers, server farms, and VIPs, which made URL rules much easier) and the ability to insert the client’s true-source IP address in an HTTP header in SNAT situations. The latter was a critical function that the CSS never had.

But the ACE certainly had its downsides. The biggest issue is that the ACE could never go toe-to-toe with the other big names in load balancing in terms of features. F5 and NetScaler, as well as A10, Radware, and others, always had a far richer feature set than the ACE. It is, as Greg Ferro said, a moderately competent load balancer in that it does what it’s supposed to do, but it lacked the features the other guys had.

The number one feature that keeps ACE from eating at the big-boy table is an answer to F5’s iRules. F5’s iRules give a huge amount of control over how to load balance and manipulate traffic. You can use it to create a login page on the F5 that authenticates against AD(without ever touching a web server), re-write http:// URLs to https:// (very useful in certain SSL termination setups), and even calculate Pi everytime someone hits a web page. Many of the other high end vendors have something similar, but F5’s iRules is the king of the hill.

In contrast, the ACE can evaluate existing HTTP headers, and can manipulate headers to a certain extent, but the ACE cannot do anything with HTTP content. There’s more than one installation where I had to replace the ACE with another load balancer because of that issue.

The ACE never had a FIPS-compliant SSL implementation either, which prevented the ACE from being in a lot of deals, especially with government and financial institutions. ACE was very late to the game with OCSP support and IPv6 (both were part of the 5.0 release in 2011), and the ACE10 and ACE20 Service Modules will never, ever be able to do IPv6. You’d have to upgrade to the ACE30 Module to do IPv6, though right now you’d be better off with another vendor.

For some reason, Cisco decided to make use of MQC (Module QoS CLI) as the configuration framework in the ACE. This meant configuring a VIP required setting up class-maps, policy-maps, and service-policies in addition to real server and server farms. This was far more complicated than the configuring of most of the competition, despite the fact that the ACE had less functionality. If you weren’t a CCNP level or higher, the MQC could be maddening. (On the upside, if you mastered it on the ACE, QoS was a lot easier to learn, as was my case.)

If the CLI was too daunting, there was always the GUI on the ACE 4710 Appliance and/or the ACE Network Manager (ANM), which was separate user interface that ran on RedHat and later became it’s own OVA-based virtual appliance. The GUI in the beginning wasn’t very good, and the ACE Service Modules (ACE10, ACE20, and now the ACE30) lacked a built-in GUI. Also, when it hits the fan, the CLI is the best way to quickly diagnose an issue. If you weren’t fluent in the MQC and the ACE’s rather esoteric utilization of such, it was tough to troubleshoot.

There was also a brief period of time when Cisco was selling the ACE XML Gateway, a product obtained through the purchase of Reactivity in 2007, which provided some (but not nearly all) of the features the ACE lacked. It still couldn’t do something like iRules, but it did have Web Application Firewall abilities, FIPS compliance, and could do some interesting XML validation and other security. Of course, that product was short lived as well, and Cisco pulled the plug in 2010.

Despite these short comings, the ACE was a decent load balancer. The ACE service module was a popular service module for the Catalyst 6500 series, and could push up to 16 Gbps of traffic, making it suitable for just about any site. The ACE 4710 appliance was also a popular option at a lower price point, and could push 4 Gbps (although it only had (4) 1 Gbit ports, never 10 Gbit). Those that were comfortable with the ACE enjoyed it, and there are thousands of happy ACE customers with deployments.

But “decent” isn’t good enough in the highly competitive load balancing/ADC market. Industry juggernauts like F5 and scrappy startups like A10 smoke the ACE in terms of features, and unless a shop is going all-Cisco, the ACE almost never wins in a bake-off. I even know of more than one occasion where Cisco had to essentially invite itself to a bake-off (which in those cases never won). The ACE’s market share continued to drop from its release, and from what I’ve heard is in the low teens in terms of percentage, while F5 has about 50%.

In short, the ACE was the knife that Cisco brought to the gunfight. And F5 had a machine gun.

I’d thought for years that Cisco might just up and decide to drop the ACE. Even with the marketing might and sales channels of Cisco, the ACE could never hope to usurp F5 with the feature set it had. Cisco didn’t seem committed to developing new features, and it fell further behind.

Then Cisco included ACE in the CCIE Data Center blueprint, so I figured they were sticking with it for the long haul. Then the CRN article came out, and surprised everybody (including many in Cisco from what I understand).

So now the big question is whether or not Cisco is bowing out of load balancing entirely, or coming out with something new. We’re certainly getting conflicting information out of Cisco.

I think both are possible. Cisco has made a commitment (that they seem to be living up to) to drop businesses and products that they aren’t successful in. While Cisco has shipped tens of thousands of load balancing units since the first LocalDirector was unboxed, except for the beginning they’ve never led the market. Somewhere in the early 2000s, that title belong almost exclusively to F5.

For a company as broad as Cisco is, load balancing as a technology is especially tough to sell and support. It takes a particular skill set that doesn’t relate fully to Cisco’s traditional routing and switching strengths, as load balancing sits in two distinct worlds: Server/app development, and networking. With companies like F5, A10, Citrix, and Radware, it’s all they do, and every SE they have knows their products forwards and backwards.

The hardware platform that the ACE is based on (Cavium Octeon network processors) I think are one of the reasons why the ACE hasn’t caught up in terms of features. To do things like iRules, you need fast, generalized processors. And most of the vendors have gone with x86 cores, and lots of them. Vendors can use pure x86 power to do both Layer 4 and Layer 7 load balancing, or some like F5 and A10 incorporate FGPAs to hardware-assist the Layer 4 load balancing, and distribute flows to x86 cores for the more advanced Layer 7 processing.

The Cavium network processors don’t have the horsepower to handle the advanced Layer 7 functionality, and the ACE Modules don’t have x86 at all. The ACE 4710 Appliance has an x86 core, but it’s several generations back (it’s seriously a single Pentium 4 with one core). As Greg Ferro mentioned, they could be transitioning completely away from that dead-end hardware platform, and going all virtualized x86. That would make a lot more sense, and would allow Cisco to add features that it desperately needs.

But for now, I’m treating the ACE as dead.

Filed under CCIE Data Center Prep, data center, Layer 8, Load Balancing, Virtualization

CCIE Data Center Beta Written Results Are In! (351-080)

September 20, 2012 1 Comment

And Cisco probably couldn’t be happier that the results are finally in. It’s been more than 3 months since the beta closed, and after a few promises of “soon”, we finally got our results today. Over at the Cisco learning community message boards for CCIE DC, there was a virtual riot going on.

Guys? I think we’d better get those results posted…

Once I got word they were live on PearsonVUE, I logged in and…. I failed.

Smug Cisco Guy: Way to go, dumbass.

At least we got our results.

To find out your status, go to PearsonVUE, log into your account, and check your history. It’ll show the pass or fail. Beyond pass/fail, we have to await the score report to find our what our weak areas were. My guess I was really weak on the 7K/5K stuff. I know I got all the ACE-related questions right, and most of the storage and UCS seemed pretty evident to me. I’ll have to wait and see, of course. I’ve scheduled a re-take for October 5th, so I’ve got some books to hit. Queue the montage…

Filed under Always Be Learning, CCIE Data Center Prep, Certifications

RIP ACE

September 17, 2012 2 Comments

Oh don’t be like that ACE, you had to know it was coming

Looks like the rumor was true, and it’s not just the ACE30 Service Module: Cisco will stop developing the ACE load balancers. The article quotes a statement from Cisco that includes this:

…Cisco has decided it will not develop further generations of its ACE load-balancing products.

I wonder if that means they won’t even bother to release the vACE or the Nexus 7000 Service Module (both of which have been mentioned in Cisco Live! presentations I believe).

Not sure what this means for the CCIE Data Center either. I think it would be relatively easy to strip the ACE out. It doesn’t appear to be a key component, just sort of tacked on anyway. I used to joke that the CCIE Data Center exam would be more relevant if it included F5’s LTM. We’ll see.

Filed under CCIE Data Center Prep, Load Balancing

Rumor: Cisco To Stop Selling ACE?

September 15, 2012 6 Comments

Update 9/17/12: It’s true. The Cisco ACE is dead.

Filed under things that make you go Hrmm…. I saw this on @IPv6freely’s (Chris Jones) twitter feed, and article in Barron’s stating that Cisco has told its sales people to stop selling the ACE ~~application delivery controller~~ load balancer.

The article mentions specifically the ACE30 module, the current service module. It makes no mention of the ACE 4710 appliance. Also, it’s just a rumor, so while interesting, certainly nothing definitive.

My speculation? Of course, it could be utter bullshit. I haven’t found or heard anything to substantiate it. Though even if it were true, it wouldn’t necessarily mean Cisco has given up on ACE. It could signal that Cisco is no longer interested in selling the ACE30 service module, preferring instead the ACE 4710 appliance and/or gearing up for a service module for the Nexus 7000 series. They could be making a move to go all virtual, with the vACE to possibly be announced shortly. (I heard November 2012, but just a rumor.)

Less likely, but certainly possible, is that Cisco is going to drop the ACE. It seems particularly unlikely given that the ACE was featured in the CCIE Data Center lab blueprint. However, the lab exam was moved to December, it could be they’re re-tooling it for an ACE-less lab (the ACE looked to be a relatively minor part of the lab anyway).

The ACE doesn’t have many fans outside of Cisco (or even inside, honestly). Though I wouldn’t say the ACE is a bad load balancer. It does what it’s supposed to do, and it does it relatively well. It’s just that it’s been… disappointing. It’s been a bit of a disappointment, in terms of market share and features. The ACE’s market share continues to drop, and in a competitive environment (F5, A10, Citrix NetScaler, etc.) ACE just can’t go toe-to-toe in features (especially against something like iRules/aFlex, FIPS, IPv6, etc.).

Tony, I find your lack of faith in ACE disturbing

So we’ll wait and see.

Filed under CCIE Data Center Prep, Load Balancing

CCIE Data Center Dates Pushed Back

September 8, 2012 1 Comment

Originally, the CCIE Data Center written exam was to be available September 3rd, and the lab exams available sometime in October. It looks like those dates have been pushed back.

As of writing, the written exam will be available starting September 17th, 2012 (still isn’t live on PearsonVUE yet). The lab exam first dates will be sometime in December, 2012.

I also heard that the CCIE Data Center written beta exam results will be available at about September 15th, 2012. Like many of you, I’m constantly checking my cert tracker…

A bit disappointing, but we’ve waited this long, we can wait a few more weeks.

Filed under CCIE Data Center Prep

Cisco ACE 101: Tony’s 5 Steps to a Happy VIP

August 14, 2012 15 Comments

I’ve been teaching Cisco ACE for over four years now, and I developed a quick trick/check list to teach students the minimum configuration to get a virtual service (VIP) up and running. And since the CCIE Data Center lab will soon be upon us, I’m sharing this little trick with you. I call it “Tony’s 5 Steps to a Happy VIP”. And here it is:

Step #1: ACL
Step #2: class-map: Defines the VIP address and port
Step #3: policy-map: Which server farm(s) do we send traffic to
Step #4: policy-map: Multi-match, will pair every class-map to its policy-map
Step #5: service-policy: Apply step #4 to the VLAN interface

Using that checklist, you can quickly troubleshoot/understand most ACE configurations. So what does that list mean?

First off, let’s define what a VIP even is: In load balancing terms, it refers to an IP and TCP or UDP port combination. In that regard, it’s a bit of a misnomer, since VIP is an acronym for “Virtual IP”, and only implies an IP address. Depending on the vendor, a VIP can be called a “Virtual Server”, “Virtual Service”, although it’s commonly referred to simply as “VIP”. It’s whatever you point the firehouse of network traffic to.

I’m not anti-GUI (in fact, I think the GUI is increasingly necessary in the network world), but in the case of the ACE (and CCIE DC) you’re going to want to use the CLI. It’s just faster, and you’re going to feel the need for speed in that 8 hour window. Also, when things go wrong, the CLI (and config file) is going to allow you to troubleshoot much more quickly than the GUI in the case of the ACE.

The CLI for Cisco ACE can be a little overwhelming. For some reason, Cisco decided to use the Modular QoS CLI (MQC) configuration framework. To me, it seems overly complicated. Other vendors have CLIs that tend to make a lot more sense, or at least is a lot easier to parse with your eyes. If you’re familiar with class-maps, policy-maps, and service-policies, the transition to the ACE CLI won’t be all that difficult. It works very similar to setting up QoS. However, if you’re new to MQC, it’s going to be a bit of a bumpy ride.

How I felt learning MQC for the first time

The Configuration

Here is a very basic configuration for an ACE:

access-list ANYANY line 10 extended permit ip any any 

rserver host SERVER1 ip address 192.168.10.100
  inservice 
rserver host SERVER2 ip address 192.168.10.101 
  inservice 
rserver host SERVER3 ip address 192.168.10.101 
  inservice

serverfarm host SERVERFARM1
  rserver SERVER1
    inservice
  rserver SERVER2
    inservice
  rserver SERVER3
    inservice 

class-map match-all VIP1-80 
  2 match virtual-address 192.168.1.200 tcp eq http

class-map match-all VIP1-443
  2 match virtual-address 192.168.1.200 tcp eq https

policy-map type loadbalance first-match VIP1-POLICY
  class class-default 
    serverfarm SERVERFARM1 

policy-map multi-match CLIENT-VIPS 
  class VIP1-80
    loadbalance vip inservice 
    loadbalance policy VIP1-POLICY
  class VIP1-443
    loadbalance vip inservice
    loadbalance policy VIP1-POLICY

interface vlan 200 
  description Client-facing interface 
  ip address 192.168.1.10 255.255.255.0 
  access-group input ANYANY
  service-policy input CLIENT-VIPS 
  no shutdown
interface vlan 100
  description Server VLAN
  ip address 192.168.10.1 255.255.255.0
  no shutdown

Step #1: ACL

It’s not necessarily part of the VIP setup, but you do need to have an ACL rule in before a VIP will work. The reason is that the ACE, unlike most load balancers, is deny all by default. Without an ACL you can’t pass any traffic through the ACE. (However, ACLs have no effect on traffic to the ACE for management.)

Many an ACE configuration problem has been caused by forgetting to put an ACL rule in. My recommendation? Even if you plan on using specific ACLs, start out with an “any/any” rule.

access-list ANYANY line 10 extended permit ip any any

And don’t forget to put them on the interface facing the client (outside VLAN).

interface vlan 200 
  description Client-facing interface 
  ip address 192.168.1.10 255.255.255.0 
  access-group ANYANY input 
  service-policy input CLIENT-VIPS 
  no shutdown

Once you get everything working, then you can make a more nailed-down ACL if required, although most don’t since there is likely a firewall in place anyway (even the Cisco example configurations typically only have an any-any rule in place).

If you do use a more specific ACL, it’s often a good idea to switch back to any-any for troubleshooting. Put the more specific rule in place only when you’re sure your config works.

Step #2: class-map (VIP declaration)

The next step is to create a class-map that will catch traffic destined for the VIP. You should always include an IP address as well as a single TCP or UDP port. I’ve seen configurations that match any TCP/UDP port on a specific IP address, and this is usually a really, really bad idea.

class-map match-all VIP1-80
  2 match virtual-address 192.168.1.200 tcp eq http

This defines a VIP with an address of 192.168.1.200 on port http (port 80). Even if you set up multiple ports on the same IP address, such as port 80 and 443, use different class-maps and configure them separately.

Step #3: policy-map (what do we do with traffic hitting the VIP)

Here is where the VIP is defined as either a Layer 4 VIP or a Layer 7 VIP. The example below is a simple Layer 4 VIP (the ACE is not aware of anything that happens above Layer 4). You can get a lot fancier in this section, such as sending certain matched traffic to one server farm, and other traffic to others, and/or setting up persistence. Again, this is the most basic configuration.

policy-map type loadbalance first-match VIP1-POLICY
  class class-default <-- This matches everything
    serverfarm SERVERFARM1 <-- And sends it all right here

Step #4: policy-map (round-up policy-map, pairs a VIP with a decision process, and all the pairs are joined into a single statement)

You will typically have multiple Step 2’s and Step 3’s, but they exist as independent declarations so you’ll need something to round them all up into a single place and join them. In most configurations, you will typically only have one multi-match policy-map. This multi-match is where you marry a Step 2 class-map to a Step 3 policy-map. In this example, two separate class-maps use the same policy-map (which is fine).

policy-map multi-match CLIENT-VIPS 
  class VIP1-80 <-- This VIP...
    loadbalance vip inservice 
    loadbalance policy VIP1-POLICY <-- ...sends traffic to this policy
  class VIP1-443 <-- This VIP...
    loadbalance vip inservice
    loadbalance policy VIP1-POLICY <-- ...sends traffic to this policy

Step #5: service-policy (apply the round-up to the client-facing interface)

Finally, for any of this to work, you’ll need to apply the Step 4 multi-match policy-map to a VLAN interface, the one that faces the client.

interface vlan 200 

 description Client-facing interface 
 ip address 192.168.1.10 255.255.255.0 
 access-group input ANYANY <-- Step 1's ACL is applied
 service-policy input CLIENT-VIPS <-- Step 5's multi-match policy map is applied
 no shutdown <-- Don't forget the no shut!

Hope this helps with demystifying the ACE configuration. A short little check list can really help save time, especially in a time-constrained environment like a CCIE lab.

Filed under CCIE Data Center Prep, Certifications, data center, Load Balancing

Po-tay-to, Po-ta-to: Analogies and NPIV/NPV

May 21, 2012 2 Comments

In a recent post, I took a look at the Fibre Channel subjects of NPIV and NPV, both topics covered in the CCIE Data Center written exam (currently in beta, take yours now, $50!). The post generated a lot of comments. I mean, a lot. Over 50 so far (and still going). An epic battle (although very unInternet-like in that it was very civil and respectful) brewed over how Fibre Channel compares to Ethernet/IP. The comments look like the aftermath of the battle of Wolf 359.

Captain, the analogy regarding squirrels and time travel didn’t survive

One camp, lead by Erik Smith from EMC (who co-wrote the best Fibre Channel book I’ve seen so far, and it’s free), compares the WWPNs to IP addresses, and FCIDs to MAC addresses. Some others, such as Ivan Pepelnjak and myself, compare WWPNs to MAC addresses, and FCIDs to IP addresses. There were many points and counter-points. Valid arguments were made supporting each position. Eventually, people agreed to disagree. So which one is right? They both are.

Wait, what? Two sides can’t be right, not on the Internet!

When comparing Fibre Channel to Ethernet/IP, it’s important to remember that they are different. In fact, significantly different. The only purpose for relating Fibre Channel to Ethernet/IP is for the purpose of relating those who are familiar with Ethernet/IP to the world of Fibre Channel. Many (most? all?) people learn by building associations with known subjects (in our case Ethernet/IP) to lesser known (in this case Fibre Channel) subjects.

Of course, any association includes includes its inherent inaccuracies. We purposefully sacrifice some accuracy in order to attain relatability. Specific details and inaccuracies are glossed over. To some, introducing any inaccuracy is sacrilege. To me, it’s being overly pedantic. Pedantic details are for the expert level. Using pedantic facts as an admonishment of an analogy misses the point entirely. With any analogy, there will always be inaccuracies, and there will always be many analogies to be made.

Personally, I still prefer the WWPN ~= MAC/FC_ID ~= IP approach, and will continue to use it when I teach. But the other approach I believe is completely valid as well. At that point, it’s just a matter of preference. Both roads lead to the same destination, and that is what’s really important.

Learning always happens in layers. Coat after coat is applied, increasing in accuracy and pedantic details as you go along. Analogies is a very useful and effective tool to learn any subject.

Filed under Always Be Learning, CCIE Data Center Prep, Certifications, data center, Ethernet, Fibre Channel, Learning, Storage

← Older posts