Common ACE Gotchas
April 9, 2012 2 Comments
OK, so the Cisco ACE is not my favorite load balancer. It’s certainly not my go-to load balancer when clients come a-callin. It lacks many features that its competitors have, and market share-wise it’s getting its ass handed to it by F5. (And honestly? Deservedly so.) Also, that blue light will seer your retinas like a fine tofu steak.
This is what the ACE 4710 Appliance’s blue light can do if you’re not careful
But with the Cisco ACE being part of the CCIE Data Center track, it’s kind of like a distant relative who won the lottery: I suddenly find it more interesting.
So my thoughts turn to the CCIE Data Center lab test, and trying to figure out what they could test me on, and I started to think of some of the common gotchas I see in the field in terms of ACE configuration, so I’ve listed a few of them here.
HTTP Health Probes Failed (With Healthy Server)
With the ACE software (including the latest 5.x code) there’s a requirement for an HTTP/HTTPS-based health probe that often trips people up: HTTP status code (min/max). HTTP responses have a status code associated with them, ranging from 200 (OK) to 404 (Not Found) or even 418 “I’m a teapot”. (No seriously, it’s a valid code.) Most load balancers will accept 200 by default, and fail on something 400 or above. But not the ACE. You must explicitly configure what the ACE will accept, or it will accept nothing.
When you configure a health probe in the ACE, it has defaults for URL (/), timeout (10 seconds), and interval (15 seconds). But there’s no default for “expect status”. If you don’t set it, all health probes will fail, even though the server is perfectly healthy.
No. Probes skipped : 0 Last status code : 200 No. Out of Sockets : 0 No. Internal error: 0 Last disconnect err : Received invalid status code
As you can see in the show probe detail above, the last HTTP response was a 200 (which is good), but the ACE still considered that an invalid status code. The solution is to set the status.
probe http HTTP_PROBE1 expect status 200 200
Once you do that, it will take the passdetect interval and timeout to mark the server up (default is 1m30s, since the interval and timeout values when a server is down can be different than when a server is considered up). You can have it do it faster by disabling and re-enabling the probe.
Cold Standby
When you upload a certificate and key to the ACE, you typically do so on the active ACE in a HA pair. What a lot of people don’t realize is that you need to upload the certificate and key on both ACEs, and you need to do so before you reference the key and certificate in the configuration file.
If you put the key and certificate in the configuration file, and it doesn’t exist on both the active and standby ACE, the standby goes into a mode called COLD_STANDBY. With COLD_STANDBY, the standby ACE will take over in the even of the active ACE going down, however it no longer accepts configuration updates from the active ACE. You can still failover, but the config could be months old.
- Upload certificate and key to active ACE
- Upload certificate and key to standby ACE
- Go into config mode, setup the ssl-proxy referencing the key and cert.
What happens a lot of the time is people do this:
- Upload certifiate and key to active ACE
- Go into config mode, setup the ssl-proxy referencing the key and cert
If you’re in STANDBY_COLD (many are and don’t even realize it), make sure all referenced certificates and keys are uploaded to both ACEs, then reboot the standby box or run the command no ft auto-sync run followed by ft auto-sync run. When it comes up, it’ll sync again, and you should be in STANDBY_HOT, which is what you want.
You Forgot the Intermediate Certificate
This isn’t an ACE thing, this is a load balancing thing. As I outlined in my article on SSL and trust, many CAs (including Verisign) require an intermediate certificate in addition to the server certificate you obtain. Both the server and intermediate certificate need to be installed on the ACE (or other load balancer) for the certificate chain to be complete.
This is an often missed step, as it’s not always obvious from the CA if you need one or not, and even when it is explicitly stated, they don’t often tell you which is the right intermediate (some CAs have several to choose from).
Cert Expired: Health Checks Fail
If you’re doing health check against and HTTPS device, and the certificate (whether self signed or certificate authority-signed) has expired, the health checks will fail. Not matter what. So make sure your back end servers don’t have an expired cert.
That’s all I can think of for now. Feel free to post questions or other possible gotchas in the comments section.
We recently went through the process of renewing the ssl certificates on our web servers. They are running IIS 6 on windows 2003 servers. Our development server was renewed and worked immediately. It does not sit behind the ACE.
For our servers that sit behind the ACE, the https health probe fails when we renew the certificate, and the ACE will not send traffic to the web server with the renewed certificate. The certificate is good because when the probe is removed, the ACE will send traffic to the web server and people can login without any problems. When the probe is put back on, it again says the web server is down.
One thing that is interesting, is that when the probe is on, a netstat -an on the web server shows all port 443 connections in FIN_WAIT_1 status. We saw the connections stay in this state for over 15 hours until the probe was removed and then they cleared out. Then we saw Established connections until the probe was put back on and the connections on 443 went to FIN_WAIT_1 again.
I am fairly certain that the probe is not recognizing the new certificate, but we are not doing SSL on the ACE. It is just functioning as a load balancer.
Hrm, the HTTPS probe on the ACE is quite picky, I’ve had issues where health probes were failed because the certificate had expired. I’d check the certificate to make sure it was installed correctly, and also check the date you have on the ACE (sometimes the clock can be really off, and that will affect both SSL and persistence cookies).