OpenFlow/SDN Won’t Scale?
December 17, 2013 5 Comments
I got in a conversation today on Twitter, talking about SDN/SDF (software defined forwarding), which is a new term I totally made up which I use to describe the programmatic and centralized control of forwarding tables on switches and multi-layer switches. The comment was made that OpenFlow in particular won’t scale, which reminded me of an article by Doug Gourlay of Arista talking about scalability issues with OpenFlow.
The argument that Doug Gourlay of Arista had is essentially that OpenFlow can’t keep up with the number of new flows in a network (check out points 2 and 3). In a given data center, there would be tens of thousands (or millions or tens of millions) of individual flows running through a network at any given moment. And by flows, I mean keeping track of stateful TCP connection or UDP pseudo-flows. The connection rate would also be pretty high if you’re talking dozens or hundreds of VMs, all taking in new connections.
My answer is that yeah, if you’re going to try to put the state of every TCP connection and UDP flow into the network operating system and into the forwarding tables of the devices, that’s not going to scale. I totally agree.
But why would you do that?
Why wouldn’t you, instead of keeping track of every flow, do destination-based forwarding table entries, which is how forwarding tables on switches are currently programmed? The network operating system/controller would learn (or be told about) the specific VMs and other devices within the data center. It could learn this through conversational learning, flooding, static entries (configured by hand), automated static entries (where an API programs it in such as connectivity through vCenter), or externally through traditional MAC flooding and routing protocols.
In that case, the rate of change in the forwarding tables would be relatively low, and not much different then how switches are currently programmed with Layer 3 routes and Layer 2 adjacencies with traditional methods. This would actually be more likely shrink the size of the forwarding tables when compared with traditional Ethernet/IP forwarding, as the controller could intelligently prune the forwarding tables of the switches rather than flood and learn every MAC address on every switch that has a given VLAN configured (similar to what TRILL/FabricPath/VCS does).
We don’t track every TCP/UDP flow in a data center with our traditional networking, and I can’t think of any value-add to keeping track of every flow in a given network, even if you could. So why would OpenFlow or any other type of SDF do any different? We have roughly the same size tables, and we have the added benefit of including Layer 4, VXLAN, NVGRE, or even VLANs in the forwarding decisions.
I honestly don’t know if keeping track of every flow was the original concept with OpenFlow (I can’t imagine it would be, but there’s a lot of gaps in my OpenFlow knowledge), but it seems an API that programs a forwarding table could be made to do so without keeping traffic of every gosh darn flow.
You don’t need Openflow / SDN controller to automate networks. Cisco has it’s OnePK SDK were you can program directly against the hardware of the switch. You can also use Puppet/Master to automate servers and networks.
But these are still static networks. Yes, you can program them. But still they are static. SDN has also a “dynamic mode” were you can change the path of traffic flows “on-the-fly.” ( e.g. because of increase of latency you want to steer traffic away from that path. ) The best SDN controller out there only does 100.000 flows in “dynamic mode” (SDN controller of NEC)
With Cisco ACI, things are different. Forwarding is done in hardware. But the policy is governed in the ACI controller. With ACI by default, no traffic is forwarded, unless there is a policy that says so. It’s the combination of hardware and software that makes it scale.
Please check with Konrad Rzadzinski of Firefly. He has a nice course on SDN / Openflow / Cisco ACI.
Hi Peter,
Of course, you don’t need SDN to automate networks. This wasn’t talking about automation though, it was talking about forwarding, and someone having mentioned that OpenFlow or direct programming of forwarding tables other than the traditional method, doesn’t scale.
If you want to change flows dynamically, I would imagine it would be more efficient to do prefix matching, and just go fairly deep into the headers. The Trident IIs that the new 40 Gbit switches from various vendors (Arista, Cisco’s ACI, etc.) can do LAG with load balancing based on a 16-bit hash (traditionally we’ve done 3-bit hashes, though some more recent switches can do 8). You wouldn’t need to track every individual flow, just do some prefix matching to move flows around.
The forwarding-programming model for ACI is similar to datacenter-style OpenFlow (such as NEC’s controller). Both OpenFlow and ACI have a controller with a view on the network, and both program forwarding tables. OpenFlow can also be “white-list” based (no connection unless it’s allowed). Cisco does takes it several steps further however, doing service chaining and VXLAN termination/encap and some other really cool stuff on the packet level.
ACI also doesn’t keep track (on a state level) of every flow. It’s end-point based.
And I know Konrad, great instructor. We both teach (including ACI) at Firefly 🙂
-Tony
Thanks for helping shoot down the folks who are shooting down a naive strawman version of SDN, Tony. Just because the earliest (academic) uses of OpenFlow did reactive per-flow rule installation does not mean that people who use it in practice will down something so foolish.
Based on what (little) I know of OpenFlow, I don’t think Doug is referring to tracking the state of all these flows, rather just setting them up in the first place.
Last I checked, Openflow “reactive” mode sends that first packet of *EVERY* flow up to the controller in order to set up the flow entry in the appropriate switch(es). How does this handle a major network recovery event (e.g: a large chunk of the network comes back online) when all traffic received becomes “first packet” in a very short space of time and flows need to be pushed down for all involved switches?
I’m not sure Flow tracking (in terms of state) was ever an objective of Openflow, but until Hybrid mode was defined, all of the destination-based forwarding entries you’re describing were installed more like 5-tuple ACLs than traditional forwarding-table entries, and this definitely had implications on scaling.
Why would the first packet hit the controller though? That would be rare I think. The networks would be learned or stated, either by an API/manual configuration or learning through Layer 2 flooding or learning through traditional routing protocols, or even “default gateway”.
Perhaps in the campus environment would be different, but I’m thinking of it from a DC perspective (and so is Arista, I would imagine).
If you’re installing 5-tuble forwarding rules on a TCAM (or equivalent) with a NOS, I would think you could scale a lot better than you would with traditional MAC learning/IP forwarding.
In a DC, Layer 2 segment sizes are limited in by the number of (T)CAM entries on a DC switch. Since every switch needs to learn every MAC address for a given VLAN, that’s a lot of MAC addresses. If the location of each end point is known by the controller and access controlled by the controller, the forwarding tables of the individual switches can be pruned to only the relevant entries (much how say a Nexus 7000 line card only gets TCAM entries for VLANs that are assigned to the ports).
Learning of new endpoints should be pretty rare, and changes easy to orchestrate (vMotion, host moves, etc.).