Once again collocated with SIGCOMM is the HotSDN conference. I'm here live blogging the proceedings, SDN controllers permitting....
Session 1 - Controller Architecture
The Beacon OpenFlow Controller
- Back in the day, the state of the art was NOX
- Python - inconsistent API
- C++ - STL (strange errors), Linux
- Decided to build a controller
- Decided to use Java because it's fast and cross platform.
- Decided against C# because it's not x-platform
- Decided against python because it's single threaded? [ed huh?]
- SDN controller ~= operating system [ed: interesting insight]Â
- Core module, with lots of modules above it.
- Single threaded, 1 thread per core. 1 thread per switch.
- Beacon scales up to about 13M repossess per second on 12 cores, much better than anyone else.
- Floodlight is an open source version of Beacon.
Q: Why is it fast? A: Limited by programming languages, and I/O interfaces.
Q: The metric you're using, what does it tell you about apps? A: It's the lowest bar. This is the overhead that your controller is running.
Exploiting Locality in Distributed SDN Control
- Controllers don't dictate the network, you have a distributed system.Â
- 2 Dimensions -
- There is a spectrum. Fully centralised through to totally distributed.
- Visibility of the network from fully Â centralised to totally distributed.
- How to control a network if we have a local view only?
- Local view means we can scale easily.
- Load balancing == Link assignment "semi-matching problem"
- Loop free forwarding == Spanning tree
- Both tasks are trivial if you have a global view, not trivial with a local view.
- Local control - we try to minimise interactions between controllers.
- Take home 1 - Go for local approximations - Semi-matching is global, but you can do it almost optimally.
- Take home 2 - Verification is easier than computation - verification can be made local
- Take home 3 - Not purely local, pre-processing can help.
- Take home > 3 - Make sure that your controller graph has a low degree.
Â Towards an Elastic Distributed SDN Controller
- We'd like a distributed control plane, but we don't want to overload controllers.
- So, we want dynamic load distribution, and elastic control expansion/shrinking.
- This requires, load estimation at the controllers, and a switch migration protocol.
- This paper focuses on a switch migration protocol.
- Naive switch migration - Master + State. Migration = Switch master and slave.
- Pretty simple, just send a change role message.
- But, you can drop packets if you switch half way through message exchanges.
- 4-Phase migration protocol.
- Change from slave to equal both now receive messages, but only one responds.
- New master adds a dummy entry and removes it. This creates a message that is sent to both controllers. Sync point.
- Â Old master flushes out any old messages.
- Equal now switches to master.
- Evaluated with mininet
- Cannot generate enough traffic on a single host to test this.
- So we enhanced minimnet to be distributed emulated network using GRE tunnels.
Cementing High Availability in OpenFlow with RuleBricks
- We need to plan for failure in our SDN networks.Â
- How can high availability policies be added o openflow's forwarding rules?
- We wan't a backup plan in case something fails.
- Exploits two features
- Hierarchal structure of wildcards
- Precedence rules extraction
- If you look at matching as blocks (bricks) instead of a a binary tree - each brick reprsents some part of the matching space.
- We colour the bricks depending on which flows they belong to.
- Dropping bricks down as new replicas come online.
- Insert - slide a brick in underneath other bricks.
- Reduce transforms - Fragment, defragment, duplicate.
- Built a greed algo that frags/defrags bricks.
- 1. Fragement 2. Deduplicate 3. defrag.
- What can you do with it?
- 1 brick to cover the entire space.
- Add a replica, another rule for the whole space, plus an active rule.
- Implemented rule bricks in python.
- how does it affect the rule list size? Not too bad.
- Â Take away, failure is elasticity's evil twin.
Q: Do you end up with a mess in the end? Fragmentation A: We don't just drop from the top. The reduction operations help to defragment. No proof of optimality Â (yet).
Q: How do you detect failure? A: We don't.
Q: How do you detect host failure? A: Â We assume that you're running in a VM.
Session 2 - Testing, Simulation, and Debugging
Leveraging SDN Layering to Systematically Troubleshoot Networks
- How do you automate troubleshooting?
- Know the intended policy
- Check behaviour against policy
- Requirements in in a traditional network are hard.
- But in SDN??
- Key insight: Modularity allows us to check the behaviour of any given layer. Any bug will manifest as a mistranslation between layers.
- Allows a systematic approach.
- Two hosts can't communicate.
- Check the policy to make sure the sorting policy is ok. (Anteater)
- Use "soft" to figure out if switches are implementing rules correctly. Then we know there is a firmware bug.
- Â So, what has changed?Â
- We no longer need to understand all of the protocols and all of their interactions. Tools can do that for us.
- Automated troubleshooting appears to be possible.
- Thinking about troubleshooting in terms of tools and layers.
- Plenty of opportunities left.
High-Fidelity Switch Models forÂ Software-Deï¬ned Network Emulation
- Suppose you want to buy an SDN switch? Which one do you want to buy?
- Â Different switches have different performances.
- How do you predict the performance of 1000 switches given just 1.
- Emulators/simulators are great, but they aren't good enough, for control plane.
- open vswitch doesn't model the data plane.
- We want an emulator. But what are the differences?
- Flow table size. Flow table management policy, different switches, different CPUs
- Just going to model switch CPU speed for this talk.
- In a flow setup stage, the flow is handled by the CPU. Which may be wimpy.
- Ingress, egress delays matter.
- Measure ingress/egress Â delay.
- OVS is really fast. Ingress/egress delay is almost zero - So we can slow down OVS to make it the same as a real switch.
- Emulated performance is a pretty reasonable approximation.
- Not quite the same because we only capture ingress/egress delays.
Fast, Accurate Simulation for SDN Prototyping
- Prototyping and debugging SDN is hard.Â
- The main challenge is the S. Writing good software is hard.
- Goal develop an SDN simulation capability that complements existing development and debugging tools.
- fs - Accurate to 1 second time scales. FS-SDN uses the POX controller.
- Monkey patching to get POX and FS to talk to each other.
- Results: Significant speed up. 900 second experiments down to 6 - 72 seconds.
- Code available on github
EtherPIPE: an Ethernet character device for networkÂ scripting
- We provide a character device, that allows packets to be written and read
- Two interfaces, ascii and binary formats.
- Can use standard linux text tools for creating and capturing packets.
- "optimized" interface with binary tools.
- Uses LatticeECP3 FPGA NIC.
- 1.4M pps with character packets at 1G == line rate.
Session 3 - Â Improved Abstractions and State Management
Protocol-Oblivious Forwarding: Unleash the PowerÂ of SDN through a Future-Proof Forwarding Plane
- We want to make switches oblivious to the SDN protocol used.Â
- SDN should operate like a PC. With generic instructions.
- It can be done efficiently with microcode.
- We built a prototype.
- We need to standardise the instruction set.
Question: How do you handle diversity in the instruction sets. Vendors always add proprietary extensions!
Incremental Consistent Updates
- How do you update Â policy in theÂ controllersÂ consistently.Â
- Past work: Packet sees either old policy or new policy, so you keep both.
- But this takes 100% overhead and rule space is expensive. Can you do an incremental update?
- Trade space for time.
- Update the policy slice by slice.
- How do you compute a slice?
- Compute using symbolic execution.
- Update slices like garbage collection. Use reference counting to make sure that packets see only Â 1 rule.
HotSwap: Correct and Efï¬cient ControllerÂ Upgrades for Software-Deï¬ned Networks
- How do you upgrade your SDN controllers, which is destruction free and correct.Â
- SDN controllers must be upgraded to:
- fix bugs
- new services
- improve performance.
- Releases to popular openflow controllers (3-33) patches 1000's
- We upgrade controllers by rebooting.
- Network failure
- rule timeout
- packet failure
- New controller, wipes out the forwarding table, so there are no remaining rules.
- Is itÂ really a problem?
- Simulated using mini-net. Up to 83% of packets lost.
- Consider - a controller running a statefull firewall.
- When a controller starts. Â You can block allowed traffic, or allow forbidding traffic.
- HotSwap is a hypervisor - warms up the upgraded controller before handing over to it.
- Record stage. - Abstract view of network state
- Replay - plays the network events over to the new controller.
- Compute the delta between the rules.
- Push the rules out.
- Return to 1
- Scalability - Recording everything doesn't scale, but, we only need some subset of the events.
- Correctness - We want version 2 of the rules to behave as if it had always been 1 rule.
Â CAP for Networks
- Â CAP theorem, correctness or availability, can't have both.Â
- eg SQL servers correctness, KVS, Â availability.
- What about networks? With the move to SDN, we wanted richer functions.
- Control plane partitions no longer imply data plane partitions.
- Data plane connected --> Delivers packets, but not necessarily correct.
- Can one can provide correct isolation and availability in the presence of link failures.
- Only consider, Â out of band control networks.
- You cannot, in general, provide both isolation and availability in the presence of failure.
- You must pick one or the other.
Session 4 - Theoretical Foundations
Software Transactional Networking:Â Concurrent and Consistent Policy Composition
- How does the design of the distributed control plane affect replication?
- Can we realise the benefits of software transactional memory (STM) in the network? "Software Transactional Networking" (STN)
FatTire: Declarative Fault ToleranceÂ for Software-Deï¬ned Networks
- OpenFlow 1.1+ -> Fast failureÂ
- Frenetic provides a declarative language for expressing forwarding policies.
- But doesn't know how to cope if there is a failure because it's hop-by-hop
- FatTire -> Lifts the abstraction up from hop-by-hop to full path forwarding.
- Write programs in terms of regular expressions
- Use annotations to define fault-tolerance conditions.
- Combine policies with intersections and unions.
Resource/Accuracy Tradeoffs in Software-Deï¬nedÂ Measurement
- Lots of good reasons to do measurement
- Management policies often want measurement.Â
- Accounting for pricing also wants it
- As does troubleshooting
- Would like to use SDN to do measurement.
- Limited resources CPU/memory in switches.
- Limited control network bandwidth.
- Different primitives used in switches.
- Different measurements require different resources.
- Eg: Monitor 10.0.0.1/24, when traffic exceeds a threshold, figure out where it's from.
- Use sketches from hash based counters.
Towards Secure and Dependable Software-Deï¬nedÂ Networks
- Â SDN allows us to program the network. Cool!
- But bad guys can do it too.
- 7 new threat vectors.
- Tools we can use:
- Diversity -> Making it hard to find vulnerability.
- Autonomic trust between controllers and devices, and between apps and controllers.
- Security domains (eg rings on OSs, eg FortNox)
- Use a collection of different controllers, with a common API. Diversity is strength.
- Main message: SDN increases the threat surface.
Q: IOS already has attack vectors. Multi-chassis machines already have controlers. Where is the new attack vector: A: ??
A Balance of Power:Â Expressive, Analyzable Controller Programming
- When you program the switch you can use any language.
- But the data plane expects open flow.
- Programming in OpenFlow is a pain.
- So, we can use higher level languages. (Frenetic, Pyretic etc)
- Abstractions aren't quite right.
- Want a hash table, or a set or a graph.
- In reality, these high level languages are wrong. Half for the controller, half for the control.
- When you use a bug finding tool, it find bug in the flow tables, but you want to relate that back to the control plane programming language.
- Missing: Platform for programming and verification of control and data plane.
- FlowLog: A limited and restricted language. Unified for doing both control and data plane code.
- Based on prolog: No recursion support. No arithmetic.
- Cannot express shorted path algo in FlowLog.
OF.CPP: Consistent Packet Processing for OpenFlow
- I'm not really following this talk.Â
Session 5 -Â Dataplane and Wireless Techniques
FlowTags: Enforcing Network-Wide Policies in theÂ Presence of Dynamic Middlebox Actions
- Add flow tags to packets to bridge gaps across middle boxes.Â
- Middle boxes produce and consume tags
- FlowTags make flow context visible.
- Minimal modifications to middle boxes.
SoftRAN: Software Deï¬ned Radio Access Network
- Radio access network, high capacity wireless connectivity.Â
- Increasing demand for wireless resources.
- solution: dense deployment, but it's possible for base stations to cross paths.
- In dense deployments higher frequency of handovers.
- We need tightly co-ordinated management.
- Currently limited coordination between base stations.
- SofRAN - Big BaseStation Abstraction.
- Abstract capacity as a single large 3d capcity structure.
- Logically centralised management.
The FlowAdapter: Enable Flexible Multi-Table ProcessingÂ on Legacy Hardware
- Â Goal is to address the mismatch between the controllers.
- I can't follow this talk, and I don't understand the abstract. :-(
Cheap Silicon: a Myth or Reality? Â Picking the Right Data PlaneÂ Hardware for Software Defined Networking
- Traditional landscape, if we reduce programability, weÂ increaseÂ performance.Â
- NPUS !4-5W / 10G
- CPUS ~25W / 10G
- Switches ~0.5W / 10G
- But this is not apples to apples.
- We implement a switching load. (on paper)
- CPU Packets per second per watt isn't that bad when we compare now. And maximum packet rate is quite good.
- Performance depends mostly on the use case.
- NPU is probably the right model. Proof by prototyping.
Open Transport Switch - A Software Deï¬ned NetworkingÂ Architecture for Transport Networks
- Â What about the WAN?
- There's a lot of complexity in the wAN.
- Circuit oriented.
- Static pipes/configuration