Liveblog from SICOMM13 – HotSDN Workshop

Once again collocated with SIGCOMM is the HotSDN conference. I'm here live blogging the proceedings, SDN controllers permitting....

Session 1 - Controller Architecture

The Beacon OpenFlow Controller


  • Back in the day, the state of the art was NOX
    • Python - inconsistent API
    • C++ - STL (strange errors), Linux
  • Decided to build a controller
    • Decided to use Java because it's fast and cross platform.
    • Decided against C# because it's not x-platform
    • Decided against python because it's single threaded? [ed huh?]
  • SDN controller ~= operating system [ed: interesting insight] 
  • Beacon:
    • Core module, with lots of modules above it.
    • Single threaded, 1 thread per core. 1 thread per switch.
  • Beacon scales up to about 13M repossess per second on 12 cores, much better than anyone else.
  • Floodlight is an open source version of Beacon.



Q: Why is it fast? A: Limited by programming languages, and I/O interfaces.

Q: The metric you're using, what does it tell you about apps? A: It's the lowest bar. This is the overhead that your controller is running.


Exploiting Locality in Distributed SDN Control


  • Controllers don't dictate the network, you have a distributed system. 
  • 2 Dimensions -
    • There is a spectrum. Fully centralised through to totally distributed.
    • Visibility of the network from fully  centralised to totally distributed.
  • How to control a network if we have a local view only?
  • Local view means we can scale easily.
  • Applications
    • Load balancing == Link assignment "semi-matching problem"
    • Loop free forwarding == Spanning tree
  • Both tasks are trivial if you have a global view, not trivial with a local view.
  • Local control - we try to minimise interactions between controllers.
  • Take home 1 - Go for local approximations - Semi-matching is global, but you can do it almost optimally.
  • Take home 2 - Verification is easier than computation - verification can be made local
  • Take home 3 - Not purely local, pre-processing can help.
  • Take home > 3 - Make sure that your controller graph has a low degree.


 Towards an Elastic Distributed SDN Controller

  • We'd like a distributed control plane, but we don't want to overload controllers.
  • So, we want dynamic load distribution, and elastic control expansion/shrinking.
  • This requires, load estimation at the controllers, and a switch migration protocol.
  • This paper focuses on a switch migration protocol.
  • Naive switch migration - Master + State. Migration = Switch master and slave.
  • Pretty simple, just send a change role message.
  • But, you can drop packets if you switch half way through message exchanges.
  • 4-Phase migration protocol.
    1. Change from slave to equal both now receive messages, but only one responds.
    2. New master adds a dummy entry and removes it. This creates a message that is sent to both controllers. Sync point.
    3.  Old master flushes out any old messages.
    4. Equal now switches to master.
  • Evaluated with mininet
  • Cannot generate enough traffic on a single host to test this.
  • So we enhanced minimnet to be distributed emulated network using GRE tunnels.

Cementing High Availability in OpenFlow with RuleBricks

  • We need to plan for failure in our SDN networks. 
  • How can high availability policies be added o openflow's forwarding rules?
  • We wan't a backup plan in case something fails.
  • Exploits two features
    • Hierarchal structure of wildcards
    • Precedence rules extraction
  • If you look at matching as blocks (bricks) instead of a a binary tree - each brick reprsents some part of the matching space.
  • We colour the bricks depending on which flows they belong to.
    1. Dropping bricks down as new replicas come online.
    2. Insert - slide a brick in underneath other bricks.
    3. Reduce transforms - Fragment, defragment, duplicate.
  • Built a greed algo that frags/defrags bricks.
  • 1. Fragement 2. Deduplicate 3. defrag.
  • What can you do with it?
  • Eg:
  • 1 brick to cover the entire space.
  • Add a replica, another rule for the whole space, plus an active rule.
  • Implemented rule bricks in python.
    • how does it affect the rule list size? Not too bad.
  •  Take away, failure is elasticity's evil twin.



Q: Do you end up with a mess in the end? Fragmentation A: We don't just drop from the top. The reduction operations help to defragment. No proof of optimality  (yet).

Q: How do you detect failure? A: We don't.

Q: How do you detect host failure? A:  We assume that you're running in a VM.



Session 2 - Testing, Simulation, and Debugging

Leveraging SDN Layering to Systematically Troubleshoot Networks


  • How do you automate troubleshooting?
  • Requirements:
    1. Know the intended policy
    2. Check behaviour against policy
  • Requirements in in a traditional network are hard.
  • But in SDN??
  • Key insight: Modularity allows us to check the behaviour of any given layer. Any bug will manifest as a mistranslation between layers.
  • Allows a systematic approach.
  • Eg:
    • Two hosts can't communicate.
    • Check the policy to make sure the sorting policy is ok. (Anteater)
    • Use "soft" to figure out if switches are implementing rules correctly. Then we know there is a firmware bug.
  •  So, what has changed? 
    • We no longer need to understand all of the protocols and all of their interactions. Tools can do that for us.
    • Automated troubleshooting appears to be possible.
  • Thinking about troubleshooting in terms of tools and layers.
  • Plenty of opportunities left.

High-Fidelity Switch Models for Software-Defined Network Emulation


  • Suppose you want to buy an SDN switch? Which one do you want to buy?
  •  Different switches have different performances.
  • How do you predict the performance of 1000 switches given just 1.
  • Emulators/simulators are great, but they aren't good enough, for control plane.
  • open vswitch doesn't model the data plane.
  • We want an emulator. But what are the differences?
  • Flow table size. Flow table management policy, different switches, different CPUs
  • Just going to model switch CPU speed for this talk.
  • In a flow setup stage, the flow is handled by the CPU. Which may be wimpy.
  • Ingress, egress delays matter.
  • Measure ingress/egress  delay.
  • OVS is really fast. Ingress/egress delay is almost zero - So we can slow down OVS to make it the same as a real switch.
  • Emulated performance is a pretty reasonable approximation.
  • Not quite the same because we only capture ingress/egress delays.

Fast, Accurate Simulation for SDN Prototyping

  • Prototyping and debugging SDN is hard. 
  • The main challenge is the S. Writing good software is hard.
  • Goal develop an SDN simulation capability that complements existing development and debugging tools.
  • fs - Accurate to 1 second time scales. FS-SDN uses the POX controller.
  • Monkey patching to get POX and FS to talk to each other.
  • Results: Significant speed up. 900 second experiments down to 6 - 72 seconds.
  • Code available on github


EtherPIPE: an Ethernet character device for network scripting


  • We provide a character device, that allows packets to be written and read
  • Two interfaces, ascii and binary formats.
  • Can use standard linux text tools for creating and capturing packets.
  • "optimized" interface with binary tools.
  • Uses LatticeECP3 FPGA NIC.
  • 1.4M pps with character packets at 1G == line rate.


Session 3 -  Improved Abstractions and State Management

Protocol-Oblivious Forwarding: Unleash the Power of SDN through a Future-Proof Forwarding Plane


  • We want to make switches oblivious to the SDN protocol used. 
  • SDN should operate like a PC. With generic instructions.
  • It can be done efficiently with microcode.
  • We built a prototype.
  • We need to standardise the instruction set.

Question: How do you handle diversity in the instruction sets. Vendors always add proprietary extensions!

Incremental Consistent Updates

  • How do you update  policy in the controllers consistently. 
  • Past work: Packet sees either old policy or new policy, so you keep both.
  • But this takes 100% overhead and rule space is expensive. Can you do an incremental update?
  • Trade space for time.
  • Update the policy slice by slice.
  • How do you compute a slice?
  • Compute using symbolic execution.
  • Update slices like garbage collection. Use reference counting to make sure that packets see only  1 rule.


HotSwap: Correct and Efficient Controller Upgrades for Software-Defined Networks

  • How do you upgrade your SDN controllers, which is destruction free and correct. 
  • SDN controllers must be upgraded to:
    • fix bugs
    • new services
    • improve performance.
  • Releases to popular openflow controllers (3-33) patches 1000's
  • We upgrade controllers by rebooting.
    • Network failure
    • rule timeout
    • packet failure
  • New controller, wipes out the forwarding table, so there are no remaining rules.
  • Is it really a problem?
  • Simulated using mini-net. Up to 83% of packets lost.
  • Consider - a controller running a statefull firewall.
    • When a controller starts.  You can block allowed traffic, or allow forbidding traffic.
  • HotSwap is a hypervisor - warms up the upgraded controller before handing over to it.
    1. Record stage. - Abstract view of network state
    2. Replay - plays the network events over to the new controller.
    3. Compute the delta between the rules.
    4. Push the rules out.
    5. Return to 1
  • Challenges:
    • Scalability - Recording everything doesn't scale, but, we only need some subset of the events.
    • Correctness - We want version 2 of the rules to behave as if it had always been 1 rule.


 CAP for Networks

  •  CAP theorem, correctness or availability, can't have both. 
  • eg SQL servers correctness, KVS,  availability.
  • What about networks? With the move to SDN, we wanted richer functions.
  • Control plane partitions no longer imply data plane partitions.
  • Data plane connected --> Delivers packets, but not necessarily correct.
  • Can one can provide correct isolation and availability in the presence of link failures.
  • Only consider,  out of band control networks.
  • You cannot, in general, provide both isolation and availability in the presence of failure.
  • You must pick one or the other.



Session 4 - Theoretical Foundations

Software Transactional Networking: Concurrent and Consistent Policy Composition

  • How does the design of the distributed control plane affect replication?
  • Can we realise the benefits of software transactional memory (STM) in the network? "Software Transactional Networking" (STN)


FatTire: Declarative Fault Tolerance for Software-Defined Networks

  • OpenFlow 1.1+ -> Fast failure 
  • Frenetic provides a declarative language for expressing forwarding policies.
  • But doesn't know how to cope if there is a failure because it's hop-by-hop
  • FatTire -> Lifts the abstraction up from hop-by-hop to full path forwarding.
  • Write programs in terms of regular expressions
  • Use annotations to define fault-tolerance conditions.
  • Combine policies with intersections and unions.


Resource/Accuracy Tradeoffs in Software-Defined Measurement


  • Lots of good reasons to do measurement
    • Management policies often want measurement. 
    • Accounting for pricing also wants it
    • As does troubleshooting
  • Would like to use SDN to do measurement.
  • Limited resources CPU/memory in switches.
  • Limited control network bandwidth.
  • Different primitives used in switches.
  • Different measurements require different resources.
  • Eg: Monitor, when traffic exceeds a threshold, figure out where it's from.
  • Use sketches from hash based counters.


Towards Secure and Dependable Software-Defined Networks

  •  SDN allows us to program the network. Cool!
  • But bad guys can do it too.
  • Threats?
  • 7 new threat vectors.
  • Tools we can use:
    • Replication
    • Self-Healing
    • Diversity -> Making it hard to find vulnerability.
    • Autonomic trust between controllers and devices, and between apps and controllers.
    • Security domains (eg rings on OSs, eg FortNox)
  • Use a collection of different controllers, with a common API. Diversity is strength.
  • Main message: SDN increases the threat surface.



Q: IOS already has attack vectors. Multi-chassis machines already have controlers. Where is the new attack vector: A: ??


A Balance of Power: Expressive, Analyzable Controller Programming

  • When you program the switch you can use any language.
  • But the data plane expects open flow.
  • Programming in OpenFlow is a pain.
  • So, we can use higher level languages. (Frenetic, Pyretic etc)
  • Abstractions aren't quite right.
    • Want a hash table, or a set or a graph.
  • In reality, these high level languages are wrong. Half for the controller, half for the control.
  • When you use a bug finding tool, it find bug in the flow tables, but you want to relate that back to the control plane programming language.
  • Missing: Platform for programming and verification of control and data plane.
  • FlowLog: A limited and restricted language. Unified for doing both control and data plane code.
  • Based on prolog: No recursion support. No arithmetic.
  • Cannot express shorted path algo in FlowLog.


OF.CPP: Consistent Packet Processing for OpenFlow

  • I'm not really following this talk. 



Session 5 - Dataplane and Wireless Techniques

FlowTags: Enforcing Network-Wide Policies in the Presence of Dynamic Middlebox Actions

  • Add flow tags to packets to bridge gaps across middle boxes. 
  • Middle boxes produce and consume tags
  • FlowTags make flow context visible.
  • Minimal modifications to middle boxes.


SoftRAN: Software Defined Radio Access Network

  • Radio access network, high capacity wireless connectivity. 
  • Increasing demand for wireless resources.
  • solution: dense deployment, but it's possible for base stations to cross paths.
  • In dense deployments higher frequency of handovers.
  • We need tightly co-ordinated management.
  • Currently limited coordination between base stations.
  • SofRAN - Big BaseStation Abstraction.
    • Abstract capacity as a single large 3d capcity structure.
    • Logically centralised management.

The FlowAdapter: Enable Flexible Multi-Table Processing on Legacy Hardware

  •  Goal is to address the mismatch between the controllers.
  • I can't follow this talk, and I don't understand the abstract. :-(


Cheap Silicon: a Myth or Reality?  Picking the Right Data Plane Hardware for Software Defined Networking

  • Traditional landscape, if we reduce programability, we increase performance. 
    • NPUS !4-5W / 10G
    • CPUS ~25W / 10G
    • Switches ~0.5W / 10G
  • But this is not apples to apples.
  • We implement a switching load. (on paper)
  • CPU Packets per second per watt isn't that bad when we compare now. And maximum packet rate is quite good.
  • Performance depends mostly on the use case.
  • NPU is probably the right model. Proof by prototyping.


Open Transport Switch - A Software Defined Networking Architecture for Transport Networks

  •  What about the WAN?
  • There's a lot of complexity in the wAN.
  • Circuit oriented.
  • Static pipes/configuration
Comments (0) Trackbacks (0)

No comments yet.

Leave a comment

No trackbacks yet.