I have spent the last week in Prague for the 80th IETF meeting, and will be writing a few articles on the most interesting of the sessions I attended. Here's the first.
Multipath TCP in a nutshell
Multipath TCP is a neat way to allow TCP sessions to make use of more than one interface on the source and/or destination - the oft-touted example being wifi and 3G on a laptop or phone. In effect, multiple standard (ish) TCP connections - termed subflows - are opened between every pair of source and destination addresses, and these subflows are aggregated together into what is from the application's perspective a single MPTCP connection. The load is balanced between subflows such as to make use of available capacity of every subflow but to favour less-congested paths.
Furthermore - and this is the really cool bit - subflows can be added and removed at ay time, so as you wander in and out of wifi coverage and in and out of 3G coverage, your connections will stay up, making use of whatever is available at the time - provided, of course, you're only talking to MPTCP-capable servers (or proxies: see below).
Now personally, being a lower-layer person generally, I would have implemented this in IP (and it already exists in Ethernet bridges :-) ). Indeed, the IETF has previously standardised various IP-layer solutions to this problem, e.g. Mobile IP and Shim6. But nevertheless there are advantages to implementing this in TCP, most obviously that it makes a lot of sense for the flow control algorithm to also be in control of the load balancing.
Apart from the ubiquitous pedantic debates on minor protocol details, a few noteworthy developments were presented in Prague. I will elaborate on those which interested me the most, but I refer anyone interested in the gory detail to the meeting materials and (when they appear) the minutes.
Sébastien Barré of UCLouvain presented a very neat demo of MPTCP in use, using their Linux implementation on UCL's (London, not Louvain!) testbed. In particular he demonstrated the effect of the linked congestion control algorithm to (simulated, using tc) changes in link speed, loss and round trip time on the two paths (using a live graph of each subflow's throughput whilst the various parameters were tweaked). Summary: with zero synthetic loss, all available capacity will be used as one would hope; tweaking the RTT only causes transient effects. With synthetic loss on one link (and all other parameters set identically), the majority of traffic chooses the lossless link. With equal synthetic loss on both links, traffic is equally balanced once more. (In fact I think the total bandwidth with 1% loss on both links was if anything slightly higher than the total with one lossless link but this was not commented upon; it is likely that it will be some time before the full ramifications of the new congestion control algorithm are fully understood - especially bearing in mind that "simple" TCP still surprises people occasionally!)
The UCLouvain team have produced a MPTCP live CD based on Ubuntu 10.10 (i386 and amd64) which can be downloaded from the UCL IP Networking Lab MPTCP site. They also have implementations for Android and the Nokia N900 phone.
Unfortunately the live CD appears not to work in VMWare Workstation (at least on my laptop) so I have not yet got MPTCP running. I intend to try it out soon though, and will report back if there is interest.
Notably, though, the UCLouvain implementation does not implement any security mechanisms yet.
The mobility use case I mentioned above has not been a primary focus of this working group. Mark Handley presented his take on the benefits of considering this mode of operation; he has preliminary results (detailed in his paper) which suggest that MPTCP can help mobile clients achieve better performance as well as reduced battery usage (since wifi generally requires less transmit power, so using it in favour of 3G when available is advantageous).
However, the problem is that for now we cannot depend on servers to support MPTCP. Mark proposes the use of a proxy - probably provided by the mobile operator - which speaks MPTCP to the handset's various interfaces and standard TCP to legacy servers on its behalf. (And if the server does in fact support it, MPTCP provides a handy way to short-circuit the proxy: the proxy can instruct the client to add subflows direct to the server so it can send and receive data directly.)
There is a protocol design sticking-point though: it is not clear how the client should tell the proxy the address for the onward connection (noting that this is not an application-layer proxy, and this functionality would be implemented by the OS in its TCP stack). The target address would ideally be stated in the SYN packet, for security and to avoid the possibility of having a connection open to the proxy but not the final destination. But there is no space in the SYN header. (Alan Ford commented that perhaps they should wait until Google has some results on what happens if you put data in the SYN packet - they are trying this as part of SPDY.) Various other alternatives were discussed, which rapidlty spiralled down towards evil hacks (scrounging bits from fields which the proxy can interpret in spec-violating ways, running connections over a UDP tunnel to the proxy, etc...).
Unfortunately, useful as this would be, the sense of the room was that the protocol spec should not be delayed in order to add proxy support. This will come later, if at all.
Another stumbling block: an ex-Cisco-employee in the room drew attention to intellectual property rights on this idea (as he was obliged to); a patent now owned by Cisco may cover multipath proxying. They developed this functionality for SCTP but the patent was allegedly written in sufficiently general terms to cover the MPTCP equivalent.
Mark Handley summarised UCL's paper, presented at NSDI11 the previous day, on the linked congestion control mechanism. Since I couldn't be at IETF80 and NSDI11 simultaneously, I refer you to the NSDI11 liveblog by my esteemed colleagues Chris Smowton and Malte Schwarzkopf. Mark noted that as an introduction to the workings of this algorithm, the paper serves rather better than the current internet draft.
There is another paper, not yet published, which presents the results of running MPTCP in a datacentre environment in order to make use of multiple paths between nodes; he tested this using EC2, amongst other things, which claims to provides redundant paths (except when it doesn't - it has apparently been known to provide two paths which actually share the same wire!). MPTCP, he shows, does very well at pushing traffic off a congested hotspot when an alternative exists - and transitively pushing traffic off other paths to make room for that traffic, and so on. He has a great slide (slide 17 of this PPT deck) illustrating this graphically.
Furthermore, where there are multiple relatively-uncongested paths available, MPTCP gives much better throughput: just under twice TCP with two subflows, and about 3x TCP with four subflows (slide 18).
There is not much new to say about the MPTCP API; the only changes since the last meeting have been bug fixes and clarifications. One point stood out which may interest some: the authors considered using the already-defined SCTP API for MPTCP, but were not in favour of this as not all SCTP functionality can be mapped onto MPTCP (the possibility was left open though, since this is just an abstract API spec: "API developers MAY wish to integrate SCTP and MPTCP calls to provide a consistent interface to the application").
More details on pedantic changes are included in the slide deck (PPT).
Network (rather than host) multihoming
Rolf Winter presented his idea for allowing a home gateway (performing NAT) to inform hosts that it has more than one uplink to the internet. Essentially it's a DHCP hack whereby clients can be told to set up additional virtual interfaces on which they will receive addresses in multiple different RFC1918 subnets; each subnet would be directed to a different uplink. (He calls the agent responsible for managing this a "MPTCP proxy" - but this is not in any way related to Mark Handley's term.) Alternatively, new DHCP options can be avoided by sending multiple DHCP offers, one per uplink (again using separate RFC1918 subnets); this also would mean the multiple uplinks could be handled by different gateways.
As noted by someone from the audience, this hack is only necessary for IPv4. IPv6 specifies how multihoming should work, which can be used directly without any further extensions to gateway or host behaviour.
The main people behind MPTCP are very interested in hearing from anybody implementing MPTCP, or even considering doing so (especially if you are Microsoft or Apple! - one piece of good news was that an Oracle rep stated at the meeting that a Solaris implementation is in the works). The authors are also actively seeking feedback on their design choices and the detailed draft specifications. If you have anything to say, I urge you on their behalf to get in touch with them ASAP: the last call for comments will likely happen before the next meeting.
The MPTCP working group is close to completing its chartered work (this will probably happen around the time of the next IETF meeting). It is undecided what will happen next; the default will be to pause for a while until the community has some implementation experience.