{"id":285,"date":"2011-04-15T22:09:17","date_gmt":"2011-04-15T22:09:17","guid":{"rendered":"http:\/\/www.syslog.cl.cam.ac.uk\/?p=285"},"modified":"2011-04-15T22:09:17","modified_gmt":"2011-04-15T22:09:17","slug":"ietf80-congestion-control","status":"publish","type":"post","link":"https:\/\/www.syslog.cl.cam.ac.uk\/2011\/04\/15\/ietf80-congestion-control\/","title":{"rendered":"IETF 80 highlights: congestion control"},"content":{"rendered":"

There were several interesting talks on various aspects of congestion control at IETF 80, spread around various working groups and research groups; the majority of work that I would classify as actual\u00c2\u00a0research<\/em> being done in the IETF and IRTF at the moment seems to concern congestion control in some way or other. \u00c2\u00a0I've already written about Multipath TCP<\/a> and Bufferbloat<\/a>; here's a potpourri of other TCP problems and proposed solutions. \u00c2\u00a0Most of these came out of the meeting of the\u00c2\u00a0Internet Congestion Control Research Group (ICCRG)<\/a> - strictly part of the IRTF<\/a> rather than the IETF - but the presentation on SPDY came from the IETF Transport Area open meeting.<\/p>\n

Chirping for Congestion Control<\/h1>\n

This work being undertaken by Mirja K\u00c3\u00bchlewind<\/a> and Bob Briscoe<\/a> at BT is a really neat way for TCP to react to changes in available capacity quickly and accurately. \u00c2\u00a0As things stand, if a significant amount of new capacity appears after a connection has left the slow-start (exponential) phase, classic TCP can take a long time to make use of this capacity via additive increase of the congestion window. \u00c2\u00a0The state of the art prior to this work reacts more quickly but will overshoot and cause unnecessary congestion.<\/p>\n

Their solution is to \"chirp\"\u00c2\u00a0groups of data packets. \u00c2\u00a0This is an adaptation of an old radio concept<\/a>; in the analogue domain, a chirp is a signal with increasing or decreasing frequency over time. \u00c2\u00a0Here, chirping means to transmit a group of packets with steadily-decreasing inter-packet intervals, or in other words steadily-increasing data rates. \u00c2\u00a0At some point, the sender will pass the capacity of the bottleneck; the part of the chirp sent too quickly for the network to cope with will have been spread out when it reaches the receiver, i.e. the inter-packet intervals will flatten out at a minimum value from which the available capacity can be easily computed.<\/p>\n

This application of chirping to networking was originally devised by Ribeiro et al. -\u00c2\u00a0pathChirp [postscript]<\/a> -\u00c2\u00a0as a means for testing a link; K\u00c3\u00bchlewind and Briscoe have implemented this as a congestion control mechanism for TCP, sending every<\/em> user\u00c2\u00a0data packet as part of a chirp (on the order of 32 packets or half a second long) in order to continuously\u00c2\u00a0reevaluate the available bandwidth. \u00c2\u00a0TCP no longer has to rely on loss to detect the optimal window, and nor does it have to fill the buffer next to the bottleneck - no more\u00c2\u00a0buffer-induced latency<\/a>! \u00c2\u00a0A member of the audience commented that this could work\u00c2\u00a0particularly\u00c2\u00a0well on wireless links.<\/p>\n

There are a few open research questions, though: for instance it is not clear what will happen when everybody is chirping; it may be that chirps interact with each other at the bottleneck.<\/p>\n

Updating TCP to support Variable-Rate Traffic<\/h1>\n

Gorry Fairhurst<\/a> of the University of Aberdeen made the case for revisiting an assumption inherent in TCP: that flows have traditionally been considered to be either \"bulk\" (transmitting data as fast as possible) or \"thin\" (sitting idle for most of the time), not both. \u00c2\u00a0More recently we have seen flows which are both or neither, such as audio\/video streams where the transmission rate is governed by the application or persistent HTTP connections which sit idle but switch to bulk operation occasionally.<\/p>\n

Standard TCP does very badly with such connections for two reasons:<\/p>\n

    \n
  1. If the connection goes idle (e.g. in an interactive application), the congestion window drops to 1 packet - i.e. all information about the probed available capacity is discarded and the connection must go through slow start again when it has more data to transmit. \u00c2\u00a0Performance after a connection has been idle is therefore very poor.<\/li>\n
  2. When the transmission rate is application-governed, the link capacity is massively overestimated: the congestion window continues to increase linearly as ACKs come back without loss.\u00c2\u00a0 TCP-CWV (RFC 2861<\/a>) proposed to solve this particular problem by exponentially decaying the congestion window whilst idle, but this can apparently make the interactive case even worse. (Linux turns on CWV by default, but many interactive applications turn it off; most other operating systems do not use CWV.) \u00c2\u00a0CWV is also not entirely satisfactory in the case of a variable-rate stream, where the congestion window will lag behind the application's transmission rate by a significant amount.<\/li>\n<\/ol>\n

    The proposed fix is to preserve the existing congestion window unmodified (for up to six minutes) whenever the application transmits at less than two-thirds of this window - i.e. the congestion window found during bursts of transmission is tracked. \u00c2\u00a0(The 6-minute timeout is an arbitrary compromise.) \u00c2\u00a0A connection which is idle (or transmitting slowly) for less than six minutes can resume transmission immediately at any time using its previously-determined congestion window.<\/p>\n

    As noted by Mark Handley, however, this behaviour does carry a risk: it is quite possible for the network conditions - in particular, competing traffic - to change significantly during a sub-six-minute idle period, which could cause the connection to transmit at a wildly-inappropriate rate when it returns from idle leading to massive congestion.<\/p>\n

    Datacentre problems and DCTCP<\/h1>\n

    \u00ef\u00bb\u00bf\u00ef\u00bb\u00bfMurari Sridharan of Microsoft is a vociferous proponent for getting the IETF interested in the datacentre, which is approaching and surpassing the limits of several protocols (for example ARP, which is why I was there, but that's another story). \u00c2\u00a0From his perspective the prevailing opinion in the IETF is that the datacentre is \"not the internet\" and tends to be the domain of proprietary equipment running proprietary protocols. \u00c2\u00a0However some of the problems experienced in the datacentre relate specifically to TCP\/IP and may have to be solved by modifying the OS or hypervisor; if that happens, then these modifications will doubtless interact with the internet at some point, either deliberately or accidentally. \u00c2\u00a0Furthermore, as echoed by BT and Google, the line between datacentre and access network is blurring as access technologies increase in bandwidth and datacentres become more distributed.<\/p>\n

    The specifics of the datacentre from Sridharan's perspective (which are of course biased towards the situation in Microsoft datacentres, but probably apply more generally) are:<\/p>\n