Continuing my series of articles about noteworthy happenings at last week's IETF meeting, here is a summary of Jim Gettys's presentation on "Bufferbloat" to the Transport Area open meeting, which is in turn a summary of his many articles on the topic. This is significant not because it is new, but because it is a very thorough treatment of an old problem which has gone ignored by much of the IETF community - and probably by others too.
The central premise is that the design of TCP assumes that feedback of information about congestion to the sender happens in a timely manner - but this assumption is flawed on today's internet due to buffering. Most links will, when congested, only start to drop packets (the only means generally available for telling the sender to slow down) when the transmit buffer sat before the link becomes full, and even then drop-from-tail means that the following packets have to traverse the full buffer and cause ACKs before the drop is noticed. Many components of the internet contain bloated buffers which take several seconds to empty even at full line rate.
In the several seconds between a TCP flow causing congestion and feedback of that congestion arriving back at the sender, the transmission rate continues to increase further, worsening the congestion. The result is a cycle of repeated bursts of massive congestion; TCP is expected to cause some burstiness but the reality is much worse than the intention. (If you watch a packet dump you will see bursts of duplicate ACKs and retransmits, repeated on the order of tens of seconds.)
Traffic classification doesn't help; what you need is active queue management at the point of bottleneck. Unfortunately the bottleneck could arise in any of a vast selection of places, some of which are surprising. Jim calls these "dark buffers" - buffers which most of the time sit empty causing no harm, but when there is a bottleneck the corresponding buffer will suddenly start to fill and cause pain. Generally the buffers either side of home or mobile access links get the blame, but in his experiments Jim has succeeded in reproducing this problem in buffers in pretty much every component and at every layer, including the application layer.
In fact, applications have recently begun to make the problem much worse. There have been proposals to increase TCP's initial congestion window, and some implementations do this already, notably Google's (in so doing they are violating the standard in order to, in their eyes, improve performance). Even if the initial congestion window is left alone, modern web browsers open several simultaneous connections to the same server - in effect this is exactly the same as increasing the initial congestion window if you consider the number of packets initially in flight simultaneously. Jim makes the point that either of these behaviours may be too much for a congested link to cope with, leading to more-permanent congestion since TCP will not back down far enough - and meanwhile, the responsible flows are in slow start, exponentially increasing their congestion window for several seconds before they see that they have overshot by a very long way.
Jim considers this to be evidence that HTTP is broken in encouraging network-unfriendly behaviour and should be replaced; I think however that he may underestimate the difficulty of replacing the most popular application-layer protocol - although Google are quietly doing this in their corner of the internet, but that is another story about which I will write later.
(In fact, as noted by Joe Touch, even a single standard TCP connection's initial congestion window of 1 maximum segment size may be excessive in some cases.)
Jim went rapidly through a list of implications. For example, carriers' managed telephony services have a considerable quality advantage over any VoIP system using the internet - this could perhaps be considered a network neutrality issue!
He also alleges that BitTorrent's problems may have been misdiagnosed and were actually due to bufferbloat. This statement confuses me, since to my mind congestion and buffering of competing traffic is the single major problems with (pre-LEDBAT) BitTorrent. Perhaps I am being unfair, but it seems to me (having not read his articles yet) that Jim is approaching this problem without considering that it may in fact already be well-understood and well-documented. He has rediscovered the problems himself, and done a remarkably thorough job of documenting them, but I am not sure he realises that he is not the first to tread this path. (My undergraduate dissertation was on this very topic!) Having said that, regardless of how he got here, Jim is getting this message through to the right people to actually (I hope) fix the problems properly.
He does acknowledge random early detection (RED), the prevalent approach to active queue management devised by Van Jacobson and Sally Floyd in 1993, as a step in the right direction. However as he points out RED is not widely deployed (and likewise nor is ECN); he suspects that this is due to a feeling amongst operators that it can have undesired side-effects, which may well be justified as two problems with RED have been known to the authors for well over a decade. Van Jacobson has a paper describing a fixed version which he has held off from publishing for a long time (apparently it may actualy be published soon). Apparently the revised RED algorithm detailed in draft of this paper, circulated in 1999, also has a bug.
Jim offers no concrete solutions as yet. Bufferbloat can't be fixed in TCP without changing every TCP implementation out there, and perhaps some application layer protocols; any remaining unmodified TCP will cause problems for competing better-behaved traffic if allowed to fill a buffer. His ultimate solution is non-broken active queue management (i.e. not classic RED) everywhere in the network, but there are many operators and vendors whose minds are not easy to change.
Perhaps the Congestion Exposure (conex) working group has the right idea in aiming to fix the incentives for users and operators alike. I'll write about that work in a future post.