Liveblogging IMC 2013 – Day 3

25Oct/130

Liveblogging IMC 2013 – Day 3

Last day of IMC 2013 - starting with network scaling and their analysis at scale and will have some rather funky if not weird papers later today.

Big or Fast

Missed the beginning of the session and not taking notes for the rest of the session, but here is the list together with links to the pdfs

Indexing Millions of Packets per Second using GPUsÂ (review) (short)

F. FuscoÂ (ETH Zurich), M. VlachosÂ (IBM Research Zurich), X. DimitropoulosÂ (ETH Zurich)Â and L. DeriÂ (ntop)

On the Benefits of Using a Large IXP as an Internet Vantage PointÂ (review) (long)

N. ChatzisÂ (TU Berlin), G. SmaragdakisÂ (T-Labs/TU Berlin), J. Boettger, T. Krenc, and A. FeldmannÂ (TU Berlin)

Growth Analysis of a Large ISPÂ (review) (short)

A. Ferguson, J. Place and R. FonsecaÂ (Brown University)

Understanding the Super-sized Traffic of the Super BowlÂ (review) (short)

J. Erman and K.K. RamakrishnanÂ (AT&T Research)

Detective Stories

Appraising the Delay Accuracy in Browser-based Network MeasurementÂ (review) (short)

W. Li, R. Mok, R. Chang and W. FokÂ (The Hong Kong Polytechnic University)

How much accuracy is there in browser-based measurement results?

Used 10 techniques (protocols - http, websocket, etc.) and different browser-os combinations.

Cause object reuse by immediately rerunning experiment (not convinced how that interacts with the different components...). Not sure I buy their methodology - the numbers might be interested, but 1ms - 10ms latency of JS alone and 10-200ms of flash for example do not seem to be plausible, possibly because of the methodology. Need to take a look at the paper.

Network Fingerprinting: TTL-based Router SignaturesÂ (review) (short)

Y. VanaubelÂ (UniversitÃ© de LiÃ¨ge), J.-J. Pansiot, P. MerindolÂ (UniversitÃ© de Strasbourg)Â and B. DonnetÂ (UniversitÃ© de LiÃ¨ge)

Grouping network devices into disjoint classes. Using TTL for fingerprinting.

TTL should be initialised to 64, but in reality not the case (hardware, OS, protocol, type of message...). Use ICMP messages: time-exceeded, echo-reply and destination-unreachable messages and TTLs of 32, 64, 128 and 255. Inferred TTL is then the smallest initial TTL for which returned TTL is smaller than sent...

Only used <time-exceeded, echo-reply> initial TTL as a signature. Turns out to be specific to hardware vendor, can deduce more info about hardware in the network based on the simple signatures...

Q: why do you need to infer TTLs in the orders of 2? are there any weird ttls that people use?

A: (shortened answer) we did not think about this

Comment from twittersphere: traceroute -D on FreeBSD/OSX...

Peeking Behind the NAT: An Empirical Study of Home NetworksÂ (review) (long)

S. Grover, M.S. Park, S. Sundaresan, S. Burnett, H. Kim, B. Ravi and N. FeamsterÂ (Georgia Institute of Technology)

Poses that design studies rely exclusively on human subject interviews. Suggests measuring from the gateway.

Want to answer: how frequently home networks disconnect? are there connectivity patterns and how crowded is the wifi? do users saturate their links, does usage depend on devices?

Part 1: user behaviour can Â affect connectivity.

Study based on occasional pings
Median number of downtimes per day - 0.11
0.06 for a developed nation and 0.9 for a developing nation!
Strong diurnal pattern - some users switch off their routers when not in use

Part 2: Connectivity patterns over wifi

Strong diurnal patterns... some stats about large numbers of APs seen...

Part 3: usage patterns

Do users saturate their links? bw is often 100mbps, but half the houses saturate less than 50% of the link. (related to wifi link capacity being smaller than access link capacity?). Most of the traffic is due to one bandwidth-hungry device even in household with 3+ devices.

More popular content is served over long-lived connections - interesting note, sadly didn't go into more detail.

Different types of devices have different most popular domains (more detail in paper) - propose that could be used for device fingerprinting.

Identifiability of Link Metrics Based on End-to-End Path MeasurementsÂ (review) (long)

L. MaÂ (Imperial College, London, UK), T. HeÂ (IBM T.J. Watson Research, Yorktown, NY, USA), K.K. LeungÂ (Imperial College, London, UK), A. SwamiÂ (Army Research Laboratory, Adelphi, MD, USA)Â and D. TowsleyÂ (UMass Amherst)

Intriguing Results

Profiling High-School Students with Facebook: How Online Privacy Laws Can Actually Increase Minors' RiskÂ (review) (long)

R. Dey, Y. Ding and K.W. RossÂ (Polytechnic Institute of New York University)

Question: is it possible to automatically build detailed profiles (names, addresses, gender, etc.) of most of the teenagers in a target high school?

By default fb profiles for minors expose much less information publicly. Still contains name, gender, networks and profile photo. FB will not even allow showing more information publicly. Chellenge: for a give high school, how do we find the students in facebook and build profiles? Minors are not searchable by school and expose little information.

Clustering idea: kids often have many friends from the same high school and the same graduating year. In addition, children lie about their age - get around the restrictions and circumvent all the special privacy policies. Previous research: almost 44% percent of kids lie (Sorbone?) and 1 of 5 in UK.

I don't really feel like summarising the method they used to find minors on fb publicly...

Scary thing is that they contacted fb and google+ about the issues and potential measures, but they did not really do anything.

Demystifying Porn 2.0: A Look into a Major Adult Video Streaming WebsiteÂ (review) (long)

G. TysonÂ (Queen Mary, University of London), Y. El-khatibÂ (Lancaster University), N. SastryÂ (King's College London)Â and S. UhligÂ (Queen Mary, University of London)

Well, read the paper.. :)

Comparison with youtube interesting though: much fewer uploads, but each video gets viewed much more - on average over 10k views. Need fresh content - people don't watch content twice and don't share content...

Content is being vetted - only 18% seem to become live (only 10% of content seems to be removed afterwards). Users are not very choosy - content differentiation is tough. Furthermore, popularity trends are largely dictated by browsing order (least effort). People browse by category - the more categories a video is tagged in, the more viewing grows...

A lot of implications for delivery, caching and UI design.

Q: costs are high... how do these websites make any money?

A: advertising based. Commercial content providers using that as a distribution channels...

Q: what kind of content distribution did you find? No externalities increasing video viewing, how does that compare to youtube?

A: still skewed, but less than youtube. Very little content curation in ones like youtube... The skew is created by social sharing. Vetting becomes the information bottleneck and another part is UI content.

Q: (comment): recent article shows that sandwitches receive good sales from these adverts..

Paths and Flows

From Paris to Tokyo: On the Suitability of Ping to Measure LatencyÂ (review) (short)

C. PelsserÂ (Internet Initiative Japan), L. CittadiniÂ (Roma Tre University), S. VissicchioÂ (UniversitÃ© Catholique de Louvain)Â and R. BushÂ (Internet Initiative Japan)

short answer - not suitable for RTT distributions. OK for min/max RTT, but not suitable for jitter. everything that uses ping to measure latency (unless detecting min/max) should be re-evaluated.

A Comparison of Syslog and IS-IS for Monitoring Link StateÂ (review) (short)

D. Turner, K. Levchenko, S. Savage and A.C. SnoerenÂ (UC San Diego)

Syslog has been popular for network reliability measurements. Not the gold standard though - the gold standard is IGP routing messages...

Question: how accurate is syslog compared to IS-IS data?

Can syslog be used as a drop in replacement for IS-IS data? short answer is no - 18% for DL and 15% UL syslog messages are never received by either end; syslog misses ~1k hours of downtime; 20% of syslog failures are false positives.

Data collection: CENIC network, 13 months of data.

Scap: Stream-Oriented Network Traffic Capture and Analysis for High-Speed NetworksÂ (review)(long)

A. PapadogiannakisÂ (FORTH-ICS), M. PolychronakisÂ (Columbia University)Â and E.P. MarkatosÂ (FORTH-ICS)

A Measurement-based Study of Multipath TCP performance over Wireless NetworksÂ (review) (long)

Y.-C. Chen, Y.-S. LimÂ (UMass Amherst), R.J. GibbensÂ (University of Cambridge), E.M. NahumÂ (IBM Research), R. KhaliliÂ (Deutsche Telekom)Â and D. TowsleyÂ (UMass Amherst)

Experiment setup: apache with one interface as a server and WiFi + 4G/3G cellular client (multiple nets in US). Presenting the case of 2-flow MPTCP with AT&T LTE.Â Measure download time, RTT, loss rate and out-of-order delay.

Loads of pair-by-pair comparisons - difficult to summarise, but worth taking a look at the paper. Overall - in many cases end up with performance degradation with MPTCP on cellular networks and energy consumption is problematic because of two radios active simultaneously.