Detecting and Defending Against Third-Party Tracking on the Web, UW
I circulated this paper recently - its nice work -
relates closely to work done by Christian Kreibich recently too (c.f.
It also contains one of the best tutorial introductions to how third-party tracking in browser/server
interaction actually works, so it is a v. nice read if you want to catch-up fast in this area.
(see also Ghostery, if you are already paranoid:)
You can get their code from sharemenot.cs.washington.edu
Towards Statistical Queries over Distributed Private User Data, MPI
We had a talk about this in CL recently by Paul Francis - this basically solves a problem we needed to solve too - if you take an online service (e.g. OSN) and want to fund resources by advertising and market research over the aggregate data, then you probably want to take away the inceintive or risk of bad behaviour by the OSN provider too. To do this you have to a) crypt the data and b) decentralize the data (c.f. safebook, diaspora, peerson). However, how do you do queries over the decentralied crypted data - a) is solved by work like privad, and b) is addressed by this work which basically extendes differential privacy to the distributed database case (oversimplifying, somewhat).
Koi: A Location-Privacy Platform for Smartphone Apps, MSR India
This is a privacy-by-design approach to the location problem - the threat model needs reading carefully as it is slightly different (I think) than systems like this before whch just offer location fuzzing from source.
The novel trick here is that the Koi platform doesn't push location reports, but instead responds to queries about location of the device, so can decide at query time based on various criteria, how accurate to respond to whom, when and where.
I guess this is also potentially a redction in net/battery load since the phone doesn't do unnecessary periodic reports.
Very amusing speaker - grabbed attention well, and pointed out real problem (us) and motivated their solution well. One point here - MSR (like Yahoo! Research) do some nice things and yet their company just doesn't seem to be able to get traction with them in the service context the way Google (or Amazon or Apple, or insert your evil empire here) do....it is NOT obvious why not!
6 Security and Availability
Aiding the Detection of Fake Accounts in Large Scale Social Online Services, Duke
This is a.n.other social graph based sybil defense (c.f. eigentrust) - n.b. pagerank is NOT sufficient.
This system is called SybilRank and uses the short random walk on the graph (i.e. relies on clustering, I guess)
I thought there was a paper from Cambridge folks that showed this can't work that well....must check
Don't Lose Sleep Over Availability: The GreenUp Decentralized Wakeup Service, MSR India
This is tradeoff between powersavings (turning stuff off) andsystems being up for maineance and backup and stuff like that.Greenup solves both - a sleep proxy (seen before in Austin's work:)Solid piece of good useful engineering (a fairly typical thing of mid-range NSDI papers, which I liek and in discussio nwith other people, is appreciated a lot (no gimmicky ideas - just lots of well executed solutions to actual realworld problems.
Some nice tricks for doing the sleep/wakeup distribution of work & probing - a lot like distributed sensor net duty cycling.
7 Data Center Networking
Jellyfish: Networking Data Centers Randomly, UIUC
Jellyfish is clever - based on oservatio nof properties of evolution ofrandom graph/attachment::->Random Graph data center -
Eval - compare Jellyfish with Fat-tree
"less like structured solid, more like a liquid"
The paper has some nice results on ECMP routing and congestion control for these random graphs (see also
for more info) - The last bit of the talk spoke to the problem of cabling costs for these sorts of interconnect - they then introduce optimsations to get more realistic cabling costs.
OSA: An Optical Switching Architecture for Data Center Networks with Unprecedented Flexibility, Northwestern etc
n.b. same folks as did Wavecube (coming up in Sigcomm in august). lots of problems with existing solutions for speeding up data center nets - c-Through is criticised - the claim here is this is a first "all-optical"...
This looks like they do standard lambda switching between ToRs or other and pretty much do the normal graph colouring problem - Not sure how this is significantly different than lots of optical WAN stuff, but just in the data center
Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center, Stanford
Good motivation - latency is the key problem. Uses "phantom queues" == virtual queues (cites kelly/gibbbens and others) - gives early congestion signal. Hack to get TCP to do stuff early. So as with the original ECN + VQ work, you run the net somewhat below saturation, and so the queue length is way smaller, so then the latency is lower. This was always the original motivation for ECN+AQM with a decent occupancy estimator, so I don't see what this paper adds to knowedge, except applying it to the data center. What isn't obvious is whether you can build a decent VQ implemention in Big Iron data center switches (especially since half of the people building them is trying to replace he electronics with optics:) But even in an Arista box, can we consider A VQ (phantom)??
8 Big Data (2)
PACMan: Coordinated Memory Caching for Parallel Jobs, UCB
starting point is that 90% of facebook Hadoop jobs can fit job in memory. plus median utilisation is 20%so put big cache on top of HDFS- and map tasks to exploy memory locality simple LRU.And got only 10% speed up (hit ratio of 47%:(
So... coordinate - system treats a set of caches in waves + also need to have a "sticky" policy (nice -another example of a systems design principle so ...incomplete tasks stick - once all files complete,look at eviction...Eval looks at completion time of jobs, plus cluster utilisation.Very clear exposition. Sort of affinity+LFU?PACMan might end up self-configuring to look a little bit like the google stream processing stuff in some cases, I suppose.
Results for throughput and utilisation show that the sticky policy helps despite not getting quite the same cache hit rate, by as much as 3-6 times improvements.
Reoptimizing Data Parallel Computing, UCB
query optimisation - tuned out for a bit here, alasquite a bit of work on smart data collection to base the prediction for query exection planning. One point they made about RoPE is that it doesn't work so well with declarative systems.
Optimizing Data Shuffling in Data-Parallel Computation by Understanding User-Defined Functions, MSR Asia
does what is says on the tin:)
9 Poster Session and Reception