syslog
4Apr/110

20th International World Wide Web Conference – WWW 2011

I am just back from Hyderabad, in India, where I attended the 20th International World Wide Web Conference, also known as WWW 2011, to present our work on tracking geographic social cascades to improve video delivery.  This conference, organised as usual by the International World Wide Web Conference Committee (IW3C2),  represents the annual opportunity for the international community to discuss and debate the evolution of the Web, providing a mixture of academic and industrial content.

The word cloud shows pretty well the main themes of the conference this year, which heavily revolved around two large pivotal aspects: "social" and "search". Interestingly, there was not any attempt of merging the two things together, as Aardvark tried last year. Not surprisingly, "networks" are still popular in the community, and "Twitter" still enjoys a lot of interest, even though this may change with their new controversial Terms Of Service, which are likely to hamper social media data harvesting.

Overall it is a fairly big conference, with 2 initial days of workshops, tutorial and panels and then 3 days with 81 research papers. Also, there were three world-known personalities such as Dr. Abdul Kalam, Sir Tim Berners-Lee and Christos Papadimitriou that gave a keynote each. I will give a brief summary of the main research themes, with pointers to the most interesting papers. However, it was physically impossible to attend all the research sessions, as they were often happening simultaneously: you can find much more information on the conference website and on the official proceedings.

The first keynote was given by Dr. Abdul Kalam, the 11th President of India (2002-2007): in his talk he advocated for a truly multilingual and democratic Web, pushing for societal transformations that can happen only when a larger part of the planet population will be connected and online. In particular, he discussed how the main hindrance in making the Web truly democratic is the language barrier and how researchers should work more on making information available across different languages and cultures.

The second keynote was given by Sir Tim-Berners Lee, the inventor of the World Wide Web: he talked about how the Internet should remain neutral so that the Web can truly support democracy and science. The resilience of the Internet is not about topology anymore, but it is now about ownership of the topology, as the Egypt disconnection demonstrates. Another interesting topic was the semantic Web: governments should provide all their data and companies should link all of them together. Finally,  he complained about mobile applications: we shouldn't  make mobile apps, but web apps, so that we can keep things on the web and link them all together.

The third keynote was given by Christos Papadimitriou, Professor of Computer Science at UC Berkeley. His talk was about the rising of a new discipline, Algorithmic Economics, and how this is impacting how we think, experience and design the Web. Computer scientists should realize that large-scale performing systems can emerge from the interaction of selfish agents and that incentives are a quintessential part of a good system design. Overall, Papadimitriou depicted a new way of addressing research questions involving the Web, where end users are a key part of the systems, the algorithms and the applications we create and deploy.

 

I will now discuss the main topics of the conference, giving some pointers to interesting papers I have come across. This would likely be a complete overview of the topics of interest to the conference and, more generally, of the future trends of the evolution of the Web. However, I encourage you to explore the official conference proceedings, containing much more material.

Recommending systems still play a large role on the Web, as testified by the interesting tutorial given by Ido Guy (IBM) on Social Recommender Systems: building applications that help users discover things they may like is still of paramount importance, with particular emphasis on recommending new friends on online social networks. Related to this theme, Yahoo! Research presented a nifty paper about a new method to extract  templates already observed in queries to recommend new and never-observed  long-tail search queries: as always, diversity and serendipity remain highly valuable in any recommending system.

A lot of efforts about improving Web search involve analysing queries and improve our understanding of the true user intent. Google presented NearestCompletion, their new effort to provide context-sensitive query autocompletion: the goal is to predict the user's query after the user has entered only one character. The inherent lack of information is overcome by exploiting recent user activity to provide useful context. While this approach shows extremely good results, it may also raise controversy as the user behaviour is tracked and analysed.

The business aspects of the Web were also discussed, with many papers about monetisation. The one I liked the most was from Yahoo! Research, proposing a game-theoretic model to study the problem of incentivizing high-quality user generated content. This is clearly a big issue, as social platforms rely on users to create content, but they also require high-quality content to engage their users and make profits. Their model is based on the assumption that users are motivated by the amount of exposure their content will receive. They show how elimination mechanisms are able to filter out low-quality content, generating overall optimal results.

The Web is built on top of systems and networks and an interesting paper by Case Western Reserve University presented the results of a measurement study of Akamai, a large commercial Content Delivery Network. The authors investigated the key architectural question faced by CDN designers: distributing servers across as many ISPs as possible, or centralize their efforts in a few, large clusters? Their results show how quite significant consolidation in fewer network location is possible without appreciably degrading the platform performance and their methodology seems applicable to other CDNs. However, they only consider performance as design metric, while other considerations, mainly about business agreements with ISPs, may be influencing CDN architecture.

As expected, many papers addressed different aspects of online social networks, with Twitter being by large the most discussed and studied service. Yahoo! Research and Georgia Tech presented an innovative approach which exploits homophily on online social networks to do joint friend prediction and interest recommendation. Other papers focused on how information spreads and diffuses on online social networks. A joint work Cornell University and CMU investigates how hashtags spread on Twitter by analysing their temporal evolution and finding universal characteristics such as "stickiness" and "persistence", which exhibit different patterns across different topics. Another interesting work by Northeastern University and IBM studies how information flows in email communication networks according to shallow spreading trees: overall, they find how at macroscopic level the structure of information flow is not dependent on user characteristics, while at microscopic level the structure of the flow strongly depends on people’s interests and profiles. It is also worth mentioning the excellent keynote "Temporal Analytics in Online Social Networks", given by the Program Chair Ravi Kumar within the Temporal Web workshop, which addressed the temporal properties associated with social networks and their structural characteristics and advocated for a data-driven modeling of the evolution of such networks.

Finally, Facebook presented an interesting example of the social network algorithmic questions that arise when dealing with social services. A/B testing is often used to test new features on a small fraction of users of a social networking service, in order to assess their reaction and estimate the overall impact. The problem is that sometimes social features need to be tested on users and on their friends at the same time, so choosing at random will not work. This combinatorial problem is solved with a novel walk-based sampling method for producing samples of nodes that are internally well-connected but also approximately uniform over the population.

At the end, I think it's worth mentioning the work that got the Best Paper Award, "Towards a Theory Model for Product Search", by Beibei Li, Anindya Ghose and Panagiotis G. Ipeirotis  from New York University. Their work focuses on building a theoretical model of the process of buying a product online, based on expected utility theory from economics. This seems the sort of work that will be cited in the following years, with a promising future impact.

Overall, it was great conference with many interesting papers and smart researchers. Next time, it will be Lyon hosting WWW 2012, and surely it will be another fascinating opportunity to understand where the Web is heading to.

Comments (0) Trackbacks (0)
  1. guys, I wonder why Tim Berners Lee keeps on repeating the same thing: “we
    shouldn’t make mobile apps, but web apps, so that we can keep things
    on the web and link them all together.”?

  2. @hamed, just after that keynote there was some debate exactly on this point.
    The organizers of WWW had provided an Android/Iphone application with
    the full schedule of the conference and there was someone complaining
    that the main Web conference should not provide mobile applications but
    websites. Well, it turned out that

    1 – on the first day the WiFi at the conference venue completely
    failed, so the application was the only way to get information when
    the “Web” was not working.

    2 – the organizers had already a vast website, and the mobile app was
    just an addition.

    So the point is that we should actually use the Web and link everything
    together and this is good. But as long as there will be connectivity
    problems, having data physically residing on your phone it’s too
    important.

  3. The last point — that there is no solution for smart prefetching in the web — is a nice area for future work and definitely something that needs fixing. I’d claim that this is inhibited by the fact that the way the web names documents is broken. Embedding hostnames is just too location-dependent… I want something like DOI or pURL. (But I guess the official argument is that you’re just embedding a domain name, a.k.a. an authority not a location. So a very smart HTTP caching layer might do.)

    I was also a bit confused about why wifi outage means that mobile apps win — since surely you can just use 3G on your phone to get to the web site. But presumably it’s the same issue — the app has preloaded all the data, so once you have it, you don’t need any connectivity at all.

    The other point — which I guess not many people would want to put directly to Sir Tim — is that building a web site so that it’s accessible from a mobile device is Just Too Hard right now, so people fall back on writing mobile apps separately. Writing web apps is too hard in general, in fact. It would be awesome (and doable) if someone could hack the Android/iPhone SDKs so that they generate a web app (HTML, Javascript + whatever code is deemed to need running server-side) from an unmodified Objective-C (or whatever) codebase. Objective-C -> LLVM -> Javascript — sounds doable?

  4. I’m still a bit baffled as to why TBL is so keen on the semantic web – it seems to me that an attempt to build a single global ‘ontology’ (as they call it) is doomed to failure… seems perfectly sane in the context of e.g. computational chemistry, but not for general purpose. What’s wrong with per-user views enabled by customised search?

  5. > Objective-C -> LLVM -> Javascript

    Stephen: see Cappuccino http://cappuccino.org/ ; it extends Javascript to Objective-J, with Cocoa-like bindings. It can target native iPhone (via Webkit) that is fairly indistinguishable from ObjC apps.


Leave a comment

No trackbacks yet.