I am just back from Hyderabad, in India, where I attended the 20th International World Wide Web Conference, also known as WWW 2011, to present our work on tracking geographic social cascades to improve video delivery. This conference, organised as usual by the International World Wide Web Conference Committee (IW3C2), represents the annual opportunity for the international community to discuss and debate the evolution of the Web, providing a mixture of academic and industrial content.
The word cloud shows pretty well the main themes of the conference this year, which heavily revolved around two large pivotal aspects: "social" and "search". Interestingly, there was not any attempt of merging the two things together, as Aardvark tried last year. Not surprisingly, "networks" are still popular in the community, and "Twitter" still enjoys a lot of interest, even though this may change with their new controversial Terms Of Service, which are likely to hamper social media data harvesting.
Overall it is a fairly big conference, with 2 initial days of workshops, tutorial and panels and then 3 days with 81 research papers. Also, there were three world-known personalities such as Dr. Abdul Kalam, Sir Tim Berners-Lee and Christos Papadimitriou that gave a keynote each. I will give a brief summary of the main research themes, with pointers to the most interesting papers. However, it was physically impossible to attend all the research sessions, as they were often happening simultaneously: you can find much more information on the conference website and on the official proceedings.
The first keynote was given by Dr. Abdul Kalam, the 11th President of India (2002-2007): in his talk he advocated for a truly multilingual and democratic Web, pushing for societal transformations that can happen only when a larger part of the planet population will be connected and online. In particular, he discussed how the main hindrance in making the Web truly democratic is the language barrier and how researchers should work more on making information available across different languages and cultures.
The second keynote was given by Sir Tim-Berners Lee, the inventor of the World Wide Web: he talked about how the Internet should remain neutral so that the Web can truly support democracy and science. The resilience of the Internet is not about topology anymore, but it is now about ownership of the topology, as the Egypt disconnection demonstrates. Another interesting topic was the semantic Web: governments should provide all their data and companies should link all of them together. Finally, he complained about mobile applications: we shouldn't make mobile apps, but web apps, so that we can keep things on the web and link them all together.
The third keynote was given by Christos Papadimitriou, Professor of Computer Science at UC Berkeley. His talk was about the rising of a new discipline, Algorithmic Economics, and how this is impacting how we think, experience and design the Web. Computer scientists should realize that large-scale performing systems can emerge from the interaction of selfish agents and that incentives are a quintessential part of a good system design. Overall, Papadimitriou depicted a new way of addressing research questions involving the Web, where end users are a key part of the systems, the algorithms and the applications we create and deploy.
I will now discuss the main topics of the conference, giving some pointers to interesting papers I have come across. This would likely be a complete overview of the topics of interest to the conference and, more generally, of the future trends of the evolution of the Web. However, I encourage you to explore the official conference proceedings, containing much more material.
Recommending systems still play a large role on the Web, as testified by the interesting tutorial given by Ido Guy (IBM) on Social Recommender Systems: building applications that help users discover things they may like is still of paramount importance, with particular emphasis on recommending new friends on online social networks. Related to this theme, Yahoo! Research presented a nifty paper about a new method to extract templates already observed in queries to recommend new and never-observed long-tail search queries: as always, diversity and serendipity remain highly valuable in any recommending system.
A lot of efforts about improving Web search involve analysing queries and improve our understanding of the true user intent. Google presented NearestCompletion, their new effort to provide context-sensitive query autocompletion: the goal is to predict the user's query after the user has entered only one character. The inherent lack of information is overcome by exploiting recent user activity to provide useful context. While this approach shows extremely good results, it may also raise controversy as the user behaviour is tracked and analysed.
The business aspects of the Web were also discussed, with many papers about monetisation. The one I liked the most was from Yahoo! Research, proposing a game-theoretic model to study the problem of incentivizing high-quality user generated content. This is clearly a big issue, as social platforms rely on users to create content, but they also require high-quality content to engage their users and make profits. Their model is based on the assumption that users are motivated by the amount of exposure their content will receive. They show how elimination mechanisms are able to filter out low-quality content, generating overall optimal results.
The Web is built on top of systems and networks and an interesting paper by Case Western Reserve University presented the results of a measurement study of Akamai, a large commercial Content Delivery Network. The authors investigated the key architectural question faced by CDN designers: distributing servers across as many ISPs as possible, or centralize their efforts in a few, large clusters? Their results show how quite signiﬁcant consolidation in fewer network location is possible without appreciably degrading the platform performance and their methodology seems applicable to other CDNs. However, they only consider performance as design metric, while other considerations, mainly about business agreements with ISPs, may be influencing CDN architecture.
As expected, many papers addressed different aspects of online social networks, with Twitter being by large the most discussed and studied service. Yahoo! Research and Georgia Tech presented an innovative approach which exploits homophily on online social networks to do joint friend prediction and interest recommendation. Other papers focused on how information spreads and diffuses on online social networks. A joint work Cornell University and CMU investigates how hashtags spread on Twitter by analysing their temporal evolution and finding universal characteristics such as "stickiness" and "persistence", which exhibit different patterns across different topics. Another interesting work by Northeastern University and IBM studies how information flows in email communication networks according to shallow spreading trees: overall, they find how at macroscopic level the structure of information flow is not dependent on user characteristics, while at microscopic level the structure of the flow strongly depends on people’s interests and profiles. It is also worth mentioning the excellent keynote "Temporal Analytics in Online Social Networks", given by the Program Chair Ravi Kumar within the Temporal Web workshop, which addressed the temporal properties associated with social networks and their structural characteristics and advocated for a data-driven modeling of the evolution of such networks.
Finally, Facebook presented an interesting example of the social network algorithmic questions that arise when dealing with social services. A/B testing is often used to test new features on a small fraction of users of a social networking service, in order to assess their reaction and estimate the overall impact. The problem is that sometimes social features need to be tested on users and on their friends at the same time, so choosing at random will not work. This combinatorial problem is solved with a novel walk-based sampling method for producing samples of nodes that are internally well-connected but also approximately uniform over the population.
At the end, I think it's worth mentioning the work that got the Best Paper Award, "Towards a Theory Model for Product Search", by Beibei Li, Anindya Ghose and Panagiotis G. Ipeirotis from New York University. Their work focuses on building a theoretical model of the process of buying a product online, based on expected utility theory from economics. This seems the sort of work that will be cited in the following years, with a promising future impact.
Overall, it was great conference with many interesting papers and smart researchers. Next time, it will be Lyon hosting WWW 2012, and surely it will be another fascinating opportunity to understand where the Web is heading to.
- Distributed Systems
- Operating Systems
- Research Agenda
- April 2013
- October 2012
- September 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- October 2011
- September 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- January 2011