Live blog from OSDI 2012 — Day 2

Posted by Malte Schwarzkopf

Here we are, reporting back from OSDI 2012 in Hollywood today.

Today's live-blog coverage continues below the fold. Note that some of the coverage is a little spotty due to our blog machine being overwhelmed by the number of requests.


Blogging OSDI 2012 — Day 1

Posted by Malte Schwarzkopf

For the next couple of days, I am attending OSDI in Hollywood. However, due to various scheduling constraints on both sides of the Atlantic, I only arrived there at lunch time on Monday, and missed the first session. Fortunately, in addition to my  delay-tolerant "live blog" from the plane, where I read the first session's papers, Derek Murray was kind enough to take some notes on the actual talks. Normal live-blogging service of the talks will be provided for the other days! :)


RecSys 2012: few things i remember

Posted by Daniele Quercia

random notes & thoughts


From the Sunday's workshops, I remember this paper "Dating Sites and the Split-complex Numbers" It uses split-complex numbers to represent dating preferences in an elegant way. It seems promising. I'd be great to connect this work on previous papers on trust and distrust and on structural balance theories... I also heard that two presentations were quite good: 1) Content, Connections, and Context 2) Joseph Konstan talk abt the different decision strategies ppl have in different contexts.

On Thursday, we run a workshop on  mobile recommender systems. Francesco Calabrese of IBM Smart Cities gave an interesting invited talk about current projects on transportation systems. Then, we had a set of really good talks & one outdoor activity. What did I learn? Well, most of the existing mobile systems assume that the recommendation process unfolds in one single step - get restaurant recommendations & choose one of them. In reality, recommendations in the built environment should go beyond that. For example,

  • To mimic humans, the task of recommending restaurants should at least return 3 different recommendations (or facets): closest restaurant, best restaurant, trade-off between the two.
  • One should understand WHY people visit certain places. How did they make those decisions? Which criteria did they employ?
  • Recommender systems need to tap into established findings in the area of urban studies. For example, in our RecSys paper "Ads & the City", we exploited the fact that people are boring - they generally do not travel very far - unless what they are looking for is not readily available where they are.
  • Temporal patterns in recommender systems have not been widely studied. They have been studied on Web platforms only recently (and Neal Lathia has done great work on that!) and have been neglected in mobile platforms. That is why we had another paper in the conference titled "Spotting Trends: The Wisdom of the few"
  • Finally, and more importantly, we need far more user studies of how these systems are ACTUALLY used! Recommendations do not matter much -the experience counts ;)

And this is just scratching the surface ;)


I remember only few things from the conference (the industry track was pretty good):

  • Multiple Objective Optimization in Recommendation Systems (linkedin). Nice example of A/B testing
  • Towards Personality-Based Personalization (Thore Graepel of Microsoft Research). Nice talk about how easy is to predict personal attributes of Facebook users based on their likes. if you are interested in personality and social media, you should check out our work on Facebook and Twitter (we can predict personality traits of twitter users upon only their number of followers, following, and listed counts)
  • Building Industrial-scale Real-world Recommender Systems (Xavier Amatriain of Netflix). Brilliant (& fully packed) tutorial. Check this out for a summary.
  • Controlled experiments at Microsoft Bing (very good work): i encourage you to read  2009 guide [pdf] ; 2012 kdd paper; slides of the talk.
  • Pareto-efficient hybrization for multi-objective recommender systems (UFMG). Here the question is  how to combine different types of algorithms (hybrization).
  • User Effort vs. Accuracy in Rating-based Elicitation (PoliMI). What's the optimal number of users ratings for movie recommendations? It seems to be between 5 to 20.
  • TasteWeights: A Visual Interactive Hybrid Recommender System (UCSB). Visualization platform for your social media stream
  • Learning to rank optimizing MRR for recommendations. Very cool work.  It taps into the less is more concept, which I'm a big fan of
  • Thumbs up to real-world stuff: Beyond Lists: Studying the Effect of Different Recommendation Visualizations;  Yokie - Explorations in Curated Real-Time Search & Discovery Using Twitter; A System for Twitter User List Curation; The Demonstration of the Reviewer’s Assistant; CubeThat: News Article Recommender (browser extension for Chrome displays recommended additional news stories related to the same topic as the current news story)
  • Challenges in music recommendation (@plamere from @echonest). A couple of interesting insights: "Understanding the specifics of your domain is critical to building a good recommender"; and recommending down-tail is OK, while recommending up-tail (britney to one who likes tom waits) is risky. Might be offensive to one's music identity. So make your recommendations Hipster-Friendly ;)

The San Diego Trip: An Overview of this year’s SIGKDD Conference

Posted by an346

This year's SIGKDD conference returned after 12 years to San Diego, California to host the meeting of Data Mining and Knowledge Discovery experts from around the world. The elite of heavy-weight data scientists was hosted at the largest hotel of the West Coast and together with industry experts and government technologists enumerated more than 1100 attendees, a record number in the conference's history.

The gathering kicked off with tutorials and the parallel of two classics; David Blei's topic models and Jure Leskovec' extensive work on Social Media Analytics. Blei offered a refreshing talk that stretched, from the very basics of text-based learning, to the most up to date extensions of his work with applications in streaming data and the online version of the paradigm that allows one to scale up the model to huge datasets satisfying the requirements of modern data analysis. Leskovec elaborated on a large spectrum of his past work, covering a wide range of topics including the temporal dynamics of news articles, sentiment polarisation analysis in social networks and information diffusion in graphs by modelling the influence of participating nodes. The first day's menu on the social front was completed with Lada Adamic' presentation on the relationship between structure and content in social networks. Her talk at the Mining and Learning with Graphs Workshop provided an empirical analysis on a variety of online domains, that described how the flow of novel content in those systems was evident of variations in the patterns of interaction amongst individuals. The day closed with the conference's plenary open session that featured submission and reviewing highlights and the usual KDD award ceremonies: the latter session honoured the decision trees man, Ross Quilan, who presented a historical overview of his work and a data mining legion of 25 students from NTU that won this year's KDD cup on music recommendations.

After the second night of sleep and repetitive jetlag ignited wake ups, Monday rolled in and the conference opened with sessions on user classification and web user modelling. A follow up in the afternoon with the presentation of the (student) award winning work on the application of topic models for scientific article recommendation attracted the interest of many. The dedicated session of the conference on online social networks also signified the interest of the Data Mining community for the nowadays hot domain. The latter opened with an interesting work on predicting semantic annotations in location-based social networks and in particular the prediction of missing labels in venues that lacked user generated semantic information. While the machine learning part of the work was sound, its applicability as a real problem was doubted, suggesting the need to identify the essential challenges in a relatively new application area. Nonetheless, the keyword of the day was scalability:  two talks focused on an ever classic machine learning problem, clustering,  introduced in the context of the trendy Map Reduce model. Aline Ene from University of Illinois introduced the basics, whereas the brazilian Robson Cordeiro offered novel insights with a cutting edge algorithm for clustering huge graphs. The work driven by the guru Christos Faloutsos featured the elegance of simplicity with the virtues of effectiveness, showing that for some size does not matter and petabytes of data can be crunched in minutes. A poster session came to shut the curtains of another day. The crowd was not discouraged by the only-one-free drink offer of the conference organisers and a vibrant set of interactions took place. Some were discussing techniques, some were looking for new datasets, while social cliques were also forming in the corners of the hotel's huge Douglas Pavilion.

Day 3 drove the conference participants to the dark technical depths of the well established topic of matrix factorisation, that was succeeded by the user modelling session.Yahoo!'s Bee-Chung Chen gave an intriguing presentation on a user reputation in a comment rating environment, followed by the lucid talk of Panayiotis Tsaparas on the selection of a useful subset of reviews for amazon products that were plagued by tones of reviews. The Boston-based Greek gang of Microsoft Research, also showed how Mechanical Turk can be used to assess the effectiveness of review selection in such systems.  Poster session number 2 closed the day and the group's work on link-prediction in location-based social networks was up. The three hour exhaustive but fruitful interaction with location-based enthusiasts, agnostics and doubters was a good opportunity to get the vibe of the community in an up and coming hot topic. For application developers and online service providers the work was an excellent example of how location-based data could be used to drive personalised and geo-temporally aware content to users. For data mining geeks it presents an unexplored territory where existing techniques could be tested and novel ones devised. At the end of the poster session many of the participants headed for a taste of San Diego's downtown outing, whereas the relaxing boat trips at the local gulf were also highly preferred.

The final day of the conference was marked by Kaggle's visionary entrepreneur Jeremy Howard and a panel of experts in data mining competitions. The panel aimed to analyse the problems that were risen during previous competitions and the lessons learned for the creation of new successful ones. Howard presented radical views suggesting that the future of data mining and problem solving would be delivered in the form of competitions. Not only competitions could attract an army of approximately 10 million data analysts around the globe, but the design of them could promise a sustainable economic model that would bring money to all participants (even non-winners) and would perhaps put at stake a respectable number of PhD careers. His philosophy was driven by the idea that to solve challenging problems effectively, you need to awaken the diverse pool of minds that is out there and can constitute an infinite source of innovation.

But KDD attracted not only the interest of scientists and corporate experts, but also that of politicians. Ahead of 2012 elections the Obama data mining team is here and hiring! Rayid Ghani chief scientist at Obama for America highlighted the important role of predictive analytics and optimisation problems in the battle for an electorate body that is traditionally positioned to announce winners by only small margins of difference. It is left to see whether science will beat Tea Party style propaganda and will maximise positive votes in a bumpy and complex socio-political landscape. The political world was also also (quietly) represented by government data scientists and secret service analysts who were seeking to catch up with the state of the art in data mining and knowledge discovery, a vital survival requirement in a world overflowed with data and subsequent leaks...

The full proceedings of KDD 2011 can be found here.


Socio-spatial properties of online social networks

Posted by Salvatore Scellato

Some social scientists have suggested that the advent of fast long-distance travel and cheap online communication tools might have caused the "death of distance": as described by Frances Cairncross, the world appears shrinking as individuals connect and interact with each other regardless of the geographic distances which separates them. Unfortunately, the lack of reliable geographic data about large-scale social networks has hampered research on this specific problem.

However, the recent growing popularity of location-based services such as Foursquare and Gowalla has unlocked large-scale access to where people live and who their friends are, making possible to understand how distance and friendship ties relate to each other.

In a recent paper which will appear at the upcoming ICWSM 2011 conference we study the socio-spatial properties arising between users of three large-scale online location-based social networks. We discuss how distance still matters: individuals tend to create social ties with people living nearby much more likely than with persons further away, even though strong heterogeneities still appear across different users.