syslog
23Sep/110

Mobicom. Day 3

Posted by Narseo

3rd and final day... mainly about PHY/MAC layer and theory works

The day started with a Keynote by Farnan Jahanian (University of Michigan, NSF).  Jahanian talked about some opportunities behind cloud computing research. In his opinion, cloud computing can enable new solutions in fields such as health-care and also environmental issues. As an example, it can help to enforce a greener and more sustainable world and to predict natural disasters (e.g. the recent japanese tsunami) with the suport of a wider sensor network. His talk concluded with a discussion about some of the challenges regarding computer science research in the US (which seem to be endemic in other countries). He highlighted that despite the fact that the market demands more computer science graduates, few students are joining related programs at every level, including high school.

Session 7. MAC/PHY Advances.

No Time to Countdown: Migrating Backoff to the Frequency Domain, Souvik Sen and Romit Roy Choudhury (Duke University, USA); and Srihari Nelakuditi (University of South Carolina, USA)

Conventional WiFi networks perform channel contention in time domain. Such approach imposes a high channel wastage due to time back-off. Back2F is a new way of enabling channel contention in the frequency domain by considering OFDM subcarriers as randomised integer numbers (e.g. instead of picking up a randomised backoff length, they choose a randomly chosen subcarrier). This technique requires incorporating an additional listening antenna to allow WiFi APs to learn about the backoff value chosen by nearby access points and decide if their value is the smallest among all others generated by close-proximity APs. This knowledge is used individually by each AP to schedule transmissions after every round of contention. Nevertheless, by incorporating a second round of contention, the APs colliding in the first one will be able to compete again in addition to a few more APs. The performance evaluation was done on a real environment. The results show that the collision probability decreases considerable with Back2F with two contention rounds. Real time traffic such as Skype experiences a throughput gain but Back2F is more sensitive to channel fluctuation.

Harnessing Frequency Diversity in Multicarrier Wireless Networks, Apurv Bhartia, Yi-Chao Chen, Swati Rallapalli, and Lili Qiu (University of Texas at Austin, USA)

Wireless multicarrier communication systems are based on spreading data over multiple subcarriers but SNR varies in each subcarrier. In this presentation, the authors propose a join integration of three solutions to reduce the side-effects:

  1. Map symbols to subcarriers according to their importance.
  2. Effectively recover partially corrupted FEC groups and facilitate FEC decoding.
  3. MAC-layer FEC to offer different degrees of protection to the symbols according to their error rates at the PHY layer

Their simulation and testbed results corroborate that a joint combination of all those techniques can increase the throughput in the order of 1.6x to 6.6x.

Beamforming on Mobile Devices: A first Study, Hang Yu, Lin Zhong, Ashutosh Sabharwal, David Kao (Rice University, USA)

Wireless links present two invariants: spectrum is scarce while hardware is cheap. The fundamental waste in cellular base stations is because of the antenna design. Lin Zhong proposed passive directional antennas to minimize this issue. They used directional antennas to generate a very narrow beam with a larger spatial coverage. They have proved that this solution is practical despite small form factor of smartphone's antenna, resistent to nodes rotation (only 2-3 dB lost if compared to a static node), and does not affect the battery life of the handsets, specially in the uplink as the antenna's beam is narrower. This technique allows calculating the optimal number of antennas for efficiency. The system was evaluated both indoors and outdoors in stationary/mobile scenarios.  The results show that it is possible to save a lot of power in the client by bringing down the power consumption as the number of antennas increases with this technique.

SESSION 8. Physical Layer

FlexCast: Graceful Wireless Video Streaming, S T Aditya and Sachin Katti (Stanford University, USA)

This is a scheme to adapt video streaming to wireless communications. Mobile video traffic is growing exponentially and users' experience is very poor because of channel conditions. MPEG-4 estimates the quality over long timescales but channel conditions change rapidly thus it has an impact on the video quality. However, current video codecs are not equipped to handle such variations since they exhibit an all or nothing behavior. They propose that quality is proportional to instantaneous wireless quality, so a receiver can reconstruct a video encoded at a constant bit rate by taking into account information about the instantaneous network quality.

A Cross-Layer Design for Scalable Mobile Video, Szymon Jakubczak and Dina Katabi (Massachusetts Institute of Technology, USA)

One of the best papers in Mobicom'11. Mobile video is limited by the bandwidth available in cellular networks, and lack of robustness to changing channel conditions. As a result, video quality must be adapted to the channel conditions of different receivers. They propose a cross-layer design for video that addresses both limitations. In their opinion the problem is that the compression an error protection convert real-valued pixels to bits and as a consequence, they destroy the numerical properties of original pixels. In analog TV this was not a problem since there is a linear relationship between the transmitted values and the pixels so a small perturbation in the channel was also transformed on a small perturbation on the pixel value (however, this was not efficient as this did not compress data).

SoftCast is as efficient as digital TV whilst also compressing data linearly (note that current compression schemes are not linear so this is why the numerical properties are lost). SoftCast transforms the video in the frequency domain with a transform called 3D DCT. In the frequency domain, most temporal and spatial frequencies are zeros so the compression sends only the non-zero frequencies. As it is a linear transform, the output presents the same properties. They ended the presentation with a demo that demonstrated the real gains of SoftCast compared to MPEG-4 when the SNR of the channel drops.

Practical, Real-time Full Duplex Wireless, Mayank Jain, Jung II Choi, Tae Min Kim, Dinesh Bharadia, Kanna Srinivasan, Philip Levis andSachin Katti (Stanford University, USA); Prasun Sinha (Ohio State University, USA); and Siddharth Seth (Stanford University, USA)

This paper presents a full duplex radio design using signal inversion (based on a balanced/unbalanced (Balun) transformer)and adaptive cancellation. The state of the art in RF full-duplex solutions is based on techniques such as antenna cancellation and they present several limitations (e.g. manual tuning, channel-dependent). This new design supports wideband and high power systems without imposing any limitation on bandwidth or power. The authors also presented a full duplex medium access control (MAC) design and they evaluated the system using a testbed of 5 prototype full duplex nodes. The results look promising so... now it's the time to re-design the protocol stack!

Session 9. Theory

Understanding Stateful vs Stateless Communication Strategies for Ad hoc Networks, Victoria Manfredi and Mark Crovella (Boston University, USA); and Jim Kurose (University of Massachusetts Amherst, USA)

There are many communication strategies depending on the network properties. This paper explores adapting forwarding strategies that decides when/what state communication strategy should be used based on network unpredictability and network connectivity. Three network properties (connectivity, unpredictability, and resource contention) determine when state is useful. Data state is information about data packets, it is valuable when network is not well-connected whilst control-state is preferred when the network is well connected. Their analytic results (based on simulations on Haggle traces and DieselNet) show that routing is the right strategy for control state, DTN forwarding for data-state (e.g. Haggle Cambridge traces) and packet forwarding for those which are in the data and control state simultaneously (e.g. Haggle Infocom traces).

Optimal Gateway Selection in Multi-domain Wireless Networks: A Potential Game Perspective, Yang Song, H. Y. Wong, and Kang-Won Lee (IBM Research, USA)

This paper tries to leverage a coalition of networks with multiple domains with heterogeneous groups. They consider a coalition network where multiple groups are interconnected via wireless links. Gateway nodes are designated by each domain to achieve a network-wide interoperability.  The challenge is minimising the intra-domain cost and the sum of backbone cost. They used a game-perspective approach to solve this problem to analyse the equilibrium inefficiency. They consider that this solution can be also used in other applications such as power control, channel allocation, spectrum sharing or even content distribution.

Fundamental Relationship between Node Density and Delay in Wireless Ad Hoc Networks with Unreliable Links, Shizhen Zhao, Luoyi Fu, and Xinbing Wang (Shanghai JiaoTong University, China); and Qian Zhang (Hong Kong University of Science and Technology, China)

Maths, percolation theory ... quite complex to put into words

Tagged as: No Comments
22Sep/110

Mobicom. Day 2

Posted by Kiran Rachuri

Day 2 of MobiCom 2011 started with my talk on SociableSense. Fourteen papers were presented over four sessions, including two best papers.

SESSION: Applications

SociableSense: Exploring the Trade-offs of Adaptive Sampling and Computation Offloading for Social Sensing, Kiran K. Rachuri, Cecilia Mascolo, Mirco Musolesi, and Peter J. Rentfrow (University of Cambridge, United Kingdom)

Our work. Details at:

http://www.syslog.cl.cam.ac.uk/2011/07/15/efficient-social-sensing-based-on-smart-phones/

Overlapping Communities in Dynamic Networks: Their Detection and how they can help Mobile Applications, Nam P. Nguyen, Thang N. Dinh, Sindhura Tokala, and My T. Thai (University of Florida, USA)

A better understanding of mobile networks in terms of overlapping communities, underlying structure, organisation helps in developing efficient applications such as routing in MANETs, worm containment, and sensor reprogramming in WSNs. So, the detection of network communities is important, however, they are large and dynamic, and overlapping communication.  Can community detection be performed in a quick and efficient way.

They propose a two phase limited input dependent framework to address this. Phase 1: basic communities detection (basic communities are dense parts of the networks). Phase 2: update network communities when changes are introduced, i.e., handle: adding a node/edge, and removing a node/edge.  The evaluation is based on MIT reality mining data.  They evaluate the proposed scheme with respect to two applications: routing in MANETs and worm containment.

Detecting Driver Phone Use Leveraging Car Speakers, Jie Yang and Simon Sdhom> (Stevens Institute of Technology, USA); Gayathri Chandrasekaranand Tam Vu (Rutgers University, USA); Hongbo Liu (Stevens Institute of Technology, USA);Nicolae Cecan (Rutgers University, USA); Yingying Chen (Stevens Institute of Technology, USA);Marco Gruteser and Richard P. Martin(Rutgers University, USA)

(Joint Best Paper Award)

80% of people talk on cell phone while driving. The consequences of this might be dangerous (18% accidents). They claim that hands-free devices do not help because of the effects in the cognitive load on the driver. Several mobile apps in the market trying to solve that. (zoom safer ïzup, cellsafety). Recent measures:

-hard blocking: jammers, blocking calls etc

-soft interaction: delay calls, route to voice mail, automatic reply

Current apps that actively prevent cell phone use in vehicle only detect the phone is in vehicle or not through: GPS, handover, signal strength, speedometer etc. None of them have capability to find whether phone is used by driver or passenger. They use an acoustic ranging approach to solve this problem.  They identify the position of the cell phone based on the car speakers and mobile phone, and based on speakers emitting different sounds at different times. Cell phone mic has wider range of frequency range: so beep frequency to outside user hearing range.  Evaluation shows that the accuracy of detection is over 90%.

I Am the Antenna: Accurate Outdoor AP Location Using Smartphones, Zengbin Zhang, Xia Zhou, Weile Zhang, Yuanyang Zhang, Gang Wang, Ben Y. Zhao, and Haitao Zheng (University of Calfornia at Santa Barbara, USA)

The density of APs in the environment is very high. How to find the location of an AP?  Conventional AP location methods:

- Directional antenna: Fast, very accurate but expensive

- Signal map: Simple but time consuming

- RSS gradient: Low accuracy, low measurement overhead but low accuracy

Their solution is based on the effect  of user orientation degree to an AP on RSS. The body of the user can affect the SNR (they observed around 13dBm difference). They also tested the generality of the effect with multiple phones, protocols, different users, and environments, and  RSS profiles all followed the same trend.

Evaluation is in a campus, with three scenarios. 1. Simple line of sight (no blocks) 2. complex line of sight (vehicles etc) 3. Non line of sight (line of sight is completely blocked). Metric: absolute angular error: detected direction - actual direction. results: error < 30 degree for 80% cases, in simple LOS (line of sight); error < 65 degree for 80% cases in Non LOS.

SESSION: Cellular Networks

Traffic-Driven Power Saving in Operational 3G Networks,  Chunyi Peng, Suk-Bok Lee, Songwu Lu, and Haiyun Luo (University of California at Los Angeles, USA)

Transmission power of Base Stations increases linearly with the traffic load. The cooling power keeps constant and its comparable to the transmission power. As a result, high energy is consumed energy even at zero traffic. Existing solutions do not address practical issues and they follow a theoretical analysis. In this work, they propose a traffic-driven approach that exploits traffic dynamics to turn off under-utilised BSs for system-wide energy efficiency. They claim that traffic is quite predictable in the base station. There’s a lot of potential to save energy in quite hours but also in peak hours. Their solution also tries to be compatible with current 3G standard/deployment. Issues addressed: Issue 1: how to satisfy location dependent coverage and capacity constraints. Issue 2: how to estimate traffic load ?

Solution: based on profiling: estimate traffic envelope via profiling and leverage near-term stability. The set of BS active in idle hours should be a subset of the ones in peak hours. Their condition is that they should not switch BSs more than once per day. Provide location-dependent capacity. Their estimation is a moving average with 24 daily intervals. However, frequent on/off switching is undesirable: takes several minutes. It should be based on traffic characteristics.

MOTA: Engineering an Operator Agnostic Mobile Service, Supratim Deb, Kanthi Nagaraj, and Vikram Srinivasan (Bell Labs Research, India)

Cellular coverage varies with respect to locations. Users may not be happy with a single service provider, and there is a case for users choosing services from multiple providers. Dual sim phones are already popular in asia. Users are using services based on the cost from the providers. Goal of this work: Ability for users to join the network of choice at will based on location, pricing, and applications.

Solution: to propose changing operator from the user-side. They consider several solutions: Option 1: Centralised approach making decisions but operators unlikely to share network planning information. Option 2: Users use signal strength from different base stations. This is insufficient and can result in poor user experience.

They propose MOTA in which a service aggregator is introduced: new intermediary between users and operator and is responsible for maintaining customer relationships and handles all control plane operations that cannot be handled by a single operator. The also use a Utility function that incorporates fairness. Evaluation is based on the data from one of the largest cellular operators in India.

Anonymization of Location Data Does Not Work: A Large-Scale Measurement Study, Hui Zang and Jean Bolot (Sprint Applied Research, USA)

Call Detail Records (CDR) keep a lot of information about the phone calls of the users and they can be linked to a location. They can be used for marketing, security, LBS, Mobility Modelling, however, privacy might be breached if such data is released. Traditional approaches to protect privacy of users is through anonymisation, however, this works shows that does not work. CDR contains: mobile id, time of call, call durations, start cell id, start sector id, end sector id, call direction, caller id. If mobile id and caller id are anonymised, can we detect the user. Its shown that with gender, zipcode, and birthdate, 87% of USA population can be identified.

Their dataset consists of more than 30 billion call records made by 25 million cell phone users across the USA. Their approach is to infer top N locations for each user and correlate this with publicly available information such as census data. They show that the top 1 location does not yield small anonymity sets, but top 2 and 3 locations do at the sector or cell-level granularity. They also provide possible solutions based on spatial and time domain approaches for publishing location data without compromising on privacy.

SESSION: Infrastructureless Networking.

Enhance & Explore: An Adaptive Algorithm to Maximize the Utility of Wireless Networks, Adel Aziz and Julien Herzen (École Polytechnique Fédérale de Lausanne, Switzerland); Ruben Merz (Deutsche Telekom Laboratories, Germany); Seva Shneer (Heriot-Watt University, UK); andPatrick Thiran (École Polytechnique Fédérale de Lausanne, Switzerland)

This work addresses the problem of providing efficiency and fairness in wireless networks. Their approach is based on maximising a utility function. They propose an algorithm called Enhance and Explore that maximises the utility function. The challenges in designing this scheme are: work on existing mac, non-network wide message passing, and wireless capacity is unknown a priory.

They consider two scenarios: WLAN setting: inter-flow problem and optimally allocate resources. Multi-hop setting: intra-flow problem and avoid congestion. They show analytically that the proposed algorithm converges to a point of optimal utility. Evaluation is through experiments in a testbed and simulations in ns-3.

Scoop: Decentralized and Opportunistic Multicasting of Information Streams, Dinan Gunawardena, Thomas Karagiannis, and Alexandre Proutiere (Microsoft Research Europe, UK); Elizeu Santos-Neto (University of British Columbia, Canada); and Milan Vojnovic (Microsoft Research Europe, UK)

This work aims at leveraging mobility for content delivery in networks of devices experiencing intermittent connectivity. Main challenge: routing / relaying strategies. Existing solutions include epidemic routing. Drawback of existing works are: simplifying assumptions on mobility, and interact contact times are exponentially distributed. This work proposes SCOOP that

  • maximizes some global system objective
  • accounts for storage and transmission costs
  • multi-point to multi-point communications
  • decentralized
  • model-free (allows general node mobility)

There is a necessity to propose a mobility model-free system. They used classic traces: UCSD, Infocom, DieelNet and SF Taxis.  They show that two hops are enough to reach a large percentage of nodes. They also show that the delays in paths between a source and a destination are positively correlated. They aim to identify the strategy optimally exploiting mobility and buffer constraints and relays. However, this is a hard problem. They use a sub-gradient algorithm to solve it efficiently. Evaluation is through numerical experiments. They compared SCOOP with an idealized version of R-OPT of RAPID algorithm (assumes full global knowledge). Performance with respect to delivery ratio is very close to R-OPT.

R3: Robust Replication Routing Wireless Networks with Diverse Connectivity, Xiaozheng Tie, Arun Venkataramani (University of Massachusetts Amherst, USA) and Aruna Balasubramanian (University of Washington).

Wireless routing protocols are designed for specific target environments, like well-connected meshes, intermittently connected MANETs. Problems with this is routing protocols are fragile, and perform poorly outside its target environment. Wireless networks exhibit spatio-temporal diversity, therefore, compartmentalized design is not efficient. Can we design a protocol that ensures a robust performance across networks.

They propose to use Replication routing. They present a model to quantify replication gain. Replication gain depends on the path delay distributions and not just expected value. They study the average replication gain with respect to number of paths using DieselNet-DTN and Haggle traces. They propose R3: a link state protocol that selects replication paths using the proposed model. The scheme also adapts the replication to load.

Evaluation is both on DieselNet DTN testbed and a Mesh testbed. Simulation validation is also performed  using DieselNet deployment. Compared with several protocols. Simulation based on haggle trace shows that R3 reduces delay by up to 60% and increases good put by up to 30% over SWITCH. Simulations on DieselNet-Hybrid shows that R3 improves median delay compared to SWITCH  by 2.1x.

Flooding-Resilient Broadcast Authentication for VANETs, Hsu-Chun Hsiao, Ahren Studer, Chen Chen, and Adrian Perrig (Carnegie Mellon University, USA); and Fan Bai, Bhargav Bellur, and Aravind Iyer (General Motors Research)

Each vehicle possess an On Board Unit (OBU), and broadcasts info for safety and convenience. This information has to be secured. IEEE 1069.2 standard suggests to use ECDSA signature for these messages, however, its expensive for verification and takes around 22ms to verify, and its difficult if many messages arrive in short time. Can we reduce this verification delay. Core idea of this work: entropy aware authentication.

They propose two methods: (1) FastAuth - exploits predictability of future messages. Uses hash to verify location updates instead of ECDSA . The result is 1 us instead of 22000 us in ideal case. (2) SelAuth - selective verification before forwarding. They also reduce the communication overhead. Evaluation is based on real vehicle traces (4 traces), each generated by driving a car along a 2 mile path for 2 hours. Results show that the signature generation is 20x faster and verification is 50x faster compared to ECDSA.

SESSION: Protocols.

E-MiLi: energy-Minimizing Idle Listening in Wireless Networks, Xinyu Zhang and Kang G. Shin (University of Michigan-Ann Arbor, USA)

(Joint Best Paper Award)

Wi-Fi is a popular means of wireless Internet connection. However, Wi-Fi is a main energy consumer in mobile devices, 14x higher than GSM on phone. This is due to cost of idle listening. Moreover, idle listening power is comparable to TX/RX power. Existing solutions are variants of PSM, but, is this good enough. No, this is due to carrier sensing time. To overcome this, they propose E-MiLI that reduces the power consumption of idle listening. They down-clock the radio in idle listening mode. Down-clocking by 1/4 saves power by 47.5%. The key challenge is how to decode a packet given that receiver sampling rate should be no less than senders clock rate to decode a packet. The solution proposed is to separate detection from decoding.They add a preamble to 802.11 packet that can be detected by low clock rates.

One issue with this is false triggering. Packets intended for one client may trigger all other clients and this is a waste of energy. The second problem is the energy overhead caused by large preambles. The solution is a minimum-cost address sharing to allow multiple nodes to be assigned the same address. Address allocated according to channel usage. There’s a delay caused by cold-rate switching too. To reduce this they use opportunistic downclocking. Evaluation is with respect to: Packet detection: software radio based experiments, Energy consumption: through Wi-Fi traces, and Simulations using ns-2. Results: When SNR is above 8dB, miss detection probability is almost zero. They achieved close to 40% energy saving.

Refactoring Content Overhearing to Improve Wireless Performance, Shan-Hsiang Shen, Aaron Gember, Ashok Anand, and Aditya Akella (University of Wisconsin-Madison, USA)

The main aim is to improve on wireless performance by leveraging overheard packets. Several techniques available currently, but,  none of these leverage duplicate data. This work takes a content based overhearing approach and suppresses duplicate data transmission. Ditto is first work that used content based overhearing approach,  but it works at the granularity of objects, and does not remove sub packet redundancy. Moreover, it only works for some applications. This work presents REfactor content overhearing:

(1) this scheme puts content overhearing at the network layer, and this results in savings across applications.  Transport layer approach (used in Ditto) ties data to application or object chunk. Network layer approach reduces redundancy across all flows. Transport approach also requires payload reassembly.

(2) this scheme identifies sub-packet redundancy. This saves transmission times. Ditto only works in 8 - 32kb object chunks, whereas the proposed scheme operates at a finer granularity. This results in savings from redundancy as small as 64 bytes. and this also results in leveraging any overhearing even a single packet.

Evaluation through test-bed experiments show 6 to 20% improvement in Goodput. Simulation results also show that 20% improvement is achieved in Goodput.

Distributed Spectrum Management and Relay Selection in Interference-Limited Cooperative Wireless Networks, Zhangyu Guan (Shandong University, P. R. China); Tommaso Melodia (State University of New York at Buffalo, USA); Donfeng Yuan (Shandong University, P. R. China); and Dimitris A. Pados (State University of New York at Buffalo, USA)

Emerging multimedia services require high data rates. This work aims to maximize the capacity of wireless networks by leveraging the frequency and spatial diversity. Frequency: by dynamic spectrum access, and this improves spectral efficiency. Spatial: by cooperative communication, and this enhances link connectivity. Problem: maximize sum utility (capacity, log-capacity) of multiple concurrent traffic sessions by jointly optimizing relay selection (whether to cooperate or not) and direct transmission. Problem formulated as mixed integer non-convex problem. This is NP hard. They propose a solution based on branch and bound that is able to find a globally optimum solution. Polynomial time  solution is not guaranteed but in practice it works well. Evaluation is based on simulations. Results show that the proposed schemes converge very fast. Centralized algorithm achieves at least 95% of the global optimum, and distributed schemes are very close to optimal.

 

10Sep/110

The San Diego Trip: An Overview of this year’s SIGKDD Conference

Posted by an346

This year's SIGKDD conference returned after 12 years to San Diego, California to host the meeting of Data Mining and Knowledge Discovery experts from around the world. The elite of heavy-weight data scientists was hosted at the largest hotel of the West Coast and together with industry experts and government technologists enumerated more than 1100 attendees, a record number in the conference's history.

The gathering kicked off with tutorials and the parallel of two classics; David Blei's topic models and Jure Leskovec' extensive work on Social Media Analytics. Blei offered a refreshing talk that stretched, from the very basics of text-based learning, to the most up to date extensions of his work with applications in streaming data and the online version of the paradigm that allows one to scale up the model to huge datasets satisfying the requirements of modern data analysis. Leskovec elaborated on a large spectrum of his past work, covering a wide range of topics including the temporal dynamics of news articles, sentiment polarisation analysis in social networks and information diffusion in graphs by modelling the influence of participating nodes. The first day's menu on the social front was completed with Lada Adamic' presentation on the relationship between structure and content in social networks. Her talk at the Mining and Learning with Graphs Workshop provided an empirical analysis on a variety of online domains, that described how the flow of novel content in those systems was evident of variations in the patterns of interaction amongst individuals. The day closed with the conference's plenary open session that featured submission and reviewing highlights and the usual KDD award ceremonies: the latter session honoured the decision trees man, Ross Quilan, who presented a historical overview of his work and a data mining legion of 25 students from NTU that won this year's KDD cup on music recommendations.

After the second night of sleep and repetitive jetlag ignited wake ups, Monday rolled in and the conference opened with sessions on user classification and web user modelling. A follow up in the afternoon with the presentation of the (student) award winning work on the application of topic models for scientific article recommendation attracted the interest of many. The dedicated session of the conference on online social networks also signified the interest of the Data Mining community for the nowadays hot domain. The latter opened with an interesting work on predicting semantic annotations in location-based social networks and in particular the prediction of missing labels in venues that lacked user generated semantic information. While the machine learning part of the work was sound, its applicability as a real problem was doubted, suggesting the need to identify the essential challenges in a relatively new application area. Nonetheless, the keyword of the day was scalability:  two talks focused on an ever classic machine learning problem, clustering,  introduced in the context of the trendy Map Reduce model. Aline Ene from University of Illinois introduced the basics, whereas the brazilian Robson Cordeiro offered novel insights with a cutting edge algorithm for clustering huge graphs. The work driven by the guru Christos Faloutsos featured the elegance of simplicity with the virtues of effectiveness, showing that for some size does not matter and petabytes of data can be crunched in minutes. A poster session came to shut the curtains of another day. The crowd was not discouraged by the only-one-free drink offer of the conference organisers and a vibrant set of interactions took place. Some were discussing techniques, some were looking for new datasets, while social cliques were also forming in the corners of the hotel's huge Douglas Pavilion.

Day 3 drove the conference participants to the dark technical depths of the well established topic of matrix factorisation, that was succeeded by the user modelling session.Yahoo!'s Bee-Chung Chen gave an intriguing presentation on a user reputation in a comment rating environment, followed by the lucid talk of Panayiotis Tsaparas on the selection of a useful subset of reviews for amazon products that were plagued by tones of reviews. The Boston-based Greek gang of Microsoft Research, also showed how Mechanical Turk can be used to assess the effectiveness of review selection in such systems.  Poster session number 2 closed the day and the group's work on link-prediction in location-based social networks was up. The three hour exhaustive but fruitful interaction with location-based enthusiasts, agnostics and doubters was a good opportunity to get the vibe of the community in an up and coming hot topic. For application developers and online service providers the work was an excellent example of how location-based data could be used to drive personalised and geo-temporally aware content to users. For data mining geeks it presents an unexplored territory where existing techniques could be tested and novel ones devised. At the end of the poster session many of the participants headed for a taste of San Diego's downtown outing, whereas the relaxing boat trips at the local gulf were also highly preferred.

The final day of the conference was marked by Kaggle's visionary entrepreneur Jeremy Howard and a panel of experts in data mining competitions. The panel aimed to analyse the problems that were risen during previous competitions and the lessons learned for the creation of new successful ones. Howard presented radical views suggesting that the future of data mining and problem solving would be delivered in the form of competitions. Not only competitions could attract an army of approximately 10 million data analysts around the globe, but the design of them could promise a sustainable economic model that would bring money to all participants (even non-winners) and would perhaps put at stake a respectable number of PhD careers. His philosophy was driven by the idea that to solve challenging problems effectively, you need to awaken the diverse pool of minds that is out there and can constitute an infinite source of innovation.

But KDD attracted not only the interest of scientists and corporate experts, but also that of politicians. Ahead of 2012 elections the Obama data mining team is here and hiring! Rayid Ghani chief scientist at Obama for America highlighted the important role of predictive analytics and optimisation problems in the battle for an electorate body that is traditionally positioned to announce winners by only small margins of difference. It is left to see whether science will beat Tea Party style propaganda and will maximise positive votes in a bumpy and complex socio-political landscape. The political world was also also (quietly) represented by government data scientists and secret service analysts who were seeking to catch up with the state of the art in data mining and knowledge discovery, a vital survival requirement in a world overflowed with data and subsequent leaks...

The full proceedings of KDD 2011 can be found here.

15Jul/110

Socio-spatial properties of online social networks

Posted by Salvatore Scellato

Some social scientists have suggested that the advent of fast long-distance travel and cheap online communication tools might have caused the "death of distance": as described by Frances Cairncross, the world appears shrinking as individuals connect and interact with each other regardless of the geographic distances which separates them. Unfortunately, the lack of reliable geographic data about large-scale social networks has hampered research on this specific problem.

However, the recent growing popularity of location-based services such as Foursquare and Gowalla has unlocked large-scale access to where people live and who their friends are, making possible to understand how distance and friendship ties relate to each other.

In a recent paper which will appear at the upcoming ICWSM 2011 conference we study the socio-spatial properties arising between users of three large-scale online location-based social networks. We discuss how distance still matters: individuals tend to create social ties with people living nearby much more likely than with persons further away, even though strong heterogeneities still appear across different users.

15Jul/110

Efficient Social Sensing based on Smart Phones

Posted by Kiran Rachuri

Mobile smart phones represent a perfect platform for building systems to capture the behaviour of users in the work-places, as they are ubiquitous, unobtrusive, and sensor-rich devices. However, there are many challenges in building such systems: mobile phones are battery powered and the energy consumption of sensor sampling, data transmission, and resource intensive local computation is high, the mobile phone sensors are inaccurate and not specifically designed for the purpose of capturing user behaviour, and finally, the local and cloud resources should be used efficiently by considering the changing mobile phone resources.

We address the above technical challenges for supporting social sensing applications in a paper to be presented at the upcoming ACM MobiCom '11 conference.

In the paper we describe the design, implementation, and evaluation of SociableSense, an efficient and adaptive platform based on off-the-shelf mobile phones that supports social applications aiming to provide real-time feedback to users or collect data about their behaviour.

The key components of the system are:

- A sensor sampling component adaptively controls the sampling rate of accelerometer, Bluetooth, and microphone sensors while balancing energy-accuracy-latency trade-offs based on reinforcement learning mechanisms. The learning mechanism adjusts the sampling rate of the sensors based on the context of the user in terms of events observed (interesting or not), i.e., the sensors are sampled at a high rate when there are interesting events observed and at a low rate when there are no events of interest.

- A computation distribution component based on multi-criteria decision theory dynamically decides where to perform computation of tasks by considering the importance given to each of the dimensions: energy consumption, latency, and data sent over the network.  For each classification task that needs to be processed, this scheme evaluates a utility function to decide on how to effectively distribute the subtasks of the classification between the local and the cloud resources.

We show through several micro-benchmark tests that the adaptive sampling scheme adjusts the sampling rate of sensors dynamically based on the user's context and balances energy-accuracy-latency trade-offs. We also evaluate the computation distribution scheme in terms of selecting the best configuration given the importance assigned to each performance dimension, and show that the computation distribution scheme efficiently utilises the local and the cloud resources and balances energy-latency-traffic trade-offs by considering the requirements of the experiment designers.

To further demonstrate the effectiveness of the SociableSense platform, we also conduct a social experiment using an application that determines the sociability of users based on colocation and interaction patterns. The use of computation distribution scheme leads to approximately 28% more battery life, 6% less latency per task, and 3% less data transmitted over the network per task compared to the model where all the classification tasks are computed remotely.

Kiran K. Rachuri, Cecilia Mascolo, Mirco Musolesi, Peter J. Rentfrow.  SociableSense: Exploring the Trade-offs of Adaptive Sampling and Computation Offloading for Social Sensing. In Proceedings of the 17th ACM International Conference on Mobile Computing and Networking (MobiCom '11), Las Vegas, USA. [PDF]