Posted by & filed under Location, Mobile, Networks, Social, Workshop.

I’m at the Open Data institute, with Richard Mortier, Jon Crowcroft, Amir Chaudhry and Hamed Haddadi , live-blogging a daylong workshop about our emerging Human-Data Interaction research initiative.  The room is packed with notable researchers from all over the UK, so this promises to be an exciting day!

Mort opens up with an introduction to the format of the workshop and lays out the problem space:

  • Visualisation and sense making: how do we make sense of such complex, technical systems, and crucially, what is the process of translating the information to
  • Transparency and audit: is essential to give a feedback loop to users
  • Privacy and control: 
  • Analytics and commerce: there is clearly an ecosystem forming around personal data (c.f. several startups specifically around this
  • Data to knowledge: what are the new business models around

Attendee introductions

Danielle (missed her last name, is from Switzerland): did a PhD on anomaly detection and then worked at Swiss banks on security monitoring. The key challenges in the area are around the nature of the questions we need to ask about all the data that we have access to.

Steve Brewer: coordinator of the IT as a Utility network which is funding all this.  Interested in the next steps: both immediate and the overall vision and direction it’s all going.  Concrete actions emphasised.

Ian Brown: Oxford University and cybersecurity. Interested broadly in these issues and also has a degree in psychology and behavioural psychology.  The challenge is how to balance the “engineering” (this isnt fundamental computer science) and understand why there is so little takeup of this.

Amir Chaudhry: working on a toolstack for the Internet of Things.  When we have tweeting toasters, how do we make it all useful to people. It’s not the raw data, but the insights we present back to people to make it useful.

Elizabeth Churchill: was at Yahoo, now at eBay Research.  Right now the challenge I’m facing is personalisation, and is trying to understand why the models they have of people are rubbush (this is why so much advertising isnt very useful).  The reasoning processes and data quality is important.  We are making assertions about people based on data they have gathered for another purpose, and the ethical and business issues are important. Has been pushing on : what the data source, what is the data quality, and are these assertions based off data thats appropriate, and how can we design new algorithms”

Jon Crowcroft: from the Computer Lab in Cambridge and does anything to do with communication systems, and leads the Horizon project with Mort and PAWS which is working on wifi access in disadvantaged areas. Also working on Nymote.  Is interested in new human/data rights vs t&cs.  Rights and duties are encoded in terms and conditions (badly) — see Latier’s latest book about this, and see how poor Amazon’s recommendations are.  We’re interested in building a technology where you own your own data but business practises all gel together.  We had a workshop at how the EU is pushing the right to be forgotten, so how can we ensure that data can be removed including all references.  People go “its too difficult”, but this isn’t true — takedowns work, and why is it that only big corporations can afford to take down stuff.  The right to oblivion (“not be there in the first place”) and data shouldn’t be a public good but people should have the right to be in a paid avoidance zones. (See Shockwave Rider by Brunner, 30 years old, loads of great technical ideas, and Future Shock is a good read too).  Can we shift regulatory positions and have

Thomas Dedis: runs Fab Lab in Barcelona.  Cofounder of Smart Citizen, crowdfunded environment sensing platform based on a piece of hardware based on Arduino.  Allows people to capture data in their own homes, and push it to an online platform.  Allows people to push it in a ‘conscious’ way? Go to smartcitizen.me to see the data platform.  Intended to grow into other places, but capturing air pollution, humidity, temperature and other environment data.  Main thing is that you own your own data.  Much cheaper than the competition too and crowdfunding.

Martin Dittus (@dekstop) is a PhD student at UCL and works on cities.io. Thousands of people mapping the planet in incredibly detailed ways and it works with both commercial and non-commercial stuff. What makes these systems work, what are the processes to coordinate things, questions of data quality and how to assert stuff over it?  Used to be part of the last.fm data team which is about personal data gathering and detailed profiling that users themselves put up.

Christos Efstratiou: as of September is a lecturer at University of Kent (and is still a visiting research at Cambridge).  Works on sensing, more broadly that has to do with people and sensors in the environment and embedded sensing in the environment.  Privacy is a huge issue and is his key challenge.  This isn’t sensing in the old style like Active Badge — back then, people werent aware of the issues and nowadays, people are more aware of privacy.  So the challenge is the evolving user perceptions and how our system design works.  Anecdote: at a recent wedding he was at, there was a request from the  bride/groom to note post any public photos to Twitter/Facebook. We’ve lost control over our public personas.

Julie Freeman: an actual resident in this building and the art associate for the ODI space!  Is a PhD student at QMW and is interested in echoing other people.  Interested in physical manifestations of digital data, and how we can “step away from the screen”.  Broad interests in transformation of digital data.

Barbara Grimpe: from Oxford and is a sociologist in an interdisciplinary EU project on “Governance for Responsible Innovation”.  They have a network notion of responsibility which takes into account that a lot of data production and data use takes place in networks that posses strong ties between people.  She started two cases studies in two area: telecare technologies for elderly people (relevant due to the EU Horizon 2020 societal challenge in aging and wellbeing due to the demographic change in western societies, and this brings the ethical issue around the use of personal data at scale); there is difficulty of informed consent due to dementia also. The other case study is completely different and is about financial market technology and the data is highly commercially sensitive, so understanding how transparency can be balanced against financial control and the need for genuine market secrets to facilitate trade.

Wifak Gueddana is a postdoc at information systems group at LSE.  Did her PhD on open source communities and how open source can work for NGOs and grass roots.  Working on a research project for online platforms — how to use computational analytics to collect and process data and deal with the qualitative and subjective issues.

Hamed Haddadi: lecturer in QMUL, asking how we can compute over mobile data and has worked on advertising and privacy aware system systems.  Linking the temporal nature of the data and understanding how much of your data is exposed (amusing anecdote about wearing shorts)

Muki Haklay: professor of GIScience and the Extreme Citizen Science group. He’s interested in participatory citizen science and cyberscience.  Interested in GeoWeb and mobiile GeoWeb technologies and understanding how tools for voluntary participation work (open street map, participatory geoweb).  ”How can you give control to people that are not technical” and how do we build protocols for cultural sensitivity?  Working on Open Street Map, while its easy for techies to contribute and feel happy, it might be an issue from a privacy perspective without GPS fuzzing or pseudonyms (you can say “I know where you live” to every OSM user). (discussion about most data being rubbish and is a psychology question about whether this depresses people!)

Jenny Harding: from the Ordnance Survey who control most of the UK’s mapping data and is a principal scientist in the research team team on working on how people interact with geography in business, public service and leisure.  Moving beyond just GPS into the connective function about what’s going on in places, and the connection between different objects and places.  What is the purpose for needing all this connected information, and how can the Ordnance Survey better serve its users with such usable connected data?  She commissions internal and external research and this includes PhDs and postdocs.  Challenge for this workshop: how personal data relates to location and the different levels of granularity at which such relationships can be made — beyond GPS, there is address data in (e.g.) supermarket loyalty cards, and other data at a postcode level, and different types of geographies all have connections.  Understanding provenance of data is really really important .

Pat Healey: Professor of Human Interaction and head of cognitive science research group and workson health issues.  Not here yet so introduced by Hamed.

Tristan Henderson; lecturer in compsci at St Andrews in Scotland, did his PhD on first person shooter games and runs a widely used mobile data set called CRAWDAD. They archive and share it and so work a lot on redistribution of data. His undergrad was economics so his interest is on behavioural aspects and usability issues too (Tristan has recently joined the HCI group and the ethics committee at St Andrews).  How can we get researchers to further engage with the ethics process and to refine the notions of informed consent in electronic terms.  Challenges: what is acceptable HDI and are we conducting it in an acceptable way (q from Mort: how broad? a: everything).  And is HDI unique enough that we might need another term.

Laura James: from OKF (not here yet)

Helene Lambrix: visiting LSE and usually at Paris-Dauphine University in France and is hoping to finish her PhD this year! Interested in corporate reputation and social media.

Neal Lathia: researcher at Cambridge University and did his PhD on online recommender systems. Noticed a disparity between data services online and the offline word so started working on recommender systems for urban systems (banging head against TFL data).  At Cambridge, started working with psychologists and leads EmotionSense (how satisfied are you with your life, as well as ongoing smartphone sensor data) — its really popular.  Challenges: language issue around how we present issue and motivate people around using that data (how does using EmotionSense affect their behavior)

Panos from Brunel: interested in cybersecurity and intelligence from media data mining and cloud based work. Commodification process of digital economy data and what is the legal frameowork surrounding this.  What are the personas for data to apply frameworks and data mining techniques to it?  Challenges: regulatory system using big personal data.

Eva from University of Nottingham and has just submitted PhD and waiting viva. Background is political science and internationl relations, and is interested in informed consent and how we sustain consent rather than just secure it as a one-off. The challenges: the human bit and how we communicate the complexity of systems and how people can make meaningful decisions. If you want people to be engaged with process then we need data to be more social and human.

Ursula Martin: professor in a Russel Group university in Mile End road. Is a mathematician and is researching the production of mathematics, and how it happens in the first place. Producing maths is a slow, painstaking thing, and is wondering how the rate of production can keep up with our needs. When interacting with an outfit getting her data, she’s not just an isolate, but is actually part of a large complex system.

Richard Mortier “mort”: From Nottingham and is the dude running this workshop so has introed before!

Nora Ni Loideain: doing a PhD in the european data protection and this requires the mandatory retention of data by telecomms provider. References recent US events cf Snowden and her PhD is on privacy oversight and the guards (or lack thereof) in current frameworks.  Challenges: how can we build these safeguards and what is the nature of informed consent with these.

Ernima Ochu: based in Manchester. Sometimes an activist, sometimes a writer, sometimes an artist. Background in neuroscience!  The social life of data and what happens around it, and how people get around based on it.

Yvonne Rogers and is from team UCL and is director of UCLIC (Interaction Center) and also an Intel-funded institute at UCL where they work on connected data.  Given lots of data from sources, interested in how people can engage with it and how to visualise . (Hamed: “Human City Interaction is the next thing!”)

Geetanjali Sampemane; background in Computer Science, and works at Google on Infrastructure Security and Privacy group.  Challenge is how to help people calibrate the benefits and risks for appropriate tradeoffs.  How can humans make an informed choice, and this isn’t based on informed choice.  Giving people buttons and options isn’t the most useful way to approach this, and we need a mental model similar to how we judge risks in the physical world.  In the online world, noone understands how dangerous it is to reuse passwords. Security people have made is a little worse by telling people to use complicated passwords, but brute force isn’t the big problem right now, it’s the connectivity of services.

Cerys Willoughby: Southhampton and looking at the usability of notebooks and wondering how to make the interfaces usable without being a technological guru. (missed rest due to reading cool comic she projected about her work. Sorry!)

Eiko Yoneki: from Cambridge, and she works on digital epidemiology and real world mobility data collection in Africa (e.g. EipPhone).  She analyses network structure to understand graphs and connectivity.  Also works on CDNs and builds self-adaptive CDNs, and works on graph-specific data parallel algorithms.

Jonathan Cave: game theorist (Yale, Cambridge, Stanford) and worked in a lot of government/academia/research areas. Works on economics and IoT (festival of things and the boundaries of humanity is coming up soon in Cambridge on 29th October).  Fascinating anecdote about price of sick animals

George Danezis: formerly MSR now UCL, and is interested in technical security aspects of how to glue together distributed mobile devices and not leave our personal data unreadable.  Has done work on privacy friendly statistics and how we can process and analyse it as an aggregate data set. Has worked in the context of smart metering in Europe.

Breakout sessions

We then had breakout sessions to brainstorm the challenges in this area (lots of post it notes and arguments), and Amir Chaudhry has summarised the results of the 5 presentations here:

Disambiguating data. For example from the home and organisations. This isn’t a new problem but becomes more important the more data collection occurs using different sources.  Who are the principles in terms of onwership and provenance?  How do we deal with communities/groups  and data control?

Why not try crowd sourcing mechanisms for people to use so that they can improve the use of sites like Ryanair (who deliberately obfuscate things).  Changing mindset from consumer perspective to a producer perspective.  i.e humans make data and perhaps can provide this to others for economic benefits.

We have data and different notions of data quality.  It’s not always the case that the most scientific data is the best, depending on how it’s used.  We have Collectivist notions of data culture: e.g this conversation right now in the room isn’t just individual, it’s all of us so we can’t ascribe it to individuals. Then there are Network notions, based on transactions that use a reductionist view to decide what they’re worth.  Can think of these on a continuum and the research challenge is how well do different ends of above scale in producing data and making value.  Interesting question is if people opt out (right to forget or right to oblivion), then the data that is left is flawed.

We need to examine current assumptions and values around data.  What is the unit of analysis? Must be clearer on this. Where and when we look at what data also matters as well as global aggregation.  Sometimes also want to look at trajectories and data flows and how this changes over time.  How do we interact with this data.  Do users interact with data directly or with something that sits in front of it?  There’s an interaction between data science and the creative process of presenting information in a certain way.  This depends on ultimate goal being served, for example e.g beavioural changes or just increased engagement.

There are big challenges in integration of groups who want to construct humans. Groups like social sciences (think about risk), Psychological science (reputation), Data sciences (Epistomology and Ontology), Design science (Interfaces and interactions). What is the new meaning of ownership and liability? e.g who owns this collection of posters and the ideas that have come out? [Hamed and Mort clarify that it's all theirs!]  What happens if there are negative consequences as a result of using poor data?

Business models are also important in order to go from studies to practice. Are there new social structures we could make to help this?  For example, we have venture capital that takes risks but what about social capital to spread risk to create new businesses e.g kickstarter and the like.

What does informed consent mean? Current system puts onus on user to understand all the contractual conditions before deciding.  Perhaps there’s a social-network method of crowd-sourcing opinions on ToS or providing some kind of health rating?  Perhaps data protection agencies could certify terms or maybe the EFF or non-profits can provide some kind of rating system (c.f Moody’s etc?). For example from the home and organisations. This isn’t a new problem but becomes more important the more data collection occurs using different sources.  Who are the principles in terms of onwership and provenance?  How do we deal with communities/groups  and data control?

Why not try crowd sourcing mechanisms for people to use so that they can improve the use of sites like Ryanair (who deliberately obfuscate things).  Changing mindset from consumer perspective to a producer perspective.  i.e humans make data and perhaps can provide this to others for economic benefits.

We have data and different notions of data quality.  It’s not always the case that the most scientific data is the best, depending on how it’s used.  We have Collectivist notions of data culture: e.g this conversation right now in the room isn’t just individual, it’s all of us so we can’t ascribe it to individuals. Then there are Network notions, based on transactions that use a reductionist view to decide what they’re worth.  Can think of these on a continuum and the research challenge is how well do different ends of above scale in producing data and making value.  Interesting question is if people opt out (right to forget or right to oblivion), then the data that is left is flawed.

We need to examine current assumptions and values around data.  What is the unit of analysis? Must be clearer on this. Where and when we look at what data also matters as well as global aggregation.  Sometimes also want to look at trajectories and data flows and how this changes over time.  How do we interact with this data.  Do users interact with data directly or with something that sits in front of it?  There’s an interaction between data science and the creative process of presenting information in a certain way.  This depends on ultimate goal being served, for example e.g beavioural changes or just increased engagement.

There are big challenges in integration of groups who want to construct humans. Groups like social sciences (think about risk), Psychological science (reputation), Data sciences (Epistomology and Ontology), Design science (Interfaces and interactions).  What is the new meaning of ownership and liability? e.g who owns this collection of posters and the ideas that have come out? [Hamed and Mort clarify that it's all theirs!]  What happens if there are negative consequences as a result of using poor data? Business models are also important in order to go from studies to practice. Are there new social structures we could make to help this?  For example, we have venture capital that takes risks but what about social capital to spread risk to create new businesses e.g kickstarter and the like.

What does informed consent mean? Current system puts onus on user to understand all the contractual conditions before deciding.  Perhaps there’s a social-network method of crowd-sourcing opinions on ToS or providing some kind of health rating?  Perhaps data protection agencies could certify terms or maybe the EFF or non-profits can provide some kind of rating system (c.f Moody’s etc?)

Next steps

Ian Brown took notes on our breakout session on business models for privacy:

Collectives/cooperatives sharing data through PDSes

  • How to incentivise membership? Dividends, social benefit (e.g. medical research)
  • what currently exists where data controller has strong incentive not to leak/breach data e.g. Boots for brand loyalty, Apple/Facebook? So long as customer can switch effectively (portability, erasure)
  • Power tool sharing at village level. Hang off existing structures e.g. local councils.
  • New forms of micro-markets e.g. physical gatherings? Alternatives to currencies. Kickstarter? Distribution reduces risk of centralised architectures.
  • What do syndicalist-anarchist models of data management look like?
  • Current uses of data are optimising existing business practices. But what totally new practices could be enabled? Human-facing efficiencies?

What are types of biz models? Startups, personal profit, NGO, medical research, banks. Balanced investment portfolio.