DATAB X 01000100 01100001 01110100 01100001 01100010 01111000 - - PowerPoint PPT Presentation

datab x
SMART_READER_LITE
LIVE PREVIEW

DATAB X 01000100 01100001 01110100 01100001 01100010 01111000 - - PowerPoint PPT Presentation

On the Edge of Human-Data Interaction with the DATAB X 01000100 01100001 01110100 01100001 01100010 01111000 Richard Mortier Networks & Operating Systems SRG, Computer Laboratory Living in a Big Data World Challenges and


slide-1
SLIDE 1

Networks & Operating Systems SRG, Computer Laboratory

On the Edge of Human-Data Interaction 
 with the

Richard Mortier

DATAB X

01000100 01100001 01110100 01100001 01100010 01111000

slide-2
SLIDE 2

http://weputachipinit.tumblr.com/ “It was just a dumb thing. Then we put a chip in it. Now it's a smart thing.” http://bigdatapix.tumblr.com/ “Big Data is visualized in so many ways... all of them blue and with numbers and lens flare.”

Living in a Big Data World

  • Challenges and Opportunities
  • Who’s tracking us, to what end?
  • Personalisation, Internet of Things
  • Digital Footprints
  • Intimate information in large, rich data silos
  • Never forgets or forgives

Key Challenge: How do we enable data subjects to control collection and exploitation of both their data and data about them?

2

slide-3
SLIDE 3

Existing Ecosystem: Move Data

3

your data your data you processors your data data data your data

slide-4
SLIDE 4

A Structural Problem?

  • The Internet is fragmented, distributed systems are difficult
  • Centralising simplifies things
  • With the cloud, we can, so we do!

4

https://www.stickermule.com/marketplace/3442-there-is-no-cloud

  • Ease of cloud computing means,

by default, we move data to the cloud for processing

your data your data you processors your data data data your data

slide-5
SLIDE 5

Restructuring the Problem

  • Horizon Digital Economy Research, Nottingham, UK ~2009
  • [Them] Build us a Magic Context Service! [Me] WTF even is that?!
  • No-one could explain, but it definitely involved using personal data
  • I’m a lazy computer scientist so I punted on the hard problems
  • I don’t know what you want when you say you want context
  • But if you give me some program that encodes what you want, I’ll run it for you
  • Dataware — effectively a service-oriented architecture for personal data processing
  • Data Processor writes some code to process the Data Subject’s data
  • Subject provides the platform on which to run that code
  • Processor gets the result
  • Key: Move code to data, not data to code

5

slide-6
SLIDE 6

❶ request

permission

processing

Dataware

6

sources

processors

results

subjects

❺ interac(ons

slide-7
SLIDE 7

Constructing Interaction

  • Many proposed interaction models
  • E.g., pay-per-use
  • Little about how to actually provide for it
  • E.g., Exactly what am I being paid for?
  • Dataware was a technical proposal supporting some forms of interaction
  • Accountable transaction between parties in terms of request,

permission, audit

  • But there’s a lot more to consider here…

7

slide-8
SLIDE 8

Human-Data Interaction

8

slide-9
SLIDE 9

Human-Data Interaction

  • Data is collected
  • Analytics to process

data

  • Inferences are drawn
  • Actions taken as a

result

9

slide-10
SLIDE 10

Lack of Legibility

Visualisation & comprehension

  • We are generally unaware of
  • the many sources of data

collected about us,

  • the analyses performed on this

data, and

  • the implications of these analyses

10

https://flic.kr/p/6thmfN

E.g., Computation of credit scores

slide-11
SLIDE 11

Lack of Agency

Capacity to act

  • We are generally unaware of
  • the means we have to affect data

collection,

  • the means we have to affect data

analysis,

  • if they even exist, and we know

enough to want to employ them

11

E.g., Use of retail data to profile your propensity 
 to risk for sale to an insurance agency

http://appadvice.com/appnn/2012/04/facebooks-acquisition-of-instagram-just-another-question- mark-for-internet-privacy

slide-12
SLIDE 12

Lack of Negotiability

Support for dynamics of interaction

  • Even if we know the data collected and

analysed about us, and understand how to enact choices over these

  • We’re still trapped by current systems

and services

  • Binary accept/reject of terms
  • Cannot subsequently modify or refine
  • ur decisions

12

slide-13
SLIDE 13

Databox: Dataware v2

13

Databox moves code to the data, minimising data release and retaining control over processing

  • Mediates access to data, local or remote
  • Control internal and external communications
  • Log all I/O for users to inspect, control

you

your data your data your data your data data your data data

your databox

processors

slide-14
SLIDE 14

14

slide-15
SLIDE 15

Databox: Move Contained Code!

  • Install apps to process data

locally

  • Ingest/release data via drivers
  • App manifests describe data 


they will access,

  • …when made into concrete

SLAs on installation

15

subject

databox

data app data driver data driver data driver data driver data app driver

processors

app store

slide-16
SLIDE 16

Databox Platform

  • All components are 


Docker containers

  • Lightweight virtualisation provides

platform independence, isolation, and management

  • Four core platform components
  • Container Manager
  • Arbiter
  • Core Network
  • Data store(s)

16

Arbiter GitHub App Driver Container Manager Proxy

Dashboard User

Core Network CoreUI AppStore

slide-17
SLIDE 17

Databox Platform

17

subject

data app data driver data driver data driver data driver data app driver

processors

app store

databox platform

container- manager arbiter

app-netif app-netif driver-netif

core- network

system-netif

  • Container Manager manages container

lifecycle

  • Arbiter manages access control tokens
  • Persistent storage and 0MQ-based

middleware layer via provided data stores

  • Data stores registered in hypercat catalogue
  • Inter-container communications controlled

by core-network interconnecting separate virtual interfaces

slide-18
SLIDE 18

Container Lifecycle

  • Apps and drivers come with a Manifest, covering
  • origination metadata,
  • data access and storage requirements,
  • remote access requirements
  • Installation
  • user input realises manifest as a Service Level Agreement,
  • obtains access tokens (macaroons) from the Arbiter,
  • creates a per-app bridge and configures connectivity via Core Network,
  • starts the app/driver’s containers, including a Store

18

slide-19
SLIDE 19

Accessing Data Stores with Zest

  • Originally simple HTTP/REST API
  • Unsuited to high-frequency sensor data
  • Memory footprint unsuited to rPI
  • Zest: CoAP over 0MQ
  • RESTful-like, key-value and timeseries retrieval

controlled by macaroons

  • Irmin (git-like) backend supporting JSON, text, binary

data

  • Encryption via CurveZMQ, integration with HyperCat
  • About half the CPU load and memory footprint of

HTTP solution

  • Audit logging

19

Temperature driver Solar generation driver Power consumption driver Real-time dashboard app Historical analysis app

store

route

store

route

store

route ZEST (OBSERVE) ZEST (GET)

CoAP/TCP: https://tools.ietf.org/html/draft-ietf-core-coap-tcp-tls-09 0MQ: http://api.zeromq.org/

slide-20
SLIDE 20

Enabling Physical Interactivity

  • Physical devices often easier to reason about
  • Visible; Located; Proximate; Portable
  • Physical access control (“bag of keys”) is


widely understood

  • For example,
  • “access to our smart meter data allowed only if a green tag is in my

Databox and in my partner’s Databox, or when the green tag is in one Databox and we’re both in the house”

  • Alternatively, physical interactions providing for virtual connectivity

20

slide-21
SLIDE 21

Democratising App Development

  • Install and connect existing apps
  • Plug together apps and components to customise your apps

21

datastores processors

  • utputs

hue bulbs mobile sensors smart plugs map, reduce filter convert actuate display write to store

slide-22
SLIDE 22

Rich Visualisations of Rich Data

22

svg image image parts data x y z

rotate x degrees scale by y/2 fill with colour z

(i,j)

translate to (i,j)

transform

slide-23
SLIDE 23

Privacy-Informed Access Control

  • Access control through tokens (macaroons)
  • minted by the Arbiter,
  • verified by a Store,
  • apply to URI paths
  • Exploring generic measures

  • f privacy risk, e.g.,
  • (entropic) surprisal,
  • (statistical) autocorrelation,
  • (similarity) k-anonymity, l-diversity, t-closeness
  • Dynamic determination of risky access
  • Static analysis of overall configuration risk

23

1110100 10100011 10101000 01001100 00101010 01001101 01111101 010010 0100011 00101011 11011100 00101010 01110110 01000001 10110001 000101 0011000 00001010 10101111 11011000 11100111 01000111 00111000 000010 1101110 00101110 01000010 01101010 11111100 01101110 11100001 000100 0001001 01111100 01011111 01111110 00010111 11010101 10010000 101011 0110101 11100111 01111101 01100001 00100011 11100101 00010111 111001 0101001 11110011 10100111 11000101 01110011 11100101 00011110 000011 1010011 10000011 00111000 10001111 11101100 11011110 01000100 010101 1001110 00001100 00011000 11011110 11010101 01010101 01001101 001101 1000110 11001111 01000011 01001101 00001111 10010010 00010100 111000 1001010 10010100 00010000 00101110 01100100 00010111 11101011 011100 0110111 11011111 01000100 01100011 01000100 11110101 11110100 001110 1101111 01010000 11000100 11110001 01000010 10010010 11110000 000100 1111011 00000100 10011110 10100010 00100001 01010001 01111101 111100 1110100 00010011 10011011 01001001 00110110 00111100 00010101 001100 1111011 10100110 01001010 11001010 00101110 10011011 01100011 100001 0010000 01000011 01000010 11110100 10011001 10100000 00100110 001001 1110000 11000111 11100011 01100110 10000101 01111110 11011010 101100

slide-24
SLIDE 24

Big Data Analytics?

24

Big Data Big Data Analytics Small Data aggregate public private traditional centralised cloud

slide-25
SLIDE 25

Big Data Analytics? Small Data Analytics!

25

Big Data Big Data Analytics Small Data Small Data Analytics aggregate public private traditional centralised cloud exploratory decentralised computation aggregate

slide-26
SLIDE 26

Wide-Area Distributed Analytics

Current: centralise data so it can be processed, usually in big datacenters

26

First attempt: distribute models and then refine locally

Online learning Cooperative learning

u1

MS Batch learning

Training data

ML1i+1

Inference

d1 di+1

Inference

ML1i+1

MP

i+1

u1

ML1i+1

Inference

d1 di+1

Inference

ML1i+1

MP

i+1

u3

ML1i+1

Inference

d1 di+1

Inference

ML1i+1

MP

i+1

u2

Goal? Fully distributed inference and learning at scale

ASP SSP BSP pBSP pSSP ASP

Strong consistency Slow iteration rate Fully centralised Weak consistency Fast iteration rate Fully distributed BSP SSP ASP Consistency Completeness PSP

data data subject processors data data data data

slide-27
SLIDE 27

BBC Box

27

slide-28
SLIDE 28

BBC

28

  • Profiling App processes data from

Spotify, iPlayer, Instagram

  • Exports profile to content

recommendation system

slide-29
SLIDE 29

HDI: So Where’s the Interaction?

  • Request and processing occur as if in a black-box
  • Can’t tell where it’s got to, what’s going on
  • Status within the arrangement
  • Requests, permissions and audit logs
  • Mechanisms of coordination within the field of work
  • Order but do not articulate the field of work
  • Real world data sharing is recipient designed
  • Shaped by people with respect to the relationship they have with the

parties implicated in the act of sharing

29

slide-30
SLIDE 30

Articulation Work

  • Dataware subject is engaged in cooperative work
  • Interdependence between subject, processor, perhaps other subjects
  • E.g., walking down a busy street
  • Activities must thus be meshed together, e.g., Schmidt (1994)
  • maintaining reciprocal awareness of salient activities within a

cooperative ensemble

  • directing attention towards current state of cooperative activities
  • assigning tasks to members of the ensemble
  • handing over aspects of the work for others to pick up

30

slide-31
SLIDE 31

Data as a Boundary Object

  • Contextual nature – plastic adaptation to need
  • E.g., Credit card receipt
  • Consumer’s proof of payment
  • Bank’s proof of a valid transaction
  • Supermarket’s proof that the bank should pay them
  • Inherently relational and thus social
  • Not so much ‘me’ or ‘you’ as ‘us’
  • Very little is so private that it involves no-one else

31

slide-32
SLIDE 32

Interactional Challenges for HDI

User Driven Discovery

  • What is discovered? By whom? Under whose control?
  • Meta-data publication
  • Consumer analytics
  • Empowering subjects: app stores?
  • Discoverability policies
  • Identity mechanisms
  • Permissions, social ratings and exchange
  • App store models supporting discovery of data

processors

32

https://flic.kr/p/4o1wLv

slide-33
SLIDE 33

Interactional Challenges for HDI

Legibility of Data Sources

  • Visualisation of own data, impact of others’ data
  • Help users make sense of data usage
  • Both present and future public data
  • What you have, what others want
  • What processors would take from data sources
  • Editing of data; control of presentation to

processors — Recipient design

  • Support data editing and data presentation

33

https://flic.kr/p/9AwFd3 https://flic.kr/p/c3jJAY

slide-34
SLIDE 34

Interactional Challenges for HDI

From My Data to Our Data

  • Delegating and revoking control
  • Transparency/awareness mechanisms
  • Rights management
  • Editing, viewing, sharing
  • Negotiation
  • Group management, negotiated collection

and control

  • Group management of data sources

34

https://flic.kr/p/drV8zY

slide-35
SLIDE 35

Interactional Challenges for HDI

Salient Dimensions of Collaboration

  • To whom is data passed, for what

purpose — Transitivity

  • Real time articulation of data sharing

processes, e.g., current status reports

  • Tracking and treatment
  • Data tracking, e.g., subsequent

processing or transfer

35

https://flic.kr/p/e57ySb

slide-36
SLIDE 36

Platform Challenges

  • Sharing data
  • Need to support offline data collection from e.g., mobile phones
  • Need a rendezvous and identity service for direct interconnection
  • Shared data
  • No current platform is a good fit to social dynamics of a household!
  • Who and how to manage users, groups?
  • Who gets to be root?

36

slide-37
SLIDE 37

Questions?

37

https://bit.ly/encyclopedia-hdi http://hdiresearch.org/
 https://databoxproject.uk/
 https://ocaml.xyz/ https://mort.io/ richard.mortier@cl.cam.ac.uk