GETTING DATA PROTECTION RIGHT
- Prof. dr. Mireille Hildebrandt
Interfacing Law & Technology Vrije Universiteit Brussel Smart Environments, Data Protection & the Rule of Law Radboud University
DATA PROTECTION RIGHT Prof. dr. Mireille Hildebrandt Interfacing - - PowerPoint PPT Presentation
GETTING DATA PROTECTION RIGHT Prof. dr. Mireille Hildebrandt Interfacing Law & Technology Vrije Universiteit Brussel Smart Environments, Data Protection & the Rule of Law Radboud University 21/2/17 Hildebrandt SNS seminar Stockholm
Interfacing Law & Technology Vrije Universiteit Brussel Smart Environments, Data Protection & the Rule of Law Radboud University
21/2/17 Hildebrandt SNS seminar Stockholm 2
21/2/17 Hildebrandt SNS seminar Stockholm 3
21/2/17 Hildebrandt SNS seminar Stockholm 4
■ internet: packet switching & routing, network structure, ■ world wide web: hyperlinking ■ search engines, blogs, social media, web portals ■ web platforms [network effects & filter bubbles; reputation & fake news] ■ mobile applications [moving towards IoT, wearables] ■ IoT: cyberphysical infrastructures [connected cars, smart energy grids] ■ cloud computing, fog computing & edge computing
21/2/17 Hildebrandt SNS seminar Stockholm 5
21/2/17 Hildebrandt SNS seminar Stockholm 6
21/2/17 Hildebrandt SNS seminar Stockholm 7
21/2/17 Hildebrandt SNS seminar Stockholm 8
■ creating added value from big data or small data ■ predicting behaviours ■ pre-empting behaviours ■ interplay of backend & frontend of computing systems ■ interfaces enable but they also hide, nudge and force [AB testing, ‘by design’ paradigms]
21/2/17 Hildebrandt SNS seminar Stockholm 9
Big Data Space: ce: ■ accumulation of behavioural and other data ■ mobile and polymorphous data & hypothesis spaces ■ distributed storage [once data has been shared, control becomes a challenge] ■ distributed access [access to data or to the inferences, to training set & algos]
21/2/17 Hildebrandt SNS seminar Stockholm 10
21/2/17 Hildebrandt SNS seminar Stockholm 11
Big Data Space: ce: the e envelop elop of big data space drives human agency, providing convenience & resilience Weiser’s calm computi uting, IBM’s auton
ic computi uting:
21/2/17 Hildebrandt SNS seminar Stockholm 12
21/2/17 Hildebrandt SNS seminar Stockholm 13
21/2/17 Hildebrandt SNS seminar Stockholm 14
21/2/17 Hildebrandt SNS seminar Stockholm 15
■ BIG – volume (but, n=all is nonsense) – variety (unstructured in sense of different formats) – velocity (real time, streaming) ■ OP OPEN EN as opposed to proprietary? reuse? repurposing? public-private? – creating added value is hard work, not evident, no guarantees for return on investment ■ PER ERSONA ONAL data: IoT will contribute to a further explosion of personal data – high risk high gain (think DPIA)? anonymisation will mostly be pseudonymisation!
21/2/17 Hildebrandt SNS seminar Stockholm 16
“we say that a machine learns:
if if
(Tom Mitchell)
http://www.cs.cmu.edu/~tom/mlbook.html
21/2/17 Hildebrandt SNS seminar Stockholm 17
■ super pervi vised sed (lear arning ning from
ample les – requi uire res s labelli elling, ng, doma main in exper ertise tise) ■ reinf inforce rcement ment (lea earnin rning g by correcti rection
uires prior r doma
in exper erti tise) se) ■ uns nsuper upervised vised (bott ttum up up, induc ucti tive e – danger nger of overfitt tting) ing)
21/2/17 Hildebrandt SNS seminar Stockholm 18
21/2/17 Hildebrandt SNS seminar Stockholm 19
■
■
■
■
http://www.nature.com/news/can-we-open-the-black-box-of-ai-1.20731
21/2/17 Hildebrandt SNS seminar Stockholm 20
Wher here d = trainin ning g set; et; f = ‘target’ input-ou
tput ut relat ationshi ionships; s; h = hypo poth thesi esis (the he algori rith thm's m's gue uess ss for f made de in response ponse to d); ; and C = off-trai training ng-set ‘loss’ associated with f and h (‘generalization error’)
How well you do is determined by how ‘aligned’ your learning algorithm P(h|d) is with the actual posterior, P(f|d).
Check http://www.no-free-lunch.org
21/2/17 Hildebrandt SNS seminar Stockholm 21
Summary: – The bias that is necessary to mine the data will co-determine the results – This relates to the fact that the data used to train an algorithm is finite – ‘Reality’, whatever that is, escapes the inherent reduction – Data is not the same as what it refers to or what it is a trace of
21/2/17 Hildebrandt SNS seminar Stockholm 22
21/2/17 Hildebrandt SNS seminar Stockholm 23
21/2/17 Hildebrandt SNS seminar Stockholm 24
■ NFL FL theo eorem rem –
■ trainin ning g set, et, domai main n kno nowled wledge, ge, hypo poth theses ses space, ce, test st set et – accuracy, precision, speed, iteration ■ low w hanging ging frui uit t – may be cheap and/or available but not very helpfull ■ data nor algori rith thms s are object jectiv ive e – bias in the data, bias of the algos, guess what: bias in the output ■ the e more re data, a, the e larger er the e hypo poth theses es sp space, e, the e more
erns – spurious correlations, computational artefacts
21/2/17 Hildebrandt SNS seminar Stockholm 25
■ data obes esitas itas: : lots of data, but often incorrect, incomplete, irrelevant (low hanging fruit) – any personal data stored presents security and other risks sks (need for DPIA, DPbD) – pu purpose rpose limitati tion
ect before re you
lect (and while, and after) ■ pattern ern obesi esitas tas: : trained algorithms can see patterns anywhere, added value? – training set and algorithms ne necessari essarily ly contain bias, this may be problematic (need for DPIA, DPbD) – pu purpose rpose limitati tion
t rele levance nce
21/2/17 Hildebrandt SNS seminar Stockholm 26
■ agile e softw tware are developme elopment: nt:
– iteration instead of waterfall – collaboration domain experts, data scientists, whoever invests – initial purpose (prediction of behaviour, example: tax office, car insurance) – granular purposing (testing specific patterns, AB testing to nudge specific behaviour)
■ lean n com
uting:
– less data = more effective & more efficient
■ meth ethodo dologi logica cal l integri egrity ty:
– make your software testable and contestable: mathematical & empirical software verification – secure logging, open source
21/2/17 Hildebrandt SNS seminar Stockholm 27
21/2/17 Hildebrandt SNS seminar Stockholm 28
21/2/17 Hildebrandt SNS seminar Stockholm 29
21/2/17 Hildebrandt SNS seminar Stockholm 30
1. 1. intent ntional nal conceal alment ment
– trade de secre rets ts, , IP right hts, s, pub ublic c security urity
2. 2. we we have learned d to read and write, , not
hine learning ing
– monopoly of the new ‘clerks’, the end of democracy
3. 3. mismatc match h betwee etween mathe hematic matical al optimi miza zation tion and human an semant ntics ics
– when it comes to law and justice we cannot settle for ‘computer says no’
– inspired by: Jenna Burrell, How the machine ‘thinks’: Understanding opacity in machine learning algorithms’, in Big Data ta & Society ty, January-June 2016, 1-12 21/2/17 Hildebrandt SNS seminar Stockholm 31
21/2/17 Hildebrandt SNS seminar Stockholm 32
21/2/17 Hildebrandt SNS seminar Stockholm 33
21/2/17 Hildebrandt SNS seminar Stockholm 34
■ “To avoid id bias and imp mprove e transp sparen rency cy, , algori rith thm m desi signer gners s mus ust t make e data so sour urce ces and profil iles es pub ublic.” ■ “People shou
ld have e the right t to see e their eir own n data, , how w profiles les are deriv rived ed and d have e the e right ht to ch challenge llenge them em.” ■ “Some propose
d reme medies dies are techn hnica cal, l, suc uch as developi eloping ng new w com
utational ional techni chniques ues that bet etter er addre ress s and correct rrect disc scrim riminat nation ion both
ning data set sets and in the e algori rith thms ms — a sort t of affirmat rmative e algor
ithm hmic c action ion.”
21/2/17 Hildebrandt SNS seminar Stockholm 35
21/2/17 Hildebrandt SNS seminar Stockholm 36
21/2/17 Hildebrandt SNS seminar Stockholm 37
■ think ‘training sets’: select before you collect ■ think of how to avoid ‘low hanging fruit’ ■ think nk of how w to ensure re accura uracy cy, , rele levance, ance, perti tine nenc nce ■ data minimisat imisation, ion, if done ne well, ll, shou
ld avoid id both
d pattern ern obesita esitas – det etect ct pr productiv
e bias, s, while ile also det etecti ecting ng unf nfair ir or r pr prohibit
ed bias – make e data sets ets ava vaila lable ble for r ins nspe pecti ction
nd cont ntes estation tation
21/2/17 Hildebrandt SNS seminar Stockholm 38
■ think ‘training sets’: select before you collect (and nd while le you u collec lect t and nd after er) ■ think of how to avoid ‘low hanging fruit’ (GIGA IGA) ■ think nk of how w to ensure re accura uracy cy, , rele levance, ance, perti tine nenc nce e (dependin epending g on n pu purpose
– pu purpose rpose spe peci cific icat ation, ion, if done ne well, l, should
d both
a and nd pa patter ern obes besit itas – pu purpose rpose should
rect ct the e developme elopment nt and nd empl ploymen yment of data-driv riven en app pplicati tions
– experimen erimenta tati tion
n be a p purpose,
t not not in it n itself elf ■ the e choice
rithm hms shou
ld be informe
d by the e purp urpose
21/2/17 Hildebrandt SNS seminar Stockholm 39
■ ML, , IoT is meant ant to pre-empt t our ur intent ent ■ to run un smoo
thly y un under er the e radar r of everyd yday y life ■ it is all abou
t continuous ntinuous sur urrep repti titi tiou
s aut utoma
ed deci cisi sions
21/2/17 Hildebrandt SNS seminar Stockholm 40
1. 1. the e right ht not
ubject ject to aut utoma
d decis cisions ions that t have e a signi nific icant ant imp mpact ct 2. 2. the e right ht to a noti
cation, an explana anati tion
cipation pation if excep epti tion
es
21/2/17 Hildebrandt SNS seminar Stockholm 41
= = choice
1. 1. the e right ht not
ubject ject to aut utoma
d decis cisions ions that t have e a signi nific icant ant imp mpact, ct, un unless ess a. a. ne necessar essary y for r contract ntract b. b. autho horised rised by EU U or MS la S law c. c. expli plicit cit consen nsent un under der a and c: right t to hum uman an inter ervention ention, , possi sibil bility ty to cont ntest est prohi hibition bition to make e su such deci cisions ions based sed on se sensi nsitiv tive data
21/2/17 Hildebrandt SNS seminar Stockholm 42
2. 2. the e right ht to a noti
cation, an explana anati tion
cipation pation if excep epti tion
es – exis isten ence ce of deci cisions sions based ed on n pr prof
ing – me meani ningfu ful inf nfor
mati tion
e logic c involv lved ed (= explanation anation?) – signi nificance cance and nd envisage isaged conse nsequence uences s of such h pr processing
21/2/17 Hildebrandt SNS seminar Stockholm 43
21/2/17 Hildebrandt SNS seminar Stockholm 44
21/2/17 Hildebrandt SNS seminar Stockholm 45
21/2/17 Hildebrandt SNS seminar Stockholm 46
21/2/17 Hildebrandt SNS seminar Stockholm 47
21/2/17 Hildebrandt SNS seminar Stockholm 48
21/2/17 Hildebrandt SNS seminar Stockholm 49
21/2/17 Hildebrandt SNS seminar Stockholm 50