Responsible data science Informa3on ethics & privacy
Dino Pedreschi
EUI-SoBigData.eu workshop 11 October 2017
Responsible data science Informa3on ethics & privacy Dino - - PowerPoint PPT Presentation
Responsible data science Informa3on ethics & privacy Dino Pedreschi EUI-SoBigData.eu workshop 11 October 2017 URBAN MOBILITY ATLAS Urban Mobility Atlas http://kdd.isti.cnr.it/uma2/ REAL TIME DEMOGRAPHY A Sociometer based on mobile phone
Responsible data science Informa3on ethics & privacy
Dino Pedreschi
EUI-SoBigData.eu workshop 11 October 2017
Urban Mobility Atlas http://kdd.isti.cnr.it/uma2/
GSM Calls
Profile Map
Temporal Profile
A Sociometer based on mobile phone data for real 3me demographics
8
FLUXES ORIGINATING IN TUSCAN CITIES
EMERGENT CITY STRUCTURE
Ø Will enter into force on 25 May 2018 Ø Introduces important novelKes Ø New ObligaKons Ø New Rights
Privacy by design big data analy3cs
Ø Design analyKcal process that implement the privacy-by- design & by-default principle Ø Consider privacy at every stage of their business Ø Integrate privacy requirements “by design” into their business model.
19
Privacy by Design Methodology in Big Data Analy3cs
The framework is designed with assumptions about § The sensitive data that are the subject of the analysis § The attack model, i.e., the knowledge and purpose of a malicious party that wants to discover the sensitive data § The target analytical questions that are to be answered with the data Design a privacy-preserving framework able to § transform the data into an anonymous version with a quantifiable privacy guarantee § guarantee that the analytical questions can be answered correctly, within a quantifiable approximation that specifies the data utility
Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa) www-kdd.isK.cnr.it
Privacy Risk Assessment Framework
# w e e k s
B K
Data dimension: The spa(al area in which the analysis is performed. Background Knowledge dimension: The temporal window (in weeks) in which the a9acker recorded the user ac(vity. I-RACu: An indicator of the risk of re- iden(fica(on of the users
# w e e k s
B K
within a privacy-preserving framework
deployment of the service
with a quan3fied (low) risk of privacy breach
control room.
interacKons
– the outcome for everybody depends not only on individual choices but it is condiKoned by everybody else's choices.
collaborate and coordinate their acKons from the bobom-up is more efficient and resilient
goals with those of the collecKvity in the system.
– We humans have a limited percepKon of ourselves as a social, collecKve living being
most cases, gets lost – like tears in rain.
power to connect all this personal informaKon into a personal data repository – and make sense of it.
31
Where am I? Comparison with the community
– to acquire, integrate and make sense of our own data – and to connect with our peers and the surrounding urban community and infrastructure
awareness needed to face our grand challenges
And yet…
new ones, exhibit themselves in the big data ecosystem
analysis must be used responsibly
involvement and education efforts
36
AI and Big Data 37
for AlternaKve SancKons)
crime recidivism.” The model is a proprietary secret of Northpointe, Inc.
has a strong ethnic bias
– blacks who did not reoffend are classified as high risk twice as much as whites who did not reoffend – whites who did reoffend were classified as low risk twice as much as blacks who did reoffend.
AI and Big Data 38
from an accidental bias in the training data.
dataset of images, the resulKng deep learning model is shown to classify a wolf in a picture based solely
AI and Big Data 39
from an accidental bias in the training data.
dataset of images, the resulKng deep learning model is shown to classify a wolf in a picture based solely
AI and Big Data 40
AI and Big Data 41
humans making a decision should explain why
Delp 17 – 19 February 2016
Goalkeepers Forwards (FW)
goals suff.
goal diff
goals suff.
goal diff
Defenders Midfielders
Human evalua3on line Technical features
Machine performance
Human evalua3on line Technical features Technical + Contextual features
Machine performance
Social Mining & Big Data Analy3cs
H2020 - www.sobigdata.eu September 2015- August 2019