Responsible data science Informa3on ethics & privacy Dino - - PowerPoint PPT Presentation

responsible data science
SMART_READER_LITE
LIVE PREVIEW

Responsible data science Informa3on ethics & privacy Dino - - PowerPoint PPT Presentation

Responsible data science Informa3on ethics & privacy Dino Pedreschi EUI-SoBigData.eu workshop 11 October 2017 URBAN MOBILITY ATLAS Urban Mobility Atlas http://kdd.isti.cnr.it/uma2/ REAL TIME DEMOGRAPHY A Sociometer based on mobile phone


slide-1
SLIDE 1

Responsible data science Informa3on ethics & privacy

Dino Pedreschi

EUI-SoBigData.eu workshop 11 October 2017

slide-2
SLIDE 2
slide-3
SLIDE 3

URBAN MOBILITY ATLAS

slide-4
SLIDE 4
slide-5
SLIDE 5

Urban Mobility Atlas http://kdd.isti.cnr.it/uma2/

slide-6
SLIDE 6

REAL TIME DEMOGRAPHY

slide-7
SLIDE 7

GSM Calls

Profile Map

Temporal Profile

A Sociometer based on mobile phone data for real 3me demographics

slide-8
SLIDE 8

8

slide-9
SLIDE 9

San Pietro Square

slide-10
SLIDE 10

DIVERSITY & WELLBEING

slide-11
SLIDE 11

Big Data: Diversity and economic development

A B C H W
slide-12
SLIDE 12

THE POLYCENTRIC CITY

slide-13
SLIDE 13

FLUXES ORIGINATING IN TUSCAN CITIES

EMERGENT CITY STRUCTURE

slide-14
SLIDE 14

POLYCENTRIC CITY

slide-15
SLIDE 15
slide-16
SLIDE 16

Ethics and Security

slide-17
SLIDE 17

The GDPR

Ø Will enter into force on 25 May 2018 Ø Introduces important novelKes Ø New ObligaKons Ø New Rights

slide-18
SLIDE 18

Privacy by Design

slide-19
SLIDE 19

Privacy by design big data analy3cs

Ø Design analyKcal process that implement the privacy-by- design & by-default principle Ø Consider privacy at every stage of their business Ø Integrate privacy requirements “by design” into their business model.

19

slide-20
SLIDE 20

Privacy by Design Methodology in Big Data Analy3cs

The framework is designed with assumptions about § The sensitive data that are the subject of the analysis § The attack model, i.e., the knowledge and purpose of a malicious party that wants to discover the sensitive data § The target analytical questions that are to be answered with the data Design a privacy-preserving framework able to § transform the data into an anonymous version with a quantifiable privacy guarantee § guarantee that the analytical questions can be answered correctly, within a quantifiable approximation that specifies the data utility

slide-21
SLIDE 21

Privacy Risk Assessment

Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa) www-kdd.isK.cnr.it

slide-22
SLIDE 22

Privacy Risk Assessment Framework

slide-23
SLIDE 23

Simula3on of privacy harmful Inferences

# w e e k s

  • f

B K

Data dimension: The spa(al area in which the analysis is performed. Background Knowledge dimension: The temporal window (in weeks) in which the a9acker recorded the user ac(vity. I-RACu: An indicator of the risk of re- iden(fica(on of the users

# w e e k s

  • f

B K

slide-24
SLIDE 24

Privacy-by-design for big data analy3cs

  • All case studies discussed have been designed

within a privacy-preserving framework

  • taking into account data minimiza3on in the

deployment of the service

  • transforming raw data into aggregated data

with a quan3fied (low) risk of privacy breach

slide-25
SLIDE 25

But we need to go further!

  • A city cannot be managed centrally, from a

control room.

  • Our ciKes are complex networks of

interacKons

– the outcome for everybody depends not only on individual choices but it is condiKoned by everybody else's choices.

slide-26
SLIDE 26
slide-27
SLIDE 27
  • A granular capability of ciKzens to self-organize,

collaborate and coordinate their acKons from the bobom-up is more efficient and resilient

  • But requires to align individual interests and

goals with those of the collecKvity in the system.

– We humans have a limited percepKon of ourselves as a social, collecKve living being

slide-28
SLIDE 28

TOWARDS A PERSONAL DATA ECOSYSTEM

slide-29
SLIDE 29
slide-30
SLIDE 30
  • An avalanche of personal informaKon that, in

most cases, gets lost – like tears in rain.

  • Yet, only each one of us, individually, has the

power to connect all this personal informaKon into a personal data repository – and make sense of it.

slide-31
SLIDE 31

A user-centric ecosystem for personal big data

31

slide-32
SLIDE 32

Personal Data Ecosystem

slide-33
SLIDE 33

Where am I? Comparison with the community

slide-34
SLIDE 34
  • We need a Personal Data Ecosystem

– to acquire, integrate and make sense of our own data – and to connect with our peers and the surrounding urban community and infrastructure

  • to the purpose of developing the collec3ve

awareness needed to face our grand challenges

slide-35
SLIDE 35

A smart city is a city of par3cipa3ng, aware ci3zens

slide-36
SLIDE 36

Big Data, Big Risks

  • Big data is algorithmic, therefore it cannot be biased!

And yet…

  • All traditional evils of social discrimination, and many

new ones, exhibit themselves in the big data ecosystem

  • Because of its tremendous power, massive data

analysis must be used responsibly

  • Technology alone won’t do: also need policy, user

involvement and education efforts

36

slide-37
SLIDE 37
  • By 2018, 50% of business ethics

violaKons will occur through improper use of big data analyKcs

  • [source: Gartner, 2016]

AI and Big Data 37

slide-38
SLIDE 38

The danger of black boxes

  • The COMPAS score (CorrecKonal Offender Management Profiling

for AlternaKve SancKons)

  • A 137-quesKons quesKonnaire and a predicKve model for “risk of

crime recidivism.” The model is a proprietary secret of Northpointe, Inc.

  • The data journalists at propublica.org have shown that the model

has a strong ethnic bias

– blacks who did not reoffend are classified as high risk twice as much as whites who did not reoffend – whites who did reoffend were classified as low risk twice as much as blacks who did reoffend.

AI and Big Data 38

slide-39
SLIDE 39

The danger of black boxes

  • An accurate but untrustworthy classifier may result

from an accidental bias in the training data.

  • In a task of discriminaKng wolves from huskies in a

dataset of images, the resulKng deep learning model is shown to classify a wolf in a picture based solely

  • n …

AI and Big Data 39

slide-40
SLIDE 40

The danger of black boxes

  • An accurate but untrustworthy classifier may result

from an accidental bias in the training data.

  • In a task of discriminaKng wolves from huskies in a

dataset of images, the resulKng deep learning model is shown to classify a wolf in a picture based solely

  • n … the presence of snow in the background!

AI and Big Data 40

slide-41
SLIDE 41

Deep learning is crea3ng computer systems we don't fully understand

AI and Big Data 41

slide-42
SLIDE 42

Transparent algorithms to build trust

  • Systems that recommend

humans making a decision should explain why

slide-43
SLIDE 43

Delp 17 – 19 February 2016

slide-44
SLIDE 44

Soccer Player Ratings

slide-45
SLIDE 45

Soccer Player Ratings

How humans evaluate sports performance?

slide-46
SLIDE 46
slide-47
SLIDE 47

Goalkeepers Forwards (FW)

goals suff.

goal diff

goals suff.

goal diff

Defenders Midfielders

slide-48
SLIDE 48

Human evalua3on line Technical features

Machine performance

slide-49
SLIDE 49

Human evalua3on line Technical features Technical + Contextual features

Machine performance

slide-50
SLIDE 50

Social Mining & Big Data Analy3cs

H2020 - www.sobigdata.eu September 2015- August 2019