Privacy through Accountability: A Computer Science Perspective - - PowerPoint PPT Presentation

privacy through accountability a computer science
SMART_READER_LITE
LIVE PREVIEW

Privacy through Accountability: A Computer Science Perspective - - PowerPoint PPT Presentation

Privacy through Accountability: A Computer Science Perspective Anupam Datta Associate Professor Computer Science, ECE, CyLab Carnegie Mellon University February 2014 Personal Information is Everywhere 2 Research Challenge Programs and


slide-1
SLIDE 1

Privacy through Accountability: A Computer Science Perspective

Anupam Datta Associate Professor Computer Science, ECE, CyLab Carnegie Mellon University February 2014

slide-2
SLIDE 2

2

Personal Information is Everywhere

slide-3
SLIDE 3

Research Challenge

Ensure organizations respect privacy expectations in the collection, use, and disclosure of personal information

3

Programs and People

slide-4
SLIDE 4

Web Privacy

Example privacy policies:

 Not use detailed location (full IP address) for

advertising

 Not use race for advertising

4

slide-5
SLIDE 5

Healthcare Privacy

5

Hospital Drug Company Patient information Patient Auditor Patient informatio n Patient informatio n Physician Nurse

Example privacy policies:

 Use patient health info only for treatment, payment  Share patient health info with police if suspect crime

slide-6
SLIDE 6

6

A Research Area

 Formalize Privacy Policies

 Precise semantics of privacy concepts

(restrictions on personal information flow)

 Enforce Privacy Policies

 Audit and Accountability

 Detect violations  Blame-assignment  Adaptive audit resource allocation Related ideas: Barth et al Oakland 2006; May et al CSFW 2006; Weitzner et al CACM 2008, Lampson 2004

slide-7
SLIDE 7

Today: Focus on Detection

 Healthcare Privacy

 Play in two acts

 Web Privacy

 Play in two (brief) acts

7

slide-8
SLIDE 8

8

A covered entity may disclose an individual’s protected health information (phi) to law-enforcement officials for the purpose of identifying an individual if the individual made a statement admitting participating in a violent crime that the covered entity believes may have caused serious physical harm to the victim

Example from HIPAA Privacy Rule

 Concepts in privacy policies

 Actions: send(p1, p2, m)  Roles: inrole(p2, law-enforcement)  Data attributes: attr_in(prescription, phi)  Temporal constraints: in-the-past(state(q, m))  Purposes: purp_in(u, id-criminal))  Beliefs: believes-crime-caused-serious-harm(p, q, m)

Black-and- white concepts Grey concepts

slide-9
SLIDE 9

9

Detecting Privacy Violations

Privacy Policy Computer-readable privacy policy Organizational audit log

Detect policy violation s

Audit

Complete formalization

  • f HIPAA Privacy Rule,

GLBA Automated audit for black-and- white policy concepts Oracles to audit for grey policy concepts

The Oracle The Matrix character Species Computer Program Title A program designed to investigate the human psyche.

slide-10
SLIDE 10

10

Policy Auditing over Incomplete Logs

With D. Garg (CMU  MPI-SWS) and

  • L. Jia (CMU)

2011 ACM Conference on Computer and Communications Security

slide-11
SLIDE 11

11

Key Challenge for Auditing Audit Logs are Incomplete

Future: store only past and current events

Example: Timely data breach notification refers to future event

Subjective: no “grey” information

Example: May not record evidence for purposes and beliefs

Spatial: remote logs may be inaccessible

Example: Logs distributed across different departments of a hospital

slide-12
SLIDE 12

12

Abstract Model of Incomplete Logs Model all incomplete logs uniformly as 3-valued structures Define semantics (meanings of formulas) over 3-valued structures

slide-13
SLIDE 13

13

reduce: The Iterative Algorithm

reduce (L, φ) = φ'

φ0 φ1 φ2

r e d u c e r e d u c e Logs Policy Time

slide-14
SLIDE 14

14

Syntax of Policy Logic

 First-order logic with restricted quantification over

infinite domains (challenge for reduce)

 Can express timed temporal properties, “grey”

predicates

slide-15
SLIDE 15

15

Example from HIPAA Privacy Rule

∀p1, p2, m, u, q, t. (send(p1, p2, m) ∧ inrole(p2, law-enforcement) ∧ tagged(m, q, t, u) ∧ attr_in(t, phi)) ⊃ (purp_in(u, id-criminal)) ∧∃ m’. state(q,m’) ∧is-admission-of-crime(m’) ∧believes-crime-caused-serious-harm(p1, q, m’) A covered entity may disclose an individual’s protected health information (phi) to law-enforcement officials for the purpose of identifying an individual if the individual made a statement admitting participating in a violent crime that the covered entity believes may have caused serious physical harm to the victim

15

slide-16
SLIDE 16

16

reduce: Formal Definition

c is a formula for which finite satisfying substitutions of x can be computed

General Theorem: If initial policy passes a syntactic mode check, then finite substitutions can be computed Applications: The entire HIPAA and GLBA Privacy Rules pass this check

slide-17
SLIDE 17

φ =

∀p1, p2, m, u, q, t. (send(p1, p2, m) ∧ tagged(m, q, t, u) ∧ attr_in(t, phi)) ⊃ inrole(p2, law-enforcement) ∧ purp_in(u, id-criminal) ∧ ∃ m’. ( state(q, m’) ∧ is-admission-of-crime(m’) ∧ believes-crime-caused-serious-harm(p1, m’))

Example

17

{ p1→ UPMC, p2→ allegeny-police, m → M2, q → Bob, u → id-bank-robber, t → date-of-treatment }

∧ purp_in(id-bank-robber, id-criminal)

{ m’ → M1 }

∧ is-admission-of-crime(M1) ∧ believes-crime-caused-serious-harm(UPMC, M1)

Log

Jan 1, 2011 state(Bob, M1) Jan 5, 2011 send(UPMC, allegeny-police, M2) tagged(M2, Bob, date-of-treatment, id-bank-robber)

T φ' =

slide-18
SLIDE 18

18

 Implementation and evaluation over simulated audit

logs for compliance with all 84 disclosure-related clauses of HIPAA Privacy Rule

 Performance:

 Average time for checking compliance of each disclosure

  • f protected health information is 0.12s for a 15MB log

 Mechanical enforcement:

 reduce can automatically check 80% of all the atomic

predicates

Implementation and Case Study

slide-19
SLIDE 19

Ongoing Transition Efforts

 Integration of reduce algorithm into Illinois Health

Information Exchange prototype

 Joint work with UIUC and Illinois HLN

 Auditing logs for policy compliance

 Ongoing conversations with Symantec Research

19

slide-20
SLIDE 20

Related Work

 Distinguishing characteristics

1.

General treatment of incompleteness in audit logs

2.

Quantification over infinite domains (e.g., messages)

3.

First complete formalization of HIPAA Privacy Rule and GLBA.

 Nearest neighbors

 Basin et al 2010 (missing 1, weaker 2, cannot handle 3)  Lam et al 2010 (missing 1, weaker 2, cannot handle entire

3)

 Weitzner et al (missing 1, cannot handle 3)  Barth et al 2006 (missing 1, weaker 2, did not do 3)

20

slide-21
SLIDE 21

21

Formalizing and Enforcing Purpose Restrictions

With M. C. Tschantz (CMU  Berkeley) and

  • J. M. Wing (CMU  MSR)

2012 IEEE Symposium on Security & Privacy

slide-22
SLIDE 22

Goal

 Give a semantics to

 “Not for” purpose restrictions  “Only for” purpose restrictions

that is parametric in the purpose

 Provide audit algorithm for detecting violations

for that semantics

22

slide-23
SLIDE 23

X-ray taken

Send record

X-ray added Diagnosis by specialist No diagnosis by drug company

Send record Add x-ray

23

Medical Record Med records used only for diagnosis

slide-24
SLIDE 24

X-ray taken

Send record

X-ray added Diagnosis by specialist No diagnosis by drug company

Send record Add x-ray

24

Not achieve purpose Achieve purpose

slide-25
SLIDE 25

25

X-ray taken

Send record

X-ray added Diagnosis by specialist No diagnosis (by drug co. or specialist)

Send record Add x-ray 1/4 3/4

Specialist fails Choice point Best choice

slide-26
SLIDE 26

Planning Thesis: An action is for a purpose iff that action is part of a plan for furthering the purpose

i.e., always makes the best choice for furthering the purpose

26

slide-27
SLIDE 27

Auditing

27

Auditee’s behavior Purpose restriction Decision- making model Obeyed Violated Inconclusiv e

slide-28
SLIDE 28

28

Violated

MDP Solve r

Optimal actions for each state

Actions

  • ptimal?

Policy implications

Record only for treatment No [ , send record]

slide-29
SLIDE 29

Summary: A Sense of Purpose

Thesis: An action is for a purpose iff that action is part of a plan for furthering the purpose

i.e., always makes the best choice for furthering the purpose

 Audit algorithm detects policy violations by

checking if observed behavior could have been produced by optimal plan

29

slide-30
SLIDE 30

Today: Focus on Detection

 Healthcare Privacy

 Play in two acts

 Web Privacy

 Play in two (brief) acts

30

slide-31
SLIDE 31

31

Bootstrapping Privacy Compliance in a Big Data System

With S. Sen (CMU) and

  • S. Guha, S. Rajamani, J. Tsai, J. M. Wing (MSR)

2014 IEEE Symposium on Security & Privacy

slide-32
SLIDE 32

Privacy Compliance for Bing

Setting:

 Auditor has access to source code

32

slide-33
SLIDE 33

Two Central Challenges

Legal Team

Crafts Policy

Privacy Champion

Interprets Policy

Developer

Writes Code

Audit Team

Verifies Compliance

1.

Ambiguous privacy policy

 Meaning unclear

2.

Huge undocumented codebases & datasets

 Connection to policy

unclear

Meeting s Meeting s Meeting s

33

slide-34
SLIDE 34
  • 1. Legalease

 Clean syntax

 Layered allow-deny

information flow rules with exceptions

 Precise Semantics

 No ambiguity

 Focus on Usability

 User study of

Legalease with Microsoft privacy champions promising

34

 Example:

DENY Datatype IPAddress USE FOR PURPOSE Advertising EXCEPT ALLOW Datatype IPAddress: Truncated

slide-35
SLIDE 35
  • 2. Grok

Process 1 Dataset A Dataset B Dataset C Dataset F Dataset E Process 2 Process 3 Dataset D Process 5 Dataset J Process 6 Process 4 Dataset H Dataset I Dataset G NewAcct Login Check Hijack GeoIP Check Fraud Reportin g Name Age IPAddres s IDX Hash Country

Timestam p Hash

IDX IDX  Data Inventory

 Annotate code +

data with policy data types

 Source labels

propagated via data flow graph

 Different Noisy

Sources

 Variable Name

Analysis

 Developer

Annotations

35

slide-36
SLIDE 36
  • 2. Grok

Dataset F Dataset D Process 5 Dataset J Process 6 Process 4 Dataset H Dataset I Dataset G GeoIP Check Fraud Reportin g IPAddres s IDX Country IDX IDX

 Example Policy

Violation IPAddress is used for reporting (advertising)

36

IPAddress

slide-37
SLIDE 37
  • 2. Grok

Dataset F Dataset D Process 5 Dataset J Process 6 Process 4 Dataset H Dataset I Dataset G GeoIP Check Fraud Reportin g IPAddres s IDX Country

IPAddress

IDX IDX

 Example Fix

IPAddress is truncated before it is passed to reporting (advertising) job

37

Dataset F

IPAddress

Truncate

slide-38
SLIDE 38

Bootstrapping Works

Pick x% most frequently appearing column names, label them Then propagate label using Grok flow Pick the nodes which will label the most of the graph

~200 annotations label 60%

  • f nodes

A small number of annotations is enough to get off the ground.

38

slide-39
SLIDE 39

Scale

 77,000 jobs run each

day

 By 7000 entities  300 functional groups

 1.1 million unique lines

  • f code

 21% changes on avg,

daily

 46 million table schemas  32 million files

 Manual audit infeasible  Information flow

analysis takes ~30 mins daily

39

slide-40
SLIDE 40

A Streamlined Audit Workflow

Legal Team

Crafts Policy

Privacy Champ

Interprets Policy

Developer

Writes Code

Audit Team

Verifies Compliance

Legalease

A Formal Policy Specification Language

Grok

Data Inventory with Policy Datatypes Encode Refine Code analysis, developer annotations

Checker

40

Annotated Code Legalease Policy Potential violations Fix code Update Grok

slide-41
SLIDE 41

Information Flow Experiments

With Michael Carl Tschantz (CMU  UC Berkeley) Amit Datta (CMU) Jeannette M. Wing (CMU  Microsoft Research)

slide-42
SLIDE 42

42

slide-43
SLIDE 43

43

User Ads Search terms Other users Advertisers Websites Google Confounding inputs

Web Tracking

?

slide-44
SLIDE 44

Control Group

Experimental Design

Scientist

44

Experimental Group Drug Placebo

slide-45
SLIDE 45

Group 2

Information Flow Experiment

45

Group 1 Arrested? Black Looking for? White

slide-46
SLIDE 46

Google

46

Black Arrested? Looking for? White Black Arrested? Black Arrested? Looking for? White Looking for? White

slide-47
SLIDE 47

Information Flow Experiments as Science

Experimental Science Information Flow Natural process System in question Population of units Subset of interactions … … Causation Information flow

47

= Theorem

slide-48
SLIDE 48

Browser Instances are Not Independent

48

17 13 13 13 12 11 10 10 8 7

slide-49
SLIDE 49

Our Idea

 Use a non-parametric test

 Does not require model of Google

 Specifically, a permutation test

 Does not require independence among browser instances

49

slide-50
SLIDE 50

Visiting Car Websites Impacts Ads

50

2 5 6 19 22 30 30 31

slide-51
SLIDE 51

Conclusion

 A rigorous methodology for information flow

experiments

 Connection to causality in natural sciences  Experimental design for causal determination  Significance testing with non-parametric statistics

 Future work

 Replicate and analyze previous experiments

systematically

 Guha et al, Wills and Tatar, Sweeney

 Conduct new large-scale experiments systematically  Tool support for automating information flow experiments

51

slide-52
SLIDE 52

52

A Research Area

 Formalize Privacy Policies

 Precise semantics of privacy concepts

(restrictions on personal information flow)

 Enforce Privacy Policies

 Audit and Accountability

 Detect violations  Blame-assignment  Adaptive audit resource allocation

 Application Domains

 Healthcare, Web privacy

slide-53
SLIDE 53

53

slide-54
SLIDE 54

Information Flow Analysis

Analysis White box Black box

Experimenting Monitoring Testing Access to program? Yes No Total Partial None Control over inputs?

54

slide-55
SLIDE 55

Google Exhibits Complex Behavior

5 10 15 20 25 30 35 40 45 50 100 150 200 Ad id Reload number

55 55

slide-56
SLIDE 56

Privacy as Contextual Integrity Context-relative information flow norms

 Example contexts: healthcare, friendship  Example norms: confidentiality, purpose, reciprocity

[Nissenbaum 2004; Barth-D-Mitchell-Nissenbaum 2006]

56

slide-57
SLIDE 57

Norms to Policies

 Example norm: confidentiality expectations in

healthcare

 Associated policy: clauses in the HIPAA Privacy Rule  Does policy reflect norm?  Is policy respected? (Our focus)

57

Privacy Norms Privacy Policies