Privacy through Accountability: A Computer Science Perspective - - PowerPoint PPT Presentation
Privacy through Accountability: A Computer Science Perspective - - PowerPoint PPT Presentation
Privacy through Accountability: A Computer Science Perspective Anupam Datta Associate Professor Computer Science, ECE, CyLab Carnegie Mellon University February 2014 Personal Information is Everywhere 2 Research Challenge Programs and
2
Personal Information is Everywhere
Research Challenge
Ensure organizations respect privacy expectations in the collection, use, and disclosure of personal information
3
Programs and People
Web Privacy
Example privacy policies:
Not use detailed location (full IP address) for
advertising
Not use race for advertising
4
Healthcare Privacy
5
Hospital Drug Company Patient information Patient Auditor Patient informatio n Patient informatio n Physician Nurse
Example privacy policies:
Use patient health info only for treatment, payment Share patient health info with police if suspect crime
6
A Research Area
Formalize Privacy Policies
Precise semantics of privacy concepts
(restrictions on personal information flow)
Enforce Privacy Policies
Audit and Accountability
Detect violations Blame-assignment Adaptive audit resource allocation Related ideas: Barth et al Oakland 2006; May et al CSFW 2006; Weitzner et al CACM 2008, Lampson 2004
Today: Focus on Detection
Healthcare Privacy
Play in two acts
Web Privacy
Play in two (brief) acts
7
8
A covered entity may disclose an individual’s protected health information (phi) to law-enforcement officials for the purpose of identifying an individual if the individual made a statement admitting participating in a violent crime that the covered entity believes may have caused serious physical harm to the victim
Example from HIPAA Privacy Rule
Concepts in privacy policies
Actions: send(p1, p2, m) Roles: inrole(p2, law-enforcement) Data attributes: attr_in(prescription, phi) Temporal constraints: in-the-past(state(q, m)) Purposes: purp_in(u, id-criminal)) Beliefs: believes-crime-caused-serious-harm(p, q, m)
Black-and- white concepts Grey concepts
9
Detecting Privacy Violations
Privacy Policy Computer-readable privacy policy Organizational audit log
Detect policy violation s
Audit
Complete formalization
- f HIPAA Privacy Rule,
GLBA Automated audit for black-and- white policy concepts Oracles to audit for grey policy concepts
The Oracle The Matrix character Species Computer Program Title A program designed to investigate the human psyche.
10
Policy Auditing over Incomplete Logs
With D. Garg (CMU MPI-SWS) and
- L. Jia (CMU)
2011 ACM Conference on Computer and Communications Security
11
Key Challenge for Auditing Audit Logs are Incomplete
Future: store only past and current events
Example: Timely data breach notification refers to future event
Subjective: no “grey” information
Example: May not record evidence for purposes and beliefs
Spatial: remote logs may be inaccessible
Example: Logs distributed across different departments of a hospital
12
Abstract Model of Incomplete Logs Model all incomplete logs uniformly as 3-valued structures Define semantics (meanings of formulas) over 3-valued structures
13
reduce: The Iterative Algorithm
reduce (L, φ) = φ'
φ0 φ1 φ2
r e d u c e r e d u c e Logs Policy Time
14
Syntax of Policy Logic
First-order logic with restricted quantification over
infinite domains (challenge for reduce)
Can express timed temporal properties, “grey”
predicates
15
Example from HIPAA Privacy Rule
∀p1, p2, m, u, q, t. (send(p1, p2, m) ∧ inrole(p2, law-enforcement) ∧ tagged(m, q, t, u) ∧ attr_in(t, phi)) ⊃ (purp_in(u, id-criminal)) ∧∃ m’. state(q,m’) ∧is-admission-of-crime(m’) ∧believes-crime-caused-serious-harm(p1, q, m’) A covered entity may disclose an individual’s protected health information (phi) to law-enforcement officials for the purpose of identifying an individual if the individual made a statement admitting participating in a violent crime that the covered entity believes may have caused serious physical harm to the victim
15
16
reduce: Formal Definition
c is a formula for which finite satisfying substitutions of x can be computed
General Theorem: If initial policy passes a syntactic mode check, then finite substitutions can be computed Applications: The entire HIPAA and GLBA Privacy Rules pass this check
φ =
∀p1, p2, m, u, q, t. (send(p1, p2, m) ∧ tagged(m, q, t, u) ∧ attr_in(t, phi)) ⊃ inrole(p2, law-enforcement) ∧ purp_in(u, id-criminal) ∧ ∃ m’. ( state(q, m’) ∧ is-admission-of-crime(m’) ∧ believes-crime-caused-serious-harm(p1, m’))
Example
17
{ p1→ UPMC, p2→ allegeny-police, m → M2, q → Bob, u → id-bank-robber, t → date-of-treatment }
∧ purp_in(id-bank-robber, id-criminal)
{ m’ → M1 }
∧ is-admission-of-crime(M1) ∧ believes-crime-caused-serious-harm(UPMC, M1)
Log
Jan 1, 2011 state(Bob, M1) Jan 5, 2011 send(UPMC, allegeny-police, M2) tagged(M2, Bob, date-of-treatment, id-bank-robber)
T φ' =
18
Implementation and evaluation over simulated audit
logs for compliance with all 84 disclosure-related clauses of HIPAA Privacy Rule
Performance:
Average time for checking compliance of each disclosure
- f protected health information is 0.12s for a 15MB log
Mechanical enforcement:
reduce can automatically check 80% of all the atomic
predicates
Implementation and Case Study
Ongoing Transition Efforts
Integration of reduce algorithm into Illinois Health
Information Exchange prototype
Joint work with UIUC and Illinois HLN
Auditing logs for policy compliance
Ongoing conversations with Symantec Research
19
Related Work
Distinguishing characteristics
1.
General treatment of incompleteness in audit logs
2.
Quantification over infinite domains (e.g., messages)
3.
First complete formalization of HIPAA Privacy Rule and GLBA.
Nearest neighbors
Basin et al 2010 (missing 1, weaker 2, cannot handle 3) Lam et al 2010 (missing 1, weaker 2, cannot handle entire
3)
Weitzner et al (missing 1, cannot handle 3) Barth et al 2006 (missing 1, weaker 2, did not do 3)
20
21
Formalizing and Enforcing Purpose Restrictions
With M. C. Tschantz (CMU Berkeley) and
- J. M. Wing (CMU MSR)
2012 IEEE Symposium on Security & Privacy
Goal
Give a semantics to
“Not for” purpose restrictions “Only for” purpose restrictions
that is parametric in the purpose
Provide audit algorithm for detecting violations
for that semantics
22
X-ray taken
Send record
X-ray added Diagnosis by specialist No diagnosis by drug company
Send record Add x-ray
23
Medical Record Med records used only for diagnosis
X-ray taken
Send record
X-ray added Diagnosis by specialist No diagnosis by drug company
Send record Add x-ray
24
Not achieve purpose Achieve purpose
25
X-ray taken
Send record
X-ray added Diagnosis by specialist No diagnosis (by drug co. or specialist)
Send record Add x-ray 1/4 3/4
Specialist fails Choice point Best choice
Planning Thesis: An action is for a purpose iff that action is part of a plan for furthering the purpose
i.e., always makes the best choice for furthering the purpose
26
Auditing
27
Auditee’s behavior Purpose restriction Decision- making model Obeyed Violated Inconclusiv e
28
Violated
MDP Solve r
Optimal actions for each state
Actions
- ptimal?
Policy implications
Record only for treatment No [ , send record]
Summary: A Sense of Purpose
Thesis: An action is for a purpose iff that action is part of a plan for furthering the purpose
i.e., always makes the best choice for furthering the purpose
Audit algorithm detects policy violations by
checking if observed behavior could have been produced by optimal plan
29
Today: Focus on Detection
Healthcare Privacy
Play in two acts
Web Privacy
Play in two (brief) acts
30
31
Bootstrapping Privacy Compliance in a Big Data System
With S. Sen (CMU) and
- S. Guha, S. Rajamani, J. Tsai, J. M. Wing (MSR)
2014 IEEE Symposium on Security & Privacy
Privacy Compliance for Bing
Setting:
Auditor has access to source code
32
Two Central Challenges
Legal Team
Crafts Policy
Privacy Champion
Interprets Policy
Developer
Writes Code
Audit Team
Verifies Compliance
1.
Ambiguous privacy policy
Meaning unclear
2.
Huge undocumented codebases & datasets
Connection to policy
unclear
Meeting s Meeting s Meeting s
33
- 1. Legalease
Clean syntax
Layered allow-deny
information flow rules with exceptions
Precise Semantics
No ambiguity
Focus on Usability
User study of
Legalease with Microsoft privacy champions promising
34
Example:
DENY Datatype IPAddress USE FOR PURPOSE Advertising EXCEPT ALLOW Datatype IPAddress: Truncated
- 2. Grok
Process 1 Dataset A Dataset B Dataset C Dataset F Dataset E Process 2 Process 3 Dataset D Process 5 Dataset J Process 6 Process 4 Dataset H Dataset I Dataset G NewAcct Login Check Hijack GeoIP Check Fraud Reportin g Name Age IPAddres s IDX Hash Country
Timestam p Hash
IDX IDX Data Inventory
Annotate code +
data with policy data types
Source labels
propagated via data flow graph
Different Noisy
Sources
Variable Name
Analysis
Developer
Annotations
35
- 2. Grok
Dataset F Dataset D Process 5 Dataset J Process 6 Process 4 Dataset H Dataset I Dataset G GeoIP Check Fraud Reportin g IPAddres s IDX Country IDX IDX
Example Policy
Violation IPAddress is used for reporting (advertising)
36
IPAddress
- 2. Grok
Dataset F Dataset D Process 5 Dataset J Process 6 Process 4 Dataset H Dataset I Dataset G GeoIP Check Fraud Reportin g IPAddres s IDX Country
IPAddress
IDX IDX
Example Fix
IPAddress is truncated before it is passed to reporting (advertising) job
37
Dataset F
IPAddress
Truncate
Bootstrapping Works
Pick x% most frequently appearing column names, label them Then propagate label using Grok flow Pick the nodes which will label the most of the graph
~200 annotations label 60%
- f nodes
A small number of annotations is enough to get off the ground.
38
Scale
77,000 jobs run each
day
By 7000 entities 300 functional groups
1.1 million unique lines
- f code
21% changes on avg,
daily
46 million table schemas 32 million files
Manual audit infeasible Information flow
analysis takes ~30 mins daily
39
A Streamlined Audit Workflow
Legal Team
Crafts Policy
Privacy Champ
Interprets Policy
Developer
Writes Code
Audit Team
Verifies Compliance
Legalease
A Formal Policy Specification Language
Grok
Data Inventory with Policy Datatypes Encode Refine Code analysis, developer annotations
Checker
40
Annotated Code Legalease Policy Potential violations Fix code Update Grok
Information Flow Experiments
With Michael Carl Tschantz (CMU UC Berkeley) Amit Datta (CMU) Jeannette M. Wing (CMU Microsoft Research)
42
43
User Ads Search terms Other users Advertisers Websites Google Confounding inputs
Web Tracking
?
Control Group
Experimental Design
Scientist
44
Experimental Group Drug Placebo
Group 2
Information Flow Experiment
45
Group 1 Arrested? Black Looking for? White
46
Black Arrested? Looking for? White Black Arrested? Black Arrested? Looking for? White Looking for? White
Information Flow Experiments as Science
Experimental Science Information Flow Natural process System in question Population of units Subset of interactions … … Causation Information flow
47
= Theorem
Browser Instances are Not Independent
48
17 13 13 13 12 11 10 10 8 7
Our Idea
Use a non-parametric test
Does not require model of Google
Specifically, a permutation test
Does not require independence among browser instances
49
Visiting Car Websites Impacts Ads
50
2 5 6 19 22 30 30 31
Conclusion
A rigorous methodology for information flow
experiments
Connection to causality in natural sciences Experimental design for causal determination Significance testing with non-parametric statistics
Future work
Replicate and analyze previous experiments
systematically
Guha et al, Wills and Tatar, Sweeney
Conduct new large-scale experiments systematically Tool support for automating information flow experiments
51
52
A Research Area
Formalize Privacy Policies
Precise semantics of privacy concepts
(restrictions on personal information flow)
Enforce Privacy Policies
Audit and Accountability
Detect violations Blame-assignment Adaptive audit resource allocation
Application Domains
Healthcare, Web privacy
53
Information Flow Analysis
Analysis White box Black box
Experimenting Monitoring Testing Access to program? Yes No Total Partial None Control over inputs?
54
Google Exhibits Complex Behavior
5 10 15 20 25 30 35 40 45 50 100 150 200 Ad id Reload number
55 55
Privacy as Contextual Integrity Context-relative information flow norms
Example contexts: healthcare, friendship Example norms: confidentiality, purpose, reciprocity
[Nissenbaum 2004; Barth-D-Mitchell-Nissenbaum 2006]
56
Norms to Policies
Example norm: confidentiality expectations in
healthcare
Associated policy: clauses in the HIPAA Privacy Rule Does policy reflect norm? Is policy respected? (Our focus)
57