LightDP: Towards Automating Differential Privacy Proofs Danfeng - - PowerPoint PPT Presentation
LightDP: Towards Automating Differential Privacy Proofs Danfeng - - PowerPoint PPT Presentation
LightDP: Towards Automating Differential Privacy Proofs Danfeng Zhang Daniel Kifer Penn State University Database w/o Database w/ Alices data Alices data " $ Alices data remain private if " , $ are
2
Database w/ Alice’s data Database w/o Alice’s data
Alice’s data remain private if 𝜈", 𝜈$ are close
𝜈" 𝜈$
(Pure) Differential Privacy
3
𝜈" 𝜈$
If for any adjacent databases and value 𝑤, 𝜈"(𝑤)/𝜈$(𝑤) ≤ 𝑓+for some constant 𝜗, then a computation is 𝜗-private 𝜈"(𝑤) 𝜈$(𝑤)
Privacy Cost
Motivation
4
Rigorous methods are needed for differential privacy proofs
DP has seen explosive growth since 2006
- U.S. Census Bureau LEHD OnTheMap tool
[Machanavajjhala et al. 2008]
- Google Chrome Browser [Erlingsson et al. 2014]
- Apple’s new data collection efforts [Greenberg 2016]
But also accompanied with flawed (paper-and- pencil) proofs
- e.g., ones categorized in [Chen&Machanavajjhala’15, Lyu et al.’16]
Related Work
DP programming platforms (e.g., PINQ, Airavat)
- Use (instead of verify) basic DP mechanisms
- Cannot offer tight bounds for sophisticated algorithms
Methods based on customized logics
- Steep learning curve
- Heavy annotation burden
5
LightDP offers a better balance between expressiveness and usability
LightDP: Overview
6
Source Program Relational, Dependent Type System Target Program with distinguished variable
Source program type checks Source program is 𝜗-private
Main Theorem v+ bounded by constant 𝜗
in the target program
Source Language: Syntax
7
Random variable Random Expression (e.g., Laplace dist.)
Source Language: Semantics
Memory: mapping from variables to values
8
Initial memory Final memory dist. Adjacent memory Final memory dist.
Relational Reasoning via Type System
Relational Types
9
Related Memories
𝑦: u 𝑦: u 𝑧: v 𝑧: v+1
Example
Γ 𝑦 : num6 Γ(𝑧): num"
Base Type Distance
e.g., int, real
Dependent Types
10
Can be a program variable Related Memories
𝑦: u 𝑦: u 𝑧: v 𝑧: v + u
Example
Γ 𝑦 : num6 Γ(𝑧): num8
Dependent Types
11
Related Memories
𝑦: u 𝑦: u 𝑧: v 𝑧: 9v + 2, u ≥ 1 v ,u < 1
Can be a non-prob. expression Example
Γ 𝑦 : num6 Γ(𝑧): num8>"?$:6
𝑛" Γ 𝑛$ if 𝑛" and 𝑛$
are related by Γ
Notation
12
(for the non-probabilistic subset)
Types form an invariant on two related program executions:
Then after executing a well-typed program, final memories 𝑛" 𝑛$ 𝑛"
A
𝑛$
A
If initial memories
Γ Γ
Enforced by a type system
Type System
13
Expression: e.g., + | − < | > | = | ≤ | ≥
Type System
14
Command: e.g.,
Distance must be identical Related executions take same branch
Relating Two Distributions
15
𝜈" 𝜈$
Program 𝜃 := Lap 𝑠
Laplace dist. w/ mean 0 and a scale factor 𝑠
Γ 𝜃 = num6
With no cost
𝜈" Γ 𝜈$ w.r.t. privacy cost 𝝑 if
∀𝑛. 𝜈"(𝑛)/𝜈$(Γ(𝑛)) ≤ 𝑓+
𝜃 may have an arbitrary distance, which affects the added cost
Observation
Relating Two Distributions
16
𝜈" Γ 𝜈$ w.r.t. privacy cost 𝜗 if
∀𝑛. 𝜈"(𝑛)/𝜈$(Γ(𝑛)) ≤ 𝑓+
𝜈" 𝜈$
Program 𝜃 := Lap 𝑠
Laplace dist. w/ mean 0 and a scale factor 𝑠
Γ 𝜃 = num"
With cost 𝟐/𝒔 due to
- dist. property
17
𝜃 has a
polymorphic type source program target program, explicitly tracks added privacy cost Non-deterministic
- peration
𝜈
Intuitively, target program computes the added cost for one sample from distribution
𝜃 may have an arbitrary distance, which affects the added cost
Observation
In General
18
Source program Target program with distinguished variable
Type System
source program target program
Target Language
19
Verification task in the target language: Proving is bounded by some constant 𝜗 in any execution (in a non-probabilistic program)
A safety property. Can be verified using off-the-shelf tools (e.g., Hoare logic, model checking) set x to arbitrary value
Putting Together
The Sparse Vector Method [Dwork and Roth’14]
20
Source Program
- Correctness proof is subtle
Incorrect variants categorized in
[Chen&Machanavajjhala’15, Lyu et al.’16]
- Formally verified very
recently [Barthe et al. 2016] with heavy annotation burden
Required Types
21
Distance depends on the value of 𝑗th query answer (𝑟[𝑗]) Types can be inferred by the inference algorithm of LightDP
Type Inference
Target Program
22
Completing the Proof
23
Loop Invariant
Postcondition:
Source program type checks + bounded by constant 𝜗 = source program is 𝜗-private
Main Theorem
More in the Paper
Type inference algorithm Searching for proof with minimum cost w/ MaxSMT Formal proof for the main theorem More verified examples (with little manual efforts)
24
Summary
25
Automated by inference engine A safety property (verified by existing tools)
Source Program Relational, Dependent Type System Target Program with distinguished variable
Decomposing differential privacy into subtasks substantially simplifies language-based proof