LightDP: Towards Automating Differential Privacy Proofs Danfeng - - PowerPoint PPT Presentation

lightdp towards automating differential privacy proofs
SMART_READER_LITE
LIVE PREVIEW

LightDP: Towards Automating Differential Privacy Proofs Danfeng - - PowerPoint PPT Presentation

LightDP: Towards Automating Differential Privacy Proofs Danfeng Zhang Daniel Kifer Penn State University Database w/o Database w/ Alices data Alices data " $ Alices data remain private if " , $ are


slide-1
SLIDE 1

LightDP: Towards Automating Differential Privacy Proofs

Danfeng Zhang Daniel Kifer Penn State University

slide-2
SLIDE 2

2

Database w/ Alice’s data Database w/o Alice’s data

Alice’s data remain private if 𝜈", 𝜈$ are close

𝜈" 𝜈$

slide-3
SLIDE 3

(Pure) Differential Privacy

3

𝜈" 𝜈$

If for any adjacent databases and value 𝑤, 𝜈"(𝑤)/𝜈$(𝑤) ≤ 𝑓+for some constant 𝜗, then a computation is 𝜗-private 𝜈"(𝑤) 𝜈$(𝑤)

Privacy Cost

slide-4
SLIDE 4

Motivation

4

Rigorous methods are needed for differential privacy proofs

DP has seen explosive growth since 2006

  • U.S. Census Bureau LEHD OnTheMap tool

[Machanavajjhala et al. 2008]

  • Google Chrome Browser [Erlingsson et al. 2014]
  • Apple’s new data collection efforts [Greenberg 2016]

But also accompanied with flawed (paper-and- pencil) proofs

  • e.g., ones categorized in [Chen&Machanavajjhala’15, Lyu et al.’16]
slide-5
SLIDE 5

Related Work

DP programming platforms (e.g., PINQ, Airavat)

  • Use (instead of verify) basic DP mechanisms
  • Cannot offer tight bounds for sophisticated algorithms

Methods based on customized logics

  • Steep learning curve
  • Heavy annotation burden

5

LightDP offers a better balance between expressiveness and usability

slide-6
SLIDE 6

LightDP: Overview

6

Source Program Relational, Dependent Type System Target Program with distinguished variable

Source program type checks Source program is 𝜗-private

Main Theorem v+ bounded by constant 𝜗

in the target program

slide-7
SLIDE 7

Source Language: Syntax

7

Random variable Random Expression (e.g., Laplace dist.)

slide-8
SLIDE 8

Source Language: Semantics

Memory: mapping from variables to values

8

Initial memory Final memory dist. Adjacent memory Final memory dist.

Relational Reasoning via Type System

slide-9
SLIDE 9

Relational Types

9

Related Memories

𝑦: u 𝑦: u 𝑧: v 𝑧: v+1

Example

Γ 𝑦 : num6 Γ(𝑧): num"

Base Type Distance

e.g., int, real

slide-10
SLIDE 10

Dependent Types

10

Can be a program variable Related Memories

𝑦: u 𝑦: u 𝑧: v 𝑧: v + u

Example

Γ 𝑦 : num6 Γ(𝑧): num8

slide-11
SLIDE 11

Dependent Types

11

Related Memories

𝑦: u 𝑦: u 𝑧: v 𝑧: 9v + 2, u ≥ 1 v ,u < 1

Can be a non-prob. expression Example

Γ 𝑦 : num6 Γ(𝑧): num8>"?$:6

𝑛" Γ 𝑛$ if 𝑛" and 𝑛$

are related by Γ

Notation

slide-12
SLIDE 12

12

(for the non-probabilistic subset)

Types form an invariant on two related program executions:

Then after executing a well-typed program, final memories 𝑛" 𝑛$ 𝑛"

A

𝑛$

A

If initial memories

Γ Γ

Enforced by a type system

slide-13
SLIDE 13

Type System

13

Expression: e.g., + | − < | > | = | ≤ | ≥

slide-14
SLIDE 14

Type System

14

Command: e.g.,

Distance must be identical Related executions take same branch

slide-15
SLIDE 15

Relating Two Distributions

15

𝜈" 𝜈$

Program 𝜃 := Lap 𝑠

Laplace dist. w/ mean 0 and a scale factor 𝑠

Γ 𝜃 = num6

With no cost

𝜈" Γ 𝜈$ w.r.t. privacy cost 𝝑 if

∀𝑛. 𝜈"(𝑛)/𝜈$(Γ(𝑛)) ≤ 𝑓+

slide-16
SLIDE 16

𝜃 may have an arbitrary distance, which affects the added cost

Observation

Relating Two Distributions

16

𝜈" Γ 𝜈$ w.r.t. privacy cost 𝜗 if

∀𝑛. 𝜈"(𝑛)/𝜈$(Γ(𝑛)) ≤ 𝑓+

𝜈" 𝜈$

Program 𝜃 := Lap 𝑠

Laplace dist. w/ mean 0 and a scale factor 𝑠

Γ 𝜃 = num"

With cost 𝟐/𝒔 due to

  • dist. property
slide-17
SLIDE 17

17

𝜃 has a

polymorphic type source program target program, explicitly tracks added privacy cost Non-deterministic

  • peration

𝜈

Intuitively, target program computes the added cost for one sample from distribution

𝜃 may have an arbitrary distance, which affects the added cost

Observation

slide-18
SLIDE 18

In General

18

Source program Target program with distinguished variable

Type System

source program target program

slide-19
SLIDE 19

Target Language

19

Verification task in the target language: Proving is bounded by some constant 𝜗 in any execution (in a non-probabilistic program)

A safety property. Can be verified using off-the-shelf tools (e.g., Hoare logic, model checking) set x to arbitrary value

slide-20
SLIDE 20

Putting Together

The Sparse Vector Method [Dwork and Roth’14]

20

Source Program

  • Correctness proof is subtle

Incorrect variants categorized in

[Chen&Machanavajjhala’15, Lyu et al.’16]

  • Formally verified very

recently [Barthe et al. 2016] with heavy annotation burden

slide-21
SLIDE 21

Required Types

21

Distance depends on the value of 𝑗th query answer (𝑟[𝑗]) Types can be inferred by the inference algorithm of LightDP

Type Inference

slide-22
SLIDE 22

Target Program

22

slide-23
SLIDE 23

Completing the Proof

23

Loop Invariant

Postcondition:

Source program type checks + bounded by constant 𝜗 = source program is 𝜗-private

Main Theorem

slide-24
SLIDE 24

More in the Paper

Type inference algorithm Searching for proof with minimum cost w/ MaxSMT Formal proof for the main theorem More verified examples (with little manual efforts)

24

slide-25
SLIDE 25

Summary

25

Automated by inference engine A safety property (verified by existing tools)

Source Program Relational, Dependent Type System Target Program with distinguished variable

Decomposing differential privacy into subtasks substantially simplifies language-based proof