Differential Privacy and Applications Marco Gaboardi Boston - - PowerPoint PPT Presentation

differential privacy and applications
SMART_READER_LITE
LIVE PREVIEW

Differential Privacy and Applications Marco Gaboardi Boston - - PowerPoint PPT Presentation

Differential Privacy and Applications Marco Gaboardi Boston University Recap Foundamental Law of Information Reconstruction The release of too many overly accurate statistics gives privacy violations. ( , )-Differential Privacy


slide-1
SLIDE 1

Marco Gaboardi

Boston University

Differential Privacy and Applications

slide-2
SLIDE 2

Recap

slide-3
SLIDE 3

Foundamental Law of Information Reconstruction

The release of too many overly accurate statistics gives privacy violations.

slide-4
SLIDE 4

(ε,δ)-Differential Privacy

Definition Given ε,δ ≥ 0, a probabilistic query Q: Xn→R is (ε,δ)-differentially private iff for all adjacent database b1, b2 and for every S⊆R: Pr[Q(b1)∈ S] ≤ exp(ε)Pr[Q(b2)∈ S] + δ

slide-5
SLIDE 5

Algorithm 2 Pseudo-code for the Laplace Mechanism

1: function LapMech(D, q, ‘) 2:

Y

$

← Lap(∆q

‘ )(0)

3:

return q(D) + Y

4: end function

Laplace Mechanism

slide-6
SLIDE 6

Theorem (Privacy of the Laplace Mechanism)
 The Laplace mechanism is ε-differentially private.

Laplace Mechanism

Accuracy Theorem: 


and let r = LapMech(D, q, ‘). T

∀ ∈ Pr

Ë

|q(D) − r| ≥

1∆q

2

ln

1 1

= —

slide-7
SLIDE 7

7

Sequential Composition

Noise

D

Q1 is ε1-DP Q2 is ε2-DP … Qn is εn-DP The overall process is (ε1+ε2+…+εn)-DP

slide-8
SLIDE 8

8

Parallel Composition

Noise

D⊎D’

Q i s ε

  • D

P Q ’ i s ε

  • D

P The overall process is ε-DP

D

Noise

D’

slide-9
SLIDE 9

PINQ - McSharry’08

  • Private LINQ (a library/API for queries in C#)
  • Designed with composition in mind.
  • The first language for differential privacy.
slide-10
SLIDE 10

An alternative approach:
 Fuzz: Compositional Reasoning about Sensitivity (Pierce et al.’10)

  • Based on a semantics model of metric spaces and non-

expansive functions,

  • The user specifies the sensitivity of some basic

primitives based on the semantics model,

  • The tool implements a type-checker permitting a static

checking of the sensitivity of a program (based on a calculus for sensitivities derived from linear logic),

  • It requires a limited reasoning about probabilities.
slide-11
SLIDE 11

Verification tools

+

expert provided
 annotations verification
 tools (semi)-decision procedures
 (SMT solvers, ITP)

Do we have good 
 semi-decision
 procedures for 
 (ε,𝜺)-indistinguishability?

slide-12
SLIDE 12

Approximate Probabilistic Coupling

A (ε,δ)-coupling 𝜈1 C(ε,δ)(S) 𝜈2 of two probability distributions 𝜈1 over A and 𝜈2 over B with respect to the relation S ⊆ AxB is a pair of probability distribution 𝜈L, 𝜈R over A x B such that: 1.the left marginal of 𝜈L is 𝜈1, the right marginal of 𝜈R is 𝜈2, 2.the support of 𝜈L and 𝜈R is contained in S, 3.max(maxE 𝜈L(E)-exp(ε)𝜈R(E), maxE 𝜈R(E)-exp(ε)𝜈L(E)) ≤δ

[Barthe et al. 12]

slide-13
SLIDE 13

We will use a simplification

⊢ϵ,δ c : P ⇒ Q

Precondition (a relation over memories) Postcondition (a relation over memories) Program Privacy Parameters

slide-14
SLIDE 14

Approximate Probabilistic Coupling for DP

Q is (ε,δ)-differentially private
 iff Q(D) C(ε,δ)(=) Q(D’)

For D and D’ differing in one individual.

slide-15
SLIDE 15

Example of coupling

Pre: 0≤k+input

1-input 2≤k’

  • utput = input + Lap(1/ε)

Post: [output1+k=output2] we pay k’ε

slide-16
SLIDE 16

16 (a) (b) (c) (d) Suppose that each one of us can vote for one star, and we want to say who is the star that receives most votes.

Report Noisy Max

slide-17
SLIDE 17

17

0.125 0.25 0.375 0.5 (a) (b) (c) (d)

Algorithm: We can compute the histogram add Laplace noise to each score and then select the maximal noised score. (a) (b) (c) (d)

Report Noisy Max

We can even add

  • ne side Laplace
slide-18
SLIDE 18

q1(D)+noise q2(D)+noise q3(D)+noise ..... qk(D)+noise

Report Noisy Max - intuition

We can prove this algorithm ε-differentially private

  • 2.25

2.25 4.5 6.75 9 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8

q1(D’)+noise q2(D’)+noise q3(D’)+noise ….. qk(D’)+noise

1 sensitive queries

D D’

Databases differing in one individual

We need to coordinate Noises

slide-19
SLIDE 19

ROSNM (q

1,…,q k : list data R,

b : list data, ε: R) : nat i = 1;

best = 0; while (i ≤ k){ cur = q

i(b) + Lap+(2/ε)

if (cur > best \/ i=1) max = i ; best = cur; return max;

Report One-sided Noisy Max

Instead of the classic Report Noisy Max, we consider a version where we add noise from a one-sided Laplace Composition doesn’t apply, since adding one- sided Laplace is not differentially private

slide-20
SLIDE 20

Pointwise rule - simplified

Pre: b

1 ~1 b 2

program Post: [out1=s => out2=s] and paid ε Pre: formula Program Post: [out

1=out 2] and paid ε

If for every s∈O then

slide-21
SLIDE 21

[b

1 ~1 b 2,∀i.∀d 1~1d

  • 2. |q

i(d 1)-q i(d 2)|≤1,…]

ROSNM (q

1,…,q k : list data R,

b : list data, ε: R) : nat i = 1; best = 0;

while (i ≤ k){ cur = q

i(b) + Lap+(2/ε)

if (cur > best \/ i=1) max = i ; best = cur; i=i+1; return max;

[max

1=max 2] and paid ε

Report One-sided Noisy Max

slide-22
SLIDE 22

By applying the pointwise rule we get a different post

[b

1 ~1 b 2,∀i.∀d 1~1d

  • 2. |q

i(d 1)-q i(d 2)|≤1,…]

ROSNM (q

1,…,q k : list data R,

b : list data, ε: R) : nat i = 1; best = 0;

while (i ≤ k){ cur = q

i(b) + Lap+(2/ε)

if (cur > best \/ i=1) max = i ; best = cur; i=i+1; return max;

[max

1=s => max 2=s] and paid ε

Notice that we focus

  • n a single general s.

Report One-sided Noisy Max

slide-23
SLIDE 23

Playing the verification game

ROSNM (q

1,…,q k : list data R,

b : list data, ε: R) : nat [b

1 ~1 b 2,∀i.∀d 1~1d

  • 2. |q

i(d 1)-q i(d 2)|≤1,…]

i = 1; best = 0;

while (i ≤ k){ cur = q

i(b) + Lap+(2/ε)

if (cur > best \/ i=1) max = i ; best = cur; i=i+1; return max;

[max

1=s => max 2=s] and paid ε

Report One-sided Noisy Max

slide-24
SLIDE 24

Playing the verification game

ROSNM (q

1,…,q k : list data R,

b : list data, ε: R) : nat i = 1; best = 0; [b

1 ~1 b 2,…]

while (i ≤ k){ cur = q

i(b) + Lap+(2/ε)

if (cur > best \/ i=1) max = i ; best = cur; i=i+1; return max;

[max

1=s => max 2=s] and paid ε

Report One-sided Noisy Max

slide-25
SLIDE 25

We can now proceed by cases

ROSNM (q

1,…,q k : list data R,

b : list data, ε: R) : nat i = 1; best = 0;

while (i ≤ k){

[b

1 ~1 b 2,…]

cur = q

i(b) + Lap+(2/ε)

if (cur > best \/ i=1) max = i ; best = cur; i=i+1; return max;

[max

1=s => max 2=s] and paid ε

Report One-sided Noisy Max

slide-26
SLIDE 26

And use different properties

ROSNM (q

1,…,q k : list data R,

b : list data, ε: R) : nat i = 1; best = 0;

while (i ≤ k){

[b

1 ~1 b 2, i 1<s => … /\ i 1≥s => … /\ i 1=i 2]

cur = q

i(b) + Lap+(2/ε)

if (cur > best \/ i=1) max = i ; best = cur; i=i+1; return max;

[max

1=s => max 2=s] and paid ε

Report One-sided Noisy Max

slide-27
SLIDE 27

Invariant

i

1 < s =>(

max

1 < s /\ max 2 < s /\ |best 1-best 2|≤1)

/\ i

1 ≥ s =>(

(max

1= max 2=s /\ best 1+1=best 2) \/ max 1 ≠ s)

/\ i

1=i 2

slide-28
SLIDE 28

Invariant

This part describes the situation before we encounter s.

i

1 < s =>(

max

1 < s /\ max 2 < s /\ |best 1-best 2|≤1)

/\ i

1 ≥ s =>(

(max

1= max 2=s /\ best 1+1=best 2) \/ max 1 ≠ s)

/\ i

1=i 2

slide-29
SLIDE 29

Invariant

i

1 < s =>(

max

1 < s /\ max 2 < s /\ |best 1-best 2|≤1)

/\ i

1 ≥ s =>(

(max

1= max 2=s /\ best 1+1=best 2) \/ max 1 ≠ s)

/\ i

1=i 2

This part describes the situation after we encounter s.

slide-30
SLIDE 30

Invariant

i

1 < s =>(

max

1 < s /\ max 2 < s /\ |best 1-best 2|≤1)

/\ i

1 ≥ s =>(

(max

1= max 2=s /\ best 1+1=best 2) \/ max 1 ≠ s)

/\ i

1=i 2

When we encounter s we switch from one to the other

slide-31
SLIDE 31

Let us consider case by case

ROSNM (q

1,…,q k : list data R,

b : list data, ε: R) : nat i = 1; best = 0;

while (i ≤ k){

[i

1 < s => max 1 < s /\ max 2 < s /\ |best 1-best 2|≤1]

cur = q

i(b) + Lap+(2/ε)

if (cur > best \/ i=1) max = i ; best = cur; i=i+1; return max;

[max

1=s => max 2=s] and paid ε

Report One-sided Noisy Max

slide-32
SLIDE 32

Which rule shall we apply?

ROSNM (q

1,…,q k : list data R,

b : list data, ε: R) : nat i = 1; best = 0;

while (i ≤ k){

[i

1 < s => max 1 < s /\ max 2 < s /\ |best 1-best 2|≤1]

cur = q

i(b) + Lap+(2/ε)

if (cur > best \/ i=1) max = i ; best = cur; i=i+1; return max;

[max

1=s => max 2=s] and paid ε

Report One-sided Noisy Max

slide-33
SLIDE 33

Laplace+ rule 1

Pre: true

  • utput = input + Lap+(ε)

Post: [output1-output2=input1-input2] we paid 0

slide-34
SLIDE 34

ROSNM (q

1,…,q k : list data R,

b : list data, ε: R) : nat i = 1; best = 0;

while (i ≤ k){ cur = q

i(b) + Lap+(2/ε)

[i

1 < s => max 1 < s /\ max 2 < s /\ |best 1-best 2|≤1/\

cur1-cur2=q

i(b)-q i(b)] paid 0

if (cur > best \/ i=1) max = i ; best = cur; i=i+1; return max;

[max

1=s => max 2=s] and paid ε

Let’s apply the rule

Report One-sided Noisy Max

slide-35
SLIDE 35

ROSNM (q

1,…,q k : list data R,

b : list data, ε: R) : nat i = 1; best = 0;

while (i ≤ k){ cur = q

i(b) + Lap+(2/ε)

[i

1 < s => max 1 < s /\ max 2 < s /\ |best 1-best 2|≤1/\

|cur1-cur2|≤1] paid 0 if (cur > best \/ i=1) max = i ; best = cur; i=i+1; return max;

[max

1=s => max 2=s] and paid ε

And rewrite…

Report One-sided Noisy Max

slide-36
SLIDE 36

ROSNM (q

1,…,q k : list data R,

b : list data, ε: R) : nat i = 1; best = 0;

while (i ≤ k){ cur = q

i(b) + Lap+(2/ε)

if (cur > best \/ i=1) max = i ; best = cur;

[i

1 < s => max 1 < s /\ max 2 < s /\ |best 1-best 2|≤1/\

|cur1-cur2|≤1] paid 0 i=i+1; return max;

[max

1=s => max 2=s] and paid ε

Proceeding…

Report One-sided Noisy Max

slide-37
SLIDE 37

ROSNM (q

1,…,q k : list data R,

b : list data, ε: R) : nat i = 1; best = 0;

while (i ≤ k){ cur = q

i(b) + Lap+(2/ε)

if (cur > best \/ i=1) max = i ; best = cur;

[i

1 < s => max 1 < s /\ max 2 < s /\ |best 1-best 2|≤1/\

|cur1-cur2|≤1] paid 0 i=i+1; return max;

[max

1=s => max 2=s] and paid ε

This preserves the invariant

Report One-sided Noisy Max

slide-38
SLIDE 38

ROSNM (q

1,…,q k : list data R,

b : list data, ε: R) : nat i = 1; best = 0;

while (i ≤ k){

[i

1 ≥ s =>(max 1= max 2=s /\ best 1+1=best 2) \/ max 1 ≠ s]

cur = q

i(b) + Lap+(2/ε)

if (cur > best \/ i=1) max = i ; best = cur; i=i+1; return max;

[max

1=s => max 2=s] and paid ε

Let us consider now the second case

Report One-sided Noisy Max

slide-39
SLIDE 39

ROSNM (q

1,…,q k : list data R,

b : list data, ε: R) : nat i = 1; best = 0;

while (i ≤ k){

[i

1 ≥ s =>(max 1= max 2=s /\ best 1+1=best 2) \/ max 1 ≠ s]

cur = q

i(b) + Lap+(2/ε)

if (cur > best \/ i=1) max = i ; best = cur; i=i+1; return max;

[max

1=s => max 2=s] and paid ε

What rule shall
 we apply now?

Report One-sided Noisy Max

slide-40
SLIDE 40

Laplace+ rule 2

Pre: 0≤k+input

1-input 2≤k’

  • utput = input + Lap+(1/ε)

Post: [output1+k=output2] we paid k’ε

slide-41
SLIDE 41

ROSNM (q

1,…,q k : list data R,

b : list data, ε: R) : nat i = 1; best = 0;

while (i ≤ k){ cur = q

i(b) + Lap+(2/ε)

[(i

1 ≥ s =>(max 1= max 2=s /\ best 1+1=best 2) \/ max 1 ≠ s)

/\cur

1+1=cur 2] paid ε

if (cur > best \/ i=1) max = i ; best = cur; i=i+1; return max;

[max

1=s => max 2=s] and paid ε

Report One-sided Noisy Max

Let’s apply the rule

slide-42
SLIDE 42

ROSNM (q

1,…,q k : list data R,

b : list data, ε: R) : nat i = 1; best = 0;

while (i ≤ k){ cur = q

i(b) + Lap+(2/ε)

[(i

1 ≥ s =>(max 1= max 2=s /\ best 1+1=best 2) \/ max 1 ≠ s)

/\cur

1+1=cur 2] paid ε

if (cur > best \/ i=1) max = i ; best = cur; i=i+1; return max;

[max

1=s => max 2=s] and paid ε

Report One-sided Noisy Max

Now we see that either we don’t enter the if in the first case, or if we do, we are guaranteed to
 enter also in the second case

slide-43
SLIDE 43

ROSNM (q

1,…,q k : list data R,

b : list data, ε: R) : nat i = 1; best = 0;

while (i ≤ k){ cur = q

i(b) + Lap+(2/ε)

if (cur > best \/ i=1) max = i ; best = cur;

[(i

1 ≥ s =>(max 1= max 2=s /\ best 1+1=best 2) \/ max 1 ≠ s)

/\cur

1+1=cur 2] paid ε

i=i+1; return max;

[max

1=s => max 2=s] and paid ε

Report One-sided Noisy Max

Continuing…

slide-44
SLIDE 44

ROSNM (q

1,…,q k : list data R,

b : list data, ε: R) : nat i = 1; best = 0;

while (i ≤ k){ cur = q

i(b) + Lap+(2/ε)

if (cur > best \/ i=1) max = i ; best = cur; i=i+1;

[invariant] paid ε

return max;

[max

1=s => max 2=s] and paid ε

Report One-sided Noisy Max

With this we can 
 conclude

slide-45
SLIDE 45

EasyCrypt

slide-46
SLIDE 46

Trade-offs: formal methods for privacy

expressivity
 (in terms of examples) usability
 (in terms of 
 required 
 expertise) granularity

  • f the privacy

enforcement

Pinq
 Airavat Fuzz Coupling-based
 methods

slide-47
SLIDE 47

Coupling Strategy

R S T Strategy

slide-48
SLIDE 48

Automated DP proofs using Coupling Strategies and Horn Modulo Coupling (Albarghouthi&Hsu’17)

  • Formalize the idea of coupling strategies useful to combine

different coupling in the same proof,

  • Define a variation of Horn clause that use coupling information

and that can be solved efficiently using a program synthesis approach.

  • They can prove sparse-vector differentially private in a fully

automated way! (With the good library of couplings.)

slide-49
SLIDE 49

Semi-automated DP proofs using Randomness Assignments

  • Permits to build more flexible reasoning about correspondences

between the programs, and the privacy budget,

  • requires few annotations and can be combined with other tools

making it almost automated,

  • the proof of sparse vector only requires 2 lines of annotations,
  • implemented in LightDP (Zhang&Kifer’17)

R

injective map
 producing the 
 same output

slide-50
SLIDE 50

Another algorithm

50

slide-51
SLIDE 51

51

Sparse Vector

Noise

D

q1 q2 … qn q3

qi(D) ≥ T?

a1 a3 ⊥ ⊥

SparseVector(D,q1,…,qn,T,ε)

✔ ✘ ✔ ✘ How can we achieve epsilon-DP by paying

  • nly for the queries above T?
slide-52
SLIDE 52

Return k

ˆ q1 ˆ q2 ˆ qk

Above threshold ?

ˆ t

✔ ✘ ✘

A first step: above threshold

slide-53
SLIDE 53

Reasoning by Composition

ˆ q1 ˆ q2 ˆ qk ✔

✘ ✘

ε/4 ε/4 ε/4 In the worst case, 
 the data analysis is (nε/4,0)-DP

slide-54
SLIDE 54

A more advanced analysis

ˆ q1 ˆ q2 ˆ qk ✔

✘ ✘

ε/2 ε/2 We can show that above threshold is (ε,0)-DP

}

It doesn’t depend on the number of queries.

slide-55
SLIDE 55

Above Threshold

AboveT (q

1,…,q k : list data R,

b : list data, T:R, ε: R) : int i = 1;

r = k+1; T’= T + Lap(ε/2) while (i < k){ cur = q

i(b) + Lap(ε/4)

if (cur ≥ T /\ r = k+1 ) r = i; i++

return r;

slide-56
SLIDE 56

Algorithm 1 An instantiation of the SVT proposed in this paper. Input: D, Q, ∆, T = T1, T2, · · · , c. 1: 1 = /2, ρ = Lap (∆/1) 2: 2 = − 1, count = 0 3: for each query qi ∈ Q do 4: νi = Lap (2c∆/2) 5: if qi(D) + νi ≥ Ti + ρ then 6: Output ai = ⊤ 7: count = count + 1, Abort if count ≥ c. 8: else 9: Output ai = ⊥ Algorithm 2 SVT in Dwork and Roth 2014 [8]. Input: D, Q, ∆, T, c. 1: 1 = /2, ρ = Lap (c∆/1) 2: 2 = − 1, count = 0 3: for each query qi ∈ Q do 4: νi = Lap (2c∆/1) 5: if qi(D) + νi ≥ T + ρ then 6: Output ai = ⊤, ρ = Lap (c∆/2) 7: count = count + 1, Abort if count ≥ c. 8: else 9: Output ai = ⊥ Algorithm 3 SVT in Roth’s 2011 Lecture Notes [15]. Input: D, Q, ∆, T, c. 1: 1 = /2, ρ = Lap (∆/1), 2: 2 = − 1, count = 0 3: for each query qi ∈ Q do 4: νi = Lap (c∆/2) 5: if qi(D) + νi ≥ T + ρ then 6: Output ai = qi(D) + νi 7: count = count + 1, Abort if count ≥ c. 8: else 9: Output ai = ⊥ Algorithm 4 SVT in Lee and Clifton 2014 [13]. Input: D, Q, ∆, T, c. 1: 1 = /4, ρ = Lap (∆/1) 2: 2 = − 1, count = 0 3: for each query qi ∈ Q do 4: νi = Lap (∆/2) 5: if qi(D) + νi ≥ T + ρ then 6: Output ai = ⊤ 7: count = count + 1, Abort if count ≥ c. 8: else 9: Output ai = ⊥ Algorithm 5 SVT in Stoddard et al. 2014 [18]. Input: D, Q, ∆, T . 1: 1 = /2, ρ = Lap (∆/1) 2: 2 = − 1 3: for each query qi ∈ Q do 4: νi = 0 5: if qi(D) + νi ≥ T + ρ then 6: Output ai = ⊤ 7: 8: else 9: Output ai = ⊥ Algorithm 6 SVT in Chen et al. 2015 [1]. Input: D, Q, ∆, T = T1, T2, · · · . 1: 1 = /2, ρ = Lap (∆/1) 2: 2 = − 1 3: for each query qi ∈ Q do 4: νi = Lap (∆/2) 5: if qi(D) + νi ≥ Ti + ρ then 6: Output ai = ⊤ 7: 8: else 9: Output ai = ⊥

The sparse vector case

Min Lyu, Dong Su, Ninghui Li: Understanding the Sparse Vector Technique for Differential Privacy. PVLDB (2017)

slide-57
SLIDE 57

Other models

slide-58
SLIDE 58

Differential privacy

So far, we have considered a curator model: a model where there is a trusted centralized party that holds the data and to which we can ask our queries.

Noise

Curator

slide-59
SLIDE 59

Multiparty Setting

We now consider a model where the data is distributed among m parties P1,…,Pm. We assume that the data is evenly split among the parties, each party Pi has n/m rows of the dataset. Each party Pi want to guarantee privacy for its data against an adversary that may control the other parties. We will study protocols to compute statistics over the data.

slide-60
SLIDE 60

Adversaries

We assume that the adversaries are:

  • passive (honest-but-curious): they follow the specified

protocol but try to extract information from what they see,

  • computationally unbounded: we will not restrict the

capacity of the adversary,

  • control several parties: an adversary can control t≤m-1
  • parties. We will focus on t=m-1.
slide-61
SLIDE 61

The local model

This is the extremal case where m=n. We can think about this case as the one where each party just holds one data, and does not trust any other party. This is in some sense the hardest differential privacy guarantee that one can provide.

Can we give non-trivial protocols for this model?

slide-62
SLIDE 62

Randomized Response


[Warner65]

biased coin True answer Opposite answer The value of the bias is what determines the epsilon Suppose I ask a yes/no question.

slide-63
SLIDE 63

Randomized Response

Algorithm 1 Pseudo-code for Randomized Response

1: function RandomizedResponse(D, q, ‘) 2:

for k ← 1 to |D| do

3:

Si ←

I

q(di) with probability

e‘ 1+e‘

¬q(di) with probability

1 1+e‘

4:

end for

5:

return (sum S)

|D|

6: end function

slide-64
SLIDE 64

Randomized Response

Privacy Theorem: 
 Randomized response is eps-differentially private.

slide-65
SLIDE 65

Randomized Response

Question: How accurate is the answer that we get from randomized response?

slide-66
SLIDE 66

Randomized Response

Accuracy Theorem: 


≤ ≤ Pr

r←RR(D,q,‘)

Ë-

  • 1 + e‘

e‘ − 1

1

r − 1 1 + e‘

2

− q(D)

  • ≥ 1 + e‘

(e‘ − 1)

Û

log(2/—) 2n

È

≤ —

This represents the variable measuring the difference between the noised answer and the non-noised one. This is our alpha. Notice that we express it in terms

  • f beta.
slide-67
SLIDE 67

Additive Chernoff Bound

Theorem 1.2 (Additive Chernoff Bound). Let X1, . . . , Xn be i.i.d ran- dom variables such that 0 ≤ Xi ≤ 1 for every 1 ≤ i ≤ n. Let S = 1

n

qn

i=1 Xi denote their mean and E[S] their expected mean, where

E[S] = 1

n

qn

i=1 E[Xi] by linearity of expectation, then for every λ we

have: Pr[|S − E[S]| ≥ λ] ≤ 2e−2λ2n

slide-68
SLIDE 68

Randomized Response

Accuracy Theorem: 


≤ ≤ Pr

r←RR(D,q,‘)

Ë-

  • 1 + e‘

e‘ − 1

1

r − 1 1 + e‘

2

− q(D)

  • ≥ 1 + e‘

(e‘ − 1)

Û

log(2/—) 2n

È

≤ —

Intuitive reading: with high probability we have: |q(D) − r| ≤ O( n nϵ )

slide-69
SLIDE 69

Randomized Response

Privacy Theorem: 
 Randomized response is ε-differentially private in the local model. Accuracy for Randomize response: with high probability we have: |q(D) − r| ≤ O( n nϵ )

slide-70
SLIDE 70

Randomized Response is

  • ptimal in the local model

Theorem 9.3 (randomized response is optimal in the local model [25]). For every nonconstant counting query q : X → {0, 1}, and n ∈ N, and (1, 0)-differentially private n-party protocol P for approximating q, there is an input data set x ∈ Xn on which P has error α = Ω(1/√n) with high probability.

slide-71
SLIDE 71

Randomized Response vs Laplace

Accuracy for Laplace for counting queries: with high probability we have: Accuracy for Randomize response for counting queries: with high probability we have: |q(D) − r| ≤ O( n nϵ ) |q(D) − r| ≤ O( 1 nϵ)

slide-72
SLIDE 72

Local model

Local Computation Noised Computation . . . ∑

i

ai q(d1) q(dn) a1 an . . . Aggregate Result

slide-73
SLIDE 73

Shuffle model

Local Computation Noised Computation

. . . ∑

i

api q(d1) q(dn) a1 an . . . Aggregate Result

Shuffler

ap1 . . . apn Hiding a lot of details

slide-74
SLIDE 74

Randomized Response vs Shuffled Randomized Response

Accuracy for Shuffled Randomize Response for counting queries: with high probability we have: Accuracy for Randomize response for counting queries: with high probability we have: |q(D) − r| ≤ O( n nϵ ) |q(D) − r| ≤ O( 1 nϵ ) Hiding a lot of details

slide-75
SLIDE 75

Summary

  • Fundamental law of information reconstruction
  • DP: quantitative notion of privacy with good properties
  • Non-trivial algorithms
  • Interesting verification methods
  • Different models
slide-76
SLIDE 76

Questions?