[PPT] - Differential Privacy (Part III) Approximate (or ( , PowerPoint Presentation

SLIDE 1

Differential Privacy (Part III)

SLIDE 2

Approximate (or (ℇ,∂))-differential privacy

Generalized definition of differential privacy allowing for a

(supposedly small) additive factor

Used in a variety of applications

A query mechanism M is (✏, )-differentially private if, for any two adjacent databases D and D0 (differing in just one entry) and C ⊆ range(M) Pr(M(D) ∈ C) ≤ e✏ · Pr(M(D0) ∈ C) +

SLIDE 3

The Gaussian mechanism

For c2>2ln(1.25/δ), the Gaussian mechanism with parameter σ≥c∆2(f)/ε is (ε,δ)-differentially private The ℓ2-sensitivity of f:ℕ|X|→ℝk is defined as ∆2(f)=max ||f(x)-f(y)||2 for all x,y∈ℕ|X|,||x-y||1=1

SLIDE 4

Sparse Vector Technique

✦ [Hardt-Rothblum, FOCS’10] study the problem of k,

adaptively chosen, low sensitivity queries where

only a very small number of these queries (say c)

take values above a certain threshold T

the data analyst is only interested in such queries
useful to learn correlations, e.g., whether there is

a dependency between smoke and cancer

✦ The data analyst could ask only the significant

queries, but she does not know them in advance!

✦ Goal: answer only the significant queries, pay only

for them, and ignore the others

SLIDE 5

Histograms and linear queries

✦ A histogram x ∈ ℝN represents a database (or a distribution)

ver a universe U of size |U|=N
Databases have support of size n, whereas distributions do

not necessarily have a small support

✦ We assume x is normalized so that ✦ Here we focus on linear queries

can be seen as the inner-product <x,f > for
counting queries (i.e., how many elements in the database

fulfill a certain predicate) are a special case

✦ Example: U={1,2,3} D=[1,2,2,3,1]

x = (2,2,1), after normalization (2/5,2/5,1/5)
“how many entries ≤ 2” ⇒ f = (1,1,0)

✦ By normalization, linear queries have sensitivity 1/n X

i∈U

xi = 1 f : RN → [0, 1]

f ∈ [0, 1]N

SLIDE 6

SVT: algorithm

✦ Intuition: answer only those queries whose sanitized

result is above the sanitized threshold

We pay only for c queries We pay only for c queries We need to sanitize the threshold otherwise the conditional branch would leak information

SLIDE 7

SVT: accuracy

α captures the distance between the sanitized result

and the real result

β captures the error probability

We say Sparse is (α, β)-accurate for a sequence of k queries Q1, . . . , Qk, if except with probability at most β, the algorithm does not abort before Qk, and for all ai ∈ R: |ai − Qi(D)| ≤ α and for all ai =⊥: Qi(D) ≤ T + α

SLIDE 8

SVT: accuracy theorem

The larger β, the smaller α
The accuracy loss is logarithmic in the number of

queries

For any sequence of k queries Q1, . . . , Qk such that L(T) = |{i : Qi(D) ≥ T −↵}| ≤ c, Sparse(D, {Qi}, T, c) is (↵, )- accurate for: ↵ = 2(log k + log 2 ) = 4c (log k + log 2

β )

✏n

SLIDE 9

SVT: privacy theorem

So, what did we prove in the end?
You can estimate the actual answers and report only

those in this range:

We can fish out insignificant queries almost “for free”,

paying only logarithmically for them in terms of accuracy

The Sparse vector algorithm is ✏-differentially private

T T+ α

∞

SLIDE 10

SVT: approximate differential privacy

✦ Setting , we get the following theorems:

= p 32c ln 1/ ✏n

The Sparse vector algorithm is (✏, )-differentially private

For any sequence of k queries Q1, . . . , Qk such that L(T) = |{i : Qi(D) ≥ T −↵}| ≤ c, Sparse(D, {Qi}, T, c) is (↵, )- accurate for: ↵ = 2(log k + log 2 ) = 128c ln 1

δ (log k + log 2 β )

✏n

SLIDE 11

Limitations

✦ Differential privacy is a general purpose privacy

definition, originally thought for databases and later applied to a variety of different settings

✦ At the moment, it is considered the state-of-the-art ✦ Still, it is not the holy grail and it is not immune from

concerns, criticisms, and limitations

✦ Typically accompanied by some over-claims

SLIDE 12

No free lunch in data privacy

✦ Privacy and utility cannot be provided without

making assumptions about how data are generated (no free lunch theorem)

✦ Privacy means hiding the evidence of participation of

an individual in the data generating process

✦ If database rows are not independent, this is

different from removing one row

Bob’s participation in a social network may cause

new edges between pairs of his friends

✦ If there is group structure, differential privacy may

not work very well...

SLIDE 13

No free lunch in data privacy (cont’d)

✦ This work disputes three popular over-claims ✦ “DP requires no assumptions on the data”

database rows must actually be independent,
therwise removing one row does not suffice to

remove the individual’s participation

✦ If rows are not independent, deciding how many

entries should be removed and which ones is far from being easy...

SLIDE 14

No free lunch in data privacy (cont’d)

✦ The attacker knows all entries of

the database except for one, so “the more an attacker knows, the greater the privacy risks”

✦ Thus we should protect against the

strongest attacker

✦ Careful! In DP, the more the

attacker knows, the less noise we actually add

intuitively, this is due to the fact

that we have less to hide

SLIDE 15

No free lunch in data privacy (cont’d)

✦ “DP is robust to arbitrary background knowledge” ✦ Actually, DP is robust when certain subsets of the

tuples are known to the attacker

✦ Other types of background knowledge may instead

be harmful

e.g., previous exact query answers

✦ DP composes well with itself, but not necessarily with

ther privacy definitions or release mechanisms

✦ One can get a new, more generic, DP privacy

guarantee if, after releasing exact query answers, a set of tuples (not just one), called neighbours, is altered in a way that is still consistent with previously answered queries (plausible deniability)

SLIDE 16

Geo-indistinguishability

Goal: protect user’s exact location, while allowing

approximate information (typically needed to obtain a certain desired service) to be released

Idea: protect the user’s location within a radius r with

a level of privacy that depends on r

corresponds to a generalized version of the well-

known concept of differential privacy.

SLIDE 17

Pictorially…

Achieve l-privacy within r
the provider cannot easily infer the user’s location

within, say, the 7th arrondissement of Paris

the provider can infer with high probability that the user

is located in Paris instead of, say, London

SLIDE 18

More formally…

Here K(x) denotes the distribution (of locations)

generated by the mechanism K applied to location x

Achieved through a variant of the Laplace mechanism

SLIDE 19

Browser extension

SLIDE 20

Malicious aggregators

So far we focused on malicious analysts…
…but aggregators can be malicious (or at least

curious) too!

Users Aggregator Analyst x1 xn f(x1,…,xn)

SLIDE 21

Existing approaches

Secure hardware (or trusted server)-based mechanisms

Fully distributed mechanisms with individual noise

SLIDE 22

Distributed Differential Privacy

How to compute differentially private queries in a distributed setting (attacker model, cryptographic protocols…)? “What’s the average age

f your self-help group?”

SLIDE 23

Smart-metering

✦ Fine-grained smart-metering has multiple uses:

time-of-use billing, providing energy advice, settlement,

forecasting, demand response, and fraud detection

✦ USA: Energy Independence and Security Act of 2007

American Recovery and Reinvestment Act (2009, $4.5bn)

✦ EU: Directive 2009/72/EC ✦ UK: deployment of 47 million smart meters by 2020 ✦ Remote reads ✦ Reads every 15-30 min ✦ Manual reads ✦ One reads every 3

months to 1 year

SLIDE 24

Smart-metering: privacy issues

✦ Meter readings are sensitive

Were you in last night?
You do like watching TV, don’t you?
Another ready meal in the microwave?
Has your boyfriend moved in?

SLIDE 25

Smart-metering: privacy issues (cont’d)

SLIDE 26

Privacy-friendly smart metering

✦ Goals:

precise billing of

consumption while revealing no consumption information to third parties

privacy-friendly real-

time aggregation

SLIDE 27

Protocol overview

✦ ri answer from client i ✦ kij key shared between

client i and aggregator j

✦ t label classifying the

kind of reading

✦ wi weight given to i’s

answers

SLIDE 28

Protocol overview

✦ Geometric distribution,

Geom(α), with α >1, is the discrete distribution with support and probability mass function

✦ Discrete counterpart of

Laplace distribution

α − 1 α + 1α−|k|

Z

Let f : D → Z be a function with sensitivity ∆f. Then g = f(X) + Geom( ✏

∆f ) is ✏-differentially private.

SLIDE 29

Protocol overview

✦ In terms of utility, the

noise added to the aggregate has mean 0 and variance

✦ P is the number of

aggregators

✦ The protocol guarantees

ε-differential privacy even if all except for one aggregators are dishonest

P X

k∈Z

α − 1 α + 1α−|k|k2 = 2Pα (α − 1)2

The noise increases with the number

f aggregators (each adds noise that

suffices to get ε-differential privacy).

On the other hand this seems to be necessary to protect from malicious aggregators…we will see a more elegant and precise solution based on SMPC

SLIDE 30

Limitations of Existing Approaches

Privacy vs utility tradeoff 
Lack of generality (and scalability) 
Inefficiency:

significant computational effort on user’s side 

Answer pollution:

single entity can pollute result by excessive noise

SLIDE 31

PrivaDA: Idea and Design

Secure Mulq-Party Computaqon Secure Mulq-Party Computaqon

✦

Computaqon parqes

Inputs are shared among

computation parties

Computation parties jointly

compute differentially private statistics

Required noise is generated in a

distributed fashion

No party learns the individual

inputs

SLIDE 32

Our Contributions (PrivaDA)

We leverage recent advances on SMPC for arithmetic
perations
uses SMPC to compose user data
uses SMPC to jointly compute the sanitization mechanism 
We support three sanitization mechanisms
Lap, DLap, exponential mechanism, more are possible  
We employ β computation parties 
We employ zero-knowledge proofs  
First publicly available library for efficient arithmetic SMPC
perations in malicious setting

strong privacy

ptimal utility

efficiency scalability generality malicious setting no answer pollution

SLIDE 33

PrivaDA 101: Differentially Private Year of Birth

born approx. + = 1 5 4 7 8 1 5 5 4 7 4

1 9 7 8 ≈

1 9 7 9 1 9 7 9

SLIDE 34

SMPC for Distributed Sanitization Mechanisms

We employ recent SMPC for arithmetic operations
fixed-point numbers [Catrina & Saxena, FC’10]
floating point numbers [Aliasgari et al., NDSS’13]
integers [From & Jakobsen, 2006] 
Key SMPC primitives
RandInt(k)
IntAdd, FPAdd, FLAdd, FLMul, FLDiv
FL2Int, Int2FL, FL2FP, FP2FL
FLExp, FLLog, FLLT, FLRound

SLIDE 35

In: d1, . . . , dn; λ = f

✏

Out: (

n

P

i=1

di) + Lap(λ) 1: d =

n

P

i=1

di 2: rx U(0,1]; ry U(0,1] 3: rz = λ(ln rx ln ry) 4: w = d + rz 5: return w (a) LM In: d1, . . . , dn; λ = e

✏ f

Out: (

n

P

i=1

di) + DLap(λ) 1: d =

n

P

i=1

di 2: rx U(0,1]; ry U(0,1] 3: α =

1 ln = f ✏

4: rz = bα(ln rx)c bα(ln ry)c 5: w = d + rz 6: return w (b) DLM In: d1, . . . , dn; a1, . . . , am; λ = ✏

2

Out: winning ak 1: I0 = 0 2: for j = 1 to m do 3: zj =

n

P

i=1

di(j) 4: δj = ezj 5: Ij = δj + Ij1 6: r U(0,1]; r0 = rIm 7: k = binary search(r0, , I0, . . . , Im) 8: return ak (c) EM

We provide algorithms for Laplace, Discrete Laplace, and Exponential
Trick: reduce the problem to random number generation
Lap(λ) = Exp(1/λ)-Exp(1/λ) with Exp(λ)=-ln 𝒱(0,1] /λ
DLap(λ) = Geo(1-λ)-Geo(1-λ) with Geo(λ)= ⎣Exp(- ln (1-λ))⎦
Exp(ε/2) = draw r∈𝒱(0,1] and check

Algorithms for Sanitization Mechanisms

r ·

m

X

j=1

e✏q(D,aj) ∈ (

j−1

X

k=1

e✏q(D,ak),

j

X

k=1

e✏q(D,ak)]

SLIDE 36

In: Shared fixed point form (, f) inputs [d1], . . . , [dn]; = f

✏

Out: w = (

n

P

i=1

di) + Lap() in the fixed point form 1: [d] = [d1] 2: for i = 2 to n do 3: [d] = FPAdd([d], [di]) 4: [rx] = RandInt( + 1); [ry] = RandInt( + 1) 5: h[vx], [px], 0, 0i = FP2FL([rx], , f = , `, k) 6: h[vy], [py], 0, 0i = FP2FL([ry], , f = , `, k) 7: h[vx/y], [px/y], 0, 0i = FLDiv(h[vx], [px], 0, 0i, h[vy], [py], 0, 0i) 8: h[vln], [pln], [zln], [sln]i = FLLog2(h[vx/y], [px/y], 0, 0i) 9: h[vz], [pz], [zz], [sz]i = FLMul(

log2 e, h[vln], [pln], [zln], [sln]i)

10: [z] = FL2FP(h[vz1], [pz1], [zz1], [sz1]i, `, k, ) 11: [w] = FPAdd([d], [z]) 12: return w = Rec([w])

For β computation parties:

Protocol for Distributed Laplace Noise

In: d1, . . . , dn; = ∆f ✏ Out: ( n P i=1 di) + Lap() 1: d = n P i=1 di 2: rx U(0,1]; ry U(0,1] 3: rz = (ln rx ln ry) 4: w = d + rz 5: return w

SLIDE 37

For β computation parties:

Protocol for Distributed Discrete Laplace Noise

In: Shared integer number () inputs [d1], . . . , [dn]; = e− ✏ ∆f ; ↵ = 1 ln ·log2 e Out: integer w = ( n P i=1 di) + DLap() 1: [d] = [d1] 2: for i = 2 to n do 3: [d] = IntAdd([d], [di]) 4: [rx] = RandInt( + 1); [ry] = RandInt( + 1) 5: h[vx], [px], 0, 0i = FP2FL([rx], , f = , `, k) 6: h[vy], [py], 0, 0i = FP2FL([ry], , f = , `, k) 7: h[vlnx], [plnx], [zlnx], [slnx]i = FLLog2(h[vx], [px], 0, 0i) 8: h[vlny], [plny], [zlny], [slny]i = FLLog2(h[vy], [py], 0, 0i) 9: h[v↵lnx], [p↵lnx], [z↵lnx], [s↵lnx]i = FLMul(↵, h[vlnx], [plnx], [zlnx], [slnx]i) 10: h[v↵lny], [p↵lny], [z↵lny], [s↵lny]i = FLMul(↵, h[vlny], [plny], [zlny], [slny]i) 11: h[vz1], [pz1], [zz1], [sz1]i = FLRound(h[v↵lnx], [p↵lnx], [z↵lnx], [s↵lnx]i, 0) 12: h[vz2], [pz2], [zz2], [sz2]i = FLRound(h[v↵lny], [p↵lny], [z↵lny], [s↵lny]i, 0) 13: [z1] = FL2Int(h[vz1], [pz1], [zz1], [sz1]i, `, k, ) 14: [z2] = FL2Int(h[vz2], [pz2], [zz2], [sz2]i, `, k, ) 15: [w] = IntAdd([d], IntAdd([z1], [z2])) 16: return w = Rec([w]) In: d1, . . . , dn; = e ✏ ∆f Out: ( n P i=1 di) + DLap() 1: d = n P i=1 di 2: rx U(0,1]; ry U(0,1] 3: ↵ = 1 ln = ∆f ✏ 4: rz = b↵(ln rx)c b↵(ln ry)c 5: w = d + rz 6: return w

SIMILAR

SLIDE 38

For β computation parties:

Protocol for Distributed Exponential Mechanism

In: [d1], . . . , [dn]; the number m of candidates; = ✏ 2 Out: m-bit w, s.t. smallest i for which w(i) = 1 denotes winning candidate ai 1: I0 = h0, 0, 1, 0i 2: for j = 1 to m do 3: [zj] = 0 4: for i = 1 to n do 5: [zj] = IntAdd([zj], [di(j)]) 6: h[vzj], [pzj], [zzj], [szj]i = Int2FL([zj], , `) 7: h[vz0 j], [pz0 j], [zz0 j], [sz0 j]i = FLMul( · log2 e, h[vzj], [pzj], [zzj], [szj]i) 8: h[vj], [pj], [zj], [sj]i = FLExp2(h[vz0 j], [pz0 j], [zz0 j], [sz0 j]i) 9: h[vIj], [pIj], [zIj], [sIj]i = FLAdd(h[vIj1], [pIj1], [zIj1], [sIj1]i, h[vj], [pj], [zj], [sj]i) 10: [r] = RandInt( + 1) 11: h[vr], [pr], 0, 0i = FP2FL([r], , f = , `, k) 12: h[v0 r], [p0 r], [z0 r], [s0 r]i = FLMul(h[vr], [pr], 0, 0i, h[vIm], [pIm], [zIm], [sIm]i) 13: jmin = 1; jmax = m 14: while jmin < jmax do 15: jM = b jmin+jmax 2 c 16: if FLLT(h[vIjM ], [pIjM], [zIjM], [sIjM]i, h[v0 r], [p0 r], [z0 r], [s0 r]i) then 17: jmin = jM + 1 else jmax = jM 18: return wjmin In: d1, . . . , dn; a1, . . . , am; = ✏ 2 Out: winning ak

1: I0 = 0 2: for j = 1 to m do 3:

zj = n P i=1 di(j)

4:

j = ezj

5:

Ij = j + Ij1

6: r U(0,1]; r0 = rIm 7: k = binary search(r0, , I0, . . . , Im) 8: return

ak

SIMILAR

SLIDE 39

Attacker Model and Privacy Guarantees

We consider two settings:
honest-but-curious (HbC) computation parties:
we assume that less than t < β/2 of β parties collude
malicious computation parties:
we assume that less than t < β/2 of β parties collude
we modify our SMPC such that correctness of each computation

step is proved by zero-knowledge proofs

Main results: ✦ The SMPC protocols for LM, DLM, and EM are differentially private in the honest-but-curious setting. ✦ The SMPC protocols for LM, DLM, and EM are differentially private in the malicious setting under the strong RSA and decisional Diffie-Hellman assumptions.

SLIDE 40

Performance of SMPC Operations (in sec)

Libraries: GMP, Relic, Boost, and OpenSSL Setup: 3.20 GHz (Intel i5) Linux machine with 16 GB RAM, using a 1 Gbps LAN

Type Protocol HbC Malicious β = 3, β = 5, β = 3, β = 5, t = 1 t = 2 t = 1 t = 2 Float FLAdd 0.48 0.76 14.6 29.2 FLMul 0.22 0.28 3.35 7.54 FLScMul 0.20 0.28 3.35 7.50 FLDiv 0.54 0.64 4.58 10.2 FLLT 0.16 0.23 2.82 6.22 FLRound 0.64 0.85 11.4 23.4 Convert FP2FL 0.83 1.21 25.7 50.9 Int2FL 0.85 1.22 25.7 50.9 FL2Int 1.35 1.91 26.3 54.3 FL2FP 1.40 1.96 26.8 55.3 Log FLLog2 12.0 17.0 274 566 Exp FLExp2 7.12 9.66 120 265

SLIDE 41

Performance of LM, DLM and EM

For β = 3 and t = 1 and number of users n = 100,000 
The HbC setting
Distributed LM protocol: 15.5 sec
Distributed DLM protocol: 31.3 sec
Distributed EM protocol: 42.3 sec

(for number of candidates m = 5)

The malicious setting
Distributed LM protocol: 344 sec

SLIDE 42

Caveats with number representations

Careful with finite representation of real numbers!
E.g., porosity of FL representation breaks Laplace
In the above papers, solutions based on suitable

rounding and truncation mechanisms

Can be easily integrated in our framework

SLIDE 43 Type Protocol HbC Malicious β = 3, t = 1 β = 5, t = 2 β = 3, t = 1 β = 5, t = 2 Float FLAdd 0.55 .99 24 43 FLMul 0.27 0.5 10 18.1 FLScMul 0.24 0.47 9.9 18.1 FLDiv 0.56 0.9 13 24.7 FLLT 0.18 0.31 7 12.9 FLRound 0.69 1.04 22.7 40.6 Conversion FP2FL 0.88 1.40 42 74 Int2FL 0.88 1.32 42 74 FL2Int 1.49 2.19 53 95 FL2FP 1.50 2.21 54 98 Logarithm FLLog2 13.7 19.5 563 1001 Exponentiation FLExp2 8.9 12.1 336 605

in sec

✦ Operations performed by computation parties ✦ No critical timing restrictions on DDP computations in most real-life scenarios ✦ Users simply forward their shared values to the computation parties (< 1 sec) Demonstrates practicality of PrivaDA (even on computationally limited devices, such as smartphones) Implementation and Performance