Differential Privacy (Part III) Approximate (or ( , - - PowerPoint PPT Presentation

differential privacy part iii approximate or differential
SMART_READER_LITE
LIVE PREVIEW

Differential Privacy (Part III) Approximate (or ( , - - PowerPoint PPT Presentation

Differential Privacy (Part III) Approximate (or ( , ))-differential privacy Generalized definition of differential privacy allowing for a (supposedly small) additive factor Used in a variety of applications A query mechanism M is (


slide-1
SLIDE 1

Differential Privacy (Part III)

slide-2
SLIDE 2

Approximate (or (ℇ,∂))-differential privacy

  • Generalized definition of differential privacy allowing for a

(supposedly small) additive factor

  • Used in a variety of applications

A query mechanism M is (✏, )-differentially private if, for any two adjacent databases D and D0 (differing in just one entry) and C ⊆ range(M) Pr(M(D) ∈ C) ≤ e✏ · Pr(M(D0) ∈ C) +

slide-3
SLIDE 3

The Gaussian mechanism

For c2>2ln(1.25/δ), the Gaussian mechanism with parameter σ≥c∆2(f)/ε is (ε,δ)-differentially private The ℓ2-sensitivity of f:ℕ|X|→ℝk is defined as ∆2(f)=max ||f(x)-f(y)||2 for all x,y∈ℕ|X|,||x-y||1=1

slide-4
SLIDE 4

Sparse Vector Technique

✦ [Hardt-Rothblum, FOCS’10] study the problem of k,

adaptively chosen, low sensitivity queries where

  • only a very small number of these queries (say c)

take values above a certain threshold T

  • the data analyst is only interested in such queries
  • useful to learn correlations, e.g., whether there is

a dependency between smoke and cancer

✦ The data analyst could ask only the significant

queries, but she does not know them in advance!

✦ Goal: answer only the significant queries, pay only

for them, and ignore the others

slide-5
SLIDE 5

Histograms and linear queries

✦ A histogram x ∈ ℝN represents a database (or a distribution)

  • ver a universe U of size |U|=N
  • Databases have support of size n, whereas distributions do

not necessarily have a small support

✦ We assume x is normalized so that ✦ Here we focus on linear queries

  • can be seen as the inner-product <x,f > for
  • counting queries (i.e., how many elements in the database

fulfill a certain predicate) are a special case

✦ Example: U={1,2,3} D=[1,2,2,3,1]

  • x = (2,2,1), after normalization (2/5,2/5,1/5)
  • “how many entries ≤ 2” ⇒ f = (1,1,0)

✦ By normalization, linear queries have sensitivity 1/n X

i∈U

xi = 1 f : RN → [0, 1]

f ∈ [0, 1]N

slide-6
SLIDE 6

SVT: algorithm

✦ Intuition: answer only those queries whose sanitized

result is above the sanitized threshold

We pay only for c queries We pay only for c queries We need to sanitize the threshold otherwise the conditional branch would leak information

slide-7
SLIDE 7

SVT: accuracy

  • α captures the distance between the sanitized result

and the real result

  • β captures the error probability

We say Sparse is (α, β)-accurate for a sequence of k queries Q1, . . . , Qk, if except with probability at most β, the algorithm does not abort before Qk, and for all ai ∈ R: |ai − Qi(D)| ≤ α and for all ai =⊥: Qi(D) ≤ T + α

slide-8
SLIDE 8

SVT: accuracy theorem

  • The larger β, the smaller α
  • The accuracy loss is logarithmic in the number of

queries

For any sequence of k queries Q1, . . . , Qk such that L(T) = |{i : Qi(D) ≥ T −↵}| ≤ c, Sparse(D, {Qi}, T, c) is (↵, )- accurate for: ↵ = 2(log k + log 2 ) = 4c (log k + log 2

β )

✏n

slide-9
SLIDE 9

SVT: privacy theorem

  • So, what did we prove in the end?
  • You can estimate the actual answers and report only

those in this range:

  • We can fish out insignificant queries almost “for free”,

paying only logarithmically for them in terms of accuracy

The Sparse vector algorithm is ✏-differentially private

T T+ α

slide-10
SLIDE 10

SVT: approximate differential privacy

✦ Setting , we get the following theorems:

= p 32c ln 1/ ✏n

The Sparse vector algorithm is (✏, )-differentially private

For any sequence of k queries Q1, . . . , Qk such that L(T) = |{i : Qi(D) ≥ T −↵}| ≤ c, Sparse(D, {Qi}, T, c) is (↵, )- accurate for: ↵ = 2(log k + log 2 ) = 128c ln 1

δ (log k + log 2 β )

✏n

slide-11
SLIDE 11

Limitations

✦ Differential privacy is a general purpose privacy

definition, originally thought for databases and later applied to a variety of different settings

✦ At the moment, it is considered the state-of-the-art ✦ Still, it is not the holy grail and it is not immune from

concerns, criticisms, and limitations

✦ Typically accompanied by some over-claims

slide-12
SLIDE 12

No free lunch in data privacy

✦ Privacy and utility cannot be provided without

making assumptions about how data are generated (no free lunch theorem)

✦ Privacy means hiding the evidence of participation of

an individual in the data generating process

✦ If database rows are not independent, this is

different from removing one row

  • Bob’s participation in a social network may cause

new edges between pairs of his friends

✦ If there is group structure, differential privacy may

not work very well...

slide-13
SLIDE 13

No free lunch in data privacy (cont’d)

✦ This work disputes three popular over-claims ✦ “DP requires no assumptions on the data”

  • database rows must actually be independent,
  • therwise removing one row does not suffice to

remove the individual’s participation

✦ If rows are not independent, deciding how many

entries should be removed and which ones is far from being easy...

slide-14
SLIDE 14

No free lunch in data privacy (cont’d)

✦ The attacker knows all entries of

the database except for one, so “the more an attacker knows, the greater the privacy risks”

✦ Thus we should protect against the

strongest attacker

✦ Careful! In DP, the more the

attacker knows, the less noise we actually add

  • intuitively, this is due to the fact

that we have less to hide

slide-15
SLIDE 15

No free lunch in data privacy (cont’d)

✦ “DP is robust to arbitrary background knowledge” ✦ Actually, DP is robust when certain subsets of the

tuples are known to the attacker

✦ Other types of background knowledge may instead

be harmful

  • e.g., previous exact query answers

✦ DP composes well with itself, but not necessarily with

  • ther privacy definitions or release mechanisms

✦ One can get a new, more generic, DP privacy

guarantee if, after releasing exact query answers, a set of tuples (not just one), called neighbours, is altered in a way that is still consistent with previously answered queries (plausible deniability)

slide-16
SLIDE 16

Geo-indistinguishability

  • Goal: protect user’s exact location, while allowing

approximate information (typically needed to obtain a certain desired service) to be released

  • Idea: protect the user’s location within a radius r with

a level of privacy that depends on r

  • corresponds to a generalized version of the well-

known concept of differential privacy.

slide-17
SLIDE 17

Pictorially…

  • Achieve l-privacy within r
  • the provider cannot easily infer the user’s location

within, say, the 7th arrondissement of Paris

  • the provider can infer with high probability that the user

is located in Paris instead of, say, London

slide-18
SLIDE 18

More formally…

  • Here K(x) denotes the distribution (of locations)

generated by the mechanism K applied to location x

  • Achieved through a variant of the Laplace mechanism
slide-19
SLIDE 19

Browser extension

slide-20
SLIDE 20

Malicious aggregators

  • So far we focused on malicious analysts…
  • …but aggregators can be malicious (or at least

curious) too!

Users Aggregator Analyst x1 xn f(x1,…,xn)

slide-21
SLIDE 21

Existing approaches

  • Secure hardware (or trusted server)-based mechanisms



 
 
 
 
 


  • Fully distributed mechanisms with individual noise
slide-22
SLIDE 22

Distributed Differential Privacy

How to compute differentially private queries in a distributed setting (attacker model, cryptographic protocols…)? “What’s the average age

  • f your self-help group?”
slide-23
SLIDE 23

Smart-metering

✦ Fine-grained smart-metering has multiple uses:

  • time-of-use billing, providing energy advice, settlement,

forecasting, demand response, and fraud detection

✦ USA: Energy Independence and Security Act of 2007

  • American Recovery and Reinvestment Act (2009, $4.5bn)

✦ EU: Directive 2009/72/EC ✦ UK: deployment of 47 million smart meters by 2020 ✦ Remote reads ✦ Reads every 15-30 min ✦ Manual reads ✦ One reads every 3

months to 1 year

slide-24
SLIDE 24

Smart-metering: privacy issues

✦ Meter readings are sensitive

  • Were you in last night?
  • You do like watching TV, don’t you?
  • Another ready meal in the microwave?
  • Has your boyfriend moved in?
slide-25
SLIDE 25

Smart-metering: privacy issues (cont’d)

slide-26
SLIDE 26

Privacy-friendly smart metering

✦ Goals:

  • precise billing of

consumption while revealing no consumption information to third parties

  • privacy-friendly real-

time aggregation

slide-27
SLIDE 27

Protocol overview

✦ ri answer from client i ✦ kij key shared between

client i and aggregator j

✦ t label classifying the

kind of reading

✦ wi weight given to i’s

answers

slide-28
SLIDE 28

Protocol overview

✦ Geometric distribution,

Geom(α), with α >1, is the discrete distribution with support and probability mass function

✦ Discrete counterpart of

Laplace distribution

α − 1 α + 1α−|k|

Z

Let f : D → Z be a function with sensitivity ∆f. Then g = f(X) + Geom( ✏

∆f ) is ✏-differentially private.

slide-29
SLIDE 29

Protocol overview

✦ In terms of utility, the

noise added to the aggregate has mean 0 and variance

✦ P is the number of

aggregators

✦ The protocol guarantees

ε-differential privacy even if all except for one aggregators are dishonest

P X

k∈Z

α − 1 α + 1α−|k|k2 = 2Pα (α − 1)2

The noise increases with the number

  • f aggregators (each adds noise that

suffices to get ε-differential privacy).

On the other hand this seems to be necessary to protect from malicious aggregators…we will see a more elegant and precise solution based on SMPC

slide-30
SLIDE 30

Limitations of Existing Approaches

  • Privacy vs utility tradeoff

  • Lack of generality (and scalability)

  • Inefficiency:


significant computational effort on user’s side


  • Answer pollution:


single entity can pollute result by excessive noise

slide-31
SLIDE 31

PrivaDA: Idea and Design

Secure Mulq-Party Computaqon Secure Mulq-Party Computaqon

Computaqon parqes

  • Inputs are shared among

computation parties

  • Computation parties jointly

compute differentially private statistics

  • Required noise is generated in a

distributed fashion

  • No party learns the individual

inputs

slide-32
SLIDE 32

Our Contributions (PrivaDA)

  • We leverage recent advances on SMPC for arithmetic
  • perations
  • uses SMPC to compose user data
  • uses SMPC to jointly compute the sanitization mechanism

  • We support three sanitization mechanisms
  • Lap, DLap, exponential mechanism, more are possible 

  • We employ β computation parties

  • We employ zero-knowledge proofs 

  • First publicly available library for efficient arithmetic SMPC
  • perations in malicious setting 


strong privacy

  • ptimal utility

efficiency scalability generality malicious setting no answer pollution

slide-33
SLIDE 33

PrivaDA 101: Differentially Private Year of Birth

born approx. + = 1 5 4 7 8 1 5 5 4 7 4

1 9 7 8 ≈

1 9 7 9 1 9 7 9

slide-34
SLIDE 34

SMPC for Distributed Sanitization Mechanisms

  • We employ recent SMPC for arithmetic operations
  • fixed-point numbers [Catrina & Saxena, FC’10]
  • floating point numbers [Aliasgari et al., NDSS’13]
  • integers [From & Jakobsen, 2006]

  • Key SMPC primitives
  • RandInt(k)
  • IntAdd, FPAdd, FLAdd, FLMul, FLDiv
  • FL2Int, Int2FL, FL2FP, FP2FL
  • FLExp, FLLog, FLLT, FLRound
slide-35
SLIDE 35

In: d1, . . . , dn; λ = f

Out: (

n

P

i=1

di) + Lap(λ) 1: d =

n

P

i=1

di 2: rx U(0,1]; ry U(0,1] 3: rz = λ(ln rx ln ry) 4: w = d + rz 5: return w (a) LM In: d1, . . . , dn; λ = e

✏ f

Out: (

n

P

i=1

di) + DLap(λ) 1: d =

n

P

i=1

di 2: rx U(0,1]; ry U(0,1] 3: α =

1 ln = f ✏

4: rz = bα(ln rx)c bα(ln ry)c 5: w = d + rz 6: return w (b) DLM In: d1, . . . , dn; a1, . . . , am; λ = ✏

2

Out: winning ak 1: I0 = 0 2: for j = 1 to m do 3: zj =

n

P

i=1

di(j) 4: δj = ezj 5: Ij = δj + Ij1 6: r U(0,1]; r0 = rIm 7: k = binary search(r0, , I0, . . . , Im) 8: return ak (c) EM

  • We provide algorithms for Laplace, Discrete Laplace, and Exponential
  • Trick: reduce the problem to random number generation
  • Lap(λ) = Exp(1/λ)-Exp(1/λ) with Exp(λ)=-ln 𝒱(0,1] /λ
  • DLap(λ) = Geo(1-λ)-Geo(1-λ) with Geo(λ)= ⎣Exp(- ln (1-λ))⎦
  • Exp(ε/2) = draw r∈𝒱(0,1] and check

Algorithms for Sanitization Mechanisms

r ·

m

X

j=1

e✏q(D,aj) ∈ (

j−1

X

k=1

e✏q(D,ak),

j

X

k=1

e✏q(D,ak)]

slide-36
SLIDE 36

In: Shared fixed point form (, f) inputs [d1], . . . , [dn]; = f

Out: w = (

n

P

i=1

di) + Lap() in the fixed point form 1: [d] = [d1] 2: for i = 2 to n do 3: [d] = FPAdd([d], [di]) 4: [rx] = RandInt( + 1); [ry] = RandInt( + 1) 5: h[vx], [px], 0, 0i = FP2FL([rx], , f = , `, k) 6: h[vy], [py], 0, 0i = FP2FL([ry], , f = , `, k) 7: h[vx/y], [px/y], 0, 0i = FLDiv(h[vx], [px], 0, 0i, h[vy], [py], 0, 0i) 8: h[vln], [pln], [zln], [sln]i = FLLog2(h[vx/y], [px/y], 0, 0i) 9: h[vz], [pz], [zz], [sz]i = FLMul(

  • log2 e, h[vln], [pln], [zln], [sln]i)

10: [z] = FL2FP(h[vz1], [pz1], [zz1], [sz1]i, `, k, ) 11: [w] = FPAdd([d], [z]) 12: return w = Rec([w])

  • For β computation parties:

Protocol for Distributed Laplace Noise

In: d1, . . . , dn; = ∆f Out: ( n P i=1 di) + Lap() 1: d = n P i=1 di 2: rx U(0,1]; ry U(0,1] 3: rz = (ln rx ln ry) 4: w = d + rz 5: return w
slide-37
SLIDE 37
  • For β computation parties:

Protocol for Distributed Discrete Laplace Noise

In: Shared integer number () inputs [d1], . . . , [dn]; = e− ✏ ∆f ; ↵ = 1 ln ·log2 e Out: integer w = ( n P i=1 di) + DLap() 1: [d] = [d1] 2: for i = 2 to n do 3: [d] = IntAdd([d], [di]) 4: [rx] = RandInt( + 1); [ry] = RandInt( + 1) 5: h[vx], [px], 0, 0i = FP2FL([rx], , f = , `, k) 6: h[vy], [py], 0, 0i = FP2FL([ry], , f = , `, k) 7: h[vlnx], [plnx], [zlnx], [slnx]i = FLLog2(h[vx], [px], 0, 0i) 8: h[vlny], [plny], [zlny], [slny]i = FLLog2(h[vy], [py], 0, 0i) 9: h[v↵lnx], [p↵lnx], [z↵lnx], [s↵lnx]i = FLMul(↵, h[vlnx], [plnx], [zlnx], [slnx]i) 10: h[v↵lny], [p↵lny], [z↵lny], [s↵lny]i = FLMul(↵, h[vlny], [plny], [zlny], [slny]i) 11: h[vz1], [pz1], [zz1], [sz1]i = FLRound(h[v↵lnx], [p↵lnx], [z↵lnx], [s↵lnx]i, 0) 12: h[vz2], [pz2], [zz2], [sz2]i = FLRound(h[v↵lny], [p↵lny], [z↵lny], [s↵lny]i, 0) 13: [z1] = FL2Int(h[vz1], [pz1], [zz1], [sz1]i, `, k, ) 14: [z2] = FL2Int(h[vz2], [pz2], [zz2], [sz2]i, `, k, ) 15: [w] = IntAdd([d], IntAdd([z1], [z2])) 16: return w = Rec([w]) In: d1, . . . , dn; = e ✏ ∆f Out: ( n P i=1 di) + DLap() 1: d = n P i=1 di 2: rx U(0,1]; ry U(0,1] 3: ↵ = 1 ln = ∆f ✏ 4: rz = b↵(ln rx)c b↵(ln ry)c 5: w = d + rz 6: return w

SIMILAR

slide-38
SLIDE 38
  • For β computation parties:

Protocol for Distributed Exponential Mechanism

In: [d1], . . . , [dn]; the number m of candidates; = ✏ 2 Out: m-bit w, s.t. smallest i for which w(i) = 1 denotes winning candidate ai 1: I0 = h0, 0, 1, 0i 2: for j = 1 to m do 3: [zj] = 0 4: for i = 1 to n do 5: [zj] = IntAdd([zj], [di(j)]) 6: h[vzj], [pzj], [zzj], [szj]i = Int2FL([zj], , `) 7: h[vz0 j], [pz0 j], [zz0 j], [sz0 j]i = FLMul( · log2 e, h[vzj], [pzj], [zzj], [szj]i) 8: h[vj], [pj], [zj], [sj]i = FLExp2(h[vz0 j], [pz0 j], [zz0 j], [sz0 j]i) 9: h[vIj], [pIj], [zIj], [sIj]i = FLAdd(h[vIj1], [pIj1], [zIj1], [sIj1]i, h[vj], [pj], [zj], [sj]i) 10: [r] = RandInt( + 1) 11: h[vr], [pr], 0, 0i = FP2FL([r], , f = , `, k) 12: h[v0 r], [p0 r], [z0 r], [s0 r]i = FLMul(h[vr], [pr], 0, 0i, h[vIm], [pIm], [zIm], [sIm]i) 13: jmin = 1; jmax = m 14: while jmin < jmax do 15: jM = b jmin+jmax 2 c 16: if FLLT(h[vIjM ], [pIjM], [zIjM], [sIjM]i, h[v0 r], [p0 r], [z0 r], [s0 r]i) then 17: jmin = jM + 1 else jmax = jM 18: return wjmin In: d1, . . . , dn; a1, . . . , am; = ✏ 2 Out: winning ak

1: I0 = 0 2: for j = 1 to m do 3:

zj = n P i=1 di(j)

4:

j = ezj

5:

Ij = j + Ij1

6: r U(0,1]; r0 = rIm 7: k = binary search(r0, , I0, . . . , Im) 8: return

ak

SIMILAR

slide-39
SLIDE 39

Attacker Model and Privacy Guarantees

  • We consider two settings:
  • honest-but-curious (HbC) computation parties:
  • we assume that less than t < β/2 of β parties collude
  • malicious computation parties:
  • we assume that less than t < β/2 of β parties collude
  • we modify our SMPC such that correctness of each computation

step is proved by zero-knowledge proofs

Main results: ✦ The SMPC protocols for LM, DLM, and EM are differentially private in the honest-but-curious setting. ✦ The SMPC protocols for LM, DLM, and EM are differentially private in the malicious setting under the strong RSA and decisional Diffie-Hellman assumptions.

slide-40
SLIDE 40

Performance of SMPC Operations (in sec)

Libraries: GMP, Relic, Boost, and OpenSSL Setup: 3.20 GHz (Intel i5) Linux machine with 16 GB RAM, using a 1 Gbps LAN

Type Protocol HbC Malicious β = 3, β = 5, β = 3, β = 5, t = 1 t = 2 t = 1 t = 2 Float FLAdd 0.48 0.76 14.6 29.2 FLMul 0.22 0.28 3.35 7.54 FLScMul 0.20 0.28 3.35 7.50 FLDiv 0.54 0.64 4.58 10.2 FLLT 0.16 0.23 2.82 6.22 FLRound 0.64 0.85 11.4 23.4 Convert FP2FL 0.83 1.21 25.7 50.9 Int2FL 0.85 1.22 25.7 50.9 FL2Int 1.35 1.91 26.3 54.3 FL2FP 1.40 1.96 26.8 55.3 Log FLLog2 12.0 17.0 274 566 Exp FLExp2 7.12 9.66 120 265

slide-41
SLIDE 41

Performance of LM, DLM and EM

  • For β = 3 and t = 1 and number of users n = 100,000

  • The HbC setting
  • Distributed LM protocol: 15.5 sec
  • Distributed DLM protocol: 31.3 sec
  • Distributed EM protocol: 42.3 sec 


(for number of candidates m = 5)

  • The malicious setting
  • Distributed LM protocol: 344 sec
slide-42
SLIDE 42

Caveats with number representations

  • Careful with finite representation of real numbers!
  • E.g., porosity of FL representation breaks Laplace
  • In the above papers, solutions based on suitable

rounding and truncation mechanisms

  • Can be easily integrated in our framework
slide-43
SLIDE 43 Type Protocol HbC Malicious β = 3, t = 1 β = 5, t = 2 β = 3, t = 1 β = 5, t = 2 Float FLAdd 0.55 .99 24 43 FLMul 0.27 0.5 10 18.1 FLScMul 0.24 0.47 9.9 18.1 FLDiv 0.56 0.9 13 24.7 FLLT 0.18 0.31 7 12.9 FLRound 0.69 1.04 22.7 40.6 Conversion FP2FL 0.88 1.40 42 74 Int2FL 0.88 1.32 42 74 FL2Int 1.49 2.19 53 95 FL2FP 1.50 2.21 54 98 Logarithm FLLog2 13.7 19.5 563 1001 Exponentiation FLExp2 8.9 12.1 336 605

in sec

✦ Operations performed by computation parties ✦ No critical timing restrictions on DDP computations in most real-life scenarios ✦ Users simply forward their shared values to the computation parties (< 1 sec) Demonstrates practicality of PrivaDA (even on computationally limited devices, such as smartphones) Implementation and Performance