building complex dp algorithms using composition
play

Building complex DP algorithms using composition Privacy & - PowerPoint PPT Presentation

Building complex DP algorithms using composition Privacy & Fairness in Data Science CS848 Fall 2019 2 Outline Recap Laplace Mechanism Composition Theorems Optimizing accuracy of DP algorithms Utilizing Parallel


  1. Building complex DP algorithms using composition Privacy & Fairness in Data Science CS848 Fall 2019

  2. 2 Outline • Recap – Laplace Mechanism • Composition Theorems • Optimizing accuracy of DP algorithms – Utilizing Parallel Composition – Postprocessing & Inference – Strategy Selection – Data dependent noise

  3. 3 Differential Privacy [Dwork ICALP 2006] For every pair of inputs For every output … that differ in one row D 1 D 2 O Adversary should not be able to distinguish between any D 1 and D 2 based on any O ∀Ω ∈ range A , ln Pr[𝐵 𝐸 0 ∈ Ω] ≤ 𝜁, 𝜁 > 0 Pr[𝐵 𝐸 2 ∈ Ω]

  4. 4 Laplace mechanism e.g., COUNT Aggregate Query: q D Noisy Answer Analyst Private Database 𝒓 𝑬 = 𝒓 𝑬 + 𝐌𝐛𝐪 𝑻(𝒓) 7 𝜻 Sensitivity -10 -5 0 5 10

  5. 5 Outline • Recap – Laplace Mechanism • Composition Theorems • Optimizing accuracy of DP algorithms – Utilizing Parallel Composition – Postprocessing & Inference – Strategy Selection – Data dependent noise

  6. 6 Sequential Composition M 1 , ε 1 M 1 (D) M 2 , ε 2 D M 2 (D, M 1 (D)) … Private Database • If M 1 , M 2 , ..., M k are algorithms that access a private database D such that each M i satisfies ε i -differential privacy, then the combination of their outputs satisfies ε- differential privacy with ε = ε 1 + ... + ε k

  7. 7 Parallel Composition M 1 , ε 1 D 1 M 1 (D 1 ) M 2 , ε 2 M 2 (D 2 ) D 2 … Private Database • If M 1 , M 2 , ..., M k are algorithms that access are algorithms that access disjoint databases D 1 , D 2 , …, D k such that each M i satisfies ε i -differential privacy, then the combination of their outputs satisfies ε- differential privacy with ε = max(ε 1 , ... , ε k )

  8. 8 Postprocessing M, ε D A(M(D)) M(D) A Private Database • If M is an ε-differentially private algorithm, any additional post-processing 𝐵 ∘ 𝑁 also satisfies ε- differential privacy.

  9. 9 Transformations & Stability M, ε V V(D) D M(V(D)) Transformed Private Database Database Transformation need not satisfy DP • 𝜏 F : Stability of the transformation – Maximum number of rows in V that can change due to changing a single row in D

  10. 10 Transformations & Stability M, ε V V(D) D M(V(D)) Transformed Private Database Database • Executing an ε-differentially private algorithm M on a transformation of a database V(D) satisfies 𝜁 G 𝜏 F -differential privacy. • 𝜏 F : Stability of the transformation – Maximum number of rows in V that can change due to changing a single row in D

  11. 11 Transformations & Stability • V 1 : For each row (x1, x2, x3) à (x1, x2+x3) Stability = 1 • V 2 : Each row in D is a tweet (id, {words}). For each row in D, generate k rows with first k words {(id, word 1 ), …, (id, word k )} Stability = k • V 3 : Sample each row with probability p. Stability = 1 … but can prove 2p 𝜁 -differential privacy* *Adam Smith, Differential Privacy and Secrecy of the Sample

  12. 12 Outline • Recap – Laplace Mechanism • Composition Theorems • Optimizing accuracy of DP algorithms – Utilizing Parallel Composition – Postprocessing & Inference – Strategy Selection – Data dependent noise

  13. 13 Problem Sex Height Weight Queries: M 6’2” 210 # Males with BMI < 25 • F 5’3” 190 # Males • F 5’9” 160 # Females with BMI < 25 • M 5’3” 180 # Females • M 6’7” 250 • Design an ε-differentially private algorithm that can answer all these questions. • What is the total error?

  14. 14 Algorithm 1 Return: • (# Males with BMI < 25) + Lap(4/ε) • (# Males) + Lap(4/ε) • (# Females with BMI) < 25 + Lap(4/ε) • (# Females) + Lap(4/ε)

  15. 15 Privacy • BMI can be computed by transforming each row (s, h, w) à (s, bmi). This is stability 1. • Sensitivity of count = 1. So each query is answered using a ε/4-DP algorithm. • By sequential composition, we get ε-DP.

  16. 16 Utility Error: 2 M 𝐹 𝑟 𝐸 − 𝑟 𝐸 O Total Error: 2 2 4 ×4 = 128 𝜁 2 𝜁

  17. 17 Algorithm 2 Compute: 𝑟 0 = (# Males with BMI < 25) + Lap(1/ε) • V 𝑟 2 = (# Males with BMI > 25) + Lap(1/ε) • V 𝑟 W = (# Females with BMI < 25) + Lap(1/ε) • V 𝑟 X = (# Females with BMI > 25) + Lap(1/ε) • V Return 𝑟 0 , V 𝑟 0 + V 𝑟 2 , V 𝑟 W , V 𝑟 W + V • V 𝑟 X

  18. 18 Privacy • Sensitivity of count = 1. So each query is answered using a ε-DP algorithm. • 𝑟 0 , 𝑟 2 , 𝑟 W , 𝑟 X are counts on disjoint portions of the database. Thus by parallel composition releasing V 𝑟 0 , V 𝑟 2 , V 𝑟 W , V 𝑟 X satisfies ε-DP. • By the postprocessing theorem , releasing V 𝑟 0 , V 𝑟 0 + V 𝑟 2 , 𝑟 W , V 𝑟 W + V 𝑟 X also satisfies ε-DP. V

  19. 19 Utility Error: 2 M 𝐹 𝑟 𝐸 − 𝑟 𝐸 O Total Error: 2 2 2 2 2 1 + 2 G 2 1 + 2 1 + 2 G 2 1 = 12 𝜁 2 𝜁 𝜁 𝜁 𝜁 V 𝑟 0 V 𝑟 0 + V 𝑟 2 𝑟 W V V 𝑟 W + V 𝑟 X

  20. 20 Utility Tighter privacy analysis gives better accuracy for the same level of privacy Total Error: 2 2 2 2 2 1 + 2 G 2 1 + 2 1 + 2 G 2 1 = 12 𝜁 2 𝜁 𝜁 𝜁 𝜁 V 𝑟 0 V 𝑟 0 + V 𝑟 2 𝑟 W V V 𝑟 W + V 𝑟 X

  21. 21 Generalized Sensitivity • Let 𝑔: 𝒠 → ℝ ] be a function that outputs a vector of d real numbers. The sensitivity of f is given by: a,a b : |a∆a b |e0 𝑔 𝐸 − 𝑔(𝐸 f ) 0 𝑇 𝑔 = max where 𝐲 − 𝐳 0 = ∑ j 𝑦 j − 𝑧 j

  22. 22 Generalized Sensitivity • 𝑟 0 = # Males with BMI < 25 • 𝑟 2 = # Males with BMI > 25 • 𝑟 = # Males with BMI • Let f 1 be a function that answers both 𝑟 0 , 𝑟 2 • Let f 2 be a function that answers both 𝑟 0 , 𝑟 • Sensitivity of f 1 = 1 • Sensitivity of f 2 = 2 • An alternate privacy proof for Alg 2 is to show that the generalized sensitivity of V 𝑟 0 , V 𝑟 2 , V 𝑟 W , V 𝑟 X is 1.

  23. 23 Outline • Recap – Laplace Mechanism • Composition Theorems • Optimizing accuracy of DP algorithms – Utilizing Parallel Composition – Postprocessing & Inference – Strategy Selection – Data dependent noise

  24. 24 Improving utility of Alg 2 Compute: 𝑟 0 = # Males with BMI < 25 + Lap(1/ε) • V 𝑟 2 = # Males with BMI > 25 + Lap(1/ε) • V Return 𝑟 0 , V 𝑟 0 + V • V 𝑟 2 We know 𝑟 0 ≤ 𝑟 0 + 𝑟 2 , but P[ V 𝑟 0 > V 𝑟 0 + V 𝑟 2 ] > 0

  25. 25 Constrained Inference DATA OWNER ANALYST Q ( I ) Q ( I ) Step 1 I Diff. • • Private Interface Q ( I ) = q Constrained ˜ q q • • Inference Private Step 2 Data Step 3

  26. 26 Constrained Inference • 𝑟 0 , 𝑟 2 , …, 𝑟 m be a set of queries 𝑟 0 , V 𝑟 2 , …, V 𝑟 m be the noisy answers • V • Constraint C( 𝑟 0 , 𝑟 2 , …, 𝑟 m ) = 1 holds on true answers (for all typical databases), but does not hold on noisy answers. • Goal: Find 𝑟 0 , 𝑟 2 , …, 𝑟 m that are: – Close to V 𝑟 0 , V 𝑟 2 , …, V 𝑟 m – Satisfy the constraint C( 𝑟 0 , 𝑟 2 , …, 𝑟 m )

  27. 27 Least Squares Optimization 𝑟 0 − 𝑟 0 2 min M V 𝑡. 𝑢. 𝐷(𝑟 0 , 𝑟 2 , … , 𝑟 m )

  28. 28 Geometric Interpretation 𝑟 0 − 𝑟 0 2 min M V Noise 𝑡. 𝑢. 𝐷(𝑟 0 , 𝑟 2 , … , 𝑟 m ) 𝑟 0 , V 𝑟 2 , …, V 𝒓 = (V 7 𝑟 m ) 𝒓 = (𝑟 0 , 𝑟 2 , …, 𝑟 m ) Space of Outputs t 𝒓 = (𝑟 0 , 𝑟 2 , … , 𝑟 m ) satisfying the Projection constraint

  29. 29 Geometric Interpretation 𝑟 0 − 𝑟 0 2 min M V Noise 𝑡. 𝑢. 𝐷(𝑟 0 , 𝑟 2 , … , 𝑟 m ) 𝑟 0 , V 𝑟 2 , …, V 7 𝒓 = (V 𝑟 m ) 𝒓 = (𝑟 0 , 𝑟 2 , …, 𝑟 m ) Space of Outputs 𝒓 = (𝑟 0 , 𝑟 2 , … , 𝑟 m ) t satisfying the Projection constraint Theorem: 𝒓 − t 𝒓 2 when the constraints 𝒓 2 ≤ 𝒓 − 7 form a convex space

  30. 30 Ordering Constraint 𝑟 0 − 𝑟 0 2 min M V Isotonic Regression: 𝑡. 𝑢. 𝑟 0 ≤ 𝑟 0 ≤ … ≤ 𝑟 m

  31. 31 Outline • Recap – Laplace Mechanism • Composition Theorems • Optimizing accuracy of DP algorithms – Utilizing Parallel Composition – Postprocessing & Inference – Strategy Selection – Data dependent noise

  32. 32 Problem Sex Height Weight Queries: M 6’2” 210 # people with height in [5’1”, 6’2”] • F 5’3” 190 # people with height in [2’0”, 4’0”] • F 5’9” 160 # people with height in [3’3”, 7’0”] • M 5’3” 180 … • M 6’7” 250 • Design an ε-differentially private algorithm that can answer all range queries. • What is the total error?

  33. 33 Problem • Let {v 1 , …, v k } be the domain of an attribute • Let {x 1 , …, x k } be the number of rows with values v 1 , …, v k • Range Query: q ij = x i + x i+1 + …+ x j • Goal: Answer all range queries

  34. 34 Strategy 1: • Answer all range queries using Laplace mechanism • Sensitivity: O( 𝑙 2 ) • Total Error: O( 𝑙 X /𝜁 2 )

  35. 35 Strategy 2: • Estimate each individual x i using Laplace mechanism • Answer: 𝑟 jw = 7 𝑦 jx0 +…+ 7 𝑦 j + V 𝑦 w • Error in each 7 𝑦 j : 𝑃(1/𝜁 2 ) • Error in 𝑟 0m : 𝑃(𝑙/𝜁 2 ) • Total Error: 𝑃(𝑙 W /𝜁 2 )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend