post processing outputs for better utility
play

Post-processing outputs for better utility CompSci 590.03 - PowerPoint PPT Presentation

Post-processing outputs for better utility CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 10 : 590.03 Fall 12 1 Announcement Project proposal submission deadline is Fri, Oct 12 noon . Lecture 10 : 590.03 Fall 12 2 Recap:


  1. Post-processing outputs for better utility CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 10 : 590.03 Fall 12 1

  2. Announcement • Project proposal submission deadline is Fri, Oct 12 noon . Lecture 10 : 590.03 Fall 12 2

  3. Recap: Differential Privacy For every pair of inputs For every output … that differ in one value D 1 D 2 O Adversary should not be able to distinguish between any D 1 and D 2 based on any O Pr[A(D 1 ) = O] log < ε ( ε >0) Pr[A(D 2 ) = O] . Lecture 10 : 590.03 Fall 12 3

  4. Recap: Laplacian Distribution Query q Database True answer q(d) + η q(d) Researcher Privacy depends on η the λ parameter h( η ) α exp(- η / λ ) Laplace Distribution – Lap( λ ) 0.6 Mean: 0, 0.4 Variance: 2 λ 2 0.2 0 Lecture 10 : 590.03 Fall 12 4 -10 -8 -6 -4 -2 0 2 4 6 8 10

  5. Recap: Laplace Mechanism Thm : If sensitivity of the query is S , then the following guarantees ε - differential privacy. λ = S/ ε Sensitivity : Smallest number s.t . for any d, d’ differing in one entry, || q(d) – q(d’) || ≤ S(q) Histogram query: Sensitivity = 2 • Variance / error on each entry = 2x4/ ε 2 = O(1/ ε 2 ) Lecture 10 : 590.03 Fall 12 5

  6. This class • What is the optimal method to answer a batch of queries? Lecture 10 : 590.03 Fall 12 6

  7. How to answer a batch of queries? • Database of values {x1, x2, …, xk} • Query Set: – Value of x1 η 1 = x1 + δ 1 – Value of x2 η 2 = x2 + δ 2 – Value of x1 + x2 η 3 = x1 + x2 + δ 3 • But we know that η 1 and η 2 should sum up to η 3! Lecture 10 : 590.03 Fall 12 7

  8. Two Approaches • Constrained inference – Ensure that the returned answers are consistent with each other. • Query Strategy – Answer a different set of strategy queries A – Answer original queries using A – Universal Histograms – Wavelet Mechanism – Matrix Mechanism Lecture 10 : 590.03 Fall 12 8

  9. Two Approaches • Constrained inference – Ensure that the returned answers are consistent with each other. • Query Strategy – Answer a different set of strategy queries A – Answer original queries using A – Universal Histograms – Wavelet Mechanism – Matrix Mechanism Lecture 10 : 590.03 Fall 12 9

  10. Constrained Inference Lecture 10 : 590.03 Fall 12 10

  11. Constrained Inference • Let x1 and x2 be the original values. We observe noisy values η 1, η 2 and η 3 • We would like to reconstruct the best estimators y1 (for x1/) and y2 (for x2) from the noisy values. • That is, we want to find the values of y1, y2 such that: min (y1- η 1) 2 + (y2 – η 2) 2 + (y3 – η 3) 2 s.t., y1 + y2 = y3 Lecture 10 : 590.03 Fall 12 11

  12. Constrained Inference [Hay et al VLDB 10] Lecture 10 : 590.03 Fall 12 12

  13. Sorted Unattributed Histograms • Counts of diseases – (without associating a particular count to the corresponding disease) • Degree sequence: List of node degrees – (without associating a degree to a particular node) • Constraint: The values are sorted Lecture 10 : 590.03 Fall 12 13

  14. Sorted Unattributed Histograms True Values 20, 10, 8, 8, 8, 5, 3, 2 Noisy Values 25, 9, 13, 7, 10, 6, 3, 1 (noise from Lap(1/ ε )) Proof:? Lecture 10 : 590.03 Fall 12 14

  15. Sorted Unattributed Histograms Lecture 10 : 590.03 Fall 12 15

  16. Sorted Unattributed Histograms • n: number of values in the histogram • d: number of distinct values in the histogram • n i : number of times i th distinct value appears in the histogram. Lecture 10 : 590.03 Fall 12 16

  17. Two Approaches • Constrained inference – Ensure that the returned answers are consistent with each other. • Query Strategy – Answer a different set of strategy queries A – Answer original queries using A – Universal Histograms – Wavelet Mechanism – Matrix Mechanism Lecture 10 : 590.03 Fall 12 17

  18. Query Strategy Strategy Query Original Query Workload Workload A W I ~ ~ A(I) A(I) W(I) Differential Noisy Strategy Noisy Workload Privacy Answers Answers Private Data Lecture 10 : 590.03 Fall 12 18

  19. Range Queries • Given a set of values {x1, x2, …, xn} • Range query: q(j,k) = xj + … + xk Q: Suppose we want to answer all range queries? Strategy 1: Answer all range queries using Laplace mechanism • O(n 2 / ε 2 ) total error. • May reduce using constrained optimization … Lecture 10 : 590.03 Fall 12 19

  20. Range Queries • Given a set of values {x1, x2, …, xn} • Range query: q(j,k) = xj + … + xk Q: Suppose we want to answer all range queries? Strategy 1: Answer all range queries using Laplace mechanism • Sensitivity = O(n 2 ) • O(n 4 / ε 2 ) total error across all range queries. • May reduce using constrained optimization … Lecture 10 : 590.03 Fall 12 20

  21. Range Queries • Given a set of values {x1, x2, …, xn} • Range query: q(j,k) = xj + … + xk Q: Suppose we want to answer all range queries? Strategy 2: Answer all xi queries using Laplace mechanism Answer range queries using noisy xi values. • O(1/ ε 2 ) error for each xi. • Error(q(1,n)) = O(n/ ε 2 ) • Total error on all range queries : O(n 3 / ε 2 ) Lecture 10 : 590.03 Fall 12 21

  22. Universal Histograms for Range Queries [Hay et al VLDB 2010] Strategy 3: Answer sufficient statistics using Laplace mechanism Answer range queries using noisy sufficient statistics. x1-8 x1234 x5678 x12 x34 x56 x78 x1 x2 x3 x4 x5 x6 x7 x8 Lecture 10 : 590.03 Fall 12 22

  23. Universal Histograms for Range Queries • Sensitivity: log n • q(2,6) = x2+x3+x4+x5+x6 Error = 2 x 5log 2 n/ ε 2 Error = 2 x 3log 2 n/ ε 2 = x2 + x34 + x56 x1-8 x1234 x5678 x12 x34 x56 x78 x1 x2 x3 x4 x5 x6 x7 x8 Lecture 10 : 590.03 Fall 12 23

  24. Universal Histograms for Range Queries • Every range query can be answered by summing at most log n different noisy answers • Maximum error on any range query = O(log 3 n / ε 2 ) • Total error on all range queries = O(n 2 log 3 n / ε 2 ) x1-8 x1234 x5678 x12 x34 x56 x78 x1 x2 x3 x4 x5 x6 x7 x8 Lecture 10 : 590.03 Fall 12 24

  25. Universal Histograms & Constrained Inference [Hay et al VLDB 2010] • Can further reduce the error by enforcing constraints x1234 = x12 + x34 = x1 + x2 + x3 + x4 • 2-pass algorithm to compute a consistent version of the counts Lecture 10 : 590.03 Fall 12 25

  26. Universal Histograms & Constrained Inference [Hay et al VLDB 2010] • Pass 1: (Bottom Up) • Pass 2: (Top down) Lecture 10 : 590.03 Fall 12 26

  27. Universal Histograms & Constrained Inference • Resulting consistent counts – Have lower error than noisy counts (upto 10 times smaller in some cases) – Unbiased estimators – Have the least error amongst all unbiased estimators Lecture 10 : 590.03 Fall 12 27

  28. Next Class • Constrained inference – Ensure that the returned answers are consistent with each other. • Query Strategy – Answer a different set of strategy queries A – Answer original queries using A – Universal Histograms – Wavelet Mechanism – Matrix Mechanism Lecture 10 : 590.03 Fall 12 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend