Post-processing outputs for better utility CompSci 590.03 - - PowerPoint PPT Presentation

post processing outputs for better utility
SMART_READER_LITE
LIVE PREVIEW

Post-processing outputs for better utility CompSci 590.03 - - PowerPoint PPT Presentation

Post-processing outputs for better utility CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 10 : 590.03 Fall 12 1 Announcement Project proposal submission deadline is Fri, Oct 12 noon . Lecture 10 : 590.03 Fall 12 2 Recap:


slide-1
SLIDE 1

Post-processing outputs for better utility

CompSci 590.03 Instructor: Ashwin Machanavajjhala

1 Lecture 10 : 590.03 Fall 12

slide-2
SLIDE 2

Announcement

  • Project proposal submission deadline is Fri, Oct 12 noon.

Lecture 10 : 590.03 Fall 12 2

slide-3
SLIDE 3

Recap: Differential Privacy

For every output … O D2 D1 Adversary should not be able to distinguish between any D1 and D2 based on any O Pr[A(D1) = O] Pr[A(D2) = O] . For every pair of inputs that differ in one value < ε (ε>0)

log

3 Lecture 10 : 590.03 Fall 12

slide-4
SLIDE 4

Recap: Laplacian Distribution

0.2 0.4 0.6

  • 10 -8 -6 -4 -2

2 4 6 8 10

Laplace Distribution – Lap(λ)

Database

Researcher

Query q

True answer

q(d) q(d) + η η

h(η) α exp(-η / λ)

Privacy depends on the λ parameter Mean: 0, Variance: 2 λ2

4 Lecture 10 : 590.03 Fall 12

slide-5
SLIDE 5

Recap: Laplace Mechanism

Thm: If sensitivity of the query is S, then the following guarantees ε- differential privacy.

λ = S/ε

Sensitivity: Smallest number s.t. for any d, d’ differing in one entry, || q(d) – q(d’) || ≤ S(q) Histogram query: Sensitivity = 2

  • Variance / error on each entry = 2x4/ε2 = O(1/ε2)

5 Lecture 10 : 590.03 Fall 12

slide-6
SLIDE 6

This class

  • What is the optimal method to answer a batch of queries?

Lecture 10 : 590.03 Fall 12 6

slide-7
SLIDE 7

How to answer a batch of queries?

  • Database of values {x1, x2, …, xk}
  • Query Set:

– Value of x1 η1 = x1 + δ1 – Value of x2 η2 = x2 + δ2 – Value of x1 + x2 η3 = x1 + x2 + δ3

  • But we know that η1 and η2 should sum up to η3!

Lecture 10 : 590.03 Fall 12 7

slide-8
SLIDE 8

Two Approaches

  • Constrained inference

– Ensure that the returned answers are consistent with each other.

  • Query Strategy

– Answer a different set of strategy queries A – Answer original queries using A – Universal Histograms – Wavelet Mechanism – Matrix Mechanism

Lecture 10 : 590.03 Fall 12 8

slide-9
SLIDE 9

Two Approaches

  • Constrained inference

– Ensure that the returned answers are consistent with each other.

  • Query Strategy

– Answer a different set of strategy queries A – Answer original queries using A – Universal Histograms – Wavelet Mechanism – Matrix Mechanism

Lecture 10 : 590.03 Fall 12 9

slide-10
SLIDE 10

Constrained Inference

Lecture 10 : 590.03 Fall 12 10

slide-11
SLIDE 11

Constrained Inference

  • Let x1 and x2 be the original values. We observe noisy values η1,

η2 and η3

  • We would like to reconstruct the best estimators y1 (for x1/) and

y2 (for x2) from the noisy values.

  • That is, we want to find the values of y1, y2 such that:

min (y1-η1)2 + (y2 – η2)2 + (y3 – η3)2 s.t., y1 + y2 = y3

Lecture 10 : 590.03 Fall 12 11

slide-12
SLIDE 12

Constrained Inference [Hay et al VLDB 10]

Lecture 10 : 590.03 Fall 12 12

slide-13
SLIDE 13

Sorted Unattributed Histograms

  • Counts of diseases

– (without associating a particular count to the corresponding disease)

  • Degree sequence: List of node degrees

– (without associating a degree to a particular node)

  • Constraint: The values are sorted

Lecture 10 : 590.03 Fall 12 13

slide-14
SLIDE 14

Sorted Unattributed Histograms

True Values 20, 10, 8, 8, 8, 5, 3, 2 Noisy Values 25, 9, 13, 7, 10, 6, 3, 1 (noise from Lap(1/ε)) Proof:?

Lecture 10 : 590.03 Fall 12 14

slide-15
SLIDE 15

Sorted Unattributed Histograms

Lecture 10 : 590.03 Fall 12 15

slide-16
SLIDE 16

Sorted Unattributed Histograms

  • n: number of values in the histogram
  • d: number of distinct values in the histogram
  • ni: number of times ith distinct value appears in the histogram.

Lecture 10 : 590.03 Fall 12 16

slide-17
SLIDE 17

Two Approaches

  • Constrained inference

– Ensure that the returned answers are consistent with each other.

  • Query Strategy

– Answer a different set of strategy queries A – Answer original queries using A – Universal Histograms – Wavelet Mechanism – Matrix Mechanism

Lecture 10 : 590.03 Fall 12 17

slide-18
SLIDE 18

Query Strategy

Lecture 10 : 590.03 Fall 12 18

I

Private Data

W A

Differential Privacy

A(I) A(I) W(I) ~ ~

Original Query Workload Strategy Query Workload Noisy Strategy Answers Noisy Workload Answers

slide-19
SLIDE 19

Range Queries

  • Given a set of values {x1, x2, …, xn}
  • Range query: q(j,k) = xj + … + xk

Q: Suppose we want to answer all range queries? Strategy 1: Answer all range queries using Laplace mechanism

  • O(n2/ε2) total error.
  • May reduce using constrained optimization …

Lecture 10 : 590.03 Fall 12 19

slide-20
SLIDE 20

Range Queries

  • Given a set of values {x1, x2, …, xn}
  • Range query: q(j,k) = xj + … + xk

Q: Suppose we want to answer all range queries? Strategy 1: Answer all range queries using Laplace mechanism

  • Sensitivity = O(n2)
  • O(n4/ε2) total error across all range queries.
  • May reduce using constrained optimization …

Lecture 10 : 590.03 Fall 12 20

slide-21
SLIDE 21

Range Queries

  • Given a set of values {x1, x2, …, xn}
  • Range query: q(j,k) = xj + … + xk

Q: Suppose we want to answer all range queries? Strategy 2: Answer all xi queries using Laplace mechanism Answer range queries using noisy xi values.

  • O(1/ε2) error for each xi.
  • Error(q(1,n)) = O(n/ε2)
  • Total error on all range queries : O(n3/ε2)

Lecture 10 : 590.03 Fall 12 21

slide-22
SLIDE 22

Universal Histograms for Range Queries

Strategy 3: Answer sufficient statistics using Laplace mechanism Answer range queries using noisy sufficient statistics.

Lecture 10 : 590.03 Fall 12 22

x1 x2 x3 x4 x5 x6 x7 x8 x12 x34 x56 x78 x1234 x5678 x1-8

[Hay et al VLDB 2010]

slide-23
SLIDE 23

Universal Histograms for Range Queries

  • Sensitivity: log n
  • q(2,6) = x2+x3+x4+x5+x6

Error = 2 x 5log2n/ε2 = x2 + x34 + x56 Error = 2 x 3log2n/ε2

Lecture 10 : 590.03 Fall 12 23

x1 x2 x3 x4 x5 x6 x7 x8 x12 x34 x56 x78 x1234 x5678 x1-8

slide-24
SLIDE 24

Universal Histograms for Range Queries

  • Every range query can be answered by summing at most log n

different noisy answers

  • Maximum error on any range query = O(log3n / ε2)
  • Total error on all range queries = O(n2 log3n / ε2)

Lecture 10 : 590.03 Fall 12 24

x1 x2 x3 x4 x5 x6 x7 x8 x12 x34 x56 x78 x1234 x5678 x1-8

slide-25
SLIDE 25

Universal Histograms & Constrained Inference

  • Can further reduce the error by enforcing constraints

x1234 = x12 + x34 = x1 + x2 + x3 + x4

  • 2-pass algorithm to compute a consistent version of the counts

Lecture 10 : 590.03 Fall 12 25

[Hay et al VLDB 2010]

slide-26
SLIDE 26

Universal Histograms & Constrained Inference

  • Pass 1: (Bottom Up)
  • Pass 2: (Top down)

Lecture 10 : 590.03 Fall 12 26

[Hay et al VLDB 2010]

slide-27
SLIDE 27

Universal Histograms & Constrained Inference

  • Resulting consistent counts

– Have lower error than noisy counts (upto 10 times smaller in some cases) – Unbiased estimators – Have the least error amongst all unbiased estimators

Lecture 10 : 590.03 Fall 12 27

slide-28
SLIDE 28

Next Class

  • Constrained inference

– Ensure that the returned answers are consistent with each other.

  • Query Strategy

– Answer a different set of strategy queries A – Answer original queries using A – Universal Histograms – Wavelet Mechanism – Matrix Mechanism

Lecture 10 : 590.03 Fall 12 28