Wavelet and Matrix Mechanism CompSci 590.03 Instructor: Ashwin - - PowerPoint PPT Presentation

wavelet and matrix mechanism
SMART_READER_LITE
LIVE PREVIEW

Wavelet and Matrix Mechanism CompSci 590.03 Instructor: Ashwin - - PowerPoint PPT Presentation

Wavelet and Matrix Mechanism CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 11 : 590.03 Fall 12 1 Announcement Project proposal submission deadline is Fri, Oct 12 noon . Lecture 11 : 590.03 Fall 12 2 Recap: Laplace Mechanism


slide-1
SLIDE 1

Wavelet and Matrix Mechanism

CompSci 590.03 Instructor: Ashwin Machanavajjhala

1 Lecture 11 : 590.03 Fall 12

slide-2
SLIDE 2

Announcement

  • Project proposal submission deadline is Fri, Oct 12 noon.

Lecture 11 : 590.03 Fall 12 2

slide-3
SLIDE 3

Recap: Laplace Mechanism

Thm: If sensitivity of the query is S, then adding Laplace noise with parameter λ guarantees ε-differential privacy, when

λ = S/ε

Sensitivity: Smallest number s.t. for any d, d’ differing in one entry, || q(d) – q(d’) || ≤ S(q) Histogram query: Sensitivity = 2

  • Variance / error on each entry = 2λ2 = 2x4/ε2

3 Lecture 11 : 590.03 Fall 12

slide-4
SLIDE 4

Laplace Mechanism is Suboptimal

  • Query 1: Number of cancer patients
  • Query 2: Number of cancer patients
  • If you answer both using Laplace mechanism

– Sensitivity = 2 – Error in each answer: 2x4/ε2 – Average of two answers gives an error of 4/ε2

  • If you just answer the first and return the same answer

– Sensitivity = 1 – Error in the answer: 2/ε2

Lecture 11 : 590.03 Fall 12 4

slide-5
SLIDE 5

Outline

  • Constrained inference

– Ensure that the returned answers are consistent with each other.

  • Query Strategy

– Answer a different set of strategy queries A – Answer original queries using A – Universal Histograms – Wavelet Mechanism [Xiao et al ICDE 09] – Matrix Mechanism [Li et al PODS 10]

Lecture 11 : 590.03 Fall 12 5

slide-6
SLIDE 6

Note

  • The following solution ideas are useful whenever

– You want to answer a set of correlated queries. – Queries are based on noisy measurements. – Each measurement (x1 or x1+x2) has similar variance.

Lecture 11 : 590.03 Fall 12 6

slide-7
SLIDE 7

Range Queries

  • Given a set of values {v1, v2, …, vn}
  • Let xi = number of tuples with value v1.
  • Range query: q(j,k) = xj + … + xk

Q: Suppose we want to answer all range queries?

Lecture 11 : 590.03 Fall 12 7

slide-8
SLIDE 8

Range Queries

Q: Suppose we want to answer all range queries? Strategy 1: Answer all range queries using Laplace mechanism

  • Sensitivity = O(n2)
  • O(n4/ε2) total error across all range queries.
  • May reduce using constrained optimization …

Lecture 11 : 590.03 Fall 12 8

slide-9
SLIDE 9

Range Queries

Q: Suppose we want to answer all range queries? Strategy 2: Answer all xi queries using Laplace mechanism Answer range queries using noisy xi values.

  • O(1/ε2) error for each xi.
  • Error(q(1,n)) = O(n/ε2)
  • Total error on all range queries : O(n3/ε2)

Lecture 11 : 590.03 Fall 12 9

slide-10
SLIDE 10

Universal Histograms for Range Queries

Strategy 3: Answer sufficient statistics using Laplace mechanism Answer range queries using noisy sufficient statistics.

Lecture 11 : 590.03 Fall 12 10

x1 x2 x3 x4 x5 x6 x7 x8 x12 x34 x56 x78 x1234 x5678 x1-8

[Hay et al VLDB 2010]

slide-11
SLIDE 11

Universal Histograms for Range Queries

  • Sensitivity: log n
  • q(2,6) = x2+x3+x4+x5+x6

Error = 2 x 5log2n/ε2 = x2 + x34 + x56 Error = 2 x 3log2n/ε2

Lecture 11 : 590.03 Fall 12 11

x1 x2 x3 x4 x5 x6 x7 x8 x12 x34 x56 x78 x1234 x5678 x1-8

slide-12
SLIDE 12

Universal Histograms for Range Queries

  • Every range query can be answered by summing at most log n

different noisy answers

  • Maximum error on any range query = O(log3n / ε2)
  • Total error on all range queries = O(n2 log3n / ε2)

Lecture 11 : 590.03 Fall 12 12

x1 x2 x3 x4 x5 x6 x7 x8 x12 x34 x56 x78 x1234 x5678 x1-8

slide-13
SLIDE 13

Outline

  • Constrained inference

– Ensure that the returned answers are consistent with each other.

  • Query Strategy

– Answer a different set of strategy queries A – Answer original queries using A – Universal Histograms – Wavelet Mechanism – Matrix Mechanism

Lecture 11 : 590.03 Fall 12 13

slide-14
SLIDE 14

Wavelet Mechanism

Lecture 11 : 590.03 Fall 12 14

x1 x2 x3 x4 x5 xn C2 C3 Cm

… …

C1 Step 1: Compute Wavelet coefficients C2+η2 C3+η3 Cm+ηm

C1+η1 Step 2: Add noise to coefficients y1 y2 y3 y4 y5 yn

Step 3: Reconstruct

  • riginal counts
slide-15
SLIDE 15

Haar Wavelet

Lecture 11 : 590.03 Fall 12 15

slide-16
SLIDE 16

Haar Wavelet

Lecture 11 : 590.03 Fall 12 16

For an internal node, Let a = average of leaves in left subtree Let b = average of leaves in right subtree

slide-17
SLIDE 17

Haar Wavelet Reconstruction

Lecture 11 : 590.03 Fall 12 17

Sum of coefficients on root to leaf path

  • + if xi is in the left

subtree of coefficient

  • - if xi is in right subtree
slide-18
SLIDE 18

Haar Wavelet : Range Queries

Lecture 11 : 590.03 Fall 12 18

Range Query: number of tuples in a range S = [a,b] Let α(c) be the number of values in the left subtree of c that are in S Let β(c) be the number of values in the right subtree of c that are in S

slide-19
SLIDE 19

Haar Wavelet : Range Queries

Lecture 11 : 590.03 Fall 12 19

α(c) – β(c) = 0 when no leaves under c are contained in S α(c) – β(c) = 0 when all leaves under c are contained in S Only need to consider those coefficients with partial overlap with the range.

slide-20
SLIDE 20

Haar Wavelet

Lecture 11 : 590.03 Fall 12 20

For an internal node, Let a = average of leaves in left subtree Let b = average of leaves in right subtree

slide-21
SLIDE 21

Adding noise to wavelet coefficients

  • Associate each coefficient with a weight
  • level( c ) = height of c in the tree.
  • Generalized sensitivity (ρ)

Lecture 11 : 590.03 Fall 12 21

slide-22
SLIDE 22

Adding noise to wavelet coefficients

Theorem: Adding noise to a coefficient c from Laplace(λ/W(c)) guarantees (2ρ/λ)-differential privacy. Proof:

Lecture 11 : 590.03 Fall 12 22

slide-23
SLIDE 23

Generalized Sensitivity of Wavelet Mechanism

Proof:

  • Any coefficient changes by 1/m, where m is the number of values

in its subtree.

  • m = 1/W(c)
  • Only c0 and the coefficients in one root to leaf path change if

some xi changes by 1.

Lecture 11 : 590.03 Fall 12 23

slide-24
SLIDE 24

Error in answering range queries

  • Range query depends on at most O(log n) coefficients.
  • Error in each coefficient is at most O(log2n/ε2)
  • Error in a range query is O(log3n/ε2)

Lecture 11 : 590.03 Fall 12 24

slide-25
SLIDE 25

Summary of Wavelet Mechanism

  • Query Strategy: use wavelet coefficients
  • Can be computed in linear time
  • Noise in each range query: O(log3n/ε2)

Lecture 11 : 590.03 Fall 12 25

slide-26
SLIDE 26

Outline

  • Constrained inference

– Ensure that the returned answers are consistent with each other.

  • Query Strategy

– Answer a different set of strategy queries A – Answer original queries using A – Universal Histograms – Wavelet Mechanism – Matrix Mechanism

Lecture 11 : 590.03 Fall 12 26

slide-27
SLIDE 27

Linear Queries

  • A set of linear queries can be represented by a matrix
  • X = [x1, x2, x3, x4] is a vector

representing the counts of 4 values

  • H4 X represents the following 7 queries

– x1+x2+x3+x4 – x1+x2 – x3+x4 – x1 – x2 – x3 – x4

Lecture 11 : 590.03 Fall 12 27

slide-28
SLIDE 28

Query Matrices

Lecture 11 : 590.03 Fall 12 28

Identity Binary Index Haar Wavelet

slide-29
SLIDE 29

Sensitivity of a Query Matrix

  • How many queries are affected by a change in a single count?

Lecture 11 : 590.03 Fall 12 29

Sensitivity = 1 Sensitivity = 3 Sensitivity = 3

slide-30
SLIDE 30

Laplace Mechanism

Lecture 11 : 590.03 Fall 12 30

Sensitivity

Noise Vector of Laplace(1)

slide-31
SLIDE 31

Matrix Mechanism

Lecture 11 : 590.03 Fall 12 31

Original Data Noisy Representation Reconstructed Data Final query answer

slide-32
SLIDE 32

Reconstruction

Lecture 11 : 590.03 Fall 12 32

slide-33
SLIDE 33

Matrix Mechanism

Lecture 11 : 590.03 Fall 12 33

slide-34
SLIDE 34

Error analysis

Lecture 11 : 590.03 Fall 12 34

slide-35
SLIDE 35

Extreme strategies

  • Strategy A = In

– Noisily answer each xi – Answer queries using noisy counts

  • Strategy A = W

– Add noise to all the query answers

Lecture 11 : 590.03 Fall 12 35

Good when each query hits a few values. Good when sensitivity is small

slide-36
SLIDE 36

Finding the Optimal Strategy

  • Find A that minimizes TotalErrorA(W)

– Reduces to solving a semi-definite program with rank constraints – O(n6) running time.

  • See paper for approximations and an interesting discussion on

geometry.

Lecture 11 : 590.03 Fall 12 36

slide-37
SLIDE 37

Summary

  • A linear query workload and strategy can be modeled using

matrices

  • Previous techniques to find a better strategy to answer a batch of

queries is subsumed by the matrix mechanism

  • General mechanism to answer queries.
  • Noise depends on the sensitivity of the strategy and AtA-1

Lecture 11 : 590.03 Fall 12 37

slide-38
SLIDE 38

Next Class

  • Sparse Vector Technique

– Answering a workload of “sparse” queries

Lecture 11 : 590.03 Fall 12 38

slide-39
SLIDE 39

References

  • X. Xiao, G. Wang, J. Gehrke, “Differential Privacy via Wavelet Transform”, ICDE 2009
  • C. Li, M. Hay, V. Rastogi, G. Miklau, A. McGregor, “Optimizing Linear Queries under

Differential Privacy”, PODS 2010

Lecture 11 : 590.03 Fall 12 39