Differentially-Private Batch Query Answering
Exploiting the Workload vs. Exploiting the Data
Gerome Miklau
University of Massachusetts, Amherst
DIMACS Workshop on Recent Work on Differential Privacy across Computer Science • October 2012
Di ff erentially-Private Batch Query Answering Exploiting the - - PowerPoint PPT Presentation
Di ff erentially-Private Batch Query Answering Exploiting the Workload vs. Exploiting the Data Gerome Miklau University of Massachusetts, Amherst DIMACS Workshop on Recent Work on Di ff erential Privacy across Computer Science October 2012
University of Massachusetts, Amherst
DIMACS Workshop on Recent Work on Differential Privacy across Computer Science • October 2012
needed--design workload to include all queries possibly of interest.
needed--design workload to include all queries possibly of interest.
differential privacy.
needed--design workload to include all queries possibly of interest.
differential privacy.
multi-dimensional range queries, marginals, data cubes, etc.
needed--design workload to include all queries possibly of interest.
Laplace or Gaussian Mechanism
database
analyst server
Laplace or Gaussian Mechanism
database
Workload W
w1 w2 w3 w1(D) + noise w2(D) + noise w3(D) + noise
analyst server
Laplace or Gaussian Mechanism
database
Workload W
w1 w2 w3
analyst server
Laplace or Gaussian Mechanism
database
Workload W
w1 w2 w3
analyst server
a1 a2 a3 a1(D) + noise a2(D) + noise a3(D) + noise
Observations A
Laplace or Gaussian Mechanism
database
Workload W
w1 w2 w3
analyst server
a1 a2 a3 a1(D) + noise a2(D) + noise a3(D) + noise
Observations A
noisy est. w1(D) noisy est. w2(D) noisy est. w3(D)
Laplace or Gaussian Mechanism
database
Workload W
w1 w2 w3
analyst server
a1 a2 a3 a1(D) + noise a2(D) + noise a3(D) + noise
Observations A
noisy est. w1(D) noisy est. w2(D) noisy est. w3(D)
Select Observations
Laplace or Gaussian Mechanism
database
Workload W
w1 w2 w3
analyst server
a1 a2 a3 a1(D) + noise a2(D) + noise a3(D) + noise
Observations A
noisy est. w1(D) noisy est. w2(D) noisy est. w3(D)
Apply standard mechanism Select Observations
Laplace or Gaussian Mechanism
database
Workload W
w1 w2 w3
analyst server
a1 a2 a3 a1(D) + noise a2(D) + noise a3(D) + noise
Observations A
noisy est. w1(D) noisy est. w2(D) noisy est. w3(D)
Apply standard mechanism Derive answers to workload queries Select Observations
Workload Observations Citation low-order marginals Fourier basis queries
[Barak, PODS ‘07]
all one-dim range queries Hierarchical ranges
[Hay, PVLDB ‘10]
all (multi-dim) range queries Haar wavelet queries
[Xiao, ICDE ‘10]
2-dim range queries Quad-tree queries
[Cormode, ICDE ’12]
sets of data cubes sets of data cubes
[Ding, SIGMOD ’11]
set of linear queries set of linear queries
[Li, PODS ‘10] [Li, PVLDB ‘12]
set of linear queries set of linear queries
[Yuan, VLDB ’12]
Optimized Fixed
Laplace or Gaussian Mechanism
database
Workload W
w1 w2 w3
analyst server
Laplace or Gaussian Mechanism
database
Workload W
w1 w2 w3
analyst server T test noisy result T’
Laplace or Gaussian Mechanism
database
Workload W
w1 w2 w3
analyst server
a1 a2 a3
Observations A T test noisy result T’
Laplace or Gaussian Mechanism
database
Workload W
w1 w2 w3
analyst server
a1 a2 a3
Observations A T test noisy result T’
a1(D) + noise a2(D) + noise a3(D) + noise
Laplace or Gaussian Mechanism
database
Workload W
w1 w2 w3
analyst server
a1 a2 a3
Observations A T test noisy result T’
a1(D) + noise a2(D) + noise a3(D) + noise noisy est. w1(D) noisy est. w2(D) noisy est. w3(D)
Laplace or Gaussian Mechanism
database
Workload W
w1 w2 w3
analyst server
a1 a2 a3
Observations A T test noisy result T’
a1(D) + noise a2(D) + noise a3(D) + noise noisy est. w1(D) noisy est. w2(D) noisy est. w3(D)
Test dataset
Laplace or Gaussian Mechanism
database
Workload W
w1 w2 w3
analyst server
a1 a2 a3
Observations A T test noisy result T’
a1(D) + noise a2(D) + noise a3(D) + noise noisy est. w1(D) noisy est. w2(D) noisy est. w3(D)
Select Observations Test dataset
Laplace or Gaussian Mechanism
database
Workload W
w1 w2 w3
analyst server
a1 a2 a3
Observations A T test noisy result T’
a1(D) + noise a2(D) + noise a3(D) + noise noisy est. w1(D) noisy est. w2(D) noisy est. w3(D)
Select Observations Apply standard mechanism Test dataset
Laplace or Gaussian Mechanism
database
Workload W
w1 w2 w3
analyst server
a1 a2 a3
Observations A T test noisy result T’
a1(D) + noise a2(D) + noise a3(D) + noise noisy est. w1(D) noisy est. w2(D) noisy est. w3(D)
Select Observations Apply standard mechanism Test dataset Derive workload answers
Workload Observations Citation 1D range queries
histogram
[Xu, ICDE ’12]
2D range queries kd-tree queries
[Xiao, SDM ‘10]
2D range queries hybrid kd-tree queries
[Cormode, ICDE ’12]
Marginals scaled workload queries
[Xiao, SIGMOD ’11]
Linear queries subset of workload
[Hardt, NIPS ’12]
name gender grade Alice
Female
91 Bob
Male
84 Carl
Male
82 Dave
Male
97 Edwina
Female
88 Faith
Female
78 Ghita
Female
85
... ... ...
Relational database Frequency vector
gender grade count Male 100 10 Male 99 13 Male 98 5 Male 97 7 ... ... ... Female 100 15 Female 99 21 Female 98 4 Female 97 14 Female 96 9 x1 x2 x3 x4 x5 x6 x7 x8 ... xn
{gender, grade}
name gender grade Alice
Female
91 Bob
Male
84 Carl
Male
82 Dave
Male
97 Edwina
Female
88 Faith
Female
78 Ghita
Female
85
... ... ...
Relational database Frequency vector
name gender grade Alice
Female
91 Bob
Male
84 Carl
Male
82 Dave
Male
97 Edwina
Female
88 Faith
Female
78 Ghita
Female
85
... ... ...
Relational database Frequency vector
grade count 90-100 10 80-90 23 70-80 16 60-70 3
{grade}
x1 x2 x3 x4
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
excluded dimensions.
excluded dimensions.
1-dim ranges
excluded dimensions.
1-dim ranges marginals
k-dim ranges
excluded dimensions.
1-dim ranges marginals
predicate counting queries k-dim ranges
excluded dimensions.
1-dim ranges marginals
linear counting queries predicate counting queries k-dim ranges
excluded dimensions.
1-dim ranges marginals
A randomized algorithm A provides (ε,δ)-differential privacy if: for all neighboring databases D and D’, and for any set of outputs S:
query answer.
and x’ will differ in one position, by exactly 1.
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1
y1 y2 y3 y4
x1 x2 x3 x4 x5 x6 x7 +1 x8 x9 x10
x
query matrix answers
and x’ will differ in one position, by exactly 1.
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1
y1 y2 y3 y4
x1 x2 x3 x4 x5 x6 x7 +1 x8 x9 x10
x
query matrix answers
and x’ will differ in one position, by exactly 1.
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1
y1 y2 y3 y4
x1 x2 x3 x4 x5 x6 x7 +1 x8 x9 x10
x
query matrix answers
and x’ will differ in one position, by exactly 1.
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1
y1 y2 y3 y4
x1 x2 x3 x4 x5 x6 x7 +1 x8 x9 x10
x
query matrix answers
The L1 sensitivity of a query matrix is: the maximum L1 norm of the columns.
and x’ will differ in one position, by exactly 1.
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1
y1 y2 y3 y4
x1 x2 x3 x4 x5 x6 x7 +1 x8 x9 x10
x
query matrix answers
The L1 sensitivity of a query matrix is: the maximum L1 norm of the columns.
The L2 sensitivity of a query matrix is: the maximum L2 norm of the columns.
x1 + x2 + x3 + x4 x1 + x2 + x3 x2 + x3 + x4 x1 + x2 x2 + x3 x3 + x4 x1 x2 x3 x4
w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 range(x1,x4) range(x1,x3) range(x2,x4) range(x1,x2) range(x2,x3) range(x3,x4) range(x1,x1) range(x2,x2) range(x3,x3) range(x4,x4)
x1 + x2 + x3 + x4 x1 + x2 + x3 x2 + x3 + x4 x1 + x2 x2 + x3 x3 + x4 x1 x2 x3 x4
w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 range(x1,x4) range(x1,x3) range(x2,x4) range(x1,x2) range(x2,x3) range(x3,x4) range(x1,x1) range(x2,x2) range(x3,x3) range(x4,x4) 10 23 16 3
x1 + x2 + x3 + x4 x1 + x2 + x3 x2 + x3 + x4 x1 + x2 x2 + x3 x3 + x4 x1 x2 x3 x4
w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 range(x1,x4) range(x1,x3) range(x2,x4) range(x1,x2) range(x2,x3) range(x3,x4) range(x1,x1) range(x2,x2) range(x3,x3) range(x4,x4) 10 23 16 3
w1 w2 w3 w4 w5 w6 w7 w8 w9 w10
52 49 42 33 39 19 10 23 16 3
x1 + x2 + x3 + x4 x1 + x2 + x3 x2 + x3 + x4 x1 + x2 x2 + x3 x3 + x4 x1 x2 x3 x4
w1 w2 w3 w4 w5 w6 w7 w8 w9 w10
b1 b2 b3 b4 b5 b6 b7 b8 b9 b10
private output Laplace noise w’1 w’2 w’3 w’4 w’5 w’6 w’7 w’8 w’9 w’10 Workload queries
x1 + x2 + x3 + x4 x1 + x2 + x3 x2 + x3 + x4 x1 + x2 x2 + x3 x3 + x4 x1 x2 x3 x4
w1 w2 w3 w4 w5 w6 w7 w8 w9 w10
b1 b2 b3 b4 b5 b6 b7 b8 b9 b10
private output Laplace noise w’1 w’2 w’3 w’4 w’5 w’6 w’7 w’8 w’9 w’10 Workload queries
8.2
6.6
2.4
6.7 4.6 60.2 44.6 38.9 39.6 31.1 21.4 7.0 18.1 22.7 7.6 52 49 42 33 39 19 10 23 16 3
x1 + x2 + x3 + x4 x1 + x2 + x3 x2 + x3 + x4 x1 + x2 x2 + x3 x3 + x4 x1 x2 x3 x4
w1 w2 w3 w4 w5 w6 w7 w8 w9 w10
b1 b2 b3 b4 b5 b6 b7 b8 b9 b10
private output Laplace noise w’1 w’2 w’3 w’4 w’5 w’6 w’7 w’8 w’9 w’10 Workload queries
8.2
6.6
2.4
6.7 4.6 60.2 44.6 38.9 39.6 31.1 21.4 7.0 18.1 22.7 7.6 52 49 42 33 39 19 10 23 16 3
Σ=55.4
x1 + x2 + x3 + x4 x1 + x2 + x3 x2 + x3 + x4 x1 + x2 x2 + x3 x3 + x4 x1 x2 x3 x4
w1 w2 w3 w4 w5 w6 w7 w8 w9 w10
n=4 n
Sensitivity ||W||1
6 O(n2)
Error per query
2(||W||1/ε)2 = 72/ε2 2(||W||1/ε)2 = O(n4)/ε2
b1 b2 b3 b4 b5 b6 b7 b8 b9 b10
private output Laplace noise w’1 w’2 w’3 w’4 w’5 w’6 w’7 w’8 w’9 w’10 Workload queries
8.2
6.6
2.4
6.7 4.6 60.2 44.6 38.9 39.6 31.1 21.4 7.0 18.1 22.7 7.6 52 49 42 33 39 19 10 23 16 3
Σ=55.4
z1 z2 z3 z4
b1 b2 b3 b4
Use Laplace mechanism to get noisy estimates for each xi.
private output
x1 x2 x3 x4
queries submitted Laplace noise
Observation
z1 z2 z3 z4
b1 b2 b3 b4
Use Laplace mechanism to get noisy estimates for each xi.
private output
x1 x2 x3 x4
queries submitted derived workload answers w’1 w’2 w’3 w’4 w’5 w’6 w’7 w’8 w’9 w’10
z1 + z2 + z3 + z4 z1 + z2 + z3 z2 + z3 + z4 z1 + z2 z2 + z3 z3 + z4 z1 z2 z3 z4
Laplace noise
Observation
z1 z2 z3 z4
b1 b2 b3 b4
Use Laplace mechanism to get noisy estimates for each xi.
private output
x1 x2 x3 x4
queries submitted derived workload answers w’1 w’2 w’3 w’4 w’5 w’6 w’7 w’8 w’9 w’10
z1 + z2 + z3 + z4 z1 + z2 + z3 z2 + z3 + z4 z1 + z2 z2 + z3 z3 + z4 z1 z2 z3 z4
Laplace noise
Observation
z1 z2 z3 z4
b1 b2 b3 b4
Use Laplace mechanism to get noisy estimates for each xi.
private output
x1 x2 x3 x4
queries submitted derived workload answers w’1 w’2 w’3 w’4 w’5 w’6 w’7 w’8 w’9 w’10
z1 + z2 + z3 + z4 z1 + z2 + z3 z2 + z3 + z4 z1 + z2 z2 + z3 z3 + z4 z1 z2 z3 z4
Laplace noise
8/ε2 2/ε2
Observation
H
= logn+1 Hierarchical queries: recursively partition the domain, computing sums of each interval. [Hay, PVLDB 10]
x1 + x2 + x3 + x4 x1 + x2 x3 + x4 x1 x2 x3 x4
private output Laplace noise b1 b2 b3 b4 b5 b6 b7 z1 z2 z3 z4 z5 z6 z7 queries submitted Observation
H
= logn+1 Hierarchical queries: recursively partition the domain, computing sums of each interval. [Hay, PVLDB 10]
x1 + x2 + x3 + x4 x1 + x2 x3 + x4 x1 x2 x3 x4
private output Laplace noise b1 b2 b3 b4 b5 b6 b7 z1 z2 z3 z4 z5 z6 z7 queries submitted derived workload answers w’1 w’2 w’3 w’4 w’5 w’6 w’7 w’8 w’9 w’10
Observation
H
= logn+1 Hierarchical queries: recursively partition the domain, computing sums of each interval. [Hay, PVLDB 10]
x1 + x2 + x3 + x4 x1 + x2 x3 + x4 x1 x2 x3 x4
private output Laplace noise b1 b2 b3 b4 b5 b6 b7 z1 z2 z3 z4 z5 z6 z7 queries submitted derived workload answers w’1 w’2 w’3 w’4 w’5 w’6 w’7 w’8 w’9 w’10
Observation
H
= logn+1 Hierarchical queries: recursively partition the domain, computing sums of each interval. [Hay, PVLDB 10]
x1 + x2 + x3 + x4 x1 + x2 x3 + x4 x1 x2 x3 x4
private output Laplace noise b1 b2 b3 b4 b5 b6 b7 z1 z2 z3 z4 z5 z6 z7 queries submitted derived workload answers w’1 w’2 w’3 w’4 w’5 w’6 w’7 w’8 w’9 w’10
z5 + z6
Observation
H
= logn+1 Hierarchical queries: recursively partition the domain, computing sums of each interval. [Hay, PVLDB 10]
x1 + x2 + x3 + x4 x1 + x2 x3 + x4 x1 x2 x3 x4
private output Laplace noise b1 b2 b3 b4 b5 b6 b7 z1 z2 z3 z4 z5 z6 z7 queries submitted derived workload answers w’1 w’2 w’3 w’4 w’5 w’6 w’7 w’8 w’9 w’10
z5 + z6 z2 - z4 + z6
Observation
H
= logn+1 Hierarchical queries: recursively partition the domain, computing sums of each interval. [Hay, PVLDB 10]
x1 + x2 + x3 + x4 x1 + x2 x3 + x4 x1 x2 x3 x4
private output Laplace noise b1 b2 b3 b4 b5 b6 b7 z1 z2 z3 z4 z5 z6 z7 queries submitted derived workload answers w’1 w’2 w’3 w’4 w’5 w’6 w’7 w’8 w’9 w’10
z5 + z6 z1 - z4 - z7 z2 - z4 + z6
Observation
H
= logn+1 Hierarchical queries: recursively partition the domain, computing sums of each interval. [Hay, PVLDB 10]
x1 + x2 + x3 + x4 x1 + x2 x3 + x4 x1 x2 x3 x4
private output Laplace noise b1 b2 b3 b4 b5 b6 b7 z1 z2 z3 z4 z5 z6 z7 queries submitted derived workload answers w’1 w’2 w’3 w’4 w’5 w’6 w’7 w’8 w’9 w’10
z5 + z6 z1 - z4 - z7 z2 - z4 + z6
Least-squares estimate
(6z1 + 3z2 + 3z3 - 9z4 + 12z5 + 12z6 - 9z7)/21
Observation
40000 80000 120000 160000 200000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Mean Squared Error
Query Width (as fraction of the domain) Noisy counts Hierarchical (2)
ε = 0.1
n = 1024
small ranges big ranges
ε-differential privacy
[Xiao, ICDE 10]
x1 + x2 + x3 + x4 x1 + x2 - x3 - x4 x1 - x2 x3 - x4
z1 z2 z3 z4
b1 b2 b3 b4
private output queries submitted derived workload answers w’1 w’2 w’3 w’4 w’5 w’6 w’7 w’8 w’9 w’10
Wavelet: use Haar wavelet as observations.
.5z1 + 0z2 - .5z3 + .5z4
Y
Laplace noise
= logn+1
Observation
[Xiao, ICDE 10]
x1 + x2 + x3 + x4 x1 + x2 - x3 - x4 x1 - x2 x3 - x4
z1 z2 z3 z4
b1 b2 b3 b4
private output queries submitted derived workload answers w’1 w’2 w’3 w’4 w’5 w’6 w’7 w’8 w’9 w’10
Wavelet: use Haar wavelet as observations.
.5z1 + 0z2 - .5z3 + .5z4
Y
Laplace noise
= logn+1
Observation
ε = 0.1
n = 1024
40000 80000 120000 160000 200000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Mean Squared Error
Query Width (as fraction of the domain) Identity Hierarchical (2) Wavelet Hierarchical (4)
ε-differential privacy
Low sensitivity, and all range queries can be estimated using no more than logn output entries. Very low sensitivity, but large ranges estimated badly.
Noisy counts Hierarchical Wavelet
O(n/ε2)
Max/Avg error
O(log3n/ε2) O(log3n/ε2)
x1 x2 x3 x4 x1 + x2 + x3 + x4 x1 + x2 x3 + x4 x1 x2 x3 x4 x1 + x2 + x3 + x4 x1 + x2
x1
x3
O(log3kn/ε2)
1-dim k-dim
queries
level
marginals
more accurate less accurate
Workload Observations Citation low-order marginals Fourier basis queries
[Barak, PODS ‘07]
all one-dim range queries Hierarchical ranges
[Hay, PVLDB ‘10]
all (multi-dim) range queries Haar wavelet queries
[Xiao, ICDE ‘10]
2-dim range queries Quad-tree queries
[Cormode, ICDE ’12]
Non-adaptive
Workload Observations Citation low-order marginals Fourier basis queries
[Barak, PODS ‘07]
all one-dim range queries Hierarchical ranges
[Hay, PVLDB ‘10]
all (multi-dim) range queries Haar wavelet queries
[Xiao, ICDE ‘10]
2-dim range queries Quad-tree queries
[Cormode, ICDE ’12]
Non-adaptive
Workload Observations Citation low-order marginals Fourier basis queries
[Barak, PODS ‘07]
all one-dim range queries Hierarchical ranges
[Hay, PVLDB ‘10]
all (multi-dim) range queries Haar wavelet queries
[Xiao, ICDE ‘10]
2-dim range queries Quad-tree queries
[Cormode, ICDE ’12]
Non-adaptive
mxn workload
nx1 database
scalar
sensitivity
mx1 noise: independent samples from Laplace(1)
mxn workload
nx1 database
scalar
sensitivity
mx1 noise: independent samples from Laplace(1)
➊ (Select Observations) Choose a (full rank) query matrix A
➊ (Select Observations) Choose a (full rank) query matrix A ➋ (Apply Laplace) Use the Laplace mechanism to answer A
➊ (Select Observations) Choose a (full rank) query matrix A ➋ (Apply Laplace) Use the Laplace mechanism to answer A
➊ (Select Observations) Choose a (full rank) query matrix A ➋ (Apply Laplace) Use the Laplace mechanism to answer A ➌ (Derive answers) Compute estimate x of x using answers z.
➊ (Select Observations) Choose a (full rank) query matrix A ➋ (Apply Laplace) Use the Laplace mechanism to answer A ➌ (Derive answers) Compute estimate x of x using answers z.
2
➊ (Select Observations) Choose a (full rank) query matrix A ➋ (Apply Laplace) Use the Laplace mechanism to answer A ➌ (Derive answers) Compute estimate x of x using answers z.
2
➊ (Select Observations) Choose a (full rank) query matrix A ➋ (Apply Laplace) Use the Laplace mechanism to answer A ➌ (Derive answers) Compute estimate x of x using answers z.
Thm: x is unbiased
and has the least variance among all linear unbiased estimators.
2
➊ (Select Observations) Choose a (full rank) query matrix A ➋ (Apply Laplace) Use the Laplace mechanism to answer A ➌ (Derive answers) Compute estimate x of x using answers z.
Thm: x is unbiased
and has the least variance among all linear unbiased estimators.
2
b=Lap(1)
b=Lap(1)
instantiated with
b=Lap(1)
instantiated with
true answer
b=Lap(1)
instantiated with
true answer
scaling by
||A||1
b=Lap(1)
instantiated with
true answer
scaling by
||A||1
transformation by WA+
b=Lap(1)
Compare with the Laplace mechanism:
instantiated with
true answer
scaling by
||A||1
transformation by WA+
Observation Matrix A Resulting mechanism
A = W Never worse than Laplace -- sometimes better A = Identity matrix a common baseline A = Haar wavelet
[Xiao, ICDE ‘10]
A = tree based
[Hay, PVLDB ‘10] [Cormode, ICDE ’12]
A = fourier basis
[Barak, PODS ‘07]
1 1 1 1 1 1
1
1
1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 √2 √2 √2 √2
1 1 1 1 1 1
1
1
1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 √2 √2 √2 √2
Equivalent error for all queries
1 1 1 1 1 1
1
1
1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 √2 √2 √2 √2
Equivalent error for all queries Lower error for all queries
1 1 1 1 1 1
1
1
1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 √2 √2 √2 √2
Equivalent error for all queries Lower error for all queries
Privacy
Optimization Objective
Problem Type Runtime
(W)
Privacy
Optimization Objective
Problem Type Runtime
ε DP Given W consisting of data cube queries, choose A consisting of data cube queries to minimize simplified error
set-cover approx O(n)
W A TotalError (W)
(W)
Privacy
Optimization Objective
Problem Type Runtime
ε DP Given W consisting of data cube queries, choose A consisting of data cube queries to minimize simplified error
set-cover approx O(n) ε DP
Given W, choose A to minimize TotalErrorA(W)
[Li, PODS ‘10]
SDP w/ rank constraints O(n8)
W A TotalError (W)
Privacy
Optimization Objective
Problem Type Runtime
ε DP Given W consisting of data cube queries, choose A consisting of data cube queries to minimize simplified error
set-cover approx O(n) ε DP
Given W, choose A to minimize TotalErrorA(W)
[Li, PODS ‘10]
SDP w/ rank constraints O(n8) (ε,δ) DP
Given W, choose A to minimize TotalErrorA(W)
[Li, PODS ‘10]
SDP O(n8)
W AB≈W
Privacy
Optimization Objective
Problem Type Runtime
ε DP Given W consisting of data cube queries, choose A consisting of data cube queries to minimize simplified error
set-cover approx O(n) ε DP
Given W, choose A to minimize TotalErrorA(W)
[Li, PODS ‘10]
SDP w/ rank constraints O(n8) (ε,δ) DP
Given W, choose A to minimize TotalErrorA(W)
[Li, PODS ‘10]
SDP O(n8) ε DP
Given W, choose AB≈W to minimize TotalErrorA(AB) [Yuan, VLDB ’12]
bi-convex
O(n4)
W
Privacy
Optimization Objective
Problem Type Runtime
ε DP Given W consisting of data cube queries, choose A consisting of data cube queries to minimize simplified error
set-cover approx O(n) ε DP
Given W, choose A to minimize TotalErrorA(W)
[Li, PODS ‘10]
SDP w/ rank constraints O(n8) (ε,δ) DP
Given W, choose A to minimize TotalErrorA(W)
[Li, PODS ‘10]
SDP O(n8) ε DP
Given W, choose AB≈W to minimize TotalErrorA(AB) [Yuan, VLDB ’12]
bi-convex
O(n4) (ε,δ) DP
Given W, choose optimal scaling of eigenvectors
convex
O(n4)
v1, v2, ... vn c1v1 c2v2 ... cnvn c1, c2, ... cn
Matrix Mechanism under (ε,δ)-Differential Privacy
using wavelet or hierarchical observations. [Xiao, ICDE ‘10] [Hay, PVLDB
‘10]
workloads for which fixed observation methods were designed; up to 10 times reduction for ad hoc workloads. [Li, PVLDB ‘12]
Note 2: ratios based on root mean squared error. Note 1: comparisons don’t depend on input data or privacy parameters.*
total error of the matrix mechanism is greater than or equal to:
Privacy
Error Lower Bound
ε-DP
(ε,δ)-DP
time.
Costs Fixed Observations Optimized Observations
O(|A|n) O(|A|n)
O(|W|n) O(|W|n2)
preprocessed:
O(|W|n) after pre-computation of WA+: no worse than standard mechanisms
Workload Observations Citation low-order marginals Fourier basis queries
[Barak, PODS ‘07]
all one-dim range queries Hierarchical ranges
[Hay, PVLDB ‘10]
all (multi-dim) range queries Haar wavelet queries
[Xiao, ICDE ‘10]
2-dim range queries Quad-tree queries
[Cormode, ICDE ’12]
sets of data cubes sets of data cubes
[Ding, SIGMOD ’11]
set of linear queries set of linear queries
[Li, PODS ‘10] [Li, PVLDB ‘12]
set of linear queries low-order set of linear queries
[Yuan, VLDB ’12]
mechanism, with error rates significantly reduced and independent
Optimized Fixed
publishable to analyst, and improves efficiency in some cases.
help much.
.
Laplace or Gaussian Mechanism
database
workload W
w1 w2 w3
analyst server
a1 a2 a3 a1(D) + noise a2(D) + noise a3(D) + noise
Observations A
noisy est. w1(D) noisy est. w2(D) noisy est. w3(D)
T test noisy result T’
Select Observations Apply standard mechanism Test dataset Derive workload answers
between testing data and usable observations.
test observations into query answers.
approximation error.
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
between testing data and usable observations.
test observations into query answers.
approximation error.
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x1+x x1+x2+x +x2+x3 x4+x x4+x5 x6+x x6+x7+x +x7+x8+x +x8+x9+x10 +x10
histogram using the exponential mechanism.
counts.
Workload 1D Range Queries Parameters k, ε1, ε2 s.t. ε1+ε2=ε
[Xu, ICDE ’12]
queries)
queries)
queries)
queries)
queries)
regions in K.
squares.
[Xiao, SDM ‘10]
Workload 2D Range Queries Parameters p1, p2, ε1, ε2 s.t. ε1+ε2=ε
1.Build hybrid hierarchical structure:
compute median.
2.Use Laplace mechanism to get noisy counts. 3.Derive workload query answers using least squares.
[Cormode, ICDE ’12]
Workload 2D Range Queries Parameters l, k, ε1, ε2 s.t. ε1+ε2=ε
budget ε/T
values.
random variable with resulting error.
Workload marginals Parameters T, ε
[Xiao, SIGMOD ’11]
xi-1. Select inaccurate qi with exponential mechanism.
[Hardt, NIPS ’12]
Workload linear queries Parameters T, ε1, ε2 s.t. T(ε1+ε2)=ε
squared error O(1/ε2) vs. O(1/ε2/3)
[Hardt, NIPS ’12]
20-40% compared with fixed workload-aware methods like wavelet
workload-aware quad-tree (on random sets of 2D range queries).
[Cormode, ICDE ’12]
range queries on sparse data, multiplicative weights can reduce error by a factor of 10 over matrix mechanism. [Hardt, NIPS ’12]
by a fixed workload-aware method like wavelet.)
Note: ratios based on root mean squared error.
Workload Observations Citation 1D range queries
histogram
[Xu, ICDE ’12]
2D range queries kd-tree queries
[Xiao, SDM ‘10]
2D range queries hybrid kd-tree queries
[Cormode, ICDE ’12]
Marginals scaled workload queries
[Xiao, SIGMOD ’11]
Linear queries subset of workload
[Hardt, NIPS ’12]
generally efficient, but spending privacy budget on testing doesn’t always pay off.
What are “real” data and workloads like? What properties of data determine error?
significant error improvements by building on standard Laplace/ Gaussian mechanisms, but using alternative observations.
the input data.
cases.
selected carefully.
analyst.
they workload-aware?
workload, dataset, epsilon. What are “real” data and workloads like? What properties of data determine error?
analysis easy, error rates publishable to analyst, and improves efficiency in some cases.
domain size, n.
non-negative least squares
lower bounds for DP .
squares if applied in derivation of matrix mechanism?
[Barak, PODS ‘07]
. McSherry, and K. Talwar. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In Principles of Database Systems (PODS) 2007.
[McSherry, FOCS ‘07]
F . McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS ’07
[McSherry, SIGMOD ’09]
F . D. McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. SIGMOD 2009.
[Hay, PVLDB ‘10]
private queries through consistency. PVLDB, 2010.
[Xiao, ICDE ‘10]
International Conference on Data Engineering (ICDE), 2010.
[Li, PODS ‘10]
Queries Under Differential Privacy. Principles of Database Systems (PODS) 2010.
[Xiao, SDM ‘10]
multidimensional partitioning. Secure Data Management (SDM) 2010.
[Xiao, SIGMOD ‘11]
Xiaokui Xiao, Gabriel Bender, Michael Hay, and Johannes Gehrke. iReduct: Differential privacy with reduced relative errors. SIGMOD, 2011.
[Ding, SIGMOD ’11]
sources and consistency. In SIGMOD 2011.
[Xiao, SIGMOD ’11]
relative errors. In SIGMOD, 2011.
[Cormode, ICDE ’12]
spatial decompositions. International Conference on Data Engineering (ICDE), 2012.
[Xu, ICDE ’12]
In ICDE, 2012.
[Li, PVLDB ‘12]
differential privacy. Proceedings of the VLDB Endowment (PVLDB) 2012.
[Yuan, VLDB ’12]
Optimizing batch queries under differential privacy. VLDB, 2012.
[Hardt, NIPS ’12]
. McSherry. A simple and practical algorithm for differentially private data release. In NIPS, 2012.