Di ff erentially-Private Batch Query Answering Exploiting the - PowerPoint PPT Presentation

Queries and workloads 1-dim ranges • 1-dimensional range queries : intervals • Marginals / data cube queries / contingency tables : aggregate over excluded dimensions. • k-dimensional range queries : axis-aligned rectangles • Predicate counting queries : only 0 or 1 coe ffi cients • Linear counting queries : arbitrary coe ffi cients

Queries and workloads 1-dim ranges marginals • 1-dimensional range queries : intervals • Marginals / data cube queries / contingency tables : aggregate over excluded dimensions. • k-dimensional range queries : axis-aligned rectangles • Predicate counting queries : only 0 or 1 coe ffi cients • Linear counting queries : arbitrary coe ffi cients

Queries and workloads 1-dim ranges marginals k-dim ranges • 1-dimensional range queries : intervals • Marginals / data cube queries / contingency tables : aggregate over excluded dimensions. • k-dimensional range queries : axis-aligned rectangles • Predicate counting queries : only 0 or 1 coe ffi cients • Linear counting queries : arbitrary coe ffi cients

Queries and workloads 1-dim ranges marginals k-dim ranges predicate counting queries • 1-dimensional range queries : intervals • Marginals / data cube queries / contingency tables : aggregate over excluded dimensions. • k-dimensional range queries : axis-aligned rectangles • Predicate counting queries : only 0 or 1 coe ffi cients • Linear counting queries : arbitrary coe ffi cients

Queries and workloads 1-dim ranges marginals k-dim ranges predicate counting queries linear counting queries • 1-dimensional range queries : intervals • Marginals / data cube queries / contingency tables : aggregate over excluded dimensions. • k-dimensional range queries : axis-aligned rectangles • Predicate counting queries : only 0 or 1 coe ffi cients • Linear counting queries : arbitrary coe ffi cients

Privacy definitions & mechanisms • Di ff erential privacy A randomized algorithm A provides ( ε , δ ) -di ff erential privacy if: for all neighboring databases D and D’, and for any set of outputs S : Pr [ A ( D ) ∈ S ] ≤ e � Pr [ A ( D � ) ∈ S ] + δ • if δ =0, standard ε -di ff erential privacy: • Laplace(0,b) noise where b=||q|| 1 / ε • if δ >0, approximate ( ε , δ )-di ff erential privacy: • Gaussian(0, σ ) noise where σ = ||q|| 2 (2ln(2/ δ )) 1/2 / ε • Multi-query Laplace/Gaussian mechanism adds independent noise to each query answer. • Exponential mechanism

The sensitivity of a query matrix • For two neighboring databases D and D’, their frequency vectors x and x’ will di ff er in one position, by exactly 1. y 1 1 1 1 1 1 1 1 1 1 1 x 1 x 2 y 2 1 1 1 1 1 0 0 0 0 0 = x x 3 y 3 0 1 0 0 0 0 1 0 0 0 x 4 y 4 1 1 1 1 1 -1 -1 -1 -1 -1 x 5 answers query matrix W x 6 x 7 +1 x 8 x 9 x 10 x’

The sensitivity of a query matrix • For two neighboring databases D and D’, their frequency vectors x and x’ will di ff er in one position, by exactly 1. y 1 1 1 1 1 1 1 1 1 1 1 x 1 x 2 y 2 1 1 1 1 1 0 0 0 0 0 = x x 3 y 3 0 1 0 0 0 0 1 0 0 0 x 4 y 4 1 1 1 1 1 -1 -1 -1 -1 -1 x 5 answers query matrix W x 6 || W || 1 = 4 x 7 +1 x 8 x 9 x 10 x’

The sensitivity of a query matrix • For two neighboring databases D and D’, their frequency vectors x and x’ will di ff er in one position, by exactly 1. y 1 1 1 1 1 1 1 1 1 1 1 x 1 x 2 y 2 1 1 1 1 1 0 0 0 0 0 = x x 3 y 3 0 1 0 0 0 0 1 0 0 0 x 4 y 4 1 1 1 1 1 -1 -1 -1 -1 -1 x 5 answers query matrix W x 6 || W || 1 = 4 x 7 +1 x 8 The L 1 sensitivity of a query matrix is: x 9 the maximum L1 norm of the columns. x 10 x’

The sensitivity of a query matrix • For two neighboring databases D and D’, their frequency vectors x and x’ will di ff er in one position, by exactly 1. y 1 1 1 1 1 1 1 1 1 1 1 x 1 x 2 y 2 1 1 1 1 1 0 0 0 0 0 = x x 3 y 3 0 1 0 0 0 0 1 0 0 0 x 4 y 4 1 1 1 1 1 -1 -1 -1 -1 -1 x 5 answers query matrix W x 6 || W || 1 = 4 x 7 +1 x 8 The L 1 sensitivity of a query matrix is: x 9 the maximum L1 norm of the columns. x 10 The L 2 sensitivity of a query matrix is: x’ the maximum L2 norm of the columns.

Outline 1. Preliminaries 2. Approach 1: workload-aware • Fixed Observations • Optimized Observations 3. Approach 2: data-aware 4. Conclusions

Answering all range queries Goal: answer all range-count queries over x AllRange = { w | w = x i + ... + x j for 1 ≤ i ≤ j ≤ n } x 1 + x 2 + x 3 + x 4 w 1 range(x 1 ,x 4 ) x 1 + x 2 + x 3 w 2 range(x 1 ,x 3 ) x 2 + x 3 + x 4 w 3 range(x 2 ,x 4 ) x 1 + x 2 w 4 range(x 1 ,x 2 ) workload W x 2 + x 3 w 5 range(x 2 ,x 3 ) x 3 + x 4 w 6 range(x 3 ,x 4 ) w 7 range(x 1 ,x 1 ) x 1 w 8 range(x 2 ,x 2 ) x 2 w 9 range(x 3 ,x 3 ) x 3 w 10 range(x 4 ,x 4 ) x 4

Answering all range queries Goal: answer all range-count queries over x AllRange = { w | w = x i + ... + x j for 1 ≤ i ≤ j ≤ n } x 1 + x 2 + x 3 + x 4 w 1 range(x 1 ,x 4 ) x 1 + x 2 + x 3 w 2 range(x 1 ,x 3 ) x 2 + x 3 + x 4 w 3 range(x 2 ,x 4 ) x 1 + x 2 w 4 range(x 1 ,x 2 ) workload W x 2 + x 3 w 5 range(x 2 ,x 3 ) x 3 + x 4 w 6 range(x 3 ,x 4 ) w 7 range(x 1 ,x 1 ) x 1 w 8 range(x 2 ,x 2 ) x 2 w 9 range(x 3 ,x 3 ) x 3 w 10 range(x 4 ,x 4 ) x 4 x = 10 23 16 3

Answering all range queries Goal: answer all range-count queries over x AllRange = { w | w = x i + ... + x j for 1 ≤ i ≤ j ≤ n } x 1 + x 2 + x 3 + x 4 w 1 range(x 1 ,x 4 ) w 1 52 x 1 + x 2 + x 3 w 2 range(x 1 ,x 3 ) w 2 49 x 2 + x 3 + x 4 w 3 range(x 2 ,x 4 ) w 3 42 x 1 + x 2 w 4 range(x 1 ,x 2 ) w 4 33 workload W x 2 + x 3 w 5 range(x 2 ,x 3 ) w 5 39 x 3 + x 4 w 6 range(x 3 ,x 4 ) w 6 19 w 7 range(x 1 ,x 1 ) x 1 w 7 10 w 8 range(x 2 ,x 2 ) x 2 w 8 23 w 9 range(x 3 ,x 3 ) w 9 x 3 16 w 10 range(x 4 ,x 4 ) w 10 x 4 3 x = 10 23 16 3

Method 1: basic Laplace mechanism Workload queries Laplace noise private output x 1 + x 2 + x 3 + x 4 w 1 b 1 w’ 1 x 1 + x 2 + x 3 w 2 b 2 w’ 2 x 2 + x 3 + x 4 w 3 b 3 w’ 3 x 1 + x 2 w 4 b 4 w’ 4 W x 2 + x 3 w 5 b 5 w’ 5 + ( 6 / ε ) x 3 + x 4 w 6 b 6 w’ 6 w 7 x 1 b 7 w’ 7 w 8 x 2 b 8 w’ 8 b 9 w 9 x 3 w’ 9 b 10 w 10 x 4 w’ 10 || W || 1 =6

Method 1: basic Laplace mechanism Workload queries Laplace noise private output x 1 + x 2 + x 3 + x 4 w 1 52 b 1 8.2 w’ 1 60.2 x 1 + x 2 + x 3 w 2 49 b 2 -5.4 w’ 2 44.6 x 2 + x 3 + x 4 w 3 42 b 3 -3.1 w’ 3 38.9 x 1 + x 2 w 4 33 b 4 6.6 w’ 4 39.6 W x 2 + x 3 w 5 39 b 5 -7.9 w’ 5 31.1 + ( 6 / ε ) x 3 + x 4 w 6 19 b 6 2.4 w’ 6 21.4 w 7 x 1 10 b 7 -3.0 w’ 7 7.0 w 8 x 2 23 b 8 -4.9 w’ 8 18.1 16 b 9 6.7 w 9 x 3 w’ 9 22.7 3 b 10 4.6 w 10 x 4 w’ 10 7.6 || W || 1 =6

Method 1: basic Laplace mechanism Workload queries Laplace noise private output x 1 + x 2 + x 3 + x 4 w 1 52 b 1 8.2 w’ 1 60.2 x 1 + x 2 + x 3 w 2 49 b 2 -5.4 w’ 2 44.6 x 2 + x 3 + x 4 w 3 42 b 3 -3.1 w’ 3 38.9 x 1 + x 2 w 4 33 b 4 6.6 w’ 4 39.6 W x 2 + x 3 w 5 39 b 5 -7.9 w’ 5 31.1 + ( 6 / ε ) x 3 + x 4 w 6 19 b 6 2.4 w’ 6 21.4 w 7 x 1 10 b 7 -3.0 w’ 7 7.0 w 8 x 2 23 b 8 -4.9 w’ 8 18.1 Σ =55.4 16 b 9 6.7 w 9 x 3 w’ 9 22.7 3 b 10 4.6 w 10 x 4 w’ 10 7.6 || W || 1 =6

Method 1: basic Laplace mechanism Workload queries Laplace noise private output x 1 + x 2 + x 3 + x 4 w 1 52 b 1 8.2 w’ 1 60.2 x 1 + x 2 + x 3 w 2 49 b 2 -5.4 w’ 2 44.6 x 2 + x 3 + x 4 w 3 42 b 3 -3.1 w’ 3 38.9 x 1 + x 2 w 4 33 b 4 6.6 w’ 4 39.6 W x 2 + x 3 w 5 39 b 5 -7.9 w’ 5 31.1 + ( 6 / ε ) x 3 + x 4 w 6 19 b 6 2.4 w’ 6 21.4 w 7 x 1 10 b 7 -3.0 w’ 7 7.0 w 8 x 2 23 b 8 -4.9 w’ 8 18.1 Σ =55.4 16 b 9 6.7 w 9 x 3 w’ 9 22.7 3 b 10 4.6 w 10 x 4 w’ 10 7.6 || W || 1 =6 n n=4 Sensitivity || W || 1 6 O(n 2 ) Error per query 2( || W || 1 / ε ) 2 = 72/ ε 2 2( || W || 1 / ε ) 2 = O(n 4 )/ ε 2

Method 2: noisy frequency counts Use Laplace mechanism to get noisy estimates for each x i . Observation queries submitted Laplace noise private output b 1 z 1 x 1 b 2 z 2 x 2 + ( 1 / ε ) I x 3 b 3 z 3 x 4 b 4 z 4 || I || 1 =1

Method 2: noisy frequency counts Use Laplace mechanism to get noisy estimates for each x i . Observation derived queries submitted Laplace noise private output workload answers z 1 + z 2 + z 3 + z 4 w’ 1 b 1 z 1 x 1 z 1 + z 2 + z 3 w’ 2 b 2 z 2 x 2 + ( 1 / ε ) I z 2 + z 3 + z 4 w’ 3 x 3 b 3 z 3 z 1 + z 2 w’ 4 x 4 b 4 z 4 z 2 + z 3 w’ 5 z 3 + z 4 w’ 6 || I || 1 =1 w’ 7 z 1 w’ 8 z 2 w’ 9 z 3 w’ 10 z 4

Method 2: noisy frequency counts Use Laplace mechanism to get noisy estimates for each x i . Observation derived queries submitted Laplace noise private output workload answers z 1 + z 2 + z 3 + z 4 w’ 1 b 1 z 1 x 1 z 1 + z 2 + z 3 w’ 2 b 2 z 2 x 2 + ( 1 / ε ) I z 2 + z 3 + z 4 w’ 3 x 3 b 3 z 3 z 1 + z 2 w’ 4 x 4 b 4 z 4 z 2 + z 3 w’ 5 z 3 + z 4 w’ 6 || I || 1 =1 w’ 7 z 1 w’ 8 z 2 w’ 9 z 3 w’ 10 z 4 For w=range(x i, x j ) Error(w)= 2(j-i+1)/ ε 2

Method 2: noisy frequency counts Use Laplace mechanism to get noisy estimates for each x i . Observation derived queries submitted Laplace noise private output workload answers 8/ ε 2 z 1 + z 2 + z 3 + z 4 w’ 1 b 1 z 1 x 1 z 1 + z 2 + z 3 w’ 2 b 2 z 2 x 2 + ( 1 / ε ) I z 2 + z 3 + z 4 w’ 3 x 3 b 3 z 3 z 1 + z 2 w’ 4 x 4 b 4 z 4 z 2 + z 3 w’ 5 z 3 + z 4 w’ 6 || I || 1 =1 w’ 7 z 1 w’ 8 z 2 w’ 9 z 3 2/ ε 2 w’ 10 z 4 For w=range(x i, x j ) Error(w)= 2(j-i+1)/ ε 2

[Hay, PVLDB 10] Method 3: hierarchical observations Hierarchical queries : recursively partition the domain, computing sums of each interval. Observation Laplace noise private output queries submitted x 1 + x 2 + x 3 + x 4 b 1 z 1 x 1 + x 2 b 2 z 2 x 3 + x 4 b 3 z 3 + ( 3 / ε ) H x 1 b 4 z 4 x 2 b 5 z 5 x 3 b 6 z 6 x 4 b 7 z 7 || H || 1 = 3 = logn+1

[Hay, PVLDB 10] Method 3: hierarchical observations Hierarchical queries : recursively partition the domain, computing sums of each interval. derived Observation Laplace noise private output workload answers queries submitted x 1 + x 2 + x 3 + x 4 w’ 1 b 1 z 1 w’ 2 x 1 + x 2 b 2 z 2 w’ 3 x 3 + x 4 b 3 z 3 + ( 3 / ε ) H w’ 4 x 1 b 4 z 4 ? w’ 5 x 2 b 5 z 5 w’ 6 x 3 b 6 z 6 w’ 7 x 4 b 7 z 7 w’ 8 || H || 1 = 3 w’ 9 = logn+1 w’ 10

[Hay, PVLDB 10] Method 3: hierarchical observations Hierarchical queries : recursively partition the domain, computing sums of each interval. derived Observation Laplace noise private output workload answers queries submitted x 1 + x 2 + x 3 + x 4 w’ 1 b 1 z 1 w’ 2 x 1 + x 2 b 2 z 2 w’ 3 x 3 + x 4 b 3 z 3 + ( 3 / ε ) H w’ 4 x 1 b 4 z 4 ? w’ 5 x 2 b 5 z 5 w’ 6 x 3 b 6 z 6 w’ 7 x 4 b 7 z 7 w’ 8 || H || 1 = 3 w’ 9 = logn+1 w’ 10 Possible estimates for query range(x 2 ,x 3 ) = x 2 + x 3

[Hay, PVLDB 10] Method 3: hierarchical observations Hierarchical queries : recursively partition the domain, computing sums of each interval. derived Observation Laplace noise private output workload answers queries submitted x 1 + x 2 + x 3 + x 4 w’ 1 b 1 z 1 w’ 2 x 1 + x 2 b 2 z 2 w’ 3 x 3 + x 4 b 3 z 3 + ( 3 / ε ) H w’ 4 x 1 b 4 z 4 ? w’ 5 x 2 b 5 z 5 w’ 6 x 3 b 6 z 6 w’ 7 x 4 b 7 z 7 w’ 8 || H || 1 = 3 w’ 9 = logn+1 w’ 10 Possible estimates for query range(x 2 ,x 3 ) = x 2 + x 3 z 5 + z 6

[Hay, PVLDB 10] Method 3: hierarchical observations Hierarchical queries : recursively partition the domain, computing sums of each interval. derived Observation Laplace noise private output workload answers queries submitted x 1 + x 2 + x 3 + x 4 w’ 1 b 1 z 1 w’ 2 x 1 + x 2 b 2 z 2 w’ 3 x 3 + x 4 b 3 z 3 + ( 3 / ε ) H w’ 4 x 1 b 4 z 4 ? w’ 5 x 2 b 5 z 5 w’ 6 x 3 b 6 z 6 w’ 7 x 4 b 7 z 7 w’ 8 || H || 1 = 3 w’ 9 = logn+1 w’ 10 Possible estimates for query range(x 2 ,x 3 ) = x 2 + x 3 z 2 - z 4 + z 6 z 5 + z 6

[Hay, PVLDB 10] Method 3: hierarchical observations Hierarchical queries : recursively partition the domain, computing sums of each interval. derived Observation Laplace noise private output workload answers queries submitted x 1 + x 2 + x 3 + x 4 w’ 1 b 1 z 1 w’ 2 x 1 + x 2 b 2 z 2 w’ 3 x 3 + x 4 b 3 z 3 + ( 3 / ε ) H w’ 4 x 1 b 4 z 4 ? w’ 5 x 2 b 5 z 5 w’ 6 x 3 b 6 z 6 w’ 7 x 4 b 7 z 7 w’ 8 || H || 1 = 3 w’ 9 = logn+1 w’ 10 Possible estimates for query range(x 2 ,x 3 ) = x 2 + x 3 z 2 - z 4 + z 6 z 1 - z 4 - z 7 z 5 + z 6

[Hay, PVLDB 10] Method 3: hierarchical observations Hierarchical queries : recursively partition the domain, computing sums of each interval. derived Observation Laplace noise private output workload answers queries submitted x 1 + x 2 + x 3 + x 4 w’ 1 b 1 z 1 w’ 2 x 1 + x 2 b 2 z 2 w’ 3 x 3 + x 4 b 3 z 3 + ( 3 / ε ) H w’ 4 x 1 b 4 z 4 ? w’ 5 x 2 b 5 z 5 w’ 6 x 3 b 6 z 6 w’ 7 x 4 b 7 z 7 w’ 8 || H || 1 = 3 w’ 9 = logn+1 w’ 10 Possible estimates for query range(x 2 ,x 3 ) = x 2 + x 3 Least-squares z 2 - z 4 + z 6 z 1 - z 4 - z 7 z 5 + z 6 (6z 1 + 3z 2 + 3z 3 - 9z 4 + 12z 5 + 12z 6 - 9z 7 )/21 estimate

Error rates: workload of all range queries ε -di ff erential privacy 200000 ε = 0.1 160000 n = 1024 Mean Squared Error 120000 Noisy counts Hierarchical (2) 80000 40000 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Query Width (as fraction of the domain) small ranges big ranges

[Xiao, ICDE 10] Method 4: wavelet queries Wavelet : use Haar wavelet as observations. Observation derived queries submitted Laplace noise private output workload answers b 1 z 1 w’ 1 x 1 + x 2 + x 3 + x 4 w’ 2 b 2 z 2 x 1 + x 2 - x 3 - x 4 + ( 3 / ε ) Y w’ 3 x 1 - x 2 b 3 z 3 w’ 4 ? x 3 - x 4 b 4 z 4 w’ 5 || Y || 1 = 3 w’ 6 w’ 7 = logn+1 w’ 8 w’ 9 w’ 10 Estimate for query range(x 2 ,x 3 ) = x 2 + x 3 .5z 1 + 0z 2 - .5z 3 + .5z 4

Error: workload of all range queries ε -di ff erential privacy ε = 0.1 n = 1024 200000 160000 Mean Squared Error 120000 Identity Hierarchical (2) Wavelet 80000 Hierarchical (4) 40000 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Query Width (as fraction of the domain)

Observations for the workload of all range queries Hierarchical Wavelet Noisy counts x 1 x 1 + x 2 + x 3 + x 4 x 1 + x 2 + x 3 + x 4 x 1 + x 2 x 1 + x 2 - x 3 - x 4 x 2 x 3 + x 4 x 1 - x 2 x 3 x 1 x 3 - x 4 x 4 x 2 x 3 x 4 I H Y Very low sensitivity, but Low sensitivity, and all range queries large ranges estimated can be estimated using no more than badly. logn output entries. 1-dim O(n/ ε 2 ) O(log 3 n/ ε 2 ) O(log 3 n/ ε 2 ) Max/Avg error k-dim O(log 3k n/ ε 2 )

Observations for alternative workloads • Workload : sets of 2D range less accurate queries • Observations : [Cormode, ICDE ’12] • Quad-tree queries ... • Geometrically increasing ε by more accurate level • Workload : sets of low-order marginals H i-1 H i-1 • Observations: [Barak, PODS ‘07] H i = H i-1 -H i-1 • Fourier basis queries

Questions raised Workload Observations Citation [Barak, PODS ‘07] low-order marginals Fourier basis queries Non-adaptive [Hay, PVLDB ‘10] all one-dim range queries Hierarchical ranges [Xiao, ICDE ‘10] all (multi-dim) range queries Haar wavelet queries [Cormode, ICDE ’12] 2-dim range queries Quad-tree queries • Are these observations optimal for the targeted workloads? • Which observations should we use for other custom workloads?

Questions raised Workload Observations Citation [Barak, PODS ‘07] low-order marginals Fourier basis queries Non-adaptive [Hay, PVLDB ‘10] all one-dim range queries Hierarchical ranges [Xiao, ICDE ‘10] all (multi-dim) range queries Haar wavelet queries [Cormode, ICDE ’12] 2-dim range queries Quad-tree queries • Are these observations optimal for the targeted workloads? • Which observations should we use for other custom workloads? Adapt observations to workload

Outline 1. Preliminaries 2. Approach 1: workload-aware • Fixed Observations • Optimized Observations 3. Approach 2: data-aware 4. Conclusions

Laplace mechanism (matrix notation) Laplace(W,x) = Wx + ( || W || 1 / ε )b W m x n workload x n x 1 database || W || 1 sensitivity scalar noise: independent samples b m x 1 from Laplace(1)

Laplace mechanism (matrix notation) Laplace(W,x) = Wx + ( || W || 1 / ε )b W m x n workload x n x 1 database || W || 1 sensitivity scalar noise: independent samples b m x 1 from Laplace(1) Error(w) = 2 ( || W || 1 / ε ) 2

The matrix mechanism: justification

The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A

The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A ➋ ( Apply Laplace ) Use the Laplace mechanism to answer A

The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A ➋ ( Apply Laplace ) Use the Laplace mechanism to answer A z = Ax + ( || A || 1 / ε )b

The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A ➋ ( Apply Laplace ) Use the Laplace mechanism to answer A z = Ax + ( || A || 1 / ε )b ➌ ( Derive answers ) Compute estimate x of x using answers z .

The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A ➋ ( Apply Laplace ) Use the Laplace mechanism to answer A z = Ax + ( || A || 1 / ε )b ➌ ( Derive answers ) Compute estimate x of x using answers z . • compute estimate x of x that minimizes squared error: 2 ⎟⎜ Ax - z ⎟⎜ 2

The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A ➋ ( Apply Laplace ) Use the Laplace mechanism to answer A z = Ax + ( || A || 1 / ε )b ➌ ( Derive answers ) Compute estimate x of x using answers z . • compute estimate x of x that minimizes squared error: 2 ⎟⎜ Ax - z ⎟⎜ 2 • solution is the ordinary least squares estimator: x=A + z where A + =(A T A) -1 A T

The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A ➋ ( Apply Laplace ) Use the Laplace mechanism to answer A z = Ax + ( || A || 1 / ε )b ➌ ( Derive answers ) Compute estimate x of x using answers z . • compute estimate x of x that minimizes squared error: 2 ⎟⎜ Ax - z ⎟⎜ 2 • solution is the ordinary least squares estimator: Thm : x is unbiased x=A + z where A + =(A T A) -1 A T and has the least variance among all linear unbiased estimators.

The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A ➋ ( Apply Laplace ) Use the Laplace mechanism to answer A z = Ax + ( || A || 1 / ε )b ➌ ( Derive answers ) Compute estimate x of x using answers z . • compute estimate x of x that minimizes squared error: 2 ⎟⎜ Ax - z ⎟⎜ 2 • solution is the ordinary least squares estimator: Thm : x is unbiased x=A + z where A + =(A T A) -1 A T and has the least variance among all • Compute workload queries using estimate x : linear unbiased estimators. Wx

The matrix mechanism Given a workload W , and any full-rank strategy matrix A , the following randomized algorithm is ε -differentially private: Matrix A (W,x) = Wx + ( || A || 1 / ε ) WA + b b=Lap(1)

The matrix mechanism Given a workload W , and any full-rank strategy matrix A , the following randomized algorithm is ε -differentially private: Matrix A (W,x) = Wx + ( || A || 1 / ε ) WA + b b=Lap(1) instantiated with observations A

The matrix mechanism Given a workload W , and any full-rank strategy matrix A , the following randomized algorithm is ε -differentially private: Matrix A (W,x) = Wx + ( || A || 1 / ε ) WA + b b=Lap(1) instantiated with true answer observations A

The matrix mechanism Given a workload W , and any full-rank strategy matrix A , the following randomized algorithm is ε -differentially private: Matrix A (W,x) = Wx + ( || A || 1 / ε ) WA + b b=Lap(1) instantiated with scaling by true answer observations A || A || 1

The matrix mechanism Given a workload W , and any full-rank strategy matrix A , the following randomized algorithm is ε -differentially private: Matrix A (W,x) = Wx + ( || A || 1 / ε ) WA + b b=Lap(1) instantiated with scaling by transformation true answer observations A by WA + || A || 1

The matrix mechanism Given a workload W , and any full-rank strategy matrix A , the following randomized algorithm is ε -differentially private: Matrix A (W,x) = Wx + ( || A || 1 / ε ) WA + b b=Lap(1) instantiated with scaling by transformation true answer observations A by WA + || A || 1 Compare with the Laplace mechanism: Laplace(W,x) = Wx + ( || W || 1 / ε )b

Instances of the matrix mechanism Given workload W of linear queries: Observation Resulting mechanism Matrix A A = W Never worse than Laplace -- sometimes better A = Identity matrix a common baseline A = Haar wavelet [Xiao, ICDE ‘10] [Hay, PVLDB ‘10] [Cormode, ICDE ’12] A = tree based [Barak, PODS ‘07] A = fourier basis

Observation matrices equivalent to wavelet 1 1 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 1 1 > 1 1 -1 -1 ≡ 0 1 0 0 √ 2 0 0 0 1 -1 0 0 0 0 1 0 0 √ 2 0 0 0 0 0 1 0 0 1 -1 0 0 √ 2 0 1 0 0 0 0 0 0 √ 2 0 1 0 0 0 0 1 0 0 0 0 1 Wavelet Y Y’ Y’’ || Y || 1 = 3 || Y’ || 1 = 3 || Y’’ || 1 = 2.414

Observation matrices equivalent to wavelet 1 1 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 1 1 > 1 1 -1 -1 ≡ 0 1 0 0 √ 2 0 0 0 1 -1 0 0 0 0 1 0 0 √ 2 0 0 0 0 0 1 0 0 1 -1 0 0 √ 2 0 1 0 0 0 Equivalent 0 0 0 √ 2 0 1 0 0 error for all 0 0 1 0 queries 0 0 0 1 Wavelet Y Y’ Y’’ || Y || 1 = 3 || Y’ || 1 = 3 || Y’’ || 1 = 2.414

Observation matrices equivalent to wavelet 1 1 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 1 1 > 1 1 -1 -1 ≡ 0 1 0 0 √ 2 0 0 0 1 -1 0 0 0 0 1 0 0 √ 2 0 0 0 0 0 1 0 0 1 -1 0 0 √ 2 0 1 0 0 0 Equivalent Lower 0 0 0 √ 2 0 1 0 0 error for all error for all 0 0 1 0 queries queries 0 0 0 1 Wavelet Y Y’ Y’’ || Y || 1 = 3 || Y’ || 1 = 3 || Y’’ || 1 = 2.414

Observation matrices equivalent to wavelet 1 1 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 1 1 > 1 1 -1 -1 ≡ 0 1 0 0 √ 2 0 0 0 1 -1 0 0 0 0 1 0 0 √ 2 0 0 0 0 0 1 0 0 1 -1 0 0 √ 2 0 1 0 0 0 Equivalent Lower 0 0 0 √ 2 0 1 0 0 error for all error for all 0 0 1 0 queries queries 0 0 0 1 Wavelet Y Y’ Y’’ || Y || 1 = 3 || Y’ || 1 = 3 || Y’’ || 1 = 2.414 The haar wavelet observation matrix Y is dominated by alternative matrix Y’’ .

Error of matrix mechanism Given an observation matrix A and workload W , the error under the mechanism Matrix A is: For a single query w in W : Error A (w) = ( 2/ ε 2 ) ( || A || 1 ) 2 w(A T A) -1 w T Total error for workload W : TotalError A (w) = ( 2/ ε 2 ) ( || A || 1 ) 2 trace( W(A T A) -1 W T ) Error independent of the input data

Optimal selection of observations Objective: given workload W , find the observation matrix A that minimizes the total error.

Optimal selection of observations Objective: given workload W , find the observation matrix A that minimizes the total error. Optimization Objective Problem Type Runtime Privacy (W)

Optimal selection of observations Objective: given workload W , find the observation matrix A that minimizes the total error. Optimization Objective Problem Type Runtime Privacy Given W consisting of data cube queries, choose A ε set-cover O(n) consisting of data cube queries to minimize simplified error DP approx measure. [Ding, SIGMOD ’11] W A TotalError (W) (W)

Di ff erentially-Private Batch Query Answering Exploiting the - PowerPoint PPT Presentation

Di ff erentially-Private Batch Query Answering Exploiting the Workload vs. Exploiting the Data Gerome Miklau University of Massachusetts, Amherst DIMACS Workshop on Recent Work on Di ff erential Privacy across Computer Science October 2012

Batch Systems Running calculations on HPC resources Outline What is a batch system? How

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Di ff erentially Private Empirical Risk Minimization with Non-convex Loss Functions Di Wang ,

Privacy Accounting and Quality Control in the Sage Di ff erentially Private ML Platform Mathias

query answering with description logic ontologies Meghyn Bienvenu ( CNRS & Universit de

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Query answering is the most fundamental problem in DB Query Q Result Q ( D ) Database D SELECT

The Combined Approach to Query Answering in Horn-ALCHOIQ David Carral, Irina Dragoste, Markus

ontology-mediated query answering Harnessing knowledge to get more from data Meghyn Bienvenu (

Query Answering and Rewriting in Ontology-based Data Access Riccardo Rosati DIAG, Sapienza

Question Answering What is Ques+on Answering? Dan Jurafsky Ques%on

HEBT Magnet Vacuum Chambers for Batch 2 and Batch 3 PSP Code 2.3.7.1.2.3.2 Lukas Urban /

Batch Systems Running your jobs on an HPC machine Outline What are batch systems? Why are

Batch Metadata Editing in DSpace 1.6+ Maureen P. Walsh, The Ohio State University Libraries

PROPOSED ZONING DISTRICTS INTRODUCTION WHAT IS HAPPENING Comprehensive revision of the Land

Fina nc ing Your Educ a t ion D r . J o h n B a w o r o w s k y V i c e P r o v o s t f o r

Annexation ANNEX19-04: City of Bryan City-initiated annexation of approximately 1,500 acres of

2 WINN NNERS Delta lta Sigma ma Theta ta Soror ority ity, , Inc. c. Om Omega a Ph Phi

Towards refactoring meta-models into multi-level models as 1 Esther Guerra 2 Juan de Lara 2

EASM 2014 Existing research has extensively investigated the drivers of members commitment at

TAC/CAC, TPO Project Update February 2016 PRESENTATION FOOTERT Multimodal Corridor Planning

Ballston Station Multimodal Improvements Study Transportation Commission May 27, 2010 Summary

Di ff erentially-Private Batch Query Answering Exploiting the - PowerPoint PPT Presentation

Di ff erentially-Private Batch Query Answering Exploiting the Workload vs. Exploiting the Data Gerome Miklau University of Massachusetts, Amherst DIMACS Workshop on Recent Work on Di ff erential Privacy across Computer Science October 2012

Batch Systems Running calculations on HPC resources Outline What is a batch system? How

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Di ff erentially Private Empirical Risk Minimization with Non-convex Loss Functions Di Wang ,

Privacy Accounting and Quality Control in the Sage Di ff erentially Private ML Platform Mathias

query answering with description logic ontologies Meghyn Bienvenu ( CNRS &amp; Universit de

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Query answering is the most fundamental problem in DB Query Q Result Q ( D ) Database D SELECT

The Combined Approach to Query Answering in Horn-ALCHOIQ David Carral, Irina Dragoste, Markus

ontology-mediated query answering Harnessing knowledge to get more from data Meghyn Bienvenu (

Query Answering and Rewriting in Ontology-based Data Access Riccardo Rosati DIAG, Sapienza

Question Answering What is Ques+on Answering? Dan Jurafsky Ques%on

HEBT Magnet Vacuum Chambers for Batch 2 and Batch 3 PSP Code 2.3.7.1.2.3.2 Lukas Urban /

Batch Systems Running your jobs on an HPC machine Outline What are batch systems? Why are

Batch Metadata Editing in DSpace 1.6+ Maureen P. Walsh, The Ohio State University Libraries

PROPOSED ZONING DISTRICTS INTRODUCTION WHAT IS HAPPENING Comprehensive revision of the Land

Fina nc ing Your Educ a t ion D r . J o h n B a w o r o w s k y V i c e P r o v o s t f o r

Annexation ANNEX19-04: City of Bryan City-initiated annexation of approximately 1,500 acres of

2 WINN NNERS Delta lta Sigma ma Theta ta Soror ority ity, , Inc. c. Om Omega a Ph Phi

Towards refactoring meta-models into multi-level models as 1 Esther Guerra 2 Juan de Lara 2

EASM 2014 Existing research has extensively investigated the drivers of members commitment at

TAC/CAC, TPO Project Update February 2016 PRESENTATION FOOTERT Multimodal Corridor Planning

Ballston Station Multimodal Improvements Study Transportation Commission May 27, 2010 Summary

query answering with description logic ontologies Meghyn Bienvenu ( CNRS & Universit de