 
              Smooth Sensitivity and Sampling CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 7 : 590.03 Fall 12 1
Project Topics • 2-3 minute presentations about each project topic. • 1-2 minutes of questions about each presentation. Lecture 7 : 590.03 Fall 12 2
Recap: Differential Privacy For every pair of inputs For every output … that differ in one value D 1 D 2 O Adversary should not be able to distinguish between any D 1 and D 2 based on any O Pr[A(D 1 ) = O] log < ε ( ε >0) Pr[A(D 2 ) = O] . Lecture 7 : 590.03 Fall 12 3
Recap: Laplacian Distribution Query q Database True answer q(d) + η q(d) Researcher Privacy depends on η the λ parameter h( η ) α exp(- η / λ ) Laplace Distribution – Lap( λ ) 0.6 Mean: 0, 0.4 Variance: 2 λ 2 0.2 0 Lecture 7 : 590.03 Fall 12 4 -10 -8 -6 -4 -2 0 2 4 6 8 10
Recap: Laplace Mechanism [Dwork et al., TCC 2006] Thm : If sensitivity of the query is S , then the following guarantees ε - differential privacy. λ = S/ ε Sensitivity : Smallest number s.t . for any d, d’ differing in one entry, || q(d) – q(d’) || ≤ S(q) Lecture 7 : 590.03 Fall 12 5
Sensitivity of Median function • Consider a dataset containing salaries of individuals – Salary can be anywhere between $200 to $200,000 • Researcher wants to compute the median salary. • What is the sensitivity? Lecture 7 : 590.03 Fall 12 6
Queries with Large Sensitivity • Median, MAX, MIN … • Let {x 1 , …, x 10 } be numbers in [0, Λ ]. (assume x i are sorted) • q med (x 1 , …, x 10 ) = x 5 Sensitivity of q med = Λ – d 1 = {0, 0, 0, 0, 0, Λ , Λ , Λ , Λ , Λ } – q med (d 1 ) = 0 – d 2 = {0, 0, 0, 0, Λ , Λ , Λ , Λ , Λ , Λ } – q med (d 2 ) = Λ Lecture 7 : 590.03 Fall 12 7
Minimum Spanning Tree • Graph G = (V,E) • Each edge has weight between 0, Λ • What is Global Sensitivity of cost of minimum spanning tree? Λ • Consider complete graph with all Λ Λ edge weights = Λ . Cost of MST = 3 Λ • Suppose one of the edge’s weight 0 is changed to 0 Λ Λ Cost of MST = 2 Λ Lecture 7 : 590.03 Fall 12 8
k-means Clustering • Input: set of points x 1 , x 2 , …, x n from R d • Output: A set of k cluster centers c1, c2, …, ck such that the following function is minimized. Lecture 7 : 590.03 Fall 12 9
Global Sensitivity of Clustering Lecture 7 : 590.03 Fall 12 10
Queries with Large Sensitivity However for most inputs q med is not very sensitive. d x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 d’ x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 Λ 0 x 4 ≤ q med (d’) ≤ x 6 Sensitivity of q med at d = max(x 5 – x 4 , x 6 – x 5 ) << Λ d’ differs from d in k=1 entry Lecture 7 : 590.03 Fall 12 11
Local Sensitivity of q at d – LS q (d) [Nissim et al., STOC 2007] Smallest number s.t . for any d’ differing in one entry from d, || q(d) – q(d’) || ≤ LS q (d) Sensitivity = Global sensitivity S(q) = max d LS q (d) Can we add noise proportional to local sensitivity? Lecture 7 : 590.03 Fall 12 12
Noise proportional to Local Sensitivity • d 1 = {0, 0, 0, 0, 0, 0, Λ , Λ , Λ , Λ } differ in one value • d 2 = {0, 0, 0, 0, 0, Λ , Λ , Λ , Λ , Λ } Lecture 7 : 590.03 Fall 12 13
Noise proportional to Local Sensitivity • d 1 = {0, 0, 0, 0, 0, 0, Λ , Λ , Λ , Λ } q med (d 1 ) = 0 LS qmed (d 1 ) = 0 => Noise sampled from Lap(0) • d 2 = {0, 0, 0, 0, 0, Λ , Λ , Λ , Λ , Λ } q med (d 2 ) = 0 LS qmed (d 2 ) = Λ => Noise sampled from Lap( Λ / ε ) Pr[answer > 0 | d 2 ] > 0 Pr[answer > 0 | d 2 ] > 0 = ∞ implies Pr[answer > 0 | d 1 ] = 0 Pr[answer > 0 | d 1 ] = 0 Lecture 7 : 590.03 Fall 12 14
Local Sensitivity LS qmed (d 1 ) = 0 & LS qmed (d 2 ) = Λ implies S(LS q (.)) ≥ Λ LS qmed (d) has very high sensitivity. Adding noise proportional to local sensitivity does not guarantee differential privacy Lecture 7 : 590.03 Fall 12 15
Sensitivity Local Sensitivity Global Sensitivity Smooth Sensitivity D1 D2 D3 D4 D5 D6 Lecture 7 : 590.03 Fall 12 16
Smooth Sensitivity [Nissim et al., STOC 2007] S(.) is a β -smooth upper bound on the local sensitivity if, For all d, S q (d) ≥ LS q (d) For all d, d’ differing in one entry, S q (d) ≤ exp(β ) S q (d’) • The smallest upper bound is called β -smooth sensitivity . S* q (d) = max d ’ ( LS q (d’) exp( -m β ) ) where d and d’ differ in m entries. Lecture 7 : 590.03 Fall 12 17
Smooth sensitivity of q med d x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 8 x 9 x 10 d’ Λ Λ Λ 0 0 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 • x 5-k ≤ q med (d’) ≤ x 5+k • LS(d’) = max(x med+1 – x med , x med – x med-1 ) d’ differs from d in S* qmed (d) = max k (exp(-k β ) x k=3 entries max 5- k ≤med≤ 5+k (x med+1 – x med , x med – x med-1 )) Lecture 7 : 590.03 Fall 12 18
Smooth sensitivity of q med For instance, Λ = 1000, β = 2. d 1 2 3 4 5 6 7 8 9 10 S* qmed (d) = max ( max 0≤k≤4 (exp(- β∙k) ∙ 1), max 5≤k≤10 (exp(- β∙k) ∙ Λ ) ) = 1 Lecture 7 : 590.03 Fall 12 19
Calibrating noise to smooth sensitivity Lecture 7 : 590.03 Fall 12 20
Calibrating noise to smooth sensitivity Theorem • If h is an ( α , β ) admissible distribution • If S q is a β -smooth upper bound on local sensitivity of query q. • Then adding noise from h(S q (D)/ α ) guarantees: P[f(D)  O] ≤ e ε P[f(D’)  O] + δ for all D, D’ that differ in one entry, and for all outputs O. Lecture 7 : 590.03 Fall 12 21
Calibrating Noise for Smooth Sensitivity A(d) = q(d) + Z ∙ (S* q (x) / α ) • Z sampled from h(z) 1/(1 + |z| γ ), γ > 1 • α = ε /4 γ , • S* is ε / γ smooth sensitive P[f(D)  O] ≤ e ε P[f(D’)  O] Lecture 7 : 590.03 Fall 12 22
Calibrating Noise for Smooth Sensitivity • Laplace and Normally distributed noise can also be used. • They guarantee ( ε , δ )-differential privacy. Lecture 7 : 590.03 Fall 12 23
Summary of Smooth Sensitivity • Many functions have large global sensitivity. • Local sensitivity captures sensitivity of current instance. – Local sensitivity is very sensitive. – Adding noise proportional to local sensitivity causes privacy breaches. • Smooth sensitivity – Not sensitive. – Much smaller than global sensitivity. Lecture 7 : 590.03 Fall 12 24
Computing the (Smooth) Sensitivity • No known automatic method to compute (smooth) sensitivity • For some complex functions it is hard to analyze even the sensitivity of the function. Lecture 7 : 590.03 Fall 12 25
Sample and Aggregate Framework Sample without Original Data replacement ( ) Original Function New Aggregation Function Lecture 7 : 590.03 Fall 12 26
Example: Statistical Analysis [Smith STOC’11] • Let T be some statistical point estimator on data (assumed to be drawn i.i.d. from some distribution) • Suppose T takes values from [- Λ/2, Λ/2 ], sensitivity = Λ Solution: • Divide data X into K parts • Compute T on each of the K parts: z 1 , z 2 , …, z K • Compute (z 1 , z 2 , …, z K )/K Lecture 7 : 590.03 Fall 12 27
Example: Statistical Analysis [Smith STOC’11] Solution: • Divide data X into K parts • Compute T on each of the K parts: z 1 , z 2 , …, z K • Compute : Ave K,T = (z 1 , z 2 , …, z K )/K Utility Theorem: Lecture 7 : 590.03 Fall 12 28
Example: Statistical Analysis [Smith STOC’11] Solution: • Divide data X into K parts • Compute T on each of the K parts: z 1 , z 2 , …, z K • Compute : Ave K,T = (z 1 , z 2 , …, z K )/K Privacy: Average is a deterministic algorithm. So does not guarantee differential privacy. (Add noise calibrated to sensitivity of average) Lecture 7 : 590.03 Fall 12 29
Widened Windsor Mean • α -Windsorized Mean: W(z 1 , z 2 , …, z k ) – Round up the α k smallest values to z α k – Round down the α k largest values to z (1- α )k – Compute the mean on the new set of values. • If statistician knows a = z (1- α )k and b = z α k – Sensitivity = |a-b|/k ε • If not known, a and b can be estimated using exponential mechanism. Lecture 7 : 590.03 Fall 12 30
Summary • Local sensitivity can be much smaller than global sensitivity • But local sensitivity may be a very insensitive function. • Need to use a smooth upperbound on local sensitivity • Sample and Aggregate framework helps apply differential privacy when computing sensitivity is hard. Lecture 7 : 590.03 Fall 12 31
Next Class • Optimizing noise when a workload of queries are known. Lecture 7 : 590.03 Fall 12 32
References C. Dwork, F. McSherry, K. Nissim , A. Smith, “Calibrating noise to sensitivity in private data analysis”, TCC 2006 K. Nissim, S. Raskhodnikova , A. Smith, “Smooth Sensitivity and sampling in private data analysis”, STOC 2007 A. Smith, "Privacy-preserving statistical estimation with optimal convergence rates", STOC 2011 Lecture 7 : 590.03 Fall 12 33
Recommend
More recommend