lo locally differentially private frequency es esti
play

Lo Locally Differentially Private Frequency Es Esti timati tion - PowerPoint PPT Presentation

Lo Locally Differentially Private Frequency Es Esti timati tion on Ex Exploi oiti ting Con Consi sistency Tianhao Wang Purdue University Joint work with Milan Lopuha-Zwakenberg, Zitao Li, Boris Skoric, Ninghui Li 1 Privacy in


  1. Lo Locally Differentially Private Frequency Es Esti timati tion on Ex Exploi oiti ting Con Consi sistency Tianhao Wang Purdue University Joint work with Milan Lopuhaä-Zwakenberg, Zitao Li, Boris Skoric, Ninghui Li 1

  2. Privacy in Practice • Local differential privacy is deployed • In Google Chrome browser, to collect browsing statistics • In Apple iOS and MacOS, to collect typing statistics • In Microsoft Windows, to collect telemetry data over time • In Alibaba, we built a system to collect user transaction info • Different algorithms are proposed. • They work for different tasks and different settings. • They are all based on Randomized Response .

  3. Randomized Response • Survey technique for private questions Pr disease → yes • Survey people: = Pr disease → yes ∧ /012 • “Do you have disease X?” + Pr disease → yes ∧ 4156 • Each person: = 7. 8×1 + 7. 8×0.5 = 0.75 • Flip a secret coin Similarly: • Answer truth if head (w.p. 0.5 ) Pr disease → no = 0.25 • Answer randomly if tail (w.p. 0.5 ): Pr no disease → yes = 0.25 • reply “yes”/“no” w.p. 0.5 Pr no disease → no = 0.75 S L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. JASA. 1965.

  4. Pr disease → yes = 0.75 Randomized Response Pr disease → no = 0.25 Pr no disease → no = 0.25 Pr no disease → yes = 0.75 • To estimate the distribution: • If ! "#$ out of ! people have the disease, we expect to see: An algorithm A is @ -LDP if and only if for any A and A′ , and any valid output C , E[ ' "#$ ] = 0.75! "#$ + 0.25(! − ! "#$ ) “yes” answers DE F G HI DE F GJ HI ≤ L M • Inverting the above equation: ' "#$ − 0.25! ! "#$ = 3 Enumerating possibilities of A and A J taking 0.5 disease or no disease, and C as yes or no, • It is the unbiased estimation of the number of patients the binary randomized response is N!3 -LDP. E[' "#$ ] − 0.25! E[ 3 ! "#$ ] = = ! "#$ 0.5 • Similar for the “no”

  5. Local Differential Privacy (LDP) takes reports from all users and outputs Estimation function is done independent for each value % . • estimations 3(%) for any The result is not consistent. • value % Some may be negative. • Sum may not be 4 (the original number of users). • Noisy Data ! Noisy Data Noisy Data In this work, we explore 10 different methods that improves the • accuracy of LDP by enforcing consistency. A is ' -LDP iff for any % and %′ , • ! = A(%) and any valid output ! , takes input value % and )* + , -. )* + ,/ -. ≤ 1 2 outputs ! . Data Data Data % Data Data Trust boundary

  6. 1) The estimated frequency of Making Estimations Consistent each value is non-negative. 2) The sum of the estimated frequencies is 1. Method Description Non-neg Sum to 1 Complexity Base Use existing estimation No No N/A Several Base-Pos Convert negative est. to 0 Yes No O " Baselines Post-Pos Convert negative query result to 0 Yes No N/A Base-Cut Convert est. below threshold # to 0 Yes No O " Norm Add δ to est. No Yes O " Normalizati Norm-Mul Convert negative est. to 0, then multiply ϒ to positive est. Yes Yes O " on-based Norm-Cut Convert negative and small positive est. below ϑ to 0 Yes Almost O " Methods Norm-Sub Convert negative est. to 0 while adding δ to positive est. Yes Yes O " MLE-based MLE-Apx Convert negative est. to 0, then add δ to positive est. Yes Yes O " Needs Power Fit Power-Law dist., then minimize expected squared error. Yes No O $" More Prior O $" PowerNS Apply Norm-Sub after Power Yes Yes

  7. Post-Processing: Toy Example Estimated Truth 40 40 35 35 Estimated Ratio (%) 30 25 24 30 True Ratio (%) 23 22 20 14 Constraint 1: estimation is non-negative 20 10 5 12 3 0 10 0 3 2 2 1 0 0 -2 -2 -3 -10 0 Base-Pos: Convert Norm-Sub: Additively Occupation Occupation negative to 0 normalize the result 40 34 40 Sum: 106% 35 Estimated Ratio (%) It is the solution to Constraint Estimated Ratio (%) 30 24 23 30 25 Least Square (CLS) and 24 Constraint 2: Sum of 20 Approximate Maximal Likelihood 20 13 14 estimations is known Estimation (MLE) 10 10 5 4 3 2 0 0 0 0 0 0 0 0 0 0 Occupation Occupation

  8. Analysis of the Estimation in LDP • Estimation function ' ()* +,../0 " 1 = ' 2 +30 • ! " #$% = , more generally ! ,./ 4+3 probability of A(:) supporting : probability of A(:′) supporting : where : D ≠ : (disease → yes) Takeaway: The noise of the LDP (no disease → yes) estimation approximately follows • Noise comes from 5 1 , which is the addition of two Binomials Gaussian distribution. • Bin(":, <) + Bin " − ":, @ = Bin ", 0 A 0 < + 0+0A @ 0 This makes the analysis easier (Norm-Sub is solution to MLE). "< D 1 − < D ) for < D = 0 A 0 < + 0+0A • When " is large, noise ≈ C(< D ", @ 0 J, Jia, and N. Gong. Calibrate: Frequency estimation and heavy hitter identification with local differential privacy via incorporating prior knowledge. INFOCOM 2019 .

  9. Empirical Understanding • 1 million reports following Zipf’s Base-Pos: Convert distribution (s=1.5) with 1024 values. negative to 0 • 5000 runs (each dot is the mean). Systematic positive bias to Bias is a bad thing. Should we stop post-processing? infrequent values. No, because it prevents impossible events. But how is it affect the utility? Norm-Sub: Additively Estimated Frequency normalize the result Systematic negative bias to frequent values. Value

  10. Empirical Understanding Variance is smaller for infrequent values. • 1 million reports following Zipf’s distribution (s=1.5) with 1024 values. Base-Pos: Convert negative to 0 • 5000 runs (each dot is the variance). Takeaway Message • Utility is composed of bias and variance • Post processing introduces bias Norm-Sub: Additively but reduces variance Variance normalize the result Estimated • Different method achieves different bias-variance tradeoff

  11. Comparison of Different Methods Multiplicatively normalize the result Mean Squared Error • Norm-Sub > Base-Pos > Base > Norm-Mul • Exploiting constraint may or may not be helpful More Privacy

  12. Comparison of Different Methods • Normalization- Mean Squared Error based methods works better. • MSE is symmetric with ρ = 50 if the estimates sum up to 1. ρ Uniformly sample ρ% elements from the domain. • MSE of estimating a subset of values (set-value). •

  13. Method Description Base Use existing estimation Summary Base-Pos Convert negative est. to 0 Post-Pos Convert negative query result to 0 Base-Cut Convert est. below threshold ! to 0 Norm Add δ to est. • LDP noise follows Gaussian. Norm-Mul Convert negative est. to 0, then multiply ϒ to positive est. Norm-Cut Convert negative and small positive est. below ϑ to 0 • Norm-Sub is the solution to MLE. Norm-Sub Convert negative est. to 0 while adding δ to positive est. • Exploiting priors is helpful. MLE-Apx Convert negative est. to 0, then add δ to positive est. Power Fit Power-Law dist., then minimize expected squared error. • Different method works for PowerNS Apply Norm-Sub after Power different tasks.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend