When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) - PowerPoint PPT Presentation

When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) Du Department of EECS, Syracuse University 1

Random Perturbation Agrawal and Srikant’s SIGMOD paper. Y = X + R Original Data X Random Noise R Disguised Data Y + 2

Random Perturbation Most of the security analysis methods based on randomization treat each attribute separately. Is that enough? Does the relationship among data affect privacy? 3

As we all know … We can’t perturb the same number for several times. If we do that, we can estimate the original data: Let t be the original data, Disguised data: t + R 1 , t + R 2 , …, t + R m Let Z = [(t+R 1 )+ … + (t+R m )] / m Mean: E(Z) = t Variance: Var(Z) = Var(R) / m 4

This looks familiar … This is the data set (x, x, x, x, x, x, x, x) Random Perturbation: (x+r 1 , x+r 2 ,……, x+r m ) We know this is NOT safe. Observation: the data set is highly correlated. 5

Let’s Generalize! Data set: (x 1 , x 2 , x 3 , ……, x m ) If the correlation among data attributes are high, can we use that to improve our estimation (from the disguised data)? 6

Introduction A heuristic approach toward privacy analysis Principal Component Analysis (PCA) PCA-based data reconstruction Experiment results Conclusion and future work 7

Privacy Quantification: A Heuristic Approach Our goal: to find a best-effort algorithm that reconstructs the original data, based on the available information. Definition n m 1 ∑∑ = * ( , ) P M L D D ⋅ F i , j i , j n m = = i 1 j 1 8

How to use the correlation? High Correlation Data Redundancy Data Redundancy Compression Our goal: Lossy compression: We do want to lose information, but We don’t want to lose too much data, We do want to lose the added noise. 9

PCA Introduction The main use of PCA: reduce the dimensionality while retaining as much information as possible. 1 st PC: containing the greatest amount of variation. 2 nd PC: containing the next largest amount of variation. 10

Original Data 11

After Dimension Reduction 12

For the Original Data They are correlated. If we remove 50% of the dimensions, the actual information loss might be less than 10%. 13

For the Random Noises They are not correlated. Their variance is evenly distributed to any direction. If we remove 50% of the dimensions, the actual noise loss should be 50%. 14

Data Reconstruction Applying PCA Find Principle Components: C = Q Λ Q T Set to be the first p columns of Q. Q Reconstruct the data: = T X Y Q Q = + = + T T T ( X R ) Q Q X Q Q R Q Q 15

Random Noise R How does affect accuracy? T R Q Q Theorem: p = T ( ) ( ) , V a r R Q Q V a r R m 16

How to Conduct PCA on Disguised Data? Estimating Covariance Matrix = + + ( , ) ( , ) C o v Y Y C o v X R X R i j i i j j  + σ =  2 C o v ( X , X ) , fo r i j =  i j ≠  C o v ( X , X ), fo r i j  i j 17

Experiment 1: Increasing the Number of Attributes Uniform Distribution Normal Distribution 18

Experiment 2: Increasing the number of Principal Components Uniform Distribution Normal Distribution 19

Experiment 3: Increasing Standard Deviation of Noises Normal Distribution Uniform Distribution 20

Conclusions Privacy analysis based on individual attributes is not sufficient. Correlation can disclose information. PCA can filter out some randomness from a highly correlated data set. When does randomization fail: Answer: when the data correlation is high. Can it be cured? 21

Future Work How to improve the randomization to reduce the information disclosure? Making random noises correlated? How to combine the PCA with the univariate data reconstruction? 22

When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) - PowerPoint PPT Presentation

When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) Du Department of EECS, Syracuse University 1 Random Perturbation Agrawal and Srikants SIGMOD paper. Y = X + R Original Data X Random Noise R Disguised Data Y + 2

Randomization Algorithm Theory WS 2012/13 Fabian Kuhn Randomization Randomized Algorithm: An

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

THE FUTURE OF FOIA: FIND, REDACT, DELIVER PROTECT LIFE. PROTECT LIFE. PROTECT TRUTH. PROTECT

Anonymity & Privacy Alice Privacy EU directives (e.g. 95/46/EC) to protect privacy.

Stage III of Social Subprojects Selection, Youth Corps Project Randomization (computer-based

Experience with MAC Address Randomization in Windows 10 Christian Huitema Huitema@microsoft.com

Beyond Domain Randomization Josh Tobin 6/23/19 Goals for this talk Understand domain

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

mndag 13 maj 13 OVERVIEW Fail-recovery Precedence (1,N) Logged register Byzantine (1,N)

Cut Not and Fail Cut, Not, and Fail York University CSE 3401 Vida Movahedi 1 York University

Privacy by Design Principles of Privacy-Aware Ubiquitous Systems Marc Langheinrich Privacy by

Privacy as a Service Raymond Cheng Build practical cloud services that protect user privacy from

Introduction to Cybersecurity Database Privacy Review: Anonymity vs. Privacy Privacy -

Using HUD's CNA e-Tool for RAD Transactions Office of Recapitalization December 7, 2017 Webinar

Dimensionality Reduction & Embedding Prof. Mike Hughes Many ideas/slides attributable to:

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

Probabilistic Dimensionality Reduction Neil D. Lawrence Amazon Research Cambridge and University

Fast algorithms for sparse principal component analysis based on Rayleigh quotient iteration

PCA for Distributed Data Sets Raymond H. Chan Department of Mathematics The Chinese University

Practical Bioinformatics Mark Voorhies 5/21/2019 Mark Voorhies Practical Bioinformatics Change

Sambuz

Useful Links

Newsletter

Mail Us

When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) - PowerPoint PPT Presentation

When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) Du Department of EECS, Syracuse University 1 Random Perturbation Agrawal and Srikants SIGMOD paper. Y = X + R Original Data X Random Noise R Disguised Data Y + 2

Randomization Algorithm Theory WS 2012/13 Fabian Kuhn Randomization Randomized Algorithm: An

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

THE FUTURE OF FOIA: FIND, REDACT, DELIVER PROTECT LIFE. PROTECT LIFE. PROTECT TRUTH. PROTECT

Anonymity &amp; Privacy Alice Privacy EU directives (e.g. 95/46/EC) to protect privacy.

Stage III of Social Subprojects Selection, Youth Corps Project Randomization (computer-based

Experience with MAC Address Randomization in Windows 10 Christian Huitema Huitema@microsoft.com

Beyond Domain Randomization Josh Tobin 6/23/19 Goals for this talk Understand domain

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

mndag 13 maj 13 OVERVIEW Fail-recovery Precedence (1,N) Logged register Byzantine (1,N)

Cut Not and Fail Cut, Not, and Fail York University CSE 3401 Vida Movahedi 1 York University

Privacy by Design Principles of Privacy-Aware Ubiquitous Systems Marc Langheinrich Privacy by

Privacy as a Service Raymond Cheng Build practical cloud services that protect user privacy from

Introduction to Cybersecurity Database Privacy Review: Anonymity vs. Privacy Privacy -

Using HUD's CNA e-Tool for RAD Transactions Office of Recapitalization December 7, 2017 Webinar

Dimensionality Reduction &amp; Embedding Prof. Mike Hughes Many ideas/slides attributable to:

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst. Fall

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

Probabilistic Dimensionality Reduction Neil D. Lawrence Amazon Research Cambridge and University

Fast algorithms for sparse principal component analysis based on Rayleigh quotient iteration

PCA for Distributed Data Sets Raymond H. Chan Department of Mathematics The Chinese University

Practical Bioinformatics Mark Voorhies 5/21/2019 Mark Voorhies Practical Bioinformatics Change

Sambuz

Useful Links

Newsletter

Mail Us

Anonymity & Privacy Alice Privacy EU directives (e.g. 95/46/EC) to protect privacy.

Dimensionality Reduction & Embedding Prof. Mike Hughes Many ideas/slides attributable to: