Modeling Data Correlations in Private Data Mining with Markov Model - PowerPoint PPT Presentation

Modeling Data Correlations in Private Data Mining with Markov Model and Markov Networks Yang Cao Emory University 2017.11.15

Outline • Data Mining with Di ff erential Privacy (DP) • Scenario: Spatiotemporal Data Mining using DP • Markov Chain for temporal correlations • Gaussian Random Markov Field for user-user correlations • Summary and open problems

Outline • Data Mining with Di ff erential Privacy • Scenario: Spatiotemporal Data Mining using DP • Markov Chain for temporal correlations • Gaussian Random Markov Field for user-user correlations • Summary and open problems

Data Mining Company Institute sensitive database* ! a t t a c k Attacker Public

Privacy-Preserving Data Mining (PPDM)* How? ε -Differential Privacy! Institute Or X attack Sensitive data noisy data Adversary

What is Differential Privacy • Privacy: the right to be forgotten. • DP: output of an algorithm should NOT be significantly affected by individual’s data. D’ D 1 1 ≈ M( M( Q( ) Q( ) ) ) 0 0 1 0 • Formally, M satisfies ε -DP if… ε ⬆ , privacy ⬇ . ( ) = r ) ( ) log Pr( M Q D e.g. 2 ε -DP means   ) = r ) ≤ ε ( ( ) more privacy loss than ε -DP. ′ Pr( M Q D • e.g., Laplace mechanism: add Lap(1/ ε ) noise to Q(D) • Sequential Composition. e.g., run M twice → 2 ε -DP

An open problem of DP on Correlation Data • When data are independent: D’ D 1 1 ≈ M( M( Q( ) Q( ) ) ⇒ ε -DP ) 0 0 1 0 • When data are correlated (e.g. u1 and u3 always same): D’ D 1 0 ≈ M( M( Q( ) Q( ) ) ) / 0 0 ⇒ ?-DP 1 0 • It is still controversial [*][**] about the “guarantee” of DP [*] Di ff erential Privacy as a Causal Property, https://arxiv.org/abs/1710.05899 [**] https://github.com/frankmcsherry/blog/blob/master/posts/2016-08-29.md

Quantifying DP on Correlated Data • A few recent papers [Cao17][Yang15][Song17] use a Quantification approach to achieve ε -DP (protecting each user private data value) Traditional approach (if attacker knows correlations, ε -DP may not hold): Laplace sensitive Mechanism ε -DP data data Lap(1/ ε ) Quantification approach (protect against attackers with knowledge of correlation): Laplace sensitive model data attacker Mechanism ε -DP data data correlations inference Lap(1/ ε ’ ) [Cao17]: Markov Chain [Yang15]: Gaussian Markov Random Field (GMRF) [Song17]: Bayesian Network

Outline • Data Mining with Di ff erential Privacy • Scenario: Spatiotemporal Data Mining using DP • Markov Chain for temporal correlations • Gaussian Random Markov Field for user-user correlations • Summary and open problems

Spatiotemporal Data Mining with DP Sensitive data Private data ε -DP ε -DP ε -DP D 3 … … r 2 r 3 D 1 D 2 r 1 t= 1 2 3 … t= 1 2 3 t= 1 2 3 .. .. Count Laplace 0 1 3 u1 … Query Noise 0 2 2 loc 3 loc 1 loc 1 loc1 .. loc1 .. 3 1 0 2 0 0 loc2 .. u2 … loc2 .. loc 2 loc 4 loc 5 1 0 1 1 0 1 loc3 .. loc3 .. u3 … loc 2 loc 4 loc 5 2 1 0 1 2 0 loc4 .. loc4 .. Lap(1/ ε ) u4 … loc 4 loc 5 loc 3 1 3 3 0 1 2 loc5 .. loc5 .. (a) Location Data (c) Private Counts (b) True Counts

What types of data correlations ? loc 3 loc 5 colleague couple u2 loc 4 u1 u3 (a) Road Network (b) Social Ties temporal correlation   spatial correlation   for single user for user-user … D 1 D 2 D 3 7:00 8:00 9:00 … u1 … loc 3 loc 1 loc 1 u2 … loc 2 loc 1 loc 1 u3 … loc 2 loc 4 loc 5 u4 … loc 4 loc 5 loc 3 (a) Location Data

Outline • Data Mining with Di ff erential Privacy • Scenario: Spatiotemporal Data Mining using DP • Markov Chain for temporal correlations - what is MC - how can (attacker) learn MC from data - how can (attacker) infer private data using MC • Gaussian Random Markov Field for user-user correlations • Summary and open problems

What is Markov Chain • A Markov chain is a stochastic process with the Markov property. • 1- order Markov property: the state at time t only depends on the state at time t-1 Pr(x_t|x_t-1)=Pr(x_t|x_t-1,…,x_1) • Time-homogeneous: the transition matrix is the same after each step ∀ t>0, Pr(x_t+1|x_t)=Pr(x_t+2|x_t+1) t+1 loc1 loc2 loc3 7:00 8:00 9:00 … loc1 0.2 0.1 0.7 … u1 loc1 loc3 loc2 t loc2 0.1 0.2 0.7 … u2 loc2 loc2 loc2 loc3 0.3 0.4 0.3 … u3 loc3 loc1 loc1 … u4 loc1 loc2 loc2 Transition Matrix Raw Trajectories

How can (attacker) learn MC • If attacker knows partial user trajectory, he can directly learn transition matrix by Maximum Likelihood estimation • If attacker knows road network, he may learn MC using google-like model [*] [*] E. Crisostomi, S. Kirkland, and R. Shorten, “A Google-like model of road network dynamics and its application to regulation and control,” International Journal of Control , vol. 84, no. 3, pp. 633–651, Mar. 2011.

How can (attacker) infer private data using MC Model Attacker Define TPL Find structure of TPL • Model temporal correlations using Markov Chain e.g., user i : loc 1 → loc 3 → loc 2 → … t − 1 l i t ) t l i t − 1 ) (a) Transition Matrix Pr( l i (b) Transition Matrix Pr( l i time t time t-1 loc 1 loc 2 loc 3 loc 1 loc 2 loc 3 time t-1 loc 1 loc 1 0.2 0.3 0.5 0.1 0.2 0.7 time t loc 2 loc 2 0.1 0.1 0.8 0 0 1 loc 3 0.6 0.2 0.2 loc 3 0.3 0.3 0.4 B F P P Backward Temporal Correlation Forward Temporal Correlation i i

How can (attacker) infer private data using MC Model Attacker Define TPL Find structure of TPL • DP can protect against the attacker with knowledge of all tuples + Temporal Correlation ? except the one of victim D t= 1 l i ? u1 loc 3 u2 loc 2 } D K u3 loc 2 T ( D K , P B , P F ) A i ( D K ) A i u4 loc 4 i i B , ∅ ) T ( D K , P A i (i) i T ( D K , ∅ , P F ) (ii) A i i T ( D K , P B , P F ) A i (iii) i i

How can (attacker) infer private data using MC Model Attacker Define TPL Find structure of TPL • Recall the definition of DP: PL 0 ( M ) ≤ ε if , then satisfies ε -DP. M • Definition of TPL:

How can (attacker) infer private data using MC Model Attacker Define TPL Find structure of TPL • Definition of TPL: • If no temporal correlation… TPL = PL 0 Eqn(2)= log Pr( r 1 | l i t ) + ... + log Pr( r t | l i t ) + ... + log Pr( r T | l i t ) t , D k t , D k t , D k Pr( r 1 | l i t ) Pr( r t | l i t ) Pr( r T | l i t ) t ʹ , D k t ʹ , D k t ʹ , D k { { { PL 0 0 0

How can (attacker) infer private data using MC Model Attacker Define TPL Find structure of TPL • Definition of TPL: Hard to quantify • If with temporal correlation… TPL = ? Eqn(2)… Eqn(2)= log Pr( r 1 | l i t ) + ... + log Pr( r t | l i t ) + ... + log Pr( r T | l i t ) t , D k t , D k t , D k Pr( r 1 | l i t ) Pr( r t | l i t ) Pr( r T | l i t ) t ʹ , D k t ʹ , D k t ʹ , D k { { { PL 0 ? ?

⇒ How can (attacker) infer private data using MC Model Attacker Define TPL Find structure of TPL B , ∅ ) T ( D K , ∅ , P T ( D K , P F ) (i) A i (ii) A i i i (BPL) (FPL) T ( D K , P B , P F ) (iii) A i i i (i) (ii) r 1 …. r t-1 r t r t+1 …. r T

How can (attacker) infer private data using MC BPL Model Attacker Define TPL Find structure of TPL • Analyze BPL Backward temporal correlations Eqn(6)= Backward privacy loss function . how to calculate it? ⇒

How can (attacker) infer private data using MC FPL Model Attacker Define TPL Find structure of TPL • Analyze FPL Forward temporal correlations Forward privacy loss function . how to calculate it? ⇒

Calculating BPL & FPL Privacy Quantification Upper bound • We convert the problem of BPL/FPL calculation to finding an optimal solution of a linear-fractional programming problem . • This problem can be solved by simplex algorithm in O(2 n ). • We designed a O(n 2 ) algorithm for quantifying BPL/FPL.

⇒ Calculating BPL & FPL Privacy Quantification Upper bound • Example of BPL under different temporal corr. (i) Strong temporal corr. (ii) Moderate temporal corr. (iii) No temporal corr. 1.0 0.9 0.8 0.7 0.6 Privacy Loss 0.50 0.48 0.45 0.5 0.42 0.39 0.35 0.4 0.30 0.25 0.3 0.18 0.2 0.10 0.1 t=1 2 3 4 5 6 7 8 9 10 Time

Calculating BPL & FPL Privacy Quantification Upper bound q = 0.8; d = 0.1; ε = 0.23 q = 0.8; d = 0; ε = 0.15 q=0.8, d=0.1, ε =0.23 q=0.8, d=0, ε =0.15 BPL BPL 0.8 1.2 Privacy Loss 1.0 case 2 case 1 0.6 (a) (b) 0.8 0.4 0.6 B B P i = ( ) P i = ( ) 0.8 0.2 0.8 0.2 0.1 0.9 0 1 0.4 0.2 0.2 100 t 100 t 20 40 60 80 20 40 60 80 time q = 1; d = 0; ε = 0.23 q = 0.8; d = 0; ε = 0.23 BPL BPL q=0.8, d=0, ε =0.23 q=1, d=0, ε =0.23 3.5 20 (d) 3.0 (c) case 3 case 4 2.5 15 2.0 10 1.5 B B 1 0 P i = ( ) P i = ( ) 0.8 0.2 0 1 0 1 1.0 5 0.5 100 t 100 t 20 40 60 80 20 40 60 80 Refer to Theorem 5 in our paper

Outline • Data Mining with Di ff erential Privacy • Scenario: Spatiotemporal Data Mining using DP • Markov Chain for temporal correlations • Gaussian Random Markov Field for user-user correlations - what is GMRF • Summary and open problems - how can (attacker) learn GMRF from data - how can (attacker) infer private data using GMRF

Modeling Data Correlations in Private Data Mining with Markov Model - PowerPoint PPT Presentation

Modeling Data Correlations in Private Data Mining with Markov Model and Markov Networks Yang Cao Emory University 2017.11.15 Outline Data Mining with Di ff erential Privacy (DP) Scenario: Spatiotemporal Data Mining using DP Markov

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

How do financial correlations grow? How do financial correlations grow? C. Borghesi Borghesi

Statistically-Significant Correlations 11 Oct, 2014 0F 2014 NNN4 Statistically-Significant

Measurements of BB Angular Correlations Measurements of BB Angular Correlations Measurements of

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Data Mining 2020 Frequent Pattern Mining (2) Ad Feelders Universiteit Utrecht October 2, 2020

Introduction What is data mining? to Data mining functionalities Data Mining Major

Data mining Machine Intelligence Thomas D. Nielsen September 2008 Data mining September 2008

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Control AIRS clear-sky radiances AIRS cloudy retrievals Anomaly Correlations computed from 90S

DATA MINING LECTURE 2 What is data? The data mining pipeline What is Data Mining? Data

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

Frequent Pattern Mining Frequent Sequence Mining Frequent Tree Mining Christian Borgelt

Web Mining Andreas Andersson Gustav Strmberg Sandra Stendahl Introduction Web mining o

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

The stochastic heat equation driven by a Gaussian noise: Markov property Doyoon Kim 1 , 2 Raluca

Gaussian Free Field in (self-adjoint) random matrices and random surfaces Alexei Borodin Corners

networked control systems Massimo Franceschetti PhD School on Control of Networked and

Detection of Gauss-Markov Random Field on Nearest-Neighbor Graph A. Anandkumar 1 L. Tong 1 A.

MAXIMUM CONSISTENCY METHOD for Data Fitting under Interval Uncertainty Sergey P. Shary

CSC 411 Lecture 20: Gaussian Processes Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

Gaussian processes - Refresher and some more in insig ights Marcel Lthi Graphics and Vision

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian