Data Anonymization that Towards Optimal . . . Leads to the Most - PowerPoint PPT Presentation

Need to Preserve Privacy How to Preserve . . . In Statistical Data . . . Estimating Accuracy . . . Data Anonymization that Towards Optimal . . . Leads to the Most Accurate First Result: . . . We Need to Dismiss . . . Estimates of Statistical How to Also Take into . . . Main Result: Optimal . . . Characteristics Home Page Gang Xiang 1 and Vladik Kreinovich 2 Title Page ◭◭ ◮◮ 1 Applied Biomathematics, 100 North Country Rd. Setauket, NY 11733, USA, gxiang@sigmaxi.net ◭ ◮ 2 Department of Computer Science Page 1 of 22 University of Texas at El Paso Go Back El Paso, TX 79968, USA, vladik@utep.edu Full Screen Close Quit

Need to Preserve Privacy How to Preserve . . . 1. Need to Preserve Privacy In Statistical Data . . . • One of the main objectives of engineering is to help Estimating Accuracy . . . people: Towards Optimal . . . First Result: . . . – civil engineering designs houses in which we live We Need to Dismiss . . . and roads along which we travel, How to Also Take into . . . – electrical engineering designs appliances – and elec- Main Result: Optimal . . . tric networks that help use these appliances. Home Page • To better serve customers, it is important to know as Title Page much as possible about the potential customers. ◭◭ ◮◮ • Customers are reluctant to share information, since ◭ ◮ this information can be potentially used against them. Page 2 of 22 • For example, age can be used by companies to (unlaw- fully) discriminate against older job applicants. Go Back Full Screen • It is thus important to preserve privacy when storing customer data. Close Quit

Need to Preserve Privacy How to Preserve . . . 2. How to Preserve Privacy: k -Anonymity and ℓ - In Statistical Data . . . Diversity Estimating Accuracy . . . • To maintain privacy, we divide the space of all possible Towards Optimal . . . combinations of values ( x 1 , . . . , x n ) into boxes. First Result: . . . We Need to Dismiss . . . • For each record, instead of storing the actual values x i , How to Also Take into . . . we only store the label of the box containing x . Main Result: Optimal . . . • To avoid further loss of privacy, it is important to make Home Page sure that location in a box does not identify a person. Title Page • This is usually achieved by requiring that for some ◭◭ ◮◮ fixed k , each box contains at least k records. ◭ ◮ • It is also not good if all records within a box have the Page 3 of 22 same value of an i -th quantity x i . Go Back • It is thus required that for some ℓ , in each box there are at least ℓ different values of each x i . Full Screen Close Quit

Need to Preserve Privacy How to Preserve . . . 3. Statistical Data Processing In Statistical Data . . . • Our main objective is to predict the desired character- Estimating Accuracy . . . istic x i 0 . Towards Optimal . . . First Result: . . . • In most cases, the dependence is linear, so we must � m We Need to Dismiss . . . find c q s.t. x i 0 ≈ c 0 + c q · x i q . How to Also Take into . . . q =1 Main Result: Optimal . . . • Least Squares Approach leads to: Home Page m N � � Title Page c r · C i q i r = C i 0 i q ; c 0 = E i 0 − c q · E i q . ◭◭ ◮◮ r =1 q =1 ◭ ◮ • We also want to know which quantities are correlated, C ij Page 4 of 22 i.e., we want to estimate ρ ij = . σ i · σ j Go Back • In all these tasks, we need to estimate averages E i , Full Screen variances V i = σ 2 i , covariances C ij , and correlations ρ ij . Close Quit

Need to Preserve Privacy How to Preserve . . . 4. Statistical Characteristics: Reminder In Statistical Data . . . • The means are usually estimated as follows: Estimating Accuracy . . . Towards Optimal . . . N N � � E i = 1 E j = 1 x ( p ) x ( p ) N · i , N · j . First Result: . . . We Need to Dismiss . . . p =1 p =1 How to Also Take into . . . • The covariance is usually estimated as: Main Result: Optimal . . . � � � � N � Home Page C ij = 1 x ( p ) x ( p ) N · − E i · − E j . i j Title Page p =1 ◭◭ ◮◮ • The variance is usually estimated as: ◭ ◮ � � 2 N � V i = 1 x ( p ) Page 5 of 22 N · − E i . i p =1 Go Back Full Screen Close Quit

Need to Preserve Privacy How to Preserve . . . 5. In Statistical Data Processing, Privacy Leads In Statistical Data . . . to Uncertainty Estimating Accuracy . . . • To maintain privacy, we replace each numerical value Towards Optimal . . . x ( p ) with the corresponding interval. First Result: . . . i We Need to Dismiss . . . • Different values from these intervals lead to different How to Also Take into . . . values of the resulting statistical characteristics. Main Result: Optimal . . . • Hence, for each characteristic, we get a whole interval Home Page of possible values. Title Page • If this interval is too wide, the resulting range is useless: ◭◭ ◮◮ e.g., [ − 1 , 1] for correlation. ◭ ◮ • It is therefore desirable to select: Page 6 of 22 – among all possible subdivisions into boxes which Go Back preserve k -anonymity (and ℓ -diversity), Full Screen – the one which leads to the narrowest intervals for the desired statistical characteristic. Close Quit

Need to Preserve Privacy How to Preserve . . . 6. Estimating Accuracy Caused by Privacy-Based In Statistical Data . . . Subdivision into Boxes: Case of k -Anonymity Estimating Accuracy . . . • To minimize uncertainty, we select the smallest boxes. Towards Optimal . . . First Result: . . . • Hence, each box B should have exactly k records. We Need to Dismiss . . . x i +∆ i ], instead of C ( x (1) 1 , . . . , x ( N ) • For intervals [ � x i − ∆ i , � n ), How to Also Take into . . . we get: Main Result: Optimal . . . x (1) 1 + ∆ x (1) n ) , where | ∆ x ( p ) x ( N ) + ∆ x ( N ) C ( � 1 , . . . , � i | ≤ ∆ i . Home Page n • When we have many records, boxes are small, so we Title Page can use a linear approximation: ◭◭ ◮◮ N n � � ∂C ◭ ◮ · ∆ x ( p ) C = � C + i . ∂x i Page 7 of 22 p =1 i =1 Go Back • The range of this linear expression is [ � C − ∆ , � C + ∆], � � � � � � � � � N � n = k · � � n ∂C ∂C Full Screen def � · ∆ ( p ) � � � � where ∆ = � · ∆ i . � � i ∂x i ∂x i p =1 i =1 B i =1 Close Quit

Need to Preserve Privacy How to Preserve . . . 7. Expressions for the Corr. Partial Derivatives In Statistical Data . . . • The estimate for the accuracy ∆ is described in terms Estimating Accuracy . . . of partial derivatives ∂C Towards Optimal . . . of the stat. characteristic C . ∂x i First Result: . . . = 1 • For the mean E i , the derivative is equal to ∂E i We Need to Dismiss . . . N . ∂x i How to Also Take into . . . = 2 · ( x i − E i ) • For the variance V i , we have ∂V i Main Result: Optimal . . . . ∂x i N Home Page • Therefore, for σ i = √ V i , we get ∂σ i = x i − E x Title Page . ∂x i σ x ◭◭ ◮◮ • For the covariance C ij , we have ∂C ij = x j − E j . ◭ ◮ ∂x i N Page 8 of 22 • For the correlation ρ ij , we have: Go Back ( x j − E j ) − C ij · ( x i − E i ) σ 2 ∂ρ ij = 1 Full Screen i N · . σ i · σ j ∂x i Close Quit

Need to Preserve Privacy How to Preserve . . . 8. Towards Optimal Subdivision into Boxes In Statistical Data . . . • The overall expression for ∆ is a sum of terms corre- Estimating Accuracy . . . sponding to different points. Towards Optimal . . . First Result: . . . • So, to minimize ∆, we must, for each point, minimize � � � � � n ∂C We Need to Dismiss . . . def � � the corresponding term a i · ∆ i , where a i = � . � ∂x i How to Also Take into . . . i =1 Main Result: Optimal . . . • The only constraint on the values ∆ i is that the corre- Home Page sponding box should contain exactly k different points. Title Page • The number of points can be obtained by multiplying � n ◭◭ ◮◮ the data density ρ ( x ) by the box volume (2∆ i ). i =1 ◭ ◮ • The data density can be estimated based on the data. Page 9 of 22 � n • So, we minimize a i · ∆ i under the constraint Go Back i =1 Full Screen n � ρ ( x ) · 2 n · ∆ i = k. Close i =1 Quit

Need to Preserve Privacy How to Preserve . . . 9. First Result: (Asymptotically) Optimal Subdi- In Statistical Data . . . vision into Boxes (Case of k -Anonymity) Estimating Accuracy . . . • Method: Lagrange multiplier technique leads to Towards Optimal . . . � � � � ∆ i = c ( x ) ∂C First Result: . . . � � , where a i = � . � a i ∂x i We Need to Dismiss . . . � How to Also Take into . . . � n � � • From the constraint, we get c ( x ) = 1 � k Main Result: Optimal . . . 2 · ρ ( x ) · a j . n Home Page j =1 Title Page • Conclusion: around each point x , we need to select the box with half-widths ◭◭ ◮◮ � � n ◭ ◮ � a j n ∆ i = 1 k j =1 Page 10 of 22 2 · ρ ( x ) · . n a i Go Back • The resulting accuracy: ∆ = n · � c ( x ) , where the sum Full Screen x is taken over all N data points x . Close Quit

Data Anonymization that Towards Optimal . . . Leads to the Most - PowerPoint PPT Presentation

Need to Preserve Privacy How to Preserve . . . In Statistical Data . . . Estimating Accuracy . . . Data Anonymization that Towards Optimal . . . Leads to the Most Accurate First Result: . . . We Need to Dismiss . . . Estimates of

CS573 Data Privacy and Security Anonymization methods Anonymization methods Li Xiong Today

CS573 Data Privacy and Security Anonymization methods Anonymization methods Li Xiong Today

Introduction to Anonymization (I) Claire McKay Bowen Postdoctoral Researcher, Los Alamos

CS573 Data Privacy and Security Anonymization methods Anonymization methods Li Xiong Today

Data Privacy Anonymization Li Xiong CS573 Data Privacy and Security Outline Inference

Data Masking and Anonymization for PostgreSQL 1 The Anonymization Challenge 8 Strategies

Towards Plausible Graph Anonymization Yang Zhang, Mathias Humbert, Bartlomiej Surma, Praveen

Issues of Data Mining Kyle Borah OutLine Background Data Anonymization Encryption

CS573 Data Privacy and Security Data Anonymization (cont.) Li Xiong Department of Mathematics

Big Data and the application of anonymization techniques Annual Privacy Forum 2015 7-8 October,

Anonymization Algorithms - Microaggregation and Clustering Li Xiong CS573 Data Privacy and

Sequential Composition Claire McKay Bowen Postdoctoral Researcher, Los Alamos National Laboratory

Quantifying the Risk of Re-identification in Data Anonymization Competition Takao Murakami

Laplace Sanitizer Claire McKay Bowen Postdoctoral Researcher, Los Alamos National Laboratory

Anonymization Algorithms - Other techniques, metrics, and extended scenarios Li Xiong CS573

Encryption and Anonymization in Hadoop Current and Future needs Sept-28-2015 ApacheCon, Budapest

Parameter estimation for pedestrian dynamics models Susana Gomes Mathematics Institute,

Dual Priority Scheduling is Not Optimal Pontus Ekberg Uppsala University ECRTS 2019 Stuttgart,

Assessment and Diagnosis of Psychoactive Substance Use Disorders Winter 2005 Glenn Maynard, LPC

Solutions of Equations in One Variable Fixed-Point Iteration II Numerical Analysis (9th Edition)

r t

Outline Outline Markov Processes Markov Processes Important Properties

Continuous Time Markov Chain Birth and Death Process IE 502: Probabilistic Models Jayendran

Course Specifications/Detailed Course Outline Course code : STA 331 2.0 Course title :