CS573 Data Privacy and Security Data Anonymization (cont.) Li Xiong - PowerPoint PPT Presentation

CS573 Data Privacy and Security Data Anonymization (cont.) Li Xiong Department of Mathematics and Computer Science Emory University

Today • Cont. Anonymization notions and approaches – l-diversity – t-closeness • Takeaways

Attacks on k-Anonymity • K-Anonymity protects against identity disclosure but not provide sufficient protection against attribute disclosure • k-Anonymity does not provide privacy if – Homogeneity attack: Sensitive values in each quasi-identifier group (equivalence class) lack diversity – The attacker has background knowledge A 3-anonymous patient table Homogeneity attack Zipcode Age Disease Bob 476** 2* Heart Disease Zipcode Age 476** 2* Heart Disease 47678 27 476** 2* Heart Disease 4790* ≥ 40 Flu 4790* ≥ 40 Heart Disease Background knowledge attack 4790* ≥ 40 Cancer Carl 476** 3* Heart Disease Zipcode Age 476** 3* Cancer 47673 36 476** 3* Cancer

Another Attempt: l-Diversity [Machanavajjhala et al. ICDE ‘06] Caucas 787XX Flu • Protect against attribute Caucas 787XX Shingles disclosure Caucas 787XX Acne • Sensitive attributes must be Caucas 787XX Flu • “diverse” within each • quasi-identifier equivalence Caucas 787XX Acne class. Caucas 787XX Flu • l-diversity equivalence class: at Asian/AfrAm 78XXX Flu least l “well - represented” values Asian/AfrAm 78XXX Flu for the sensitive attribute Asian/AfrAm 78XXX • l-diversity table: every Acne equivalence class of the table Asian/AfrAm 78XXX Shingles has l-diversity Asian/AfrAm 78XXX Acne Asian/AfrAm 78XXX Flu slide 4

Neither Necessary, Nor Sufficient Original dataset Anonymization A Anonymization B … HIV- Q1 HIV+ Q1 HIV- … HIV- Q1 HIV- Q1 HIV- … HIV- Q1 HIV+ Q1 HIV- … HIV- Q1 HIV- Q1 HIV+ … HIV- Q1 HIV+ Q1 HIV- … HIV+ Q1 HIV- Q1 HIV- … HIV- Q2 HIV- Q2 HIV- … HIV- Q2 HIV- Q2 HIV- 99% HIV-  quasi-identifier group is not “diverse” … HIV- Q2 HIV- Q2 HIV- …yet anonymized database does not leak anything … HIV- Q2 HIV- Q2 HIV- … HIV- Q2 HIV- Q2 HIV- 50% HIV-  quasi- identifier group is “diverse” … HIV- Q2 HIV- Q2 Flu This leaks a ton of information 99% have HIV- slide 5

Limitations of l-Diversity • Example: sensitive attribute is HIV+ (1%) or HIV- (99%) – Very different degrees of sensitivity! • l-diversity is unnecessary – 2-diversity is unnecessary for an equivalence class that contains only HIV- records • l-diversity is difficult to achieve – Suppose there are 10000 records in total – To have distinct 2-diversity, there can be at most 10000*1%=100 equivalence classes slide 6

Skewness Attack • Example: sensitive attribute is HIV+ (1%) or HIV- (99%) • Consider an equivalence class that contains an equal number of HIV+ and HIV- records – Diverse, but potentially violates privacy! • l-diversity does not differentiate: – Equivalence class 1: 49 HIV+ and 1 HIV- – Equivalence class 2: 1 HIV+ and 49 HIV- l-diversity does not consider overall distribution of sensitive values! slide 7

Sensitive Attribute Disclosure A 3-diverse patient table Similarity attack Zipcode Age Salary Disease Bob 476** 2* 20K Gastric Ulcer 476** 2* 30K Gastritis Zip Age 476** 2* 40K Stomach Cancer 47678 27 4790* ≥ 40 50K Gastritis 4790* ≥ 40 100K Flu Conclusion 4790* ≥ 40 70K Bronchitis 1. Bob’s salary is in [20k,40k], 476** 3* 60K Bronchitis which is relatively low 476** 3* 80K Pneumonia 2. Bob has some stomach-related 476** 3* 90K Stomach Cancer disease l-diversity does not consider semantics of sensitive values! slide 8

t-Closeness [Li et al. ICDE ‘07] Caucas 787XX Flu Caucas 787XX Shingles Caucas 787XX Acne Distribution of sensitive Caucas 787XX Flu attributes within each Caucas 787XX Acne quasi-identifier group should Caucas 787XX Flu be “close” to their distribution Asian/AfrAm 78XXX Flu in the entire original database Asian/AfrAm 78XXX Flu Asian/AfrAm 78XXX Acne Asian/AfrAm 78XXX Shingles Asian/AfrAm 78XXX Acne Asian/AfrAm 78XXX Flu slide 9

k- Anonymous, “t - Close” Dataset 787XX HIV+ Caucas Flu 787XX HIV- Asian/AfrAm Flu This is k-anonymous, 787XX HIV+ l-diverse and t- close… Asian/AfrAm Shingles 787XX HIV- …so secure, right? Caucas Acne 787XX HIV- Caucas Shingles 787XX HIV- Caucas Acne slide 10

What Does Attacker Know? Bob is Caucasian and I heard he was 787XX HIV+ Caucas Flu admitted to hospital with flu… 787XX HIV- Asian/AfrAm Flu 787XX HIV+ Asian/AfrAm Shingles 787XX HIV- Caucas Acne 787XX HIV- Caucas Shingles 787XX HIV- Caucas Acne slide 11

What Does Attacker Know? Bob is Caucasian and 787XX HIV+ Caucas Flu I heard he was admitted to hospital … 787XX HIV- Asian/AfrAm Flu And I know three other Caucasions admitted to hospital with Acne or 787XX HIV+ Asian/AfrAm Shingles Shingles … 787XX HIV- Caucas Acne 787XX HIV- Caucas Shingles 787XX HIV- Caucas Acne slide 12

Issues with Syntactic Privacy notions • Syntactic – Focuses on data transformation, not on what can be learned from the anonymized dataset – “k - anonymous” dataset can leak sensitive information • “Quasi - identifier” fallacy – Assumes a priori that attacker will not know certain information about his target – Any attribute can be a potential quasi-identifier (AOL example) • Relies on locality – Destroys utility of many real-world datasets slide 13

Some Takeaways • “Security requires a particular mindset. Security professionals - at least the good ones- see the world differently. They can't walk into a store without noticing how they might shoplift. They can't vote without trying to figure out how to vote twice. They just can't help it.” – Bruce Schneier (2008) • Think about how things may fail instead of how it may work

The adversarial mindset: Four Key Questions 1. Security/privacy goal: What policy or good state is meant to be enforced? 2. Adversarial model: Who is the adversary? What is the adversary’s space of possible actions? 3. Mechanisms: Are the right security mechanisms in place to achieve the security goal given the adversarial model? 4. Incentives: Will human factors and economics favor or disfavor the security goal?

CS573 Data Privacy and Security Data Anonymization (cont.) Li Xiong - PowerPoint PPT Presentation

CS573 Data Privacy and Security Data Anonymization (cont.) Li Xiong Department of Mathematics and Computer Science Emory University Today Cont. Anonymization notions and approaches l-diversity t-closeness Takeaways Attacks on

CS573 Data Privacy and Security Data Privacy and Security in Healthcare Data Privacy and Security

CS573 Data Privacy and Security Local Differential Privacy Li Xiong Privacy at Scale: Local

CS573 Data Privacy and Security Differential Privacy Real World Deployments Li Xiong

CS573 Data Privacy and Security Location Privacy Location Privacy Yonghui (Yohu) Xiao htt //

Healthcare privacy and security Li Xiong CS573 Data Privacy and Security Patients Are Concerned

CS573 Data Privacy and Security Li Xiong Department of Mathematics and Computer Science Emory

CS573 Data Privacy and Security Differential Privacy tabular data and range queries Li Xiong

Data Privacy Anonymization Li Xiong CS573 Data Privacy and Security Outline Inference

Privacy-Preserving Query Processing over Encrypted Data in Cloud CS573 Data Privacy and Security

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques

CS573 Data Privacy and Security Secure Multiparty Computation Problem and security definitions

Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security

CS573 Data Privacy and Security Statistical Databases Statistical Databases Li Xiong Today

CS573 Data Privacy and Security Anonymization methods Anonymization methods Li Xiong Today

CS573 Data Privacy and Security Anonymization methods Anonymization methods Li Xiong Today

An interpretable automated detection system for FISH-based HER2 oncogene amplification testing in

d CISMeF & TIBS, LITIS, Rouen Summary of Product Characteristics (SPC) basis of

Overview of the Draft IRIS Assessment of Ammonia Presentation for the Ammonia Augmented

Worldwide The molecular mechanism underlying the development and progression of gastric cancer

EU-ADR project Paul Avillach , Fleur Mougin, Michel Joubert, Frantz Thiessard, Antoine Pariente,

Post-market Authorized Generic Evaluation (PAGE) U01FD005272-02 November 18 th , 2016 David Page

Thank you for joining us. The program will begin momentarily. Management of Locally Advanced

Recurrent abdominal pain Quak Seng Hock Department of Paediatrics KTP- University Childrens

CS573 Data Privacy and Security Data Anonymization (cont.) Li Xiong - PowerPoint PPT Presentation

CS573 Data Privacy and Security Data Anonymization (cont.) Li Xiong Department of Mathematics and Computer Science Emory University Today Cont. Anonymization notions and approaches l-diversity t-closeness Takeaways Attacks on

CS573 Data Privacy and Security Data Privacy and Security in Healthcare Data Privacy and Security

CS573 Data Privacy and Security Local Differential Privacy Li Xiong Privacy at Scale: Local

CS573 Data Privacy and Security Differential Privacy Real World Deployments Li Xiong

CS573 Data Privacy and Security Location Privacy Location Privacy Yonghui (Yohu) Xiao htt //

Healthcare privacy and security Li Xiong CS573 Data Privacy and Security Patients Are Concerned

CS573 Data Privacy and Security Li Xiong Department of Mathematics and Computer Science Emory

CS573 Data Privacy and Security Differential Privacy tabular data and range queries Li Xiong

Data Privacy Anonymization Li Xiong CS573 Data Privacy and Security Outline Inference

Privacy-Preserving Query Processing over Encrypted Data in Cloud CS573 Data Privacy and Security

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques

CS573 Data Privacy and Security Secure Multiparty Computation Problem and security definitions

Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security

CS573 Data Privacy and Security Statistical Databases Statistical Databases Li Xiong Today

CS573 Data Privacy and Security Anonymization methods Anonymization methods Li Xiong Today

CS573 Data Privacy and Security Anonymization methods Anonymization methods Li Xiong Today

An interpretable automated detection system for FISH-based HER2 oncogene amplification testing in

d CISMeF &amp; TIBS, LITIS, Rouen Summary of Product Characteristics (SPC) basis of

Overview of the Draft IRIS Assessment of Ammonia Presentation for the Ammonia Augmented

Worldwide The molecular mechanism underlying the development and progression of gastric cancer

EU-ADR project Paul Avillach , Fleur Mougin, Michel Joubert, Frantz Thiessard, Antoine Pariente,

Post-market Authorized Generic Evaluation (PAGE) U01FD005272-02 November 18 th , 2016 David Page

Thank you for joining us. The program will begin momentarily. Management of Locally Advanced

Recurrent abdominal pain Quak Seng Hock Department of Paediatrics KTP- University Childrens

d CISMeF & TIBS, LITIS, Rouen Summary of Product Characteristics (SPC) basis of