cs573 data privacy and security
play

CS573 Data Privacy and Security Data Anonymization (cont.) Li Xiong - PowerPoint PPT Presentation

CS573 Data Privacy and Security Data Anonymization (cont.) Li Xiong Department of Mathematics and Computer Science Emory University Today Cont. Anonymization notions and approaches l-diversity t-closeness Takeaways Attacks on


  1. CS573 Data Privacy and Security Data Anonymization (cont.) Li Xiong Department of Mathematics and Computer Science Emory University

  2. Today • Cont. Anonymization notions and approaches – l-diversity – t-closeness • Takeaways

  3. Attacks on k-Anonymity • K-Anonymity protects against identity disclosure but not provide sufficient protection against attribute disclosure • k-Anonymity does not provide privacy if – Homogeneity attack: Sensitive values in each quasi-identifier group (equivalence class) lack diversity – The attacker has background knowledge A 3-anonymous patient table Homogeneity attack Zipcode Age Disease Bob 476** 2* Heart Disease Zipcode Age 476** 2* Heart Disease 47678 27 476** 2* Heart Disease 4790* ≥ 40 Flu 4790* ≥ 40 Heart Disease Background knowledge attack 4790* ≥ 40 Cancer Carl 476** 3* Heart Disease Zipcode Age 476** 3* Cancer 47673 36 476** 3* Cancer

  4. Another Attempt: l-Diversity [Machanavajjhala et al. ICDE ‘06] Caucas 787XX Flu • Protect against attribute Caucas 787XX Shingles disclosure Caucas 787XX Acne • Sensitive attributes must be Caucas 787XX Flu • “diverse” within each • quasi-identifier equivalence Caucas 787XX Acne class. Caucas 787XX Flu • l-diversity equivalence class: at Asian/AfrAm 78XXX Flu least l “well - represented” values Asian/AfrAm 78XXX Flu for the sensitive attribute Asian/AfrAm 78XXX • l-diversity table: every Acne equivalence class of the table Asian/AfrAm 78XXX Shingles has l-diversity Asian/AfrAm 78XXX Acne Asian/AfrAm 78XXX Flu slide 4

  5. Neither Necessary, Nor Sufficient Original dataset Anonymization A Anonymization B … HIV- Q1 HIV+ Q1 HIV- … HIV- Q1 HIV- Q1 HIV- … HIV- Q1 HIV+ Q1 HIV- … HIV- Q1 HIV- Q1 HIV+ … HIV- Q1 HIV+ Q1 HIV- … HIV+ Q1 HIV- Q1 HIV- … HIV- Q2 HIV- Q2 HIV- … HIV- Q2 HIV- Q2 HIV- 99% HIV-  quasi-identifier group is not “diverse” … HIV- Q2 HIV- Q2 HIV- …yet anonymized database does not leak anything … HIV- Q2 HIV- Q2 HIV- … HIV- Q2 HIV- Q2 HIV- 50% HIV-  quasi- identifier group is “diverse” … HIV- Q2 HIV- Q2 Flu This leaks a ton of information 99% have HIV- slide 5

  6. Limitations of l-Diversity • Example: sensitive attribute is HIV+ (1%) or HIV- (99%) – Very different degrees of sensitivity! • l-diversity is unnecessary – 2-diversity is unnecessary for an equivalence class that contains only HIV- records • l-diversity is difficult to achieve – Suppose there are 10000 records in total – To have distinct 2-diversity, there can be at most 10000*1%=100 equivalence classes slide 6

  7. Skewness Attack • Example: sensitive attribute is HIV+ (1%) or HIV- (99%) • Consider an equivalence class that contains an equal number of HIV+ and HIV- records – Diverse, but potentially violates privacy! • l-diversity does not differentiate: – Equivalence class 1: 49 HIV+ and 1 HIV- – Equivalence class 2: 1 HIV+ and 49 HIV- l-diversity does not consider overall distribution of sensitive values! slide 7

  8. Sensitive Attribute Disclosure A 3-diverse patient table Similarity attack Zipcode Age Salary Disease Bob 476** 2* 20K Gastric Ulcer 476** 2* 30K Gastritis Zip Age 476** 2* 40K Stomach Cancer 47678 27 4790* ≥ 40 50K Gastritis 4790* ≥ 40 100K Flu Conclusion 4790* ≥ 40 70K Bronchitis 1. Bob’s salary is in [20k,40k], 476** 3* 60K Bronchitis which is relatively low 476** 3* 80K Pneumonia 2. Bob has some stomach-related 476** 3* 90K Stomach Cancer disease l-diversity does not consider semantics of sensitive values! slide 8

  9. t-Closeness [Li et al. ICDE ‘07] Caucas 787XX Flu Caucas 787XX Shingles Caucas 787XX Acne Distribution of sensitive Caucas 787XX Flu attributes within each Caucas 787XX Acne quasi-identifier group should Caucas 787XX Flu be “close” to their distribution Asian/AfrAm 78XXX Flu in the entire original database Asian/AfrAm 78XXX Flu Asian/AfrAm 78XXX Acne Asian/AfrAm 78XXX Shingles Asian/AfrAm 78XXX Acne Asian/AfrAm 78XXX Flu slide 9

  10. k- Anonymous, “t - Close” Dataset 787XX HIV+ Caucas Flu 787XX HIV- Asian/AfrAm Flu This is k-anonymous, 787XX HIV+ l-diverse and t- close… Asian/AfrAm Shingles 787XX HIV- …so secure, right? Caucas Acne 787XX HIV- Caucas Shingles 787XX HIV- Caucas Acne slide 10

  11. What Does Attacker Know? Bob is Caucasian and I heard he was 787XX HIV+ Caucas Flu admitted to hospital with flu… 787XX HIV- Asian/AfrAm Flu 787XX HIV+ Asian/AfrAm Shingles 787XX HIV- Caucas Acne 787XX HIV- Caucas Shingles 787XX HIV- Caucas Acne slide 11

  12. What Does Attacker Know? Bob is Caucasian and 787XX HIV+ Caucas Flu I heard he was admitted to hospital … 787XX HIV- Asian/AfrAm Flu And I know three other Caucasions admitted to hospital with Acne or 787XX HIV+ Asian/AfrAm Shingles Shingles … 787XX HIV- Caucas Acne 787XX HIV- Caucas Shingles 787XX HIV- Caucas Acne slide 12

  13. Issues with Syntactic Privacy notions • Syntactic – Focuses on data transformation, not on what can be learned from the anonymized dataset – “k - anonymous” dataset can leak sensitive information • “Quasi - identifier” fallacy – Assumes a priori that attacker will not know certain information about his target – Any attribute can be a potential quasi-identifier (AOL example) • Relies on locality – Destroys utility of many real-world datasets slide 13

  14. Some Takeaways • “Security requires a particular mindset. Security professionals - at least the good ones- see the world differently. They can't walk into a store without noticing how they might shoplift. They can't vote without trying to figure out how to vote twice. They just can't help it.” – Bruce Schneier (2008) • Think about how things may fail instead of how it may work

  15. The adversarial mindset: Four Key Questions 1. Security/privacy goal: What policy or good state is meant to be enforced? 2. Adversarial model: Who is the adversary? What is the adversary’s space of possible actions? 3. Mechanisms: Are the right security mechanisms in place to achieve the security goal given the adversarial model? 4. Incentives: Will human factors and economics favor or disfavor the security goal?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend