transparency and disclosure risk in data privacy
play

Transparency and disclosure risk in data privacy c Torra 1 Vicen - PowerPoint PPT Presentation

PAIS 2015 Transparency and disclosure risk in data privacy c Torra 1 Vicen March, 2015 1 School of Informatics, University of Sk ovde, Sweden Outline Outline Outline Quantitative measures of risk: record linkage Transparency principle:


  1. PAIS 2015 Transparency and disclosure risk in data privacy c Torra 1 Vicen¸ March, 2015 1 School of Informatics, University of Sk¨ ovde, Sweden

  2. Outline Outline Outline Quantitative measures of risk: record linkage Transparency principle: publication of data processing methods a good practice on data privacy similar to the one in cryptography Risk needs to consider the transparency principle Vicen¸ c Torra; Transparency data privacy PAIS 2015 1 / 61

  3. Outline Outline 1. Introduction • Masking methods • Disclosure risk assessment 2. Transparency • Definition • Attacking Rank Swapping • Attacking Microaggregation 3. Worst-case scenario when measuring disclosure risk 4. Summary PAIS 2015 2 / 61

  4. Introduction > Masking methods Outline Introduction Masking methods PAIS 2015 3 / 61

  5. Introduction > Masking methods Outline Masking methods Masking methods. • Perturbative • Non-perturbative • Synthetic data generators Review • Microaggregation • Rank swapping Vicen¸ c Torra; Transparency data privacy PAIS 2015 4 / 61

  6. Introduction > Masking methods Outline Rank Swapping Rank swapping • For ordinal/numerical attributes • Applied attribute-wise Data : ( a 1 , . . . , a n ) : original data; p : percentage of records Order ( a 1 , . . . , a n ) in increasing order (i.e., a i ≤ a i +1 ) ; Mark a i as unswapped for all i ; for i = 1 to n do if a i is unswapped then Select ℓ randomly and uniformly chosen from the limited range [ i + 1 , min( n, i + p ∗ | X | / 100)] ; Swap a i with a ℓ ; Undo the sorting step ; Vicen¸ c Torra; Transparency data privacy PAIS 2015 5 / 61

  7. Introduction > Masking methods Outline Rank Swapping Rank swapping. • Marginal distributions not modified. • Correlations between the attributes are modified • Good trade-off between information loss and disclosure risk Vicen¸ c Torra; Transparency data privacy PAIS 2015 6 / 61

  8. Introduction > Microaggregation Outline Microaggregation Microaggregation. • Case of two attributes microaggregated together Vicen¸ c Torra; Transparency data privacy PAIS 2015 7 / 61

  9. Introduction > Microaggregation Outline Microaggregation Microaggregation. Application. • k : number of records in the cluster • Partition of the attributes v ′ v ′ v ′ v ′ v 1 v 2 v 3 v 4 1 2 3 4 1 1 1 1 1.66667 2 1.33333 1.66667 2 2 1 2 1.66667 2 1.33333 1.66667 2 3 1 6 1.66667 2 2.33333 5.66667 2 9 1 10 3 7.33333 1.66667 9.66667 3 6 2 2 3 7.33333 1.33333 1.66667 4 1 2 9 4.33333 5 1.66667 9.66667 4 6 2 10 4.33333 5 1.66667 9.66667 4 7 3 2 3 7.33333 2.33333 5.66667 5 8 3 9 4.33333 5 2.33333 5.66667 6 8 4 7 7.66667 8.66667 6 5 8 1 7 2 8.66667 2.66667 6 5 8 9 7 6 7.66667 8.66667 6 5 9 3 8 1 8.66667 2.66667 8.66667 1.33333 9 4 8 2 8.66667 2.66667 8.66667 1.33333 9 9 10 1 7.66667 8.66667 8.66667 1.33333 Vicen¸ c Torra; Transparency data privacy PAIS 2015 8 / 61

  10. Introduction > Disclosure risk Outline Introduction Disclosure risk assesment Vicen¸ c Torra; Transparency data privacy PAIS 2015 9 / 61

  11. Introduction > Disclosure risk Outline Disclosure risk assesment Disclosure risk. • Identity disclosure vs. Attribute disclosure ◦ Attribute disclosure: ⋆ Increase knowledge about an attribute of an individual ◦ Identity disclosure: ⋆ Find/identify an individual in a masked file Vicen¸ c Torra; Transparency data privacy PAIS 2015 10 / 61

  12. Introduction > Disclosure risk Outline Disclosure risk assesment Disclosure risk. • Identity disclosure vs. Attribute disclosure • Boolean vs. quantitative measures Vicen¸ c Torra; Transparency data privacy PAIS 2015 11 / 61

  13. Introduction > Disclosure risk Outline Disclosure risk assesment Disclosure risk. • Identity disclosure vs. Attribute disclosure • Boolean vs. quantitative measures (minimize information loss vs. multiobjetive optimization) Vicen¸ c Torra; Transparency data privacy PAIS 2015 11 / 61

  14. Introduction > Disclosure risk Outline Disclosure risk assesment Disclosure risk. • Identity disclosure vs. Attribute disclosure • Boolean vs. quantitative measures (minimize information loss vs. multiobjetive optimization) Examples. • Boolean definitions of risk ◦ k-Anonymity (Boolean definition / identity disclosure) ◦ differential privacy (Boolean definition / attribute disclosure) • Quantitative measures of risk ◦ Re-identification / Record linkage (for identity disclosure) ◦ Uniqueness (for identity disclosure) ◦ Interval disclosure (for attribute disclosure) Vicen¸ c Torra; Transparency data privacy PAIS 2015 11 / 61

  15. Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • An scenario for identity disclosure: X = id || X nc || X c ◦ Protection of the attributes ⋆ Identifiers. Usually removed or encrypted. ⋆ Confidential. X c are usually not modified. X ′ c = X c . ⋆ Quasi-identifiers. Apply masking method ρ to these attributes. X ′ nc = ρ ( X nc ) . Vicen¸ c Torra; Transparency data privacy PAIS 2015 12 / 61

  16. Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • An scenario for identity disclosure: X = id || X nc || X c ◦ A : File with the protected data set ◦ B : File with the data from the intruder (subset of original X ) (protected / public) B (intruder) A r 1 s 1 Re-identification a Record linkage r a b a 1 a n s b quasi- a 1 a n i 1 , i 2 , ... confidential identifiers quasi- identifiers identifiers Vicen¸ c Torra; Transparency data privacy PAIS 2015 13 / 61

  17. Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • An scenario for identity disclosure ◦ Reidentification using the common attributes (quasi-identifiers): Vicen¸ c Torra; Transparency data privacy PAIS 2015 14 / 61

  18. Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • An scenario for identity disclosure ◦ Reidentification using the common attributes (quasi-identifiers): identity disclosure Vicen¸ c Torra; Transparency data privacy PAIS 2015 14 / 61

  19. Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • An scenario for identity disclosure ◦ Reidentification using the common attributes (quasi-identifiers): identity disclosure ◦ Attribute disclosure may be possible Vicen¸ c Torra; Transparency data privacy PAIS 2015 14 / 61

  20. Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • An scenario for identity disclosure ◦ Reidentification using the common attributes (quasi-identifiers): identity disclosure ◦ Attribute disclosure may be possible when reidentification permits to link confidential values to identifiers (in this case: identity disclosure implies attribute disclosure) Vicen¸ c Torra; Transparency data privacy PAIS 2015 14 / 61

  21. Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • Flexible scenario for identity disclosure ◦ A protected file using a masking method ◦ B (intruder’s) is a subset of the original file. Vicen¸ c Torra; Transparency data privacy PAIS 2015 15 / 61

  22. Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • Flexible scenario for identity disclosure ◦ A protected file using a masking method ◦ B (intruder’s) is a subset of the original file. → intruder with information on only some individuals Vicen¸ c Torra; Transparency data privacy PAIS 2015 15 / 61

  23. Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • Flexible scenario for identity disclosure ◦ A protected file using a masking method ◦ B (intruder’s) is a subset of the original file. → intruder with information on only some individuals → intruder with information on only some characteristics Vicen¸ c Torra; Transparency data privacy PAIS 2015 15 / 61

  24. Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • Flexible scenario for identity disclosure ◦ A protected file using a masking method ◦ B (intruder’s) is a subset of the original file. → intruder with information on only some individuals → intruder with information on only some characteristics ◦ But also, ⋆ B with a schema different to the one of A (different attributes) Vicen¸ c Torra; Transparency data privacy PAIS 2015 15 / 61

  25. Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • Re-identification. Risk as number of re-identifications that might be obtained by an intruder (estimation). Vicen¸ c Torra; Transparency data privacy PAIS 2015 16 / 61

  26. Introduction > Disclosure risk Outline Disclosure risk assesment Quantitative measures for identity disclosure • Re-identification. Risk as number of re-identifications that might be obtained by an intruder (estimation). ◦ When both files have the same schema: record linkage algorithms. Vicen¸ c Torra; Transparency data privacy PAIS 2015 16 / 61

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend