privacy skyline
play

Privacy Skyline: Privacy with Multidimensional Adversarial - PowerPoint PPT Presentation


  1. �������������������������������������������������������� ������������������������� ������� Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge Bee-Chung Chen, Kristen LeFevre University of Wisconsin – Madison Raghu Ramakrishnan Yahoo! Research Bee-Chung Chen 2007 beechung@cs.wisc.edu

  2. �������������������������������������������������������� ������������������������� ������� Example: Medical Record Dataset • A data owner wants to release data for medical research • An adversary wants to discover individuals’ sensitive info Name Age Gender Zipcode Disease Ann 20 F 12345 AIDS Bob 24 M 12342 Flu Cary 23 F 12344 Flu Dick 27 M 12343 AIDS Ed 35 M 12412 Flu Frank 34 M 12433 Cancer Gary 31 M 12453 Cancer Tom 38 M 12455 AIDS 2 Bee-Chung Chen 2007 beechung@cs.wisc.edu

  3. �������������������������������������������������������� ������������������������� ������� What If the Adversary Knows … Age Gender Zipcode Group Group Disease (Ann) 20 F 12345 AIDS (Bob) 24 M 12342 Flu 2* Any 1234* 1 1 Flu (Cary) 23 F 12344 AIDS (Dick) 27 M 12343 (Ed) 35 M 12412 Flu (Frank) 34 M 12433 Cancer 3* 2 2 M 123** Cancer (Gary) 31 M 12453 AIDS (Tom) 38 M 12455 • Without any additional knowledge, Pr(Tom has AIDS) = ¼ • What if the adversary knows “Tom does not have Cancer and Ed has Flu” Pr(Tom has AIDS | above data and above knowledge) = 1 1 3 Bee-Chung Chen 2007 beechung@cs.wisc.edu

  4. �������������������������������������������������������� ������������������������� ������� Privacy with Adversarial Knowledge • Bayesian privacy definition: A released dataset D * is safe if, for any person t and any sensitive value s , safe Pr( t has s | D * , Adversarial Knowledge ) < c – This probability is the adversary’s confidence that person t has sensitive value s , after he sees the released dataset – Equivalent definition: D * is safe if max t , s Pr( t has s | D * , Adversarial Knolwedge) < c Maximum breach probability – Prior work following this intuition: [Machanavajjhala et al., 2006; Martin et al., 2007; Xiao and Tao, 2006] 4 Bee-Chung Chen 2007 beechung@cs.wisc.edu

  5. �������������������������������������������������������� ������������������������� ������� Questions to be Addressed • Bayesian privacy criterion: max Pr( t has s | D * , Adversarial Knowledge ) < c • How to describe describe various kinds of adversarial knowledge – We provide intuitive knowledge expressions that cover three kinds of common adversarial knowledge • How to analyze analyze data safety in the presence of various kinds of possible adversarial knowledge – We propose a skyline tool for what-if analysis in the “knowledge space” • How to efficiently generate generate a safe dataset to release – We develop algorithms (based on a “congregation” property) orders of magnitude faster than the best known dynamic programming technique [Martin et al., 2007] 5 Bee-Chung Chen 2007 beechung@cs.wisc.edu

  6. �������������������������������������������������������� ������������������������� ������� Outline • Theoretical framework (possible-world semantics) – How the privacy breach is defined • Three-dimensional knowledge expression • Privacy Skyline • Efficient and scalable algorithms • Experimental results • Conclusion and future work 6 Bee-Chung Chen 2007 beechung@cs.wisc.edu

  7. �������������������������������������������������������� ������������������������� ������� Theoretical Framework Release candidate D * Original dataset D Name Age Gender Zipcode Disease Age Gender Zipcode Group Group Disease Ann 20 F 12345 AIDS (Ann) 20 F 12345 AIDS Bob 24 M 12342 Flu (Bob) 24 M 12342 Flu 1 1 Flu Cary 23 F 12344 Flu (Cary) 23 F 12344 AIDS Dick 27 M 12343 AIDS (Dick) 27 M 12343 Ed 35 M 12412 Flu (Ed) 35 M 12412 Flu Frank 34 M 12433 Cancer (Frank) 34 M 12433 Cancer 2 2 Cancer Gary 31 M 12453 Cancer (Gary) 31 M 12453 AIDS Tom 38 M 12455 AIDS (Tom) 38 M 12455 • Assume each person has • Each group is called a QI-group only one sensitive value • This abstraction includes (in the talk) • Generalization-based methods • Sensitive attribute can be • Bucketization set-valued (in the paper) 7 Bee-Chung Chen 2007 beechung@cs.wisc.edu

  8. �������������������������������������������������������� ������������������������� ������� Theoretical Framework Reconstruction A reconstruction of D * is intuitively a possible original dataset (possible world) that would generate D * by using the grouping mechanism Reconstructions of Group 2 Release candidate D * Age Gender Zipcode Group Group Disease Ed … Flu (Ann) 20 F 12345 Frank … Cancer AIDS (Bob) 24 M 12342 Flu 1 1 Flu Gary … Cancer (Cary) 23 F 12344 AIDS (Dick) 27 M 12343 Tom … AIDS (Ed) 35 M 12412 Flu (Frank) 34 M 12433 Cancer 2 2 Cancer (Gary) 31 M 12453 Ed … AIDS AIDS (Tom) 38 M 12455 Frank … Cancer Fix Permute Gray … Cancer Assumption: Without any additional knowledge, Tom … Flu every reconstruction is equally likely 8 Bee-Chung Chen 2007 beechung@cs.wisc.edu

  9. �������������������������������������������������������� ������������������������� ������� Probability Definition • Knowledge expression K : Logic sentence [ Martin et al., 2007 ] E.g., K = (Tom[ S ] ≠ Cancer) ∧ (Ed[ S ] = Flu) Pr( Tom[ S ] = AIDS | K , D * ) # of reconstructions of D * that satisfy K ∧ (Tom[ S ] = AIDS) ≡ # of reconstructions of D * that satisfy K • Worst-case disclosure – Knowledge expressions may also include variables E.g., K = (Tom[ S ] ≠ x x ) ∧ ( u u [ S ] ≠ y y ) ∧ ( v v [ S ] = s s → Tom[ S ] = s s ) – Maximum breach probability s | D * , K ) t [ S ] = s max Pr( t The maximization is over variables t , u , v , s , x , y , by substituting them with constants in the dataset 9 Bee-Chung Chen 2007 beechung@cs.wisc.edu

  10. �������������������������������������������������������� ������������������������� ������� What Kinds of Expressions • Privacy criterion: Release candidate D * is safe if max Pr( t [ S ] = s | D * , K ) < c • Prior work by Martin et al., 2007 – K is a conjunction of m implications E.g., K = ( u 1 [ S ] = x 1 → v 1 [ S ] = y 1 ) ∧ … ∧ ( u m [ S ] = x m → v m [ S ] = y m ) – Not intuitive: What is the practical meaning of m implications? – Some limitations: Some simple knowledge cannot be expressed • Complexity for general logic sentences – Computing breach probability is NP-hard • Goal: Identify classes of expressions that are – Useful (intuitive & cover common adversarial knowledge) – Computationally feasible 10 Bee-Chung Chen 2007 beechung@cs.wisc.edu

  11. �������������������������������������������������������� ������������������������� ������� Outline • Theoretical framework • Three-dimensional knowledge expression – Tradeoff between expressiveness and feasibility • Privacy Skyline • Efficient and scalable algorithms • Experimental results • Conclusion and future work 11 Bee-Chung Chen 2007 beechung@cs.wisc.edu

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend