enhancing privacy in machine learning
play

Enhancing Privacy in Machine Learning Mathias Humbert INSA - PowerPoint PPT Presentation

Enhancing Privacy in Machine Learning Mathias Humbert INSA Toulouse/CNRS Toulouse, January 22, 2019 Enhancing Privacy in Machine Learning data ML What ML? What data? What threat? Mathias Humbert - Enhancing Privacy in Machine Learning


  1. Enhancing Privacy in Machine Learning Mathias Humbert INSA Toulouse/CNRS Toulouse, January 22, 2019

  2. Enhancing Privacy in Machine Learning data ML What ML? What data? What threat? Mathias Humbert - Enhancing Privacy in Machine Learning � 2

  3. Different Attacks: Linkability Robert Alice Marius Eve Ability to link at least two records concerning the same individual If one data set is not anonymized → re-identification Mathias Humbert - Enhancing Privacy in Machine Learning � 3

  4. Different Attacks: Membership Inference Study focusing on 
 HIV patients ? (x,y,z) Ability to infer that a certain target is in a specific dataset Mathias Humbert - Enhancing Privacy in Machine Learning � 4

  5. Trading Off Privacy Privacy Utility ML E ffi ciency What ML? What data? What threat? What defense? Mathias Humbert - Enhancing Privacy in Machine Learning � 5

  6. Different Defense Mechanisms Privacy Utility ML E ffi ciency Anonymization Randomization Differential privacy Cryptography Mathias Humbert - Enhancing Privacy in Machine Learning � 6

  7. Outline of the Talk • Attack - defense - data • Temporal linkability - randomization - microRNA expression ℝ r r ≈ 10 3 USENIX Security’16 • • Re-identification - cryptography - DNA methylation IEEE S&P’17 [0,1] m m ≈ 10 7 • • Membership inference - other defense - any data NDSS'19 • Mathias Humbert - Enhancing Privacy in Machine Learning � 7

  8. Outline of the Talk • Attack - defense - data • Temporal linkability - randomization - microRNA expression USENIX Security’16 • • Re-identification - cryptography - DNA methylation IEEE S&P’17 • • Membership inference - other defense - any data NDSS'19 • Mathias Humbert - Enhancing Privacy in Machine Learning � 8

  9. DNA versus MicroRNA DNA miRNA contains blueprint of what a cell regulates what a cell really • • potentially can do , does , is (mostly) fixed over time , expression changes over time , • • can hint on risks of getting a can tell whether you carry a • • disease . disease . Common belief: no privacy threats from miRNAs, 
 because of temporal variability Mathias Humbert - Enhancing Privacy in Machine Learning � 9

  10. Temporal Linkability Attack • Matching two datasets E.g., a leaked database (incl. name) and public DB (excl. name) • Which sample from t 1 corresponds to which sample from t 2 ? • t 1 t 2 Mathias Humbert - Enhancing Privacy in Machine Learning � 10

  11. Data Pre-processing • High dimensionality: 1,189 miRNAs per sample Possibly correlated and uninteresting components • r t j k • PCA + whitening provides Unit variance • PCA Smaller dimensionality • Uncorrelated components • • Condenses data into a set of smaller dimensions 
 r t j ¯ k with minimal information loss Mathias Humbert - Enhancing Privacy in Machine Learning � 11

  12. Linkability Attack t 2 t 1 Which sample from t 1 corresponds to which sample from t 2 ? r t 2 r t 1 � � � ¯ i − ¯ r t 1 � r t 2 k 2 k i n � � σ ∗ = arg min X r t 2 r t 1 � ¯ σ ( i ) − ¯ � � i � 2 σ i =1 { r t 1 i } n i =1 { r t 2 i } n i =1 Mathias Humbert - Enhancing Privacy in Machine Learning � 12

  13. Linkability Attack t 2 t 1 σ Which sample from t 1 corresponds to which sample from t 2 ? r t 2 r t 1 � � � ¯ i − ¯ r t 1 � r t 2 k 2 k i n � � σ ∗ = arg min X r t 2 r t 1 � ¯ σ ( i ) − ¯ � � i � 2 σ i =1 { r t 1 i } n i =1 { r t 2 i } n i =1 Time complexity: O(n 3 ) Mathias Humbert - Enhancing Privacy in Machine Learning � 13

  14. Athletes Dataset Participants: 29 Points in time: 2 (before and after exercising) Time period: 1 week Disease: none 1,189 miRNAs per sample taken from blood and plasma • Mathias Humbert - Enhancing Privacy in Machine Learning � 14

  15. Lung Cancer Dataset Participants: 26 (huge for a longitudinal study!) Points in time: 8 Time period: 18 months Disease: lung cancer 1,189 miRNAs per sample taken from plasma • before surgery after surgery months -? 0 3 6 9 12 15 18 Mathias Humbert - Enhancing Privacy in Machine Learning � 15

  16. Linkability Attack – Results 55% 90% 29% 48% number of PCA dimensions number of PCA dimensions success up to 90% 
 for blood-based samples Mathias Humbert - Enhancing Privacy in Machine Learning � 16

  17. Linkability Attack – Results How does the success change 
 with larger datasets ? Success decreases sharply 
 for plasma-based samples, but decreases linearly 
 for blood-based samples. Mathias Humbert - Enhancing Privacy in Machine Learning � 17

  18. Outline of the Talk • Attack - defense - data • Temporal linkability - randomization - microRNA expression USENIX Security’16 • • Re-identification - cryptography - DNA methylation IEEE S&P’17 • • Membership inference - other defense - any data NDSS'19 • Mathias Humbert - Enhancing Privacy in Machine Learning � 18

  19. Defense Mechanisms • Hiding non-relevant miRNA expressions Sometimes, randomization is not an option • E.g., for making a diagnosis in a hospital • Caution: correlations between miRNAs • • Randomizing the miRNA expression profiles Adding noise in a fully distributed, differentially-private manner 
 • → providing epigeno-indistinguishability (inspired by [1]) Noise drawn according to multivariate Laplacian mechanism • E.g., for publishing a dataset used in a study • [1] Chatzikokolakis et al. Broadening the scope of di ff erential privacy using metrics , PETS, 2013 
 Mathias Humbert - Enhancing Privacy in Machine Learning � 19

  20. Privacy-Utility Trade-Off Privacy: prevent linkability of samples Utility: preserve accuracy of classification as diseased / healthy , 
 usually using a radial SVM classifier disease miRNA 2 miRNA 1 Mathias Humbert - Enhancing Privacy in Machine Learning � 20

  21. Privacy-Utility Trade-Off Privacy: prevent linkability of samples Utility: preserve accuracy of classification as diseased / healthy , 
 usually using a radial SVM classifier disease miRNA 2 miRNA 1 Mathias Humbert - Enhancing Privacy in Machine Learning � 21

  22. Privacy-Utility Trade-Off Privacy: prevent linkability of samples Utility: preserve accuracy of classification as diseased / healthy , 
 usually using a radial SVM classifier Another dataset for exploring utility: 1000+ participants, 19 diseases, 1 time point Mathias Humbert - Enhancing Privacy in Machine Learning � 22

  23. Hiding miRNAs – Results <80% <100 miRNAs Mathias Humbert - Enhancing Privacy in Machine Learning � 23

  24. Hiding miRNAs – Results accuracy 99,2% Mathias Humbert - Enhancing Privacy in Machine Learning � 24

  25. Hiding miRNAs – Results attacker’s success rate Mathias Humbert - Enhancing Privacy in Machine Learning � 25

  26. Hiding miRNAs – Results 99,2% Mathias Humbert - Enhancing Privacy in Machine Learning � 26

  27. Hiding miRNAs – Results 99,2 Trade-off at 7 miRNAs Attack success decreased (relative to all) 
 by 54% SVM accuracy decreased (relative to max) 
 by only 1% Mathias Humbert - Enhancing Privacy in Machine Learning � 27

  28. Hiding miRNAs – Results 92,7% Mathias Humbert - Enhancing Privacy in Machine Learning � 28

  29. Hiding miRNAs – Results Trade-off at 4 miRNAs 92,7 Success decreases (relative to all) 
 by 80% Accuracy decreases (relative to max) 
 by only 1% Mathias Humbert - Enhancing Privacy in Machine Learning � 29

  30. Probabilistic Sanitization – Results 99,2% Mathias Humbert - Enhancing Privacy in Machine Learning � 30

  31. Probabilistic Sanitization – Results 99,2% 99,2% Mathias Humbert - Enhancing Privacy in Machine Learning � 31

  32. Probabilistic Sanitization – Results Suitable balance at ℇ =0.025 99,2% Attack success decreased (relative to all) 
 by 63% SVM accuracy decreased (relative to max) 
 by only 0.65% Mathias Humbert - Enhancing Privacy in Machine Learning � 32

  33. Probabilistic Sanitization – Results 96,9% Mathias Humbert - Enhancing Privacy in Machine Learning � 33

  34. Probabilistic Sanitization – Results 96,9% Trade-off at ℇ =0.01 Success decreases (relative to all) 
 by 70% Accuracy decreases (relative to max) 
 by only 0.2% Mathias Humbert - Enhancing Privacy in Machine Learning � 34

  35. Outline of the Talk • Attack - defense - data type • Temporal linkability - randomization - microRNA expression USENIX Security’16 • • Re-identification - cryptography - DNA methylation IEEE S&P’17 • • Membership inference - other defense - any data NDSS'19 • Mathias Humbert - Enhancing Privacy in Machine Learning � 35

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend