Enhancing Privacy in Machine Learning Mathias Humbert INSA - PowerPoint PPT Presentation

Enhancing Privacy in Machine Learning Mathias Humbert INSA Toulouse/CNRS Toulouse, January 22, 2019

Enhancing Privacy in Machine Learning data ML What ML? What data? What threat? Mathias Humbert - Enhancing Privacy in Machine Learning � 2

Different Attacks: Linkability Robert Alice Marius Eve Ability to link at least two records concerning the same individual If one data set is not anonymized → re-identification Mathias Humbert - Enhancing Privacy in Machine Learning � 3

Different Attacks: Membership Inference Study focusing on   HIV patients ? (x,y,z) Ability to infer that a certain target is in a specific dataset Mathias Humbert - Enhancing Privacy in Machine Learning � 4

Trading Off Privacy Privacy Utility ML E ffi ciency What ML? What data? What threat? What defense? Mathias Humbert - Enhancing Privacy in Machine Learning � 5

Different Defense Mechanisms Privacy Utility ML E ffi ciency Anonymization Randomization Differential privacy Cryptography Mathias Humbert - Enhancing Privacy in Machine Learning � 6

Outline of the Talk • Attack - defense - data • Temporal linkability - randomization - microRNA expression ℝ r r ≈ 10 3 USENIX Security’16 • • Re-identification - cryptography - DNA methylation IEEE S&P’17 [0,1] m m ≈ 10 7 • • Membership inference - other defense - any data NDSS'19 • Mathias Humbert - Enhancing Privacy in Machine Learning � 7

Outline of the Talk • Attack - defense - data • Temporal linkability - randomization - microRNA expression USENIX Security’16 • • Re-identification - cryptography - DNA methylation IEEE S&P’17 • • Membership inference - other defense - any data NDSS'19 • Mathias Humbert - Enhancing Privacy in Machine Learning � 8

DNA versus MicroRNA DNA miRNA contains blueprint of what a cell regulates what a cell really • • potentially can do , does , is (mostly) fixed over time , expression changes over time , • • can hint on risks of getting a can tell whether you carry a • • disease . disease . Common belief: no privacy threats from miRNAs,   because of temporal variability Mathias Humbert - Enhancing Privacy in Machine Learning � 9

Temporal Linkability Attack • Matching two datasets E.g., a leaked database (incl. name) and public DB (excl. name) • Which sample from t 1 corresponds to which sample from t 2 ? • t 1 t 2 Mathias Humbert - Enhancing Privacy in Machine Learning � 10

Data Pre-processing • High dimensionality: 1,189 miRNAs per sample Possibly correlated and uninteresting components • r t j k • PCA + whitening provides Unit variance • PCA Smaller dimensionality • Uncorrelated components • • Condenses data into a set of smaller dimensions   r t j ¯ k with minimal information loss Mathias Humbert - Enhancing Privacy in Machine Learning � 11

Linkability Attack t 2 t 1 Which sample from t 1 corresponds to which sample from t 2 ? r t 2 r t 1 � � � ¯ i − ¯ r t 1 � r t 2 k 2 k i n � � σ ∗ = arg min X r t 2 r t 1 � ¯ σ ( i ) − ¯ � � i � 2 σ i =1 { r t 1 i } n i =1 { r t 2 i } n i =1 Mathias Humbert - Enhancing Privacy in Machine Learning � 12

Linkability Attack t 2 t 1 σ Which sample from t 1 corresponds to which sample from t 2 ? r t 2 r t 1 � � � ¯ i − ¯ r t 1 � r t 2 k 2 k i n � � σ ∗ = arg min X r t 2 r t 1 � ¯ σ ( i ) − ¯ � � i � 2 σ i =1 { r t 1 i } n i =1 { r t 2 i } n i =1 Time complexity: O(n 3 ) Mathias Humbert - Enhancing Privacy in Machine Learning � 13

Athletes Dataset Participants: 29 Points in time: 2 (before and after exercising) Time period: 1 week Disease: none 1,189 miRNAs per sample taken from blood and plasma • Mathias Humbert - Enhancing Privacy in Machine Learning � 14

Lung Cancer Dataset Participants: 26 (huge for a longitudinal study!) Points in time: 8 Time period: 18 months Disease: lung cancer 1,189 miRNAs per sample taken from plasma • before surgery after surgery months -? 0 3 6 9 12 15 18 Mathias Humbert - Enhancing Privacy in Machine Learning � 15

Linkability Attack – Results 55% 90% 29% 48% number of PCA dimensions number of PCA dimensions success up to 90%   for blood-based samples Mathias Humbert - Enhancing Privacy in Machine Learning � 16

Linkability Attack – Results How does the success change   with larger datasets ? Success decreases sharply   for plasma-based samples, but decreases linearly   for blood-based samples. Mathias Humbert - Enhancing Privacy in Machine Learning � 17

Outline of the Talk • Attack - defense - data • Temporal linkability - randomization - microRNA expression USENIX Security’16 • • Re-identification - cryptography - DNA methylation IEEE S&P’17 • • Membership inference - other defense - any data NDSS'19 • Mathias Humbert - Enhancing Privacy in Machine Learning � 18

Defense Mechanisms • Hiding non-relevant miRNA expressions Sometimes, randomization is not an option • E.g., for making a diagnosis in a hospital • Caution: correlations between miRNAs • • Randomizing the miRNA expression profiles Adding noise in a fully distributed, differentially-private manner   • → providing epigeno-indistinguishability (inspired by [1]) Noise drawn according to multivariate Laplacian mechanism • E.g., for publishing a dataset used in a study • [1] Chatzikokolakis et al. Broadening the scope of di ff erential privacy using metrics , PETS, 2013   Mathias Humbert - Enhancing Privacy in Machine Learning � 19

Privacy-Utility Trade-Off Privacy: prevent linkability of samples Utility: preserve accuracy of classification as diseased / healthy ,   usually using a radial SVM classifier disease miRNA 2 miRNA 1 Mathias Humbert - Enhancing Privacy in Machine Learning � 20

Privacy-Utility Trade-Off Privacy: prevent linkability of samples Utility: preserve accuracy of classification as diseased / healthy ,   usually using a radial SVM classifier disease miRNA 2 miRNA 1 Mathias Humbert - Enhancing Privacy in Machine Learning � 21

Privacy-Utility Trade-Off Privacy: prevent linkability of samples Utility: preserve accuracy of classification as diseased / healthy ,   usually using a radial SVM classifier Another dataset for exploring utility: 1000+ participants, 19 diseases, 1 time point Mathias Humbert - Enhancing Privacy in Machine Learning � 22

Hiding miRNAs – Results <80% <100 miRNAs Mathias Humbert - Enhancing Privacy in Machine Learning � 23

Hiding miRNAs – Results accuracy 99,2% Mathias Humbert - Enhancing Privacy in Machine Learning � 24

Hiding miRNAs – Results attacker’s success rate Mathias Humbert - Enhancing Privacy in Machine Learning � 25

Hiding miRNAs – Results 99,2% Mathias Humbert - Enhancing Privacy in Machine Learning � 26

Hiding miRNAs – Results 99,2 Trade-off at 7 miRNAs Attack success decreased (relative to all)   by 54% SVM accuracy decreased (relative to max)   by only 1% Mathias Humbert - Enhancing Privacy in Machine Learning � 27

Hiding miRNAs – Results 92,7% Mathias Humbert - Enhancing Privacy in Machine Learning � 28

Hiding miRNAs – Results Trade-off at 4 miRNAs 92,7 Success decreases (relative to all)   by 80% Accuracy decreases (relative to max)   by only 1% Mathias Humbert - Enhancing Privacy in Machine Learning � 29

Probabilistic Sanitization – Results 99,2% Mathias Humbert - Enhancing Privacy in Machine Learning � 30

Probabilistic Sanitization – Results 99,2% 99,2% Mathias Humbert - Enhancing Privacy in Machine Learning � 31

Probabilistic Sanitization – Results Suitable balance at ℇ =0.025 99,2% Attack success decreased (relative to all)   by 63% SVM accuracy decreased (relative to max)   by only 0.65% Mathias Humbert - Enhancing Privacy in Machine Learning � 32

Probabilistic Sanitization – Results 96,9% Mathias Humbert - Enhancing Privacy in Machine Learning � 33

Probabilistic Sanitization – Results 96,9% Trade-off at ℇ =0.01 Success decreases (relative to all)   by 70% Accuracy decreases (relative to max)   by only 0.2% Mathias Humbert - Enhancing Privacy in Machine Learning � 34

Outline of the Talk • Attack - defense - data type • Temporal linkability - randomization - microRNA expression USENIX Security’16 • • Re-identification - cryptography - DNA methylation IEEE S&P’17 • • Membership inference - other defense - any data NDSS'19 • Mathias Humbert - Enhancing Privacy in Machine Learning � 35

Enhancing Privacy in Machine Learning Mathias Humbert INSA - PowerPoint PPT Presentation

Enhancing Privacy in Machine Learning Mathias Humbert INSA Toulouse/CNRS Toulouse, January 22, 2019 Enhancing Privacy in Machine Learning data ML What ML? What data? What threat? Mathias Humbert - Enhancing Privacy in Machine Learning

Privacy Enhancing Technologies Spring 2006 Outline Privacy Overview Course Topics

PRIVACY ENHANCING TECHNOLOGIES INTRODUCTION INTRODUCTION TO PRIVACY ENHANCING TECHNOLOGIES OUR

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

Differential Privacy Machine Learning Li Xiong Big Data + Machine Learning + Machine

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Privacy in Machine Learning Fatemehsadat Mireshghallah ICLR2020 Privacy: A Major Concern for

Introduction to Cybersecurity Database Privacy Review: Anonymity vs. Privacy Privacy -

Database Privacy Review: Anonymity vs. Privacy Privacy - Privacy is the claim of individuals,

Learning Where to Look and Listen: Egocentric and 360 Computer Vision Kristen Grauman Facebook

Knowledge-Based Reasoning in Computer Vision CSC 2539 Paul Vicol Outline Knowledge Bases

Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language

Review Session I CS 466 Wesley Wei Qian March 10th 2020 Midterm Exam This Thursday!

Algorithms in Bioinformatics: A Practical Introduction Motif Finding Composition of our genome

Connecting Images with Natural Language Andrej Karpathy CVPR 2016. Deep Vision workshop. July 1,

Unseen Patterns: Using Latent-Variable Models for Natural Language Shay Cohen Institute for

Construction of Goal Association Graphs from Search Query Logs Christian Krner MSc student