On the Theory and Practice of Privacy-Preserving Bayesian Data - PowerPoint PPT Presentation

On the Theory and Practice of Privacy-Preserving Bayesian Data Analysis James Foulds,* Joseph Geumlek,* Max Welling, + Kamalika Chaudhuri* + University of Amsterdam *University of California, San Diego

Overview Bayesian Privacy-preserving data analysis data analysis 2

Overview Privacy-preserving Bayesian Privacy-preserving Bayesian data analysis data analysis data analysis 3

Overview (Dimitrakakis et al., 2014; Wang et al., 2015) Privacy-preserving Bayesian Privacy-preserving Bayesian data analysis data analysis data analysis “for free” via posterior sampling 4

Overview (Dimitrakakis et al., 2014; Wang et al., 2015) Privacy-preserving Bayesian Privacy-preserving Bayesian data analysis data analysis data analysis “for free” via posterior sampling Limitations: data inefficiency, approximate inference We consider a very simple alternative technique to resolve this 5

Privacy and Machine Learning • As individuals and consumers we benefit from ML systems trained on OUR data – Internet search – Recommendations • products, movies, music, news, restaurants, email recipients – Mobile phones • Autocorrect, speech recognition, Siri, … 6

The cost is our privacy http://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/#b228dae34c62 10 ,Retrieved 6/16/2016

Privacy and Machine Learning • Want the benefits of sharing our data while protecting our privacy – Have your cake and eat it too! 11

Privacy and Machine Learning • Want the benefits of sharing our data while protecting our privacy – Have your cake Apple and eat it too! 12

Privacy and Machine Learning • Want the benefits of sharing our data while protecting our privacy – Have your cake Apple and eat it too! 13

“ We believe you should have great features and great privacy . You demand it and we're dedicated to providing it. ” • Craig Federighi, Apple senior vice president of Software Engineering. June 13 2016, WWDC16 Quote from http://appleinsider.com/articles/16/06/15/inside-ios-10-apple-doubles-down-on-security-with-cutting-edge-differential-privacy , retrieved 6/16/2016 14

Statistical analysis of sensitive data [the Wikileaks disclosure] “ puts the lives of United States and its partners’ service members and civilians at risk. ” - Hillary Clinton 15

Bayesian analysis of sensitive data • Bayesian inference widely and successfully used in application domains where privacy is invaluable – Text analysis (Blei et al., 2003; Goldwater and Griffiths, 2007) – Personalized recommender systems (Salakhutdinov and Mnih, 2008) – Medical informatics (Husmeier et al., 2006) – MOOCs (Piech et al., 2013). • Data scientists must balance benefits and potential insights vs privacy concerns (Daries et al., 2014). 16

Anonymization? Alice Alice Bob Bob Claire Claire …. …. Anonymized Netflix data + public IMDB data = identified Netflix data 17 (Narayanan and Shmatikov, 2008)

Aggregation? 20 https://www.buzzfeed.com/nathanwpyle/can-you-spot-all-26-letters-in-this-messy-room-369?utm_term=.gyRdVVvV5#.kkovLL1LE Retrieved 6/16/2016

Hiding in the crowd • Only release statistics aggregated over many individuals. Does this ensure privacy? 21

Hiding in the crowd • Only release statistics aggregated over many individuals. Does this ensure privacy? • Report average salary in CS dept. 22

Hiding in the crowd • Only release statistics aggregated over many individuals. Does this ensure privacy? • Report average salary in CS dept. • Prof. X leaves. 23

Hiding in the crowd • Only release statistics aggregated over many individuals. Does this ensure privacy? • Report average salary in CS dept. • Prof. X leaves. • Report avg salary again. – We can identify Prof. X’s salary 24

Noise / data corruption • Release Prof. X’s salary + noise • Once we sufficiently obfuscate Prof. X’s salary, it is no longer useful 25

Noise + crowd • Release mean salary + noise • Need much less noise to protect Prof. X’s salary 26

Solution • “Noise + crowds” can provide both individual-level privacy , and accurate population-level queries • How to quantify privacy loss? – Answer: Differential privacy 27

Differential privacy (Dwork et al., 2006) Queries Untrusted users Answers Individuals’ data Privacy-preserving interface: randomized algorithms • DP is a promise: – “If you add your data to the database, you will not be affected much” 28

Differential privacy (Dwork et al., 2006) • Consider randomized algorithm • DP guarantees that the likely output of is not greatly affected by any one data point • In particular, the distribution over the outputs of the algorithm will not change too much Individuals’ data 29

Differential privacy (Dwork et al., 2006) • Consider randomized algorithm • DP guarantees that the likely output of is not greatly affected by any one data point • In particular, the distribution over the outputs of the algorithm will not change too much + Individuals’ data 30

Differential privacy (Dwork et al., 2006) • Consider randomized algorithm • DP guarantees that the likely output of is not greatly affected by any one data point • In particular, the distribution over the outputs of the algorithm will not change too much + Randomized algorithm Individuals’ data 31

Differential privacy (Dwork et al., 2006) • Consider randomized algorithm • DP guarantees that the likely output of is not greatly affected by any one data point • In particular, the distribution over the outputs of the algorithm will not change too much + Randomized algorithm Individuals’ data 32

Differential privacy (Dwork et al., 2006) • Consider randomized algorithm • DP guarantees that the likely output of is not greatly affected by any one data point • In particular, the distribution over the outputs of the algorithm will not change too much + Randomized algorithm + Individuals’ data 33

Differential privacy (Dwork et al., 2006) • Consider randomized algorithm • DP guarantees that the likely output of is not greatly affected by any one data point • In particular, the distribution over the outputs of the algorithm will not change too much + Randomized algorithm + Randomized algorithm Individuals’ data 34

Differential privacy (Dwork et al., 2006) • Consider randomized algorithm • DP guarantees that the likely output of is not greatly affected by any one data point • In particular, the distribution over the outputs of the algorithm will not change too much + Randomized algorithm Similar! + Randomized algorithm Individuals’ data 35

Differential privacy (Dwork et al., 2006) Ratios of probabilities bounded by 36

Properties of differential privacy • Immune to post-processing – Resists attacks using side information, as in the Netflix Prize linkage attack 37

Properties of differential privacy • Immune to post-processing – Resists attacks using side information, as in the Netflix Prize linkage attack • Composition – If you run multiple DP queries, their epsilons add up. – Can think of this as a “privacy budget” we spend over all queries 38

Laplace mechanism (Dwork et al., 2006) • Adding Laplace noise is sufficient to achieve differential privacy • The Laplace distribution is two exponential distributions, back-to-back • The noise level depends on a quantity called the L1 sensitivity of the query h : 39

Exponential mechanism (McSherry and Talwar, 2007) • Aims to output responses of high utility • Given real-valued utility function , the exponential mechanism selects outputs r via Temperature depends on sensitivity, epsilon 40

On the Theory and Practice of Privacy-Preserving Bayesian Data - PowerPoint PPT Presentation

On the Theory and Practice of Privacy-Preserving Bayesian Data Analysis James Foulds,* Joseph Geumlek,* Max Welling, + Kamalika Chaudhuri* + University of Amsterdam *University of California, San Diego Overview Bayesian Privacy-preserving

Privacy Preserving Protocols Workshop on Cryptography for the Internet of Things Jens Hermans KU

FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY

Privacy Preserving Privacy Preserving Netw ork Flow Netw ork Flow Recording Recording Bilal

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

Privacy in Wireless Networks privacy notions and metrics; privacy in RFID systems; location

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

Privacy preserving data mining randomized response and association rule hiding Li Xiong

Towards Privacy-Preserving Ontology Publishing F. Baader & A. Nuradiansyah Technische

Privacy-preserving Biometrics in practice: diffjcult to sell Carmela Troncoso (Gradiant) My

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Prediction Uncertainty in the Bornhuetter-Ferguson Claims Reserving Method Daniel Alai Michael

Energy-efficient Trajectory Tracking for Mobile Devices Based on "Energy-efficient

Improving on the Small Sample Size Inference Jim Harmon University of Washington February 25,

Some Markov models for direct observation of behavior James E. Pustejovsky Northwestern

Full bias-correction of spatial robust small area estimators Session: SAE Using Time Series or

L1 attrition in a multidialectal setting: Input and Intake in L1 Spanish null and postverbal

zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA STATE

www.statistik.at Wir bewegen Informationen Official Statistics production: Where we come from

On the Theory and Practice of Privacy-Preserving Bayesian Data - PowerPoint PPT Presentation

On the Theory and Practice of Privacy-Preserving Bayesian Data Analysis James Foulds,* Joseph Geumlek,* Max Welling, + Kamalika Chaudhuri* + University of Amsterdam *University of California, San Diego Overview Bayesian Privacy-preserving

Privacy Preserving Protocols Workshop on Cryptography for the Internet of Things Jens Hermans KU

FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY PRESERVING SURGERY FERTILITY

Privacy Preserving Privacy Preserving Netw ork Flow Netw ork Flow Recording Recording Bilal

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

Privacy in Wireless Networks privacy notions and metrics; privacy in RFID systems; location

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

Privacy preserving data mining randomized response and association rule hiding Li Xiong

Towards Privacy-Preserving Ontology Publishing F. Baader &amp; A. Nuradiansyah Technische

Privacy-preserving Biometrics in practice: diffjcult to sell Carmela Troncoso (Gradiant) My

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Prediction Uncertainty in the Bornhuetter-Ferguson Claims Reserving Method Daniel Alai Michael

Energy-efficient Trajectory Tracking for Mobile Devices Based on &quot;Energy-efficient

Improving on the Small Sample Size Inference Jim Harmon University of Washington February 25,

Some Markov models for direct observation of behavior James E. Pustejovsky Northwestern

Full bias-correction of spatial robust small area estimators Session: SAE Using Time Series or

L1 attrition in a multidialectal setting: Input and Intake in L1 Spanish null and postverbal

zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA STATE

www.statistik.at Wir bewegen Informationen Official Statistics production: Where we come from

Towards Privacy-Preserving Ontology Publishing F. Baader & A. Nuradiansyah Technische

Energy-efficient Trajectory Tracking for Mobile Devices Based on "Energy-efficient