Differential Privacy Maria-Florina Balcan 04/22/2015 Learning and - PowerPoint PPT Presentation

Machine Learning and Differential Privacy Maria-Florina Balcan 04/22/2015

Learning and Privacy • To do machine learning, we need data. • What if the data contains sensitive information? • medical data, web search query data, salary data, student grade data. • Even if the (person running the) learning algo can be trusted, perhaps the output of the algorithm reveals sensitive info. • E.g., using search logs of friends to recommend query completions: Why are _ Why are my feet so itchy?

Learning and Privacy • To do machine learning, we need data. • What if the data contains sensitive information? • Even if the (person running the) learning algo can be trusted, perhaps the output of the algorithm reveals sensitive info. • E.g., SVM or perceptron on medical data: - Suppose feature 𝑘 is has-green-hair and the learned 𝑥 has 𝑥 𝑘 ≠ 0 . - If there is only one person in town with green hair, you know they were in the study.

Learning and Privacy • To do machine learning, we need data. • What if the data contains sensitive information? • Even if the (person running the) learning algo can be trusted, perhaps the output of the algorithm reveals sensitive info. • An approach to address these problems: Differential Privacy “The Algorithmic Foundations of Differential Privacy”. Cynthia Dwork, Aaron Roth. Foundations and Trends in Theoretical Computer Science, NOW Publishers. 2014.

Differential Privacy E.g., want to release average while preserving privacy. High level idea: • What we want is a protocol that has a probability distribution over outputs: such that if person i changed their input from x i to any other allowed x i ’ , the relative probabilities of any output do not change by much.

Differential Privacy High level idea: • What we want is a protocol that has a probability distribution over outputs: such that if person i changed their input from x i to any other allowed x i ’ , the relative probabilities of any output do not change by much. • This would effectively allow that person to pretend their input was any other value they wanted. Pr 𝑦 𝑗 𝑝𝑣𝑢𝑞𝑣𝑢 Pr 𝑝𝑣𝑢𝑞𝑣𝑢 𝑦 𝑗 Pr 𝑦 𝑗 Bayes rule: ′ ′ 𝑝𝑣𝑢𝑞𝑣𝑢 = ′ ⋅ Pr 𝑦 𝑗 Pr 𝑝𝑣𝑢𝑞𝑣𝑢 𝑦 𝑗 Pr 𝑦 𝑗 (Posterior ≈ Prior)

Differential Privacy: Definition It’s a property of a protocol A which you run on some dataset X producing some output A(X). • A is ² -differentially private if for any two neighbor datasets S, S ’ (differ in just one element x i ! x i ’), x i x’ i for all outcomes v, e - ² · Pr(A(S)=v)/Pr (A(S’)=v) · e ² ¼ 1- ² ¼ 1+ ² probability over randomness in A

Differential Privacy: Definition It’s a property of a protocol A which you run on some dataset X producing some output A(X). • A is ² -differentially private if for any two neighbor datasets S, S ’ (differ in just one element x i ! x i ’), View as model of plausible deniability If your real input is 𝑦 𝑗 and you’d like to pretend was 𝑦 𝑗 ’ , somebody looking at the output of A can’t tell, since for any outcome v, it was nearly just as likely to come from S as it was to come from S’. for all outcomes v, e - ² · Pr(A(S)=v)/Pr (A(S’)=v) · e ² ¼ 1- ² ¼ 1+ ² probability over randomness in A

Differential Privacy: Methods It’s a property of a protocol A which you run on some dataset X producing some output A(X). • Can we achieve it? • Sure, just have A(X) always output 0. • This is perfectly private, but also completely useless. • Can we achieve it while still providing useful information?

Laplace Mechanism Say have n inputs in range [0,b]. Want to release average while preserving privacy. • Changing one input can affect average by ≤ b/n. • Idea: take answer and add noise from Laplace distrib 𝑞 𝑦 ∝ 𝑓 −|𝑦|𝜗𝑜/𝑐 • Changing one input changes prob of any given answer by ≤ 𝑓 𝜗 . b/n x Value with real me Value with fake me

Laplace Mechanism Say have n inputs in range [0,b]. Want to release average while preserving privacy. • Changing one input can affect average by ≤ b/n. • Idea: : compute the true answer and add noise from Laplace distrib 𝑞 𝑦 ∝ 𝑓 −|𝑦|𝜗𝑜/𝑐 • Amount of noise added will be ≈ ±𝑐/(𝑜𝜗) . 𝑐 • To get an overall error of ± 𝛿 , you need a sample size 𝑜 = 𝛿𝜗 . • If you want to ask 𝑙 queries, the privacy loss adds, so to 𝑙𝑐 have 𝜗 -differential privacy overall , you need 𝑜 = 𝛿𝜗 .

Laplace Mechanism Good features: • Can run algorithms that just need to use approximate statistics (since just adding small amounts of noise to them). • E.g., “approximately how much would this split in my decision tree reduce entropy?”

More generally • Anything learnable via “Statistical Queries” is learnable differentially privately. Practical Privacy: The SuLQ Framework. Blum, Dwork, McSherry,Nissim. PODS 2005. • Statistical Query Model [Kearns93] : S q(x, l ) Pr D [q(x,f(x))=1] ±𝛿 . What is the error rate of • my current rule? What is the correlation of • x 1 with f when x 2 =0? … • Many algorithms (including ID3, Perceptron, SVM, PCA) can be re-written to interface via such statistical estimates.

Laplace Mechanism Problems: • If you ask many questions, need large dataset to be able to can give accurate and private answers to all of them. (privacy losses accumulate over questions asked). • Also, differential privacy may not be appropriate if multiple examples correspond to same individual (e.g., search queries, restaurant reviews).

More generally Problems: • The more interconnected our data is (A and B are friends because of person C) the trickier it becomes to reason about privacy. • Lots of current work on definitions and algorithms.

Differential Privacy Maria-Florina Balcan 04/22/2015 Learning and - PowerPoint PPT Presentation

Machine Learning and Differential Privacy Maria-Florina Balcan 04/22/2015 Learning and Privacy To do machine learning, we need data. What if the data contains sensitive information? medical data, web search query data, salary data,

CS573 Data Privacy and Security Differential Privacy Real World Deployments Li Xiong

Toniann Pitassi Outline 1. Differential Privacy: The Basics 2. Differential Privacy in New

Differential Privacy Techniques Beyond Differential Privacy Steven Wu Assistant Professor

Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques

CS573 Data Privacy and Security Local Differential Privacy Li Xiong Privacy at Scale: Local

Differential Privacy (Part III) Approximate (or ( , ))-differential privacy

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOLATILES DIFFERENTIAL AROMA

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

Graph Analysis with Node Differential Privacy Node Differential Privacy Sofya Sofya

Mobile Data Collection and Analysis with Local Differential Privacy - Part 1 Ninghui Li (Purdue

Data Mining with Differential Privacy Arik Friedman and Assal Schuster by Slawomir Goryczka

-Liftings for Differential Privacy and f -Divergences Gilles Barthe, Thomas Espitau, Justin

Algorithms for Web Indexing and Searching Gerth Stlting Brodal and Rolf Fagerberg Fall 2002 1

Italys Surveillance Toolbox Riccardo Coluccini @ORARiccardo 34C3 27th-30th December

Cyber@UC Meeting 51 Reverse Engineering: Android apps and more If Youre New! Join our

Python Introduction Principles of Programming Languages Colorado School of Mines

Amherst College Virtual Lecture Series Global Healthcare for All You CAN Make a Difference Jon

Why is my LMS so slow? A Case Study of D2L Performance Issues Sourish Roy Carey Williamson

Chapter 25: Intrusion Detection Principles Basics Models of Intrusion Detection

CSN11121 System Administration and Forensics Week 5: Essential Apache and Log Analysis Week 5: