Rigorous Foundations for Statistical Data Privacy Adam Smith - PowerPoint PPT Presentation

Rigorous Foundations for Statistical Data Privacy Adam Smith Boston University CWI, Amsterdam November 15, 2018

“Privacy” is changing • Data-driven systems guiding decisions in many areas • Models increasingly complex Privacy Benefits of data (better diagnoses, Transparency lower recidivism…) Control 2

Privacy in Statistical Databases Researchers Individuals Summaries queries “Agency” answers Complex models Large collections of personal information • census data Synthetic data • medical/public health • online advertising • education … 3

Two conflicting goals • Utility: release aggregate statistics • Privacy: individual information stays hidden Utility Privacy How do we define “privacy”? • Studied since 1960’s in Ø Statistics Ø Databases & data mining Ø Cryptography • This talk: Rigorous foundations and analysis 4

“Relax – it can only see metadata.” 5

This talk • Why is privacy challenging? Ø Anonymization often fails Ø Example: membership attacks, in theory and in practice • Differential Privacy [DMNS’06] Ø “Privacy” as stability to small changes Ø Widely studied and deployed • The “frontier” of research on statistical privacy Ø Three topics 6

First attempt: Remove obvious identifiers “AI recognizes blurred faces” [McPherson Shokri Shmatikov ’16] N a m e : E t h n i c i t y : [Gymrek McGuire Golan Halperin Erlich ’13] [Pandurangan ‘14] Everything is an identifier [Ganta Kasiviswanathan S ’08] Images: whitehouse.gov, genesandhealth.org, medium.com 7

Is the problem granularity? What if we only release aggregate information? Statistics together may encode data • Example: Average salary before/after resignation • More generally: Too many, “too accurate” statistics reveal individual information Ø Reconstruction attacks [Dinur Nissim 2003, …, Cohen Nissim 2017] Ø Membership attacks [next slide] Cannot release everything everyone would want to know 8

A Few Membership Attacks • [Homer et al. 2008] Exact high-dimensional summaries allow an attacker to test membership in a data set Ø Caused US NIH to change data sharing practices • [Dwork, S , Steinke, Ullman, Vadhan, FOCS ‘15] Distorted high-dimensional summaries allow an attacker to test membership in a data set • [Shokri, Stronati, Song, Shmatikov, Oakland 2017] Membership inference using ML as a service (from exact answers) 9

Membership Attacks Population ! attributes 0 1 1 0 1 0 0 0 1 " data 0 1 0 1 0 1 0 0 1 1 0 1 1 1 1 0 1 0 people 1 1 0 0 1 0 1 0 0 Suppose • We have a data set in which membership is sensitive Ø Participants in clinical trial Ø Targeted ad audience • Data has many binary attributes for each person Ø Genome-wide association studies # = 1 000 000 (“SNPs”), ' < 2000 10

Membership Attacks “Out” Population ! attributes Alice’s data 0 1 1 0 1 0 0 0 1 “In” " data 0 1 0 1 0 1 0 0 1 1 0 1 1 1 1 0 1 0 1 0 1 1 1 1 0 1 0 people 1 1 0 0 1 0 1 0 0 “In”/ # $ Attacker .50 .75 .50 .50 .75 .50 .25 .25 .50 “Out” • Release exact column averages • Attacker succeeds with high probability when there are more attributes than people and % ≪ '/) 11

Membership Attacks “Out” Population ( attributes Alice’s data 0 1 1 0 1 0 0 0 1 “In” ) data 0 1 0 1 0 1 0 0 1 1 0 1 1 1 1 0 1 0 1 0 1 1 1 1 0 1 0 people 1 1 0 0 1 0 1 0 0 “In”/ * + Attacker .50 .75 .50 .50 .75 .50 .25 .25 .50 “Out” ± " in each coordinate .4 .7 .6 .5 .8 .4 .2 .3 .6 • Release exact distorted column averages ( ± ") No matter how • Attacker succeeds with high probability when distortion performed there are more attributes than people and " ≪ %/' 12

Machine Learning as a Service Model Prediction API Training API Input from Classification DATA users, apps … Sensitive! Transactions, preferences, online and offline behavior 13

Exploiting Trained Models Model Prediction API Training API Input from Classification the training set DATA Input not from Classification the training set recognize the difference 14

Exploiting Trained Models … without knowing the specifics of the actual model! Model Prediction API Training API Input from Classification the training set DATA Input not from Classification the training set Train a model to… recognize the difference 15

Differential Privacy • Several current deployments Apple Google US Census • Burgeoning field of research Databases, Crypto, Statistics, Game theory, Law, Algorithms programming security learning economics policy languages 17

Differential Privacy A A(x) random bits • Data set ! Ø Domain D can be numbers, categories, tax forms Ø Think of x as fixed (not random) • A = randomized procedure Ø A(x) is a random variable Ø Randomness might come from adding noise, resampling, etc. 18

Differential Privacy A A A(x’) A(x) random random bits bits • A thought experiment Ø Change one person’s data (or add or remove them) For any set of Ø Will the probabilities of outcomes change? outcomes, (e.g. I get denied health insurance) about the same probability in both worlds 19

Differential Privacy A A A(x’) A(x) local random local random coins coins !’ is a neighbor of ! if they differ in one data point Neighboring databases induce close distributions Definition : A is # -differentially private if, on outputs for all neighbors $ , $’ , 0 1 for all subsets S of outputs Pr ' $ ∈ ) ≤ (1 + #) Pr ' $ / ∈ ) # is a leakage measure 20

Randomized Response [Warner 1965] A 1 ! = ' ( , … , ' + local random coins • Say we want to release the proportion of diabetics in a data set Ø Each person’s data is 1 bit: ! " = 0 or ! " = 1 • Randomized response: each individual rolls a die Ø 1, 2, 3 or 4: Report true value ! " Ø 5 or 6: Report opposite value & ! " • Output is list of reported values ' ( , … , ' + Ø Satisfies our definition when , ≈ 0.7 Ø Can estimate fraction of ! " ’s that are 1 when 0 is large 22

Laplace Mechanism function f A 0 " = ! " + 23456 local random coins • Say we want to release a summary ! " ∈ ℝ % Ø e.g., proportion of diabetics: " & ∈ 0,1 and ! " = + , ∑ & " & • Simple approach: add noise to !(") Ø How much noise is needed? Ø Idea: Calibrate noise to some measure of ! ’s volatility 23

Laplace Mechanism function f A ! " = $ " + &'()* local random coins • Global Sensitivity: Ø Example: f(x) x f(x’) x’ 24

Laplace Mechanism function f A % & = ( & + *+,-. local random coins • Global Sensitivity: Ø Example: Ø Laplace distribution Lap $ has density Ø Changing one point translates curve 25

Attacks “match” differential privacy function f A & ' = ) ' + +,-./ local random coins # • Can release ! proportions with noise ≈ $% per entry • Requires “approximate” variant of DP 2 2 Differential Reconstruction Robust 1/ + privacy attacks 34 4 membership attacks Sampling error 26

A rich algorithmic field Noise addition Exponential sampling ! ∼ # $ % ∝ exp(+ ⋅ -./012$ $, % ) Local 5 6 Untrusted perturbation 5 7 aggregator 5 8 A 27

Interpreting Differential Privacy • A naïve hope: Your beliefs about me are the same after you see the output as they were before • Impossible Ø Suppose you know that I smoke Ø Clinical study: “smoking and cancer correlated” Ø You learn something about me • Whether or not my data were used • Differential privacy implies: No matter what you know ahead of time, You learn (almost) the same things about me whether or not my data are used Ø Provably resists attacks mentioned earlier 28

Research on (differential) privacy • Definitions Ø Pinning down “privacy” • Algorithms: what can we compute privately? Ø Fundamental techniques Ø Specific applications • Usable systems • Attacks: “Cryptanalysis” for data privacy • Protocols: Cryptographic tools for large-scale analysis • Implications for other areas Ø Adaptive data analysis Ø Law and policy 29

Frontier 1: Deep Learning with DP [Abadi et al 2016, …] s r e t e m a r a P Sensitive Deep Data Learning Revealed now, Thought of as but should be private now, but hidden Model better to reason as if public 31

Rigorous Foundations for Statistical Data Privacy Adam Smith - PowerPoint PPT Presentation

Rigorous Foundations for Statistical Data Privacy Adam Smith Boston University CWI, Amsterdam November 15, 2018 Privacy is changing Data-driven systems guiding decisions in many areas Models increasingly complex Privacy

Why Algorithmic and Rigorous Polynomial Approximations? Rigorous Polynomial Approximation =

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

CS573 Data Privacy and Security Data Privacy and Security in Healthcare Data Privacy and Security

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

Privacy & Data Governance Privacy & Data Governance Privacy & Data Governance

recap to this point foundations foundations foundations foundations genetics =

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Data privacy: an introduction (part 1) Klara Stokes What is privacy? Privacy has been defined in

Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques

Data Privacy Law Overview Privacy Protections (D) Working Group Jennifer McAdam Senior Counsel

Toniann Pitassi Outline 1. Differential Privacy: The Basics 2. Differential Privacy in New

Introduction to Cybersecurity Database Privacy Review: Anonymity vs. Privacy Privacy -

Database Privacy Review: Anonymity vs. Privacy Privacy - Privacy is the claim of individuals,

Technology Tools for Tomorrow GFPS Believes Technology is a tool that supports our mission

Stochastic Programming for Financial Applications SAMSI Finance Group Project Adam Schmidt,

Investing in U.S. Farmland Table of Contents I. Strategy II. Homestead Team III. Case

Creating a Global Entertainment Content, Digital Media & OTT Powerhouse Disclaimer This

Researching the Wonders of Woodchip Ben Raskin - Head of Horticulture and Agroforestry, Soil

PRISA Bay County Employees Retirement System April 17, 2018 Confidential information. Not for

Council present: Shahid Ahmed, PwC Theresa Hennesy, Comcast Corporation John Barnhill, Genband

RET Project #5: Cybersecurity Faculty Mentor: Dr. Franco Graduate Research Assistant: Shaunak

Rigorous Foundations for Statistical Data Privacy Adam Smith - PowerPoint PPT Presentation

Rigorous Foundations for Statistical Data Privacy Adam Smith Boston University CWI, Amsterdam November 15, 2018 Privacy is changing Data-driven systems guiding decisions in many areas Models increasingly complex Privacy

Why Algorithmic and Rigorous Polynomial Approximations? Rigorous Polynomial Approximation =

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

CS573 Data Privacy and Security Data Privacy and Security in Healthcare Data Privacy and Security

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

Privacy &amp; Data Governance Privacy &amp; Data Governance Privacy &amp; Data Governance

recap to this point foundations foundations foundations foundations genetics =

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Data privacy: an introduction (part 1) Klara Stokes What is privacy? Privacy has been defined in

Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques

Data Privacy Law Overview Privacy Protections (D) Working Group Jennifer McAdam Senior Counsel

Toniann Pitassi Outline 1. Differential Privacy: The Basics 2. Differential Privacy in New

Introduction to Cybersecurity Database Privacy Review: Anonymity vs. Privacy Privacy -

Database Privacy Review: Anonymity vs. Privacy Privacy - Privacy is the claim of individuals,

Technology Tools for Tomorrow GFPS Believes Technology is a tool that supports our mission

Stochastic Programming for Financial Applications SAMSI Finance Group Project Adam Schmidt,

Investing in U.S. Farmland Table of Contents I. Strategy II. Homestead Team III. Case

Creating a Global Entertainment Content, Digital Media &amp; OTT Powerhouse Disclaimer This

Researching the Wonders of Woodchip Ben Raskin - Head of Horticulture and Agroforestry, Soil

PRISA Bay County Employees Retirement System April 17, 2018 Confidential information. Not for

Council present: Shahid Ahmed, PwC Theresa Hennesy, Comcast Corporation John Barnhill, Genband

RET Project #5: Cybersecurity Faculty Mentor: Dr. Franco Graduate Research Assistant: Shaunak

Privacy & Data Governance Privacy & Data Governance Privacy & Data Governance

Creating a Global Entertainment Content, Digital Media & OTT Powerhouse Disclaimer This