Explaining Privacy and Fairness Violations in Data-Driven Systems - PowerPoint PPT Presentation

Explaining Privacy and Fairness Violations in Data-Driven Systems Matt Fredrikson Carnegie Mellon University

Joint effort Emily Black Gihyuk Ko Klas Leino Anupam Datta Sam Yeom Piotr Mardziel Shayak Sen 2

Data-driven systems are ubiquitous Law Web … Credit Education Healthcare services Enforcement 3

Data-driven systems are opaque Online User data Decisions Advertising System 4

Opacity and privacy …able to identify about 25 products that, when analyzed together, allowed him to assign each shopper a “pregnancy prediction” score. Take a fictional Target shopper who … bought cocoa- butter lotion, a purse large enough to double as a diaper bag, zinc and magnesium supplements and a bright blue rug. There’s, say, an 87 percent chance that she’s pregnant 5

Opacity and fairness Image source: Han Huang, Reuters 6

Inappropriate information use Bo Both p problems ms c can b be s seen a as inappropri riate use of protected inform rmation • Fairness/discrimination race or ge gender for em • Use of ra employmen ent dec ecisi sions • Business necessity exceptions • Privacy • Use of he health h or po political ba backgr ground und for ma marketi ting ng • Exceptions derive from contextual information norms Th This is a type of f bug! 7

Agenda Methods for r dealing with inappropri riate inform rmation use • Detecting when it occurs • Providing diagnostic information to developers • Automatic repair, when possible Re Remaining talk: • Formalize “inappropriate information use” • Show how it applies to classifiers • Generalize to continuous domain • Nonlinear continuous models & applications 8

use via causal influence [Datta, Sen, Zick Oakland’16] Ex Explicit us Example: Credit decisions Age Classifier Decision (uses only income) Income Conclusion: Measures of association not informative 9

Causal intervention Age 44 21 28 63 Classifier Decision (uses only income) Income $90K $20K $100K $10K Replace feature with random values from the population, and examine distribution over outcomes. 10

Challenge: Indirect (proxy) use # years in same job Classifier unpaid mortgage? Decision (targets older income people) … rred and then used Need to determine when information type is inferr 11

Proxy use: a closer look What do we mean by proxy use? Age > 60 1. Explicit use is also proxy use T F N Y 12

Proxy use: a closer look What do we mean by proxy use? yrs in job > 10 F T 1. Explicit use is also proxy use 2. “Inferred use” is proxy use unpaid mortgage? N F T Y N 13

Proxy use: a closer look What do we mean by proxy use? yrs in job > 10 F T 1. Explicit use is also proxy use 2. “Inferred use” is proxy use unpaid mortgage? unpaid mortgage? Inferred values must be influential • F F T T Y N Y N 14

Proxy use: a closer look What do we mean by proxy use? yrs in job > 10 F T 1. Explicit use is also proxy use 2. “Inferred use” is proxy use unpaid mortgage? unpaid mortgage? Inferred values must be influential • F F T T Associations must be two-sided • Y N Y N 15

One- and two-sided associations What happens if we allow one-sided association? zip code Consider this model: Pittsburgh Philadelphia • Uses postal code to determine state • Zip code can predict race Ad #1 Ad #2 • …but not the other way around This is a benign use of information that’s associated with a protected information type 16

Proxy use: a closer look women’s college? What do we mean by proxy use? yes no interested? interested? 1. Explicit use is also proxy use 2. “Inferred use” is proxy use no no yes yes Inferred values must be in influ luential ial • Accept Reject Reject Accept Associations must be tw two-si sided • 3. Output association is unnecessary for proxy use 17

Towards a formal definition: axiomatic basis • (Axiom 1: Explicit use) If random variable Z is an influential input of the model A , then A makes proxy use of Z . • (Axiom 2: Preprocessing) If a model A makes proxy use of Z , and A’(x) = A(x, f(x)) , then A’ also makes proxy use of Z . • Example: A’ infers a protected piece of info given directly to A • (Axiom 3: Dummy) If A’(x,x’) = A(x) for all x and x’ , then A’ has proxy use of Z exactly when A does. • Example: feature never touched by the model. • (Axiom 4: Independence) If Z is independent of the inputs of A , then A does not have proxy use of Z . • Example: model obtains no information about protected type 18

Extensional proxy use axioms are inconsistent Ke Key In Intu tuiti tion: • Pr Preprocessing forces us to preserve proxy use under function composition • But the rest of the model can ca cancel ou out a composed proxy • Let X, Y, Z be pairwise independent random variables, and Y = X ⊕ Z • Then A(Y, Z)= Y ⊕ Z makes proxy use of Z (explicit use axiom) • So does A’(Y, Z, X)= Y ⊕ Z (dummy axiom) • And so does A’’(Z, X) = A’(X ⊕ Z, Z, X) (preprocessing axiom) • But A’’(Z, X) = X ⊕ Z ⊕ Z = X , and X, Z are independent… 19

Syntactic relaxation • We address this with a sy syntactic definition women’s • Composition is tied to how the function is college represented as a pr progr gram true false interested? interested? • Ch Checkin ing for proxy use requires access to program internals no no yes yes offer no offer no offer offer 20

Models as Programs • Expressions that produce a value • No loops or other complexities • But often very large women’s college true false ⟨ exp ⟩ ::= R | True ue | Fa False | var | op( ⟨ exp ⟩ , … , ⟨ exp ⟩ ) interested? interested? else { ⟨ exp ⟩ } | if if ( ⟨ exp ⟩ ) the hen n { ⟨ exp ⟩ } els no no yes yes Operations: offer no offer no offer offer arithmetic operations: +, -, *, etc. boolean connectives: or, and, not, etc. relations: ==, <, ≤, >, etc. 21

Modeling Systems | Probabilistic Semantics Expression semantics: ⟦ exp ⟧ : Instance à Value exp 0 I is a random variable over dataset instances women’s exp 1 college? ⟦ exp ⟧ : I à V true false V is a random variable over the expression’s value exp 4 exp 2 exp 3 interested? interested? exp 5 Joint over input instance ( I ) and expression values ( V i ) for false false true true each expression exp i . exp 6 exp 7 exp 8 exp 9 Pr[ I , V 0 , V 1 , ..., V 9 ] no offer no offer offer offer marginals: Pr[ V 4 = True ue , V 0 = Ad Ad 1 ] conditionals: Pr[ V 4 = True ue | V 0 = Ad Ad 1 ] 22

Program decomposition De Decomposi sition Given a program p , a decomposition (p 1 , X, p 2 ) consists of two programs p 1 , p 2 , and a fresh variable X such that replacing X with p 1 inside p 2 yields p. yrs in job > 10 p 1 p 2 F T yrs in job? unpaid mortgage? unpaid mortgage? N F T T F F T X N Y N Y N 23

Characterizing pr proxies Pr Proxy proxy for Z if Given a decomposition (p 1 , X, p 2 ) and a random variable Z , p 1 is a pr ⟦ p 1 ⟧ (I) is associated with Z. p 2 p 1 women’s X p 1 is a proxy for college “gender = Female” true false interested? interested? false false true true N N Y Y 24

Characterizing use In Influenti tial D Decomp mpositi tion A decomposition (p 1 , X, p 2 ) is influe uential if X can change the outcome of p 2 yrs in job > 10 p 1 p 2 F T yrs in job? unpaid mortgage? unpaid mortgage? N F T T F F T X N Y N Y N 25

Putting it all together Pr Proxy Us Use A program p has pr proxy us use of random variable Z if there exists an influential decomposition (p 1 , X, p 2 ) of p that is a proxy for Z . This is close to our intuition from earlier Formally, it satisfies similar axioms: Dummy and independence axioms remain largely unchanged • Explicit use, preprocessing rely on program decomposition instead of function composition • 26

Quantitative proxy use A decomposition (p 1 , X, p 2 ) is an (ε, δ)-pr proxy us use of Z when • The association between p 1 and Z is ≥ ε, and • p 1 ’s influence in p 2 , ɩ(p 1 , p 2 ) ≥ δ A program has (ε, δ)-proxy use of Z when it admits a decomposition that is an (ε, δ)-proxy use of Z 27

Quantifying decomposition influence p 1 p 2 1. Intervene on p 1 yrs in job? unpaid mortgage? 2. Compare the behavior: no yes no yes With intervention • As the system runs normally • X N N Y 3. Measure divergence: ɩ(p 1 , p 2 ) = E X,X’ [ ⟦ p ⟧ (X) ≠ ⟦ p 2 ⟧ (X, ⟦ p 1 ⟧ (X’)) ] 28

Algorithmics • Does system have an (ε, δ)- • How do we remove (ε, δ)-proxy-use violation? proxy-use of a protected variable? • Naive algorithm • Basic algorithm O(S*N 2 ) • Replace Exp i with a constant • S – # expressions O( 1 ) // any constant • N – # dataset instances O( N * M ) // best constant, M – # possible values 29

Witnesses exp 0 women’s exp 1 college? Exp 0 zip = z 1 or z 3 true false exp 4 exp 2 true false exp 3 Us Using Witn tnesses interested? interested? exp 5 exp 2 exp 1 no offer offer false false true true De Demonstration of vi violation n in in the system exp 6 exp 7 exp 8 exp 9 N N Y Y Lo Localize e where scru rutiny/human eyeballs need to be ap applie lied Determ rmine what repair r should be applied 30

Explaining Privacy and Fairness Violations in Data-Driven Systems - PowerPoint PPT Presentation

Explaining Privacy and Fairness Violations in Data-Driven Systems Matt Fredrikson Carnegie Mellon University Joint effort Emily Black Gihyuk Ko Klas Leino Anupam Datta Sam Yeom Piotr Mardziel Shayak Sen 2 Data-driven systems are

Fairness in Machine Learning: Part I Privacy & Fairness in Data Science CS848 Fall 2019

Privacy Attacks Practicum Privacy & Fairness in Data Science CS848 Fall 2019 2 Module 1:

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

Astronomical Tests Possible Violations of . . . Possible Violations of . . . of Relativity:

Toniann Pitassi Outline 1. Differential Privacy: The Basics 2. Differential Privacy in New

Fairness in Machine Learning: Practicum Privacy & Fairness in Data Science CS848 Fall 2019

Fairness in ML 2: Equal opportunity and odds Privacy & Fairness in Data Science CS848 Fall

CS573 Data Privacy and Security Data Privacy and Security in Healthcare Data Privacy and Security

Privacy & Fairness in Data Science CS848 Fall 2019 2 Instructor Xi He: Research

Differential Privacy Privacy & Fairness in Data Science CS848 Fall 2019 2 Outline

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

COMP30112: Concurrency Topics 5.4: Fairness and Starvation Howard Barringer Room KB2.20: email:

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

Media Fairness, Diversity 1 Outline Fairness (case studies, basic definitions) Diversity

Managing workplace stress Getting the best from today Being better tomorrow Stress: That

Large-scale Paraphrasing for Natural Language Generation Chris Callison-Burch March 26, 2015

Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel Fernndez Institute for Logic,

Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel Fernndez Institute for Logic,

http://cs224w.stanford.edu Three topics for today: 1. GNN recommendation (PinSage) 2.

POTENTIAL FOR STRUCTURAL TRANSFORMATION OF THE GHANAIAN ECONOMY UNU-WIDER CONFERENCE 13 th

The African IP Trust What it is and Why it is Needed Meg Brindle, PhD, and Ron Layton, Chief

AP Chemistry Compounds 2015-09-14 www.njctl.org Slide 3 / 163 Table of Contents: Compounds Pt.

Explaining Privacy and Fairness Violations in Data-Driven Systems - PowerPoint PPT Presentation

Explaining Privacy and Fairness Violations in Data-Driven Systems Matt Fredrikson Carnegie Mellon University Joint effort Emily Black Gihyuk Ko Klas Leino Anupam Datta Sam Yeom Piotr Mardziel Shayak Sen 2 Data-driven systems are

Fairness in Machine Learning: Part I Privacy &amp; Fairness in Data Science CS848 Fall 2019

Privacy Attacks Practicum Privacy &amp; Fairness in Data Science CS848 Fall 2019 2 Module 1:

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

Astronomical Tests Possible Violations of . . . Possible Violations of . . . of Relativity:

Toniann Pitassi Outline 1. Differential Privacy: The Basics 2. Differential Privacy in New

Fairness in Machine Learning: Practicum Privacy &amp; Fairness in Data Science CS848 Fall 2019

Fairness in ML 2: Equal opportunity and odds Privacy &amp; Fairness in Data Science CS848 Fall

CS573 Data Privacy and Security Data Privacy and Security in Healthcare Data Privacy and Security

Privacy &amp; Fairness in Data Science CS848 Fall 2019 2 Instructor Xi He: Research

Differential Privacy Privacy &amp; Fairness in Data Science CS848 Fall 2019 2 Outline

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

COMP30112: Concurrency Topics 5.4: Fairness and Starvation Howard Barringer Room KB2.20: email:

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

Media Fairness, Diversity 1 Outline Fairness (case studies, basic definitions) Diversity

Managing workplace stress Getting the best from today Being better tomorrow Stress: That

Large-scale Paraphrasing for Natural Language Generation Chris Callison-Burch March 26, 2015

Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel Fernndez Institute for Logic,

Taaltheorie en Taalverwerking BSc Artificial Intelligence Raquel Fernndez Institute for Logic,

http://cs224w.stanford.edu Three topics for today: 1. GNN recommendation (PinSage) 2.

POTENTIAL FOR STRUCTURAL TRANSFORMATION OF THE GHANAIAN ECONOMY UNU-WIDER CONFERENCE 13 th

The African IP Trust What it is and Why it is Needed Meg Brindle, PhD, and Ron Layton, Chief

AP Chemistry Compounds 2015-09-14 www.njctl.org Slide 3 / 163 Table of Contents: Compounds Pt.

Fairness in Machine Learning: Part I Privacy & Fairness in Data Science CS848 Fall 2019

Privacy Attacks Practicum Privacy & Fairness in Data Science CS848 Fall 2019 2 Module 1:

Fairness in Machine Learning: Practicum Privacy & Fairness in Data Science CS848 Fall 2019

Fairness in ML 2: Equal opportunity and odds Privacy & Fairness in Data Science CS848 Fall

Privacy & Fairness in Data Science CS848 Fall 2019 2 Instructor Xi He: Research

Differential Privacy Privacy & Fairness in Data Science CS848 Fall 2019 2 Outline