fa fairtest
play

Fa FairTest: : Disc Discovering un ering unwarr arran anted - PowerPoint PPT Presentation

Fa FairTest: : Disc Discovering un ering unwarr arran anted asso ed associa4o cia4ons in ns in da data- a-driv driven n applic applica4o a4ons ns IEEE IEEE Eur EuroS&P S&P Ap April 28t 28th, 2017 2017 Florian


  1. Fa FairTest: : Disc Discovering un ering unwarr arran anted asso ed associa4o cia4ons in ns in da data- a-driv driven n applic applica4o a4ons ns IEEE IEEE Eur EuroS&P S&P Ap April 28t 28th, 2017 2017 Florian Tramèr 1 , Vaggelis Atlidakis 2 , Roxana Geambasu 2 , Daniel Hsu 2 , Jean-Pierre Hubaux 3 , Mathias Humbert 4 , Ari Juels 5 , Huang Lin 3 1 Stanford University, 2 Columbia University, 3 École Polytechnique Fédérale de Lausanne, 4 Saarland University, 5 Cornell Tech 1

  2. “Unfair” associa4ons + + consequences 2

  3. “Unfair” associa4ons + + consequences 3 These are so#ware bugs : need to ac#vely test for them and fix them (i.e., debug) in data-driven applicaSons… just as with func#onality, performance, and reliability bugs.

  4. Un Unwarr arran anted A ed Asso ssocia4o ia4ons Mo ns Model del 4 Data-driven User inputs ApplicaSon outputs applicaSon Protected inputs

  5. Limi mits of preventa4ve me measures 5 What doesn’t work : • Hide protected aUributes from data-driven applicaSon. • Aim for staSsScal parity w.r.t. protected classes and service output. Foremost challenge is to even detect these unwarranted associaSons.

  6. A Fr Frame mework for Unwarranted Associa4ons 6 1. Specify relevant data features : (e.g., Gender, Race, … ) • Protected variables (e.g., Price, Error rate, … ) • “USlity”: a funcSon of the algorithm’s output • Explanatory variables (e.g., QualificaSons) (e.g., LocaSon, Job, … ) • Contextual variables 2. Find sta6s6cally significant associa6ons between protected aUributes and uSlity • Condi#on on explanatory variables • Not Sed to any parScular sta#s#cal metric (e.g., odds raSo) 3. Granular search in seman6cally meaningful subpopula6ons • Efficiently list subgroups with highest adverse effects

  7. Fa FairTest: a t : a tes4ng suit es4ng suite f e for da r data- a-driv driven app en apps s 7 • Finds context-specific associaSons between protected variables and applicaSon outputs, condiSoned on explanatory variables • Bug report ranks findings by assoc. strength and affected pop. size locaSon, click, … prices, tags, … Data-driven User inputs ApplicaSon outputs applicaSon race, gender, … Protected vars. FairTest zip code, job, … Context vars. Explanatory vars. qualificaSons, … AssociaSon bug report for developer

  8. A data-d A d -driven en a approa oach ch 8 Core of FairTest is based on staSsScal machine learning Report of associations of O=Price on S i =Income: Find context-specific associaSons Assoc. metric: norm. mutual information (NMI). Global Population of size 494,436 p-value=3.34e-10 ; NMI=[0.0001, 0.0005] Price Income <$50K Income >=$50K Total FairTest Training data High 15301 (6%) 13867 (6%) 29168 (6%) Low 234167(94%) 231101(94%) 465268 (94%) Data Total 249468(50%) 244968(50%) 494436(100%) 1. Subpopulation of size 23,532 Test data Context= { State: CA, Race: White } StaSsScally validate associaSons p-value=2.31e-24 ; NMI=[0.0051, 0.0203] Price Income <$50K Income >=$50K Total High 606 (8%) 691 (4%) 1297 (6%) Ideally sampled from Low 7116(92%) 15119(96%) 22235 (94%) Total 7722(33%) 15810(67%) 23532(100%) relevant user populaSon 2. Subpopulation of size 2,198 Sta6s6cal machine learning internals : Context= { State: NY, Race: Black, Gender: Male } p-value=7.72e-05 ; NMI=[0.0040, 0.0975] • top-down spaSal parSSoning algorithm Price Income <$50K Income >=$50K Total High 52 (4%) 8 (1%) 60 (3%) Low 1201(96%) 937(99%) 2138 (97%) • confidence intervals for assoc. metrics Total 1253(57%) 945(43%) 2198(100%) ...more entries (sorted by decreasing NMI)... • …

  9. Reports for Fairness bugs Re 9 • Example: simulaSon of Report of associations of O=Price on S i =Income: locaSon based pricing Assoc. metric: norm. mutual information (NMI). Global Population of size 494,436 scheme p-value=3.34e-10 ; NMI=[0.0001, 0.0005] Price Income <$50K Income >=$50K Total High 15301 (6%) 13867 (6%) 29168 (6%) • Test for disparate impact on Low 234167(94%) 231101(94%) 465268 (94%) Total 249468(50%) 244968(50%) 494436(100%) low-income popula6ons 1. Subpopulation of size 23,532 Context= { State: CA, Race: White } p-value=2.31e-24 ; NMI=[0.0051, 0.0203] • Low effect over whole US Price Income <$50K Income >=$50K Total populaSon High 1297 (6%) 606 (8%) 691 (4%) Low 7116(92%) 15119(96%) 22235 (94%) Total 7722(33%) 15810(67%) 23532(100%) • High effects in specific sub- 2. Subpopulation of size 2,198 Context= { State: NY, Race: Black, Gender: Male } populaSons p-value=7.72e-05 ; NMI=[0.0040, 0.0975] Price Income <$50K Income >=$50K Total High 52 (4%) 8 (1%) 60 (3%) Low 1201(96%) 937(99%) 2138 (97%) Total 1253(57%) 945(43%) 2198(100%)

  10. As Associ ocia4on on-Gu -Guided ed D Deci ecision on T Trees ees 10 Goal: find most strongly affected user sub-populaSons A C Split into sub-populaSons with OccupaSon Increasingly strong associaSons B between protected variables … and applica6on outputs < 50 ≥ 50 Age … …

  11. Associ As ocia4on on-Gu -Guided ed D Deci ecision on T Trees ees 11 • Efficient discovery of contexts with high associaSons • Outperforms previous approaches based on frequent itemset mining • Easily interpretable contexts by default • AssociaSon-metric agnosSc Metric Use Case Binary raSo/difference Binary variables Mutual InformaSon Categorical variables Pearson CorrelaSon Scalar variables Regression High dimensional outputs Plugin your own! ??? • Greedy strategy (some bugs could be missed)

  12. Examp mple: healthcare applica4on 12 Predictor of whether pa6ent will visit hospital again in next year (from winner of 2012 Heritage Health Prize CompeSSon) FairTest findings : strong associaSon between age and predicSon error rate Hospital Will paSent be age, gender, re-admission re-admiUed? # emergencies, … predictor AssociaSon may translate to quanSfiable harms (e.g., if model is used to adjust insurance premiums)

  13. Debug Debugging with ging with Fa FairTest 13 Are there confounding factors ? Do associaSons disappear aqer condiSoning? ⇒ AdapSve Data Analysis! Example: the healthcare applicaSon (again) High confidence in predic#on • EsSmate predicSon confidence (target variance) • Does this explain the predictor’s behavior? • Yes, parSally FairTest helps developers understand & evaluate potenSal associaSon bugs.

  14. Ot Other er a applica4on ons s studied ed u using Fa FairTest 14 • Image tagger based on ImageNet data ⇒ Large output space (~1000 labels) ⇒ FairTest automaScally switches to regression metrics ⇒ Tagger has higher error rate for pictures of black people • Simple movie recommender system ⇒ Men are assigned movies with lower ra#ngs than women ⇒ Use personal preferences as explanatory factor ⇒ FairTest finds no significant bias anymore

  15. Closing rema marks 15 The Unwarranted Associa/ons Framework • Captures a broader set of algorithmic biases than in prior work • Principled approach for staSsScally valid invesSgaSons FairTest • The first end-to-end system for evaluaSng algorithmic fairness Developers need beOer sta6s6cal training and tools to make beOer sta6s6cal decisions and applica6ons. hUp://arxiv.org/abs/1510.02377

  16. Examp mple: Berkeley graduate admi missions 16 Admission into UC Berkeley graduate programs (Bickel, Hammel, and O’Connell, 1975) Bickel et al ’s (and also FairTest’s) findings : gender bias in admissions at university level, but mostly gone aqer condiSoning on department Graduate admissions Admit applicant? age, gender, GPA, … commiUees FairTest helps developers understand & evaluate potenSal associaSon bugs.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend