fairness aware learning for continuous attributes and
play

Fairness-Aware Learning for Continuous Attributes and Treatments - PowerPoint PPT Presentation

Fairness-Aware Learning for Continuous Attributes and Treatments Jrmie Mary, Criteo AI Lab Clment Calauznes, Criteo AI Lab Noureddine El Karoui, Criteo AI Lab and UC, Berkeley ICML 2019, Long Beach, CA generalizes to Y generalizes to


  1. Fairness-Aware Learning for Continuous Attributes and Treatments Jérémie Mary, Criteo AI Lab Clément Calauzènes, Criteo AI Lab Noureddine El Karoui, Criteo AI Lab and UC, Berkeley ICML 2019, Long Beach, CA

  2. generalizes to Y generalizes to Y Fairness and independence Z ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and We propose new metrics that also easily generalize to continuous variables. even when Z non binary Z Demographic Parity even when Z non binary Z Y EO Generalizations using independence notions disparate impact, demographic parity Y 2 / 8 Z Y of variable Y (e.g. payment default) based on available information X (credit card history); prediction may be biased/unfair wrt sensitive attribute Z (gender). Most fairness work restricted to binary values of Y and Z . DEO Y Y Z Y Z Y Equal Opportunity DI Y Setup build prediction ˆ

  3. generalizes to Y generalizes to Y Fairness and independence EO ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and We propose new metrics that also easily generalize to continuous variables. even when Z non binary Z Demographic Parity even when Z non binary Z Y Generalizations using independence notions wrt sensitive attribute Z (gender). Most fairness work restricted to binary values of Y and Z . available information X (credit card history); prediction may be biased/unfair Y of variable Y (e.g. payment default) based on 2 / 8 Setup build prediction ˆ DEO = P (ˆ Y =1 | Z =1 , Y =1) − P (ˆ Y =1 | Z =0 , Y =1) Equal Opportunity DI = P (ˆ Y =1 | Z =0) , disparate impact, demographic parity P (ˆ Y =1 | Z =1)

  4. Fairness and independence Z ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and We propose new metrics that also easily generalize to continuous variables. generalizes to Demographic Parity generalizes to EO Generalizations using independence notions disparate impact, demographic parity Z Y 2 / 8 Y Y DEO Most fairness work restricted to binary values of Y and Z . wrt sensitive attribute Z (gender). available information X (credit card history); prediction may be biased/unfair Y of variable Y (e.g. payment default) based on DI Z Y Y Z Y Equal Opportunity Setup build prediction ˆ → ˆ − − − − − − − − Y ⊥ ⊥ Z | Y , even when Z non binary , → ˆ − − − − − − − − Y ⊥ ⊥ Z , even when Z non binary .

  5. HGR: measuring independence Defjnition (Hirschfeld-Gebelein-Rényi Maximum Correlation Coeffjcient) ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and Connection exploited in RDC, [8] with CCA in RKHS If f g only linear functions, get CCA. iff V and U independent. ; HGR U V HGR U V 3 / 8 (1) Given two random variables U ∈ U and V ∈ V , hgr ( U , V ) ≜ sup ρ ( f ( U ) , g ( V )) f , g [ ] [ ] f 2 ( U ) g 2 ( V ) < ∞ . ρ :Pearson’s correlation; f , g such that E , E

  6. HGR: measuring independence Defjnition (Hirschfeld-Gebelein-Rényi Maximum Correlation Coeffjcient) ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and Connection exploited in RDC, [8] with CCA in RKHS 3 / 8 (1) Given two random variables U ∈ U and V ∈ V , hgr ( U , V ) ≜ sup ρ ( f ( U ) , g ( V )) f , g [ ] [ ] f 2 ( U ) g 2 ( V ) < ∞ . ρ :Pearson’s correlation; f , g such that E , E 0 ≤ HGR ( U , V ) ≤ 1 ; HGR ( U , V ) = 0 iff V and U independent. If f , g only linear functions, get CCA.

  7. Information theory and relaxation Theorem (Witsenhausen’75) ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and Extends naturally to continuous variables (replace sums by integrals) 4 / 8 Suppose U and V discrete and let matrix π ( u , v ) Q ( u , v ) = √ √ , then hgr ( U , V ) = σ 2 ( Q ) . π U ( u ) π V ( v ) π ( u , v ) joint distribution of ( U , V ) ; π U and π V marginals. σ 2 : 2nd largest singular value. Upper bound on HGR by χ 2 -divergence

  8. Fairness aware learning; Equalized Odds (EO) argmin ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and Related work : [2], [5],[9], [4], [1], [3], [6], [11], [7, 10] 5 / 8 argmin Given expected loss L , function class H and fairness tolerance ε > 0 , solve : HGR | ∞ ≜ || HGR (ˆ L ( h , X , Y ) subject to Y | Y = y , Z | Y = y ) || ∞ ≤ ε h ∈H Practicals: Relax constraint HGR | ∞ ≤ ε to get tractable penalty : If � � � χ 2 (ˆ � χ 2 | 1 = π (ˆ y | y , z | y ) , ˆ π (ˆ y | y ) ⊗ ˆ π ( z | y )) 1 , this yields L ( h , X , Y ) + λχ 2 | 1 h ∈H

  9. Fairness aware learning; Equalized Odds (EO) argmin ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and Related work : [2], [5],[9], [4], [1], [3], [6], [11], [7, 10] 5 / 8 argmin Given expected loss L , function class H and fairness tolerance ε > 0 , solve : HGR | ∞ ≜ || HGR (ˆ L ( h , X , Y ) subject to Y | Y = y , Z | Y = y ) || ∞ ≤ ε h ∈H Practicals: Relax constraint HGR | ∞ ≤ ε to get tractable penalty : If � � � χ 2 (ˆ � χ 2 | 1 = π (ˆ y | y , z | y ) , ˆ π (ˆ y | y ) ⊗ ˆ π ( z | y )) 1 , this yields L ( h , X , Y ) + λχ 2 | 1 h ∈H

  10. Y and Z binary valued: comparison with previous work Drug ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and NN FERM Test case SVM ACC DEO ACC DEO ACC DEO ACC DEO ACC DEO Method 6 / 8 German Adult COMPAS Arrhythmia Smaller datasets difficult for our proposal. NN effect. Results comparable to state of the art use our proposal with neural network to train a classifier such that a Goal: maintain good accuracy while having a smaller DEO. Y . Reproduce and compare experiments from Donini et al. '18 [3]. binary sensitive Z does not unfairly influence an outcome � Naïve SVM 75 ± 4 11 ± 3 72 ± 1 14 ± 2 80 9 74 ± 5 12 ± 5 81 ± 2 22 ± 4 71 ± 5 10 ± 3 73 ± 1 11 ± 2 79 8 74 ± 3 10 ± 6 81 ± 2 22 ± 3 75 ± 5 5 ± 2 96 ± 1 9 ± 2 77 1 73 ± 4 5 ± 3 79 ± 3 10 ± 5 74 ± 7 19 ± 14 97 ± 0 1 ± 0 84 14 74 ± 4 47 ± 19 79 ± 3 15 ± 16 NN + χ 2 75 ± 6 15 ± 9 96 ± 0 0 ± 0 83 3 73 ± 3 25 ± 14 78 ± 5 0 ± 0

  11. DNN 0.7 LR Y | Z , Y Y | Z , Y DNN + L LR + L 2 2 DNN + KL| 1 LR + KL| 1 0.6 Fairness (HGR ∞ ) Fairness (HGR ∞ ) 0.5 DNN + χ 2 | 1 LR + χ 2 | 1 0.5 0.4 0.4 contrast with baseline L Y Z Y penalty which suffers from mini-batching 0.02 0.03 0.1 1.0 Predictive Error (MSE) Predictive Error (MSE) and L Y Z Y some ̂ Figure: Equalized odds with DNN Fairness-Aware Learning for Continuous Attributes and Treatments ICML '19 Continuous Case: Criminality Rates points out of graph to the right. Regression: for KL Figure: Equalized odds with Linear Dataset : UCI Communities+and+Crime . 2 sets of experiments, 3 fairness ̂ work smoothly with mini-batched stochastic optimization; and KL Important that fairness penalty be compatible with DNNs DNN improves fairness at lower price than linear models in terms of MSE. We find : Linear regression (LR), full batches of data penalties : 7 / 8 Deep neural nets (DNN) with mini-batches ( n = 200 ; Adam as optimizer) Regularization parameter λ varies 2 − 4 to 2 6

  12. DNN 0.7 LR Y | Z , Y Y | Z , Y DNN + L LR + L 2 2 DNN + KL| 1 LR + KL| 1 0.6 Fairness (HGR ∞ ) Fairness (HGR ∞ ) 0.5 DNN + χ 2 | 1 LR + χ 2 | 1 0.5 0.4 0.4 0.02 0.03 0.1 1.0 Predictive Error (MSE) Predictive Error (MSE) and L Y Z Y some ̂ Figure: Equalized odds with DNN points out of graph to the right. Fairness-Aware Learning for Continuous Attributes and Treatments ICML '19 Continuous Case: Criminality Rates Regression: for KL DNN improves fairness at lower price than linear models in terms of MSE. penalties : Linear regression (LR), full batches of data Deep neural nets (DNN) with mini-batches ( n ; Adam as optimizer) Regularization parameter varies to We find : Important that fairness penalty be compatible with DNNs Figure: Equalized odds with Linear contrast with baseline L penalty which suffers from mini-batching Dataset : UCI Communities+and+Crime . 2 sets of experiments, 3 fairness ̂ 7 / 8 χ 2 | 1 and KL | 1 work smoothly with mini-batched stochastic optimization; ˆ Y | Z , Y 2

  13. contrast with baseline L Y Z Y penalty which suffers from mini-batching Continuous Case: Criminality Rates ICML '19 Treatments Fairness-Aware Learning for Continuous Attributes and Figure: Equalized odds with DNN ̂ points out of graph to the right. some Figure: Equalized odds with Linear Dataset : UCI Communities+and+Crime . 2 sets of experiments, 3 fairness ̂ 7 / 8 work smoothly with mini-batched stochastic optimization; Regularization parameter varies We find : DNN improves fairness at lower price than linear models in terms of MSE. Important that fairness penalty be compatible with DNNs and KL to Deep neural nets (DNN) with mini-batches ( n ; Adam as optimizer) Linear regression (LR), full batches of data penalties : 0.7 DNN LR Y | Z , Y Y | Z , Y DNN + L LR + L 2 2 DNN + KL| 1 LR + KL| 1 0.6 Fairness (HGR ∞ ) Fairness (HGR ∞ ) 0.5 DNN + χ 2 | 1 LR + χ 2 | 1 0.5 0.4 0.4 0.02 0.03 0.1 1.0 Predictive Error (MSE) Predictive Error (MSE) ˆ Y | Z , Y Regression: for KL | 1 and L 2

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend