Is There a Trade-Off Between Fairness and Accuracy? A Perspective - - PowerPoint PPT Presentation

β–Ά
is there a trade off between fairness and accuracy a
SMART_READER_LITE
LIVE PREVIEW

Is There a Trade-Off Between Fairness and Accuracy? A Perspective - - PowerPoint PPT Presentation

Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing Sanghamitra Dutta Hazar Yueksel Dennis Wei sanghamd@andrew.cmu.edu hazar.yueksel@ibm.com dwei@us.ibm.com Kush Varshney Pin-Yu Chen Sijia


slide-1
SLIDE 1

A Perspective Using Mismatched Hypothesis Testing

1

Is There a Trade-Off Between Fairness and Accuracy?

Sanghamitra Dutta sanghamd@andrew.cmu.edu Dennis Wei dwei@us.ibm.com Hazar Yueksel hazar.yueksel@ibm.com Pin-Yu Chen pin-yu.chen@ibm.com Sijia Liu sijia.liu@ibm.com Kush Varshney krvarshn@us.ibm.com

slide-2
SLIDE 2

Motivational Example

2

Construct Space Observed Space (π‘Œ!, 𝑍

!)

(π‘Œ, 𝑍)

Noisy Mapping π‘Œ: Exam Score 𝑍: Data Label (0) or (1) π‘Ž: Protected Attribute (Gender, Race, etc.) π‘Œ!: True Ability 𝑍

!: True Label

No trade-off between accuracy and fairness Bayes optimal classifier achieves fairness (Equal Opportunity) Accuracy-fairness trade-off in observed space is due to noisier mappings for one group making the 0 and 1 labels β€œless separable”

Setup inspired from [Friedler et al. ’16] [Yeom et al. ’18]; Definition of Equal Opportunity [Hardt et al. β€˜16]

slide-3
SLIDE 3

Main Contributions

3

Concept of Separability

Chernoff Information: approximation to best error exponent in binary classification

  • Explain the trade-off (Theorem 1)
  • Compute fundamental limits

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 1.2 1.4

Trade-off on Existing Data Accuracy Discrimination Trade-off after Data Collection Accuracy with respect to observed dataset is a problematic measure of performance

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 1.2

Trade-off on Existing Data Accuracy Discrimination

Ideal Distributions

where accuracy and fairness are in accord

  • Proof of existence (Theorem 2)

With analytical forms

  • Interpretation

Plausible distributions in observed space,

  • r distributions in the construct space

Alleviate Trade-off in Real World

Gather knowledge from active data collection, often improving separability

  • Criterion to alleviate (Theorem 3)
  • Compute alleviated trade-off

These results also explain why active fairness works

slide-4
SLIDE 4

Related Works

  • Characterizing Accuracy-Fairness Trade-Off

[Menon & Williamson β€˜18] [Garg et al. β€˜19] [Chen et al. β€˜18] [Zhao & Gordon β€˜19]

  • Empirical Datasets for Accuracy Evaluation

[Wick et al. ’19] [Sharma et al. β€˜19]

  • Pre-processing Datasets for Fairness

[Calmon et al. β€˜18] [Feldman et al. β€˜15] [Zemel et al. β€˜13]

  • Explainability/ Active Fairness

[Varshney et al. β€˜18] [Noriega-Campero et al. β€˜19]

4

Exponent Analysis with Geometric Interpretability

slide-5
SLIDE 5

Preliminaries

  • Probability of False Negative(FN): 𝑄"#,%

! 𝜐& = Pr(π‘ˆ

& 𝑦 < 𝜐&|𝑍 = 1, π‘Ž = 𝑨)

Wrongful Reject of True (+), i.e., True Y=1

  • Probability of False Positive(FP): 𝑄"',%

! 𝜐& = Pr(π‘ˆ

& 𝑦 β‰₯ 𝜐&|𝑍 = 0, π‘Ž = 𝑨)

Wrongful Accept of True (βˆ’), i.e., True Y=0

  • Probability of error: 𝑄(,% 𝜐 = 𝜌)𝑄"',% 𝜐 + 𝜌* 𝑄"#,% 𝜐

5

Construct Space Observed Space (π‘Œ!, 𝑍

!)

(π‘Œ, 𝑍)

Noisy Mapping 𝑍 = 𝑍

!

π‘Œ = 𝑔

+,, π‘Œ!

For group Z=0, π‘Œ|+-),,-) ∼ 𝑄) 𝑦 π‘Œ|+-*,,-) ∼ 𝑄

*(𝑦)

For group Z=1, π‘Œ|+-),,-* ∼ 𝑅) 𝑦 π‘Œ|+-*,,-* ∼ 𝑅* (𝑦)

π‘ˆ! 𝑦 = log 𝑄

"(𝑦)

𝑄!(𝑦) β‰₯ 𝜐! π‘ˆ

" 𝑦 = log 𝑅"(𝑦)

𝑅!(𝑦) β‰₯ 𝜐"

Prior probabilities (assume 𝜌) = 𝜌* = 1/2)

EQUAL OPPORTUNITYΓ  EQUAL Prob. of FN

slide-6
SLIDE 6

Quick Background on Chernoff Error Exponents

6

𝑄!",$

! 𝜐% ≲ 𝑓&'"#,%!()!)

𝑄!+,$

! 𝜐% ≲ 𝑓&'"&,%!()!)

Chernoff exponents of probabilities of FN and FP (Larger exponent Γ  lower error) Since 𝑄.,0 𝜐 =

1 2 𝑄34,0 𝜐 + 1 2 𝑄35,0 𝜐 , we define

the Chernoff exponent of overall error probability as 𝐹.,0

! 𝜐6 = min{𝐹35,0 ! 𝜐6 , 𝐹34,0 !(𝜐6)}

Lemma: Chernoff exponent of error probability for Bayes optimal classifier between distributions 𝑄7(𝑦) under 𝑍 = 0 and 𝑄

1(𝑦) under 𝑍 = 1:

Chernoff information 𝐷 𝑄7, 𝑄

1 = βˆ’ log min 8∈[7,1] βˆ‘π‘„7 𝑦 8𝑄 1 𝑦 1<8

(Larger exponent Γ  lower error Γ  higher accuracy)

[Cover & Thomas]

slide-7
SLIDE 7

Our Proposition: Concept of Separability

  • Definition of Separability: For a group of people with data

distributions 𝑄7(𝑦) and 𝑄

1(𝑦) under hypotheses 𝑍 = 0 and 𝑍 = 1,

we define the separability as their Chernoff information 𝐷 𝑄7, 𝑄

1 .

7

Geometric interpretability makes them tractable

slide-8
SLIDE 8
  • 1.5
  • 1
  • 0.5

0.5 1 1.5

  • 2
  • 1

1 2

Ξ›0(u)

Ξ›1(u)

log-generating function

u

Geometric understanding of the results

8

Tangent with slope 𝝊𝟏

EFN EFP

𝐹#$,&

! 𝜐! = sup

'(!

(π‘£πœ! βˆ’ Ξ›! 𝑣 )

For group Z=0, 𝑄) 𝑦 ~𝑂 (1,1) 𝑄

* 𝑦 ~𝑂(4,1)

Ξ›! 𝑣 = 𝐦𝐩𝐑 𝐅 𝑓'&

! ) 𝑍 = 0, π‘Ž = 0 = 9

2 𝑣(𝑣 βˆ’ 1) Ξ›" 𝑣 = 𝐦𝐩𝐑 𝐅 𝑓'&

! ) 𝑍 = 1, π‘Ž = 0 = 9

2 𝑣(𝑣 + 1)

  • 5

5 0.2 0.4

𝜈 = 1 𝜈 = 4 𝑄

! 𝑦

𝑄

" 𝑦

𝐹#*,&

! 𝜐! = sup

'+!

(π‘£πœ! βˆ’ Ξ›" 𝑣 )

π‘ˆ) 𝑦 β‰₯ 𝜐)

𝐹,,&

! 𝜐! = min{𝐹#*,& ! 𝜐! , 𝐹#$,& !(𝜐!)}

slide-9
SLIDE 9
  • 1.5
  • 1
  • 0.5

0.5 1 1.5

  • 2
  • 1

1 2

Geometric understanding of the results

9

EFN EFP

𝐹#$,&

! 𝜐! = sup

'(!

(π‘£πœ! βˆ’ Ξ›! 𝑣 )

For group Z=0, 𝑄) 𝑦 ~𝑂 (1,1) 𝑄

* 𝑦 ~𝑂(4,1)

Ξ›! 𝑣 = 𝐦𝐩𝐑 𝐅 𝑓'&

! ) 𝑍 = 0, π‘Ž = 0 = 9

2 𝑣(𝑣 βˆ’ 1) Ξ›" 𝑣 = 𝐦𝐩𝐑 𝐅 𝑓'&

! ) 𝑍 = 1, π‘Ž = 0 = 9

2 𝑣(𝑣 + 1)

  • 5

5 0.2 0.4

𝜈 = 1 𝜈 = 4 𝑄

! 𝑦

𝑄

" 𝑦

𝐹#*,&

! 𝜐! = sup

'+!

(π‘£πœ! βˆ’ Ξ›" 𝑣 )

π‘ˆ) 𝑦 β‰₯ 𝜐)

log-generating function

u

Ξ›0(u)

Ξ›1(u)

𝐹,,&

! 𝜐! = min{𝐹#*,& ! 𝜐! , 𝐹#$,& !(𝜐!)}

slide-10
SLIDE 10
  • 1.5
  • 1
  • 0.5

0.5 1 1.5

  • 2
  • 1

1 2

Geometric understanding of the results

10

EFN EFP

𝐹#$,&

! 𝜐! = sup

'(!

(π‘£πœ! βˆ’ Ξ›! 𝑣 )

For group Z=0, 𝑄) 𝑦 ~𝑂 (1,1) 𝑄

* 𝑦 ~𝑂(4,1)

Ξ›! 𝑣 = 𝐦𝐩𝐑 𝐅 𝑓'&

! ) 𝑍 = 0, π‘Ž = 0 = 9

2 𝑣(𝑣 βˆ’ 1) Ξ›" 𝑣 = 𝐦𝐩𝐑 𝐅 𝑓'&

! ) 𝑍 = 1, π‘Ž = 0 = 9

2 𝑣(𝑣 + 1)

  • 5

5 0.2 0.4

𝜈 = 1 𝜈 = 4 𝑄

! 𝑦

𝑄

" 𝑦

𝐹#*,&

! 𝜐! = sup

'+!

(π‘£πœ! βˆ’ Ξ›" 𝑣 )

π‘ˆ) 𝑦 β‰₯ 𝜐)

log-generating function

u

𝐹35 = 𝐹34 = 𝐷(𝑄7, 𝑄

1)

Ξ›0(u)

Ξ›1(u)

𝐹,,&

! 𝜐! = min{𝐹#*,& ! 𝜐! , 𝐹#$,& !(𝜐!)}

C(P0, P1)

slide-11
SLIDE 11
  • 1.5
  • 1
  • 0.5

0.5 1 1.5

  • 2
  • 1

1 2

Geometric understanding of the results

11

EFN EFP

𝐹#$,&

! 𝜐! = sup

'(!

(π‘£πœ! βˆ’ Ξ›! 𝑣 )

For group Z=0, 𝑄) 𝑦 ~𝑂 (1,1) 𝑄

* 𝑦 ~𝑂(4,1)

Ξ›! 𝑣 = 𝐦𝐩𝐑 𝐅 𝑓'&

! ) 𝑍 = 0, π‘Ž = 0 = 9

2 𝑣(𝑣 βˆ’ 1) Ξ›" 𝑣 = 𝐦𝐩𝐑 𝐅 𝑓'&

! ) 𝑍 = 1, π‘Ž = 0 = 9

2 𝑣(𝑣 + 1)

  • 5

5 0.2 0.4

𝜈 = 1 𝜈 = 4 𝑄

! 𝑦

𝑄

" 𝑦

𝐹#*,&

! 𝜐! = sup

'+!

(π‘£πœ! βˆ’ Ξ›" 𝑣 )

π‘ˆ) 𝑦 β‰₯ 𝜐)

log-generating function

u

Ξ›0(u)

Ξ›1(u)

𝐹,,&

! 𝜐! = min{𝐹#*,& ! 𝜐! , 𝐹#$,& !(𝜐!)}

C(P0, P1)

slide-12
SLIDE 12

Accuracy-fairness trade-off is due to difference in separability of one group of people over another

Theorem 1 (informal): One of the following is true in observed space:

  • Unbiased Mappings 𝐷 𝑄7, 𝑄

1 = 𝐷 𝑅7, 𝑅1 : Bayes optimal classifiers for both

groups also satisfy equal opportunity, i.e., 𝐹35,0

" 𝜐7 = 𝐹35,0 # 𝜐1 .

  • Biased Mappings 𝐷 𝑄7, 𝑄

1 < 𝐷 𝑅7, 𝑅1 : Given two classifiers (one for each

group) that satisfy equal opportunity, for at least one of the groups it is not the Bayes optimal classifier, i.e.,

12

Either 𝐹.,0

" 𝜐7 < 𝐷(𝑄7, 𝑄

1) or 𝐹.,0

# 𝜐1 < 𝐷(𝑅7, 𝑅1) or both

slide-13
SLIDE 13

Geometric understanding of the results

13

For group Z=0, 𝑄) 𝑦 ~𝑂 (1,1) 𝑄

* 𝑦 ~𝑂(4,1)

π‘ˆ) 𝑦 β‰₯ 𝜐) For group Z=1, 𝑅) 𝑦 ~𝑂 (0,1) 𝑅 𝑦 ~𝑂(4,1) π‘ˆ

* 𝑦 β‰₯ 𝜐*

  • 5

5 0.2 0.4

𝜈 = 1 𝜈 = 4 𝑄

! 𝑦

𝑄

" 𝑦

  • 5

5 0.2 0.4

𝜈 = 0 𝜈 = 4 𝑅! 𝑦 𝑅" 𝑦

  • 1.5
  • 1
  • 0.5

0.5 1 1.5

  • 4
  • 2

2 4

C(P0, P1)

C(Q0, Q1)

For group Z=0, we have 𝐹#* = 𝐹#$ = 𝐷(𝑄!, 𝑄

")

For group Z=1, we have 𝐹#* = 𝐹#$ = 𝐷(𝑅!, 𝑅")

Bayes optimal classifiers do not satisfy Equal Opportunity (unequal 𝐹67)

slide-14
SLIDE 14

Geometric understanding of the results

Avoid active harm to privileged group?

14

For group Z=0, 𝑄) 𝑦 ~𝑂 (1,1) 𝑄

* 𝑦 ~𝑂(4,1)

π‘ˆ) 𝑦 β‰₯ 𝜐) For group Z=1, 𝑅) 𝑦 ~𝑂 (0,1) 𝑅 𝑦 ~𝑂(4,1) π‘ˆ

* 𝑦 β‰₯ 𝜐*

  • 5

5 0.2 0.4

𝜈 = 1 𝜈 = 4 𝑄

! 𝑦

𝑄

" 𝑦

  • 5

5 0.2 0.4

𝜈 = 0 𝜈 = 4 𝑅! 𝑦 𝑅" 𝑦

  • 1

1

  • 4
  • 2

2 4

𝐹67,%

" 𝜐) = 𝐹67,% #(𝜐*)

Equal Opportunity (equal 𝐹67) satisfied but sub-optimal for privileged group Z=1

slide-15
SLIDE 15

Geometric understanding of the results

15

For group Z=0, 𝑄) 𝑦 ~𝑂 (1,1) 𝑄

* 𝑦 ~𝑂(4,1)

π‘ˆ) 𝑦 β‰₯ 𝜐) For group Z=1, 𝑅) 𝑦 ~𝑂 (0,1) 𝑅 𝑦 ~𝑂(4,1) π‘ˆ

* 𝑦 β‰₯ 𝜐*

  • 5

5 0.2 0.4

𝜈 = 1 𝜈 = 4 𝑄

! 𝑦

𝑄

" 𝑦

  • 5

5 0.2 0.4

𝜈 = 0 𝜈 = 4 𝑅! 𝑦 𝑅" 𝑦

  • 1

1

  • 4
  • 2

2 4

𝐹67,%

" 𝜐) = 𝐹67,% #(𝜐*)

Equal Opportunity (equal 𝐹67) satisfied but sub-optimal for unprivileged group Z=0 For at least one of the groups, accuracy on given data is compromised for fairness.

slide-16
SLIDE 16

Ideal distributions where accuracy and fairness are in accord

Theorem 2 (informal): Fix Bayes optimal classifier for privileged group Z=1. Then, for group Z=0, there exists ideal distributions of the forms and such that:

  • Fairness on given data: The Bayes optimal classifier for the new distributions

is fair on given data (in fact it is the same classifier π‘ˆ

7 βˆ— 𝑦 β‰₯ 𝜐7 βˆ— that was sub-

  • ptimal but fair on the given data).
  • Fairness and Optimal Accuracy on ideal data: On the ideal data, this Bayes
  • ptimal classifier also has 𝐹ab= 𝐷 >

𝑄7, > 𝑄

1 = 𝐷(𝑅7, 𝑅1).

16

Proof of existence of ideal distributions (with analytical forms)

slide-17
SLIDE 17

How to go about finding such ideal distributions?

17

where \ π‘ˆ) 𝑦 = log

8 '

#(:)

8 '

"(:) β‰₯ 0 is the Bayes optimal classifier for the ideal distributions.

slide-18
SLIDE 18

How to interpret these ideal distributions?

18

Construct Space Observed Space (π‘Œ!, 𝑍

!)

(π‘Œ, 𝑍)

Biased Noisy Mapping 𝑍 = 𝑍

!

π‘Œ = 𝑔

+,, *

π‘Œ! ( ] π‘Œ, ] 𝑍) U n b i a s e d M a p p i n g ] 𝑍 = 𝑍

!

] π‘Œ = 𝑔

+,, <

π‘Œ!

Plausible distributions in observed space under unbiased mappings, or candidate distributions in the construct space under identity mappings

For group Z=0, ] π‘Œ|+-),,-) ∼ \ 𝑄) 𝑦 ] π‘Œ|+-*,,-) ∼ \ 𝑄

* (𝑦)

For group Z=1, ] π‘Œ|+-),,-* ∼ 𝑅) 𝑦 ] π‘Œ|+-*,,-* ∼ 𝑅* (𝑦)

slide-19
SLIDE 19

When does active data collection alleviate the accuracy- fairness trade-off in the real world?

π‘Œβ€² : New feature collected for Z=0 Theorem 3: The separability 𝐷(𝑋

7, 𝑋 1) is strictly greater than 𝐷(𝑄7, 𝑄 1) if and

  • nly if the conditional mutual information 𝐽(π‘Œβ€²; 𝑍|π‘Œ, π‘Ž = 0) > 0.

19

π‘Œ, π‘Œβ€²|+-*,,-)~𝑋

*(𝑦, 𝑦=)

π‘Œ, π‘Œβ€²|+-),,-)~𝑋

)(𝑦, 𝑦=)

Improving separability alleviates the accuracy-fairness trade-off in the real world

slide-20
SLIDE 20

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 1.2 1.4

C(P0, P1)

C(W0, W1)

FAIR but SUBOPTIMAL

  • n existing data

FAIR but SUBOPTIMAL on existing data after data collection BAYES OPTIMAL but UNFAIR on existing data after data collection BAYES OPTIMAL but UNFAIR on existing data Trade-off on Existing Data Trade-off after Data Collection

Decrease in Fairness

Accuracy (𝐹!,#

! 𝜐$ )

|𝐹%&,#

! 𝜐$ βˆ’ 𝐹%&,# " 𝜐' |

For group Z=0, 𝑄) 𝑦 ~𝑂 (1,1) 𝑄

* 𝑦 ~𝑂(4,1)

For group Z=0, 𝑋

) 𝑦, 𝑦′ ~𝑂 ((1,1),π‰πŸ‘π²πŸ‘)

𝑋

* 𝑦, 𝑦′ ~𝑂((4,2), π‰πŸ‘π²πŸ‘)

Numerical example: Exact computation of the trade-off

20

slide-21
SLIDE 21

Summary

21

  • Provides new tools that go beyond explaining accuracy-fairness trade-off
  • Geometric interpretability helps exact quantification of this trade-off
  • Separability, ideal distributions and their connection to construct space
  • Criterion to alleviate the trade-off explains success of active fairness

Thank You!