FAIRNESS RISK MEASURES FAIRNESS RISK MEASURES Robert C. Williamson - - PowerPoint PPT Presentation
FAIRNESS RISK MEASURES FAIRNESS RISK MEASURES Robert C. Williamson - - PowerPoint PPT Presentation
FAIRNESS RISK MEASURES FAIRNESS RISK MEASURES Robert C. Williamson Aditya Menon LOSS FUNCTIONS - OUTCOME CONTINGENT UTILITIES LOSS FUNCTIONS - OUTCOME CONTINGENT UTILITIES Walds abstraction: a loss function : Y A + {+
FAIRNESS RISK MEASURES
Robert C. Williamson Aditya Menon
LOSS FUNCTIONS - OUTCOME CONTINGENT UTILITIES
LOSS FUNCTIONS - OUTCOME CONTINGENT UTILITIES
▸ Wald’s abstraction: a loss function
ℓ: Y × A → ℝ+ ∪ {+∞} =: ℝ
Label space Action space
LOSS FUNCTIONS - OUTCOME CONTINGENT UTILITIES
▸ Wald’s abstraction: a loss function ▸ is an outcome contingent utility
ℓ: Y × A → ℝ+ ∪ {+∞} =: ℝ
a ↦ ℓ(y, a)
Label space Action space
LOSS FUNCTIONS - OUTCOME CONTINGENT UTILITIES
▸ Wald’s abstraction: a loss function ▸ is an outcome contingent utility ▸ Learning goal: expected risk minimisation
ℓ: Y × A → ℝ+ ∪ {+∞} =: ℝ
min
f∈ℱ 𝔽(𝖸,𝖹)∼P ℓ(𝖹, f(𝖸))
a ↦ ℓ(y, a)
Label space Action space
LOSS FUNCTIONS - OUTCOME CONTINGENT UTILITIES
▸ Wald’s abstraction: a loss function ▸ is an outcome contingent utility ▸ Learning goal: expected risk minimisation ▸ In practice: empirical risk minimisation
ℓ: Y × A → ℝ+ ∪ {+∞} =: ℝ
min
f∈ℱ 𝔽(𝖸,𝖹)∼P ℓ(𝖹, f(𝖸))
min
f∈ℱ 𝔽(𝖸,𝖹)∼Pm ℓ(𝖹, f(𝖸))
= min
f∈ℱ 1 m m
∑
i=1
ℓ(yi, f(xi))
a ↦ ℓ(y, a)
Label space Action space
MINIMISING EMPIRICAL RISK
MINIMISING EMPIRICAL RISK
50 100 150 200 1 2 3 4 5 6 7 8 9 10 50 100 150 200 1 2 3 4 5 6 7 8 9 10
Data index
Loss HYPOTHESIS
MINIMISING EMPIRICAL RISK
F1 50 100 150 200 1 2 3 4 5 6 7 8 9 10 F1 50 100 150 200 1 2 3 4 5 6 7 8 9 10
Data index
Loss HYPOTHESIS
MINIMISING EMPIRICAL RISK
F2 50 100 150 200 1 2 3 4 5 6 7 8 9 10 F2 50 100 150 200 1 2 3 4 5 6 7 8 9 10
Data index
Loss HYPOTHESIS
MINIMISING EMPIRICAL RISK
F3 50 100 150 200 1 2 3 4 5 6 7 8 9 10 F3 50 100 150 200 1 2 3 4 5 6 7 8 9 10
Data index
Loss HYPOTHESIS
MINIMISING EMPIRICAL RISK
F4 50 100 150 200 1 2 3 4 5 6 7 8 9 10 F4 50 100 150 200 1 2 3 4 5 6 7 8 9 10
Data index
Loss HYPOTHESIS
MINIMISING EMPIRICAL RISK
F5 50 100 150 200 1 2 3 4 5 6 7 8 9 10 F5 50 100 150 200 1 2 3 4 5 6 7 8 9 10
Data index
Loss HYPOTHESIS
MINIMISING EMPIRICAL RISK
F5 50 100 150 200 1 2 3 4 5 6 7 8 9 10 F5 50 100 150 200 1 2 3 4 5 6 7 8 9 10
Data index
Loss HYPOTHESIS
Average loss of best hypothesis
MINIMISING EMPIRICAL RISK WITH SENSITIVE ATTRIBUTES VISIBLE
MINIMISING EMPIRICAL RISK WITH SENSITIVE ATTRIBUTES VISIBLE
50 100 150 200 1 2 3 4 5 6 7 8 9 10 50 100 150 200 1 2 3 4 5 6 7 8 9 10
Data index
Loss HYPOTHESIS
MINIMISING EMPIRICAL RISK WITH SENSITIVE ATTRIBUTES VISIBLE
F1 50 100 150 200 1 2 3 4 5 6 7 8 9 10 F1 50 100 150 200 1 2 3 4 5 6 7 8 9 10
Data index
Loss HYPOTHESIS
MINIMISING EMPIRICAL RISK WITH SENSITIVE ATTRIBUTES VISIBLE
F2 50 100 150 200 1 2 3 4 5 6 7 8 9 10 F2 50 100 150 200 1 2 3 4 5 6 7 8 9 10
Data index
Loss HYPOTHESIS
MINIMISING EMPIRICAL RISK WITH SENSITIVE ATTRIBUTES VISIBLE
F3 50 100 150 200 1 2 3 4 5 6 7 8 9 10 F3 50 100 150 200 1 2 3 4 5 6 7 8 9 10
Data index
Loss HYPOTHESIS
MINIMISING EMPIRICAL RISK WITH SENSITIVE ATTRIBUTES VISIBLE
F4 50 100 150 200 1 2 3 4 5 6 7 8 9 10 F4 50 100 150 200 1 2 3 4 5 6 7 8 9 10
Data index
Loss HYPOTHESIS
MINIMISING EMPIRICAL RISK WITH SENSITIVE ATTRIBUTES VISIBLE
F5 50 100 150 200 1 2 3 4 5 6 7 8 9 10 F5 50 100 150 200 1 2 3 4 5 6 7 8 9 10
Data index
Loss HYPOTHESIS
Average loss of best hypothesis
MINIMISING EMPIRICAL RISK REINDEXING PER SENSITIVE ATTRIBUTE
MINIMISING EMPIRICAL RISK REINDEXING PER SENSITIVE ATTRIBUTE
50 100 150 200 1 4 7 10 2 6 8 3 5 9 50 100 150 200 1 4 7 10 2 6 8 3 5 9
MINIMISING EMPIRICAL RISK REINDEXING PER SENSITIVE ATTRIBUTE
F1 50 100 150 200 1 4 7 10 2 6 8 3 5 9 F1 50 100 150 200 1 4 7 10 2 6 8 3 5 9
Loss HYPOTHESIS
MINIMISING EMPIRICAL RISK REINDEXING PER SENSITIVE ATTRIBUTE
F2 50 100 150 200 1 4 7 10 2 6 8 3 5 9 F2 50 100 150 200 1 4 7 10 2 6 8 3 5 9
Loss HYPOTHESIS
MINIMISING EMPIRICAL RISK REINDEXING PER SENSITIVE ATTRIBUTE
F3 50 100 150 200 1 4 7 10 2 6 8 3 5 9 F3 50 100 150 200 1 4 7 10 2 6 8 3 5 9
Loss HYPOTHESIS
MINIMISING EMPIRICAL RISK REINDEXING PER SENSITIVE ATTRIBUTE
F4 50 100 150 200 1 4 7 10 2 6 8 3 5 9 F4 50 100 150 200 1 4 7 10 2 6 8 3 5 9
Loss HYPOTHESIS
MINIMISING EMPIRICAL RISK REINDEXING PER SENSITIVE ATTRIBUTE
F5 50 100 150 200 1 4 7 10 2 6 8 3 5 9 F5 50 100 150 200 1 4 7 10 2 6 8 3 5 9
Loss HYPOTHESIS
Average loss of best hypothesis
MINIMISING EMPIRICAL RISK REINDEXING PER SENSITIVE ATTRIBUTE
F5 50 100 150 200 1 4 7 10 2 6 8 3 5 9 F5 50 100 150 200 1 4 7 10 2 6 8 3 5 9
Loss HYPOTHESIS
Average loss of best hypothesis
MINIMISING EMPIRICAL RISK REINDEXING PER SENSITIVE ATTRIBUTE
F5 50 100 150 200 1 4 7 10 2 6 8 3 5 9 F5 50 100 150 200 1 4 7 10 2 6 8 3 5 9
Loss HYPOTHESIS
Average loss of best hypothesis
MINIMISING EMPIRICAL RISK REINDEXING PER SENSITIVE ATTRIBUTE
F5 50 100 150 200 1 4 7 10 2 6 8 3 5 9 F5 50 100 150 200 1 4 7 10 2 6 8 3 5 9
Loss HYPOTHESIS
MINIMISING EMPIRICAL RISK REINDEXING PER SENSITIVE ATTRIBUTE
F5 50 100 150 200 1 4 7 10 2 6 8 3 5 9 F5 50 100 150 200 1 4 7 10 2 6 8 3 5 9
Loss HYPOTHESIS
1 2 3
Sensitive Feature index
MINIMISING AGGREGATED EMPIRICAL RISK
MINIMISING AGGREGATED EMPIRICAL RISK
50 100 150 200 1 2 3 50 100 150 200 1 2 3
Sensitive Feature index
HYPOTHESIS
MINIMISING AGGREGATED EMPIRICAL RISK
F1 50 100 150 200 1 2 3 F1 50 100 150 200 1 2 3
Sensitive Feature index
HYPOTHESIS
MINIMISING AGGREGATED EMPIRICAL RISK
F2 50 100 150 200 1 2 3 F2 50 100 150 200 1 2 3
Sensitive Feature index
HYPOTHESIS
MINIMISING AGGREGATED EMPIRICAL RISK
F3 50 100 150 200 1 2 3 F3 50 100 150 200 1 2 3
Sensitive Feature index
HYPOTHESIS
MINIMISING AGGREGATED EMPIRICAL RISK
F4 50 100 150 200 1 2 3 F4 50 100 150 200 1 2 3
Sensitive Feature index
HYPOTHESIS
MINIMISING AGGREGATED EMPIRICAL RISK
F5 50 100 150 200 1 2 3 F5 50 100 150 200 1 2 3
Sensitive Feature index
HYPOTHESIS
MINIMISING AGGREGATED EMPIRICAL RISK
▸ Standard problem: minimise average risk
F5 50 100 150 200 1 2 3 F5 50 100 150 200 1 2 3
Sensitive Feature index
HYPOTHESIS
MINIMISING AGGREGATED EMPIRICAL RISK
▸ Standard problem: minimise average risk ▸ Equity problem: also take account of variation
F5 50 100 150 200 1 2 3 F5 50 100 150 200 1 2 3
Sensitive Feature index
HYPOTHESIS
MINIMISING AGGREGATED EMPIRICAL RISK
▸ Standard problem: minimise average risk ▸ Equity problem: also take account of variation ▸ Fairness problem: mixture of both
F5 50 100 150 200 1 2 3 F5 50 100 150 200 1 2 3
Sensitive Feature index
HYPOTHESIS Loss
MINIMISING AGGREGATED EMPIRICAL RISK AND DEVIATION
MINIMISING AGGREGATED EMPIRICAL RISK AND DEVIATION
▸ Trade off low deviation against higher average
MINIMISING AGGREGATED EMPIRICAL RISK AND DEVIATION
▸ Trade off low deviation against higher average
20 40 60 80 1 2 3 20 40 60 80 1 2 3
Loss
MINIMISING AGGREGATED EMPIRICAL RISK AND DEVIATION
▸ Trade off low deviation against higher average
F5 20 40 60 80 1 2 3 F5 20 40 60 80 1 2 3
Loss
MINIMISING AGGREGATED EMPIRICAL RISK AND DEVIATION
▸ Trade off low deviation against higher average
F6 20 40 60 80 1 2 3 F6 20 40 60 80 1 2 3
Loss
MINIMISING AGGREGATED EMPIRICAL RISK AND DEVIATION
▸ Trade off low deviation against higher average ▸ Let be the sensitive feature space
F6 20 40 60 80 1 2 3 F6 20 40 60 80 1 2 3
Loss
S = {1,2,3}
MINIMISING AGGREGATED EMPIRICAL RISK AND DEVIATION
▸ Trade off low deviation against higher average ▸ Let be the sensitive feature space ▸ For let be a r.v. (taking as the
sample space, with a uniform base measure)
F6 20 40 60 80 1 2 3 F6 20 40 60 80 1 2 3
𝖲f : S → ℝ
S
𝖲f: S ∋ s ↦ 𝔽(𝖸,𝖹) [ℓ(𝖹, f(𝖸))|𝖳 = s]
f ∈ ℱ
Loss
S = {1,2,3}
MINIMISING AGGREGATED EMPIRICAL RISK AND DEVIATION
▸ Trade off low deviation against higher average ▸ Let be the sensitive feature space ▸ For let be a r.v. (taking as the
sample space, with a uniform base measure)
▸ Standard ERM:
F6 20 40 60 80 1 2 3 F6 20 40 60 80 1 2 3
𝖲f : S → ℝ
S
min
f∈ℱ 𝔽(𝖲f)
𝖲f: S ∋ s ↦ 𝔽(𝖸,𝖹) [ℓ(𝖹, f(𝖸))|𝖳 = s]
f ∈ ℱ
Loss
S = {1,2,3}
MINIMISING AGGREGATED EMPIRICAL RISK AND DEVIATION
▸ Trade off low deviation against higher average ▸ Let be the sensitive feature space ▸ For let be a r.v. (taking as the
sample space, with a uniform base measure)
▸ Standard ERM: ▸ Fairness Augmented ERM:
F6 20 40 60 80 1 2 3 F6 20 40 60 80 1 2 3
𝖲f : S → ℝ
S
min
f∈ℱ 𝔽(𝖲f)
min
f∈ℱ 𝔽(𝖲f) + (𝖲f) = min f∈ℱ ℛ(𝖲f)
𝖲f: S ∋ s ↦ 𝔽(𝖸,𝖹) [ℓ(𝖹, f(𝖸))|𝖳 = s]
f ∈ ℱ
Loss
S = {1,2,3}
FAIRNESS RISK MEASURES ARE “REGULAR MEASURES OF RISK”
FAIRNESS RISK MEASURES ARE “REGULAR MEASURES OF RISK”
▸ Instead can start with axioms for Paper lists and justifies them
ℛ
FAIRNESS RISK MEASURES ARE “REGULAR MEASURES OF RISK”
▸ Instead can start with axioms for Paper lists and justifies them ▸ Then show that such fairness risk measures are “regular measures of risk”
▸ (In fact they are “coherent measures of risk’’)
ℛ
FAIRNESS RISK MEASURES ARE “REGULAR MEASURES OF RISK”
▸ Instead can start with axioms for Paper lists and justifies them ▸ Then show that such fairness risk measures are “regular measures of risk”
▸ (In fact they are “coherent measures of risk’’)
▸ Such measures can always be written as
ℛ(𝖲) = 𝔽(𝖲) + (𝖲)
Fairness risk measure Deviation measure
ℛ
FAIRNESS RISK MEASURES ARE “REGULAR MEASURES OF RISK”
▸ Instead can start with axioms for Paper lists and justifies them ▸ Then show that such fairness risk measures are “regular measures of risk”
▸ (In fact they are “coherent measures of risk’’)
▸ Such measures can always be written as ▸ Here is a “regular measure of deviation”
(i.e. convex, positively homogeneous, zero only when R is constant, and lower semicontinuous)
ℛ(𝖲) = 𝔽(𝖲) + (𝖲)
Fairness risk measure Deviation measure
ℛ
EXAMPLE RISK MEASURE AND CORRESPONDING DEVIATION MEASURE
EXAMPLE RISK MEASURE AND CORRESPONDING DEVIATION MEASURE
ℛQ,α(𝖺) = CVaRα(𝖺) Q,α(𝖺) = CVaRα(𝖺 − 𝔽(𝖺))
EXAMPLE RISK MEASURE AND CORRESPONDING DEVIATION MEASURE
▸ CVaR is the “Conditional Value at Risk”. ▸ When Z is continuous random variable: ▸ where is the th quantile of Z
ℛQ,α(𝖺) = CVaRα(𝖺) Q,α(𝖺) = CVaRα(𝖺 − 𝔽(𝖺))
CVaRα(𝖺) = 𝔽(𝖺|𝖺 ≥ qα(𝖺))
qα(𝖺)
α
EXAMPLE RISK MEASURE AND CORRESPONDING DEVIATION MEASURE
▸ CVaR is the “Conditional Value at Risk”. ▸ When Z is continuous random variable: ▸ where is the th quantile of Z ▸ Have
ℛQ,α(𝖺) = CVaRα(𝖺) Q,α(𝖺) = CVaRα(𝖺 − 𝔽(𝖺))
CVaRα(𝖺) = 𝔽(𝖺|𝖺 ≥ qα(𝖺))
qα(𝖺)
CVaR0(𝖺) = 𝔽(𝖺) and CVaR1(𝖺) = max(𝖺)
α
EXAMPLE RISK MEASURE AND CORRESPONDING DEVIATION MEASURE
▸ CVaR is the “Conditional Value at Risk”. ▸ When Z is continuous random variable: ▸ where is the th quantile of Z ▸ Have ▸ Fairness objective becomes (see paper, eq (26)):
ℛQ,α(𝖺) = CVaRα(𝖺) Q,α(𝖺) = CVaRα(𝖺 − 𝔽(𝖺))
CVaRα(𝖺) = 𝔽(𝖺|𝖺 ≥ qα(𝖺))
qα(𝖺)
CVaR0(𝖺) = 𝔽(𝖺) and CVaR1(𝖺) = max(𝖺)
α
min
f∈ℱ,ρ∈ℝ {ρ +
1 1 − α ⋅ 𝔽[𝖬(f) − ρ]+} .
AN INTERESTING LIMITING CASE - EACH PERSON IS THEIR OWN CATEGORY!
AN INTERESTING LIMITING CASE - EACH PERSON IS THEIR OWN CATEGORY!
50 100 150 200
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22
50 100 150 200
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22
Loss
AN INTERESTING LIMITING CASE - EACH PERSON IS THEIR OWN CATEGORY!
F1 50 100 150 200
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22
F1 50 100 150 200
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22
Loss
AN INTERESTING LIMITING CASE - EACH PERSON IS THEIR OWN CATEGORY!
F2 50 100 150 200
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22
F2 50 100 150 200
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22
Loss
AN INTERESTING LIMITING CASE - EACH PERSON IS THEIR OWN CATEGORY!
F3 50 100 150 200
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22
F3 50 100 150 200
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22
Loss
AN INTERESTING LIMITING CASE - EACH PERSON IS THEIR OWN CATEGORY!
F4 50 100 150 200
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22
F4 50 100 150 200
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22
Loss
AN INTERESTING LIMITING CASE - EACH PERSON IS THEIR OWN CATEGORY!
F5 50 100 150 200
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22
F5 50 100 150 200
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22
Loss
AN INTERESTING LIMITING CASE - EACH PERSON IS THEIR OWN CATEGORY!
▸ Consistent with the principle that fundamental moral unit is the individual person
F5 50 100 150 200
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22
F5 50 100 150 200
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22
Loss
AN INTERESTING LIMITING CASE - EACH PERSON IS THEIR OWN CATEGORY!
▸ Consistent with the principle that fundamental moral unit is the individual person ▸ Avoids headaches with group boundaries and multiple group membership
F5 50 100 150 200
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22
F5 50 100 150 200
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22
Loss
AN INTERESTING LIMITING CASE - EACH PERSON IS THEIR OWN CATEGORY!
▸ Consistent with the principle that fundamental moral unit is the individual person ▸ Avoids headaches with group boundaries and multiple group membership ▸ Fairness risk measures automatically extend to this case (trivially)
F5 50 100 150 200
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22
F5 50 100 150 200
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22
Loss
CONCLUSION
CONCLUSION
▸ New and general approach to fairness in ML problems
𝖲f: S ∋ s ↦ 𝔽(𝖸,𝖹) [ℓ(𝖹, f(𝖸))|𝖳 = s]
min
f∈ℱ ℛ(𝖲f)
CONCLUSION
▸ New and general approach to fairness in ML problems ▸ Fairness only depends upon losses, not predictions
𝖲f: S ∋ s ↦ 𝔽(𝖸,𝖹) [ℓ(𝖹, f(𝖸))|𝖳 = s]
min
f∈ℱ ℛ(𝖲f)
CONCLUSION
▸ New and general approach to fairness in ML problems ▸ Fairness only depends upon losses, not predictions ▸ Fairness risk measures are symmetric coherent measures of risk
𝖲f: S ∋ s ↦ 𝔽(𝖸,𝖹) [ℓ(𝖹, f(𝖸))|𝖳 = s]
min
f∈ℱ ℛ(𝖲f)
CONCLUSION
▸ New and general approach to fairness in ML problems ▸ Fairness only depends upon losses, not predictions ▸ Fairness risk measures are symmetric coherent measures of risk ▸ Close connection to measures of inequality (see appendix)
𝖲f: S ∋ s ↦ 𝔽(𝖸,𝖹) [ℓ(𝖹, f(𝖸))|𝖳 = s]
min
f∈ℱ ℛ(𝖲f)
CONCLUSION
▸ New and general approach to fairness in ML problems ▸ Fairness only depends upon losses, not predictions ▸ Fairness risk measures are symmetric coherent measures of risk ▸ Close connection to measures of inequality (see appendix) ▸ Computationally tractable; related to SVM! (see paper / poster for experiments)
𝖲f: S ∋ s ↦ 𝔽(𝖸,𝖹) [ℓ(𝖹, f(𝖸))|𝖳 = s]
min
f∈ℱ ℛ(𝖲f)
Humanising Machine Intelligence
Machine Learning Postdoc position available
hmi.anu.edu.au