Average Individual Fairness Aaron Roth Based on Joint Work with: - - PowerPoint PPT Presentation

average individual fairness
SMART_READER_LITE
LIVE PREVIEW

Average Individual Fairness Aaron Roth Based on Joint Work with: - - PowerPoint PPT Presentation

Average Individual Fairness Aaron Roth Based on Joint Work with: Michael Kearns and Saeed Sharifimalvajerdi SAT Score GPA Population 1 Population 2 SAT Score GPA Population 1 Population 2 SAT Score GPA Population 1 Population 2


slide-1
SLIDE 1

Average Individual Fairness

Aaron Roth

Based on Joint Work with: Michael Kearns and Saeed Sharifimalvajerdi

slide-2
SLIDE 2

SAT Score GPA

Population 1 Population 2

slide-3
SLIDE 3

Population 1 Population 2

SAT Score GPA

slide-4
SLIDE 4

Population 1 Population 2

SAT Score GPA

slide-5
SLIDE 5

Why was the classifier “unfair”?

Question: Who was harmed? Possible Answer: The qualified applicants mistakenly rejected. False Negative Rate: The rate at which harm is done. Fairness: Equal false negative rates across groups?

[Chouldechova], [Hardt, Price, Srebro], [Kleinberg, Mullainathan, Raghavan] Statistical Fairness Definitions: 1. Partition the world into groups (often according to a “protected attribute”) 2. Pick your favorite statistic of a classifier. 3. Ask that the statistic be (approximately) equalized across groups.

slide-6
SLIDE 6

But…

  • A classifier equalizes false negative rates. What does it promise you?
  • The rate in false negative rate assumes you are a uniformly random member
  • f your population.
  • If you have reason to believe otherwise, it promises you nothing…
slide-7
SLIDE 7

For example

  • Protected subgroups: “Men”, “Women”, “Blue”, “Green”. Labels are independent of attributes.
  • The following allocation equalizes false negative rates across all four groups.

Blue Green Male Female

slide-8
SLIDE 8

Sometimes individuals are subject to more than one classification task…

slide-9
SLIDE 9

The Idea

  • Postulate a distribution over problems and individuals.
  • Ask for a mapping between problems and classifiers that equalizes

false negative rates across every pair of individuals.

  • Redefine rate:

Averaged over the problem distribution. An individual definition of fairness.

slide-10
SLIDE 10

A Formalization

  • An unknown distribution 𝑄 over individuals 𝑦𝑗 ∈ 𝑌
  • An unknown distribution 𝑅 over problems 𝑔

𝑘: 𝑌 → {0,1}, 𝑔 𝑘 ∈ 𝐺

  • A hypothesis class 𝐼 ⊆ 0,1 𝑌 (Note 𝑔

𝑘’s not necessarily in 𝐼)

  • Task: Find a mapping from problems to hypotheses 𝜔 ∈ Δ𝐼 𝐺
  • A new “problem” will be represented as a new labelling of the training set.
  • Finding the hypothesis corresponding to a new problem shouldn’t require

resolving old problems. (Allows online decision making)

slide-11
SLIDE 11

What to Hope For (Computationally)

  • Machine learning learning is already computationally hard

[KSS92,KS08,FGKP09,FGPW14,…] even for simple classes like halfspaces.

  • So we shouldn’t hope for an algorithm with worst-case guarantees…
  • But we might hope for an efficient reduction to unconstrained (weighted)

learning problems.

  • “Oracle Efficient Algorithms”
  • This design methodology often results in practical algorithms.
slide-12
SLIDE 12

Computing the Optimal Empirical Solution.

Initialize 𝜇𝑗

1 = 1/𝑜 for each 𝑗 ∈ {1, … , 𝑜}

For 𝑢 = 1 to 𝑈 = 𝑃

log 𝑜 𝜗2

  • Learner Best Responds:
  • For each problem 𝑘, solve the learning problem ℎ𝑘

𝑢 = 𝐵(𝑇 𝑘 𝑢) for 𝑇 𝑘 𝑢 =

𝜇𝑗

𝑢 + 1 𝑜 , 𝑦𝑗, 𝑔 𝑘 𝑦𝑗 𝑗=1 𝑜

  • Set 𝛿𝑢 = 𝟐[σ𝑗

𝑜 𝜇𝑗 𝑢 ≥ 0]

  • Auditor Updates Weights:
  • Multiply 𝜇𝑗

𝑢 by (𝑓𝑠𝑠 𝑦𝑗, ℎ𝑢, ෠

𝑅 − 𝛿) for each expert 𝑗 and renormalize to get updated weights 𝜇𝑗

𝑢+1.

Output the weights 𝜇𝑗

𝑢 for each person 𝑗 and step 𝑢.

slide-13
SLIDE 13

Defining 𝜔

  • Parameterized by the sequence of dual variables 𝜇𝑈 = 𝜇𝑢

𝑢=1 𝑈

𝜔𝜇𝑈 𝑔 :

For 𝑢 = 1 to T

  • Solve the learning problem ℎ𝑢 = 𝐵(𝑇𝑢) for 𝑇𝑢 =

𝜇𝑗

𝑢 + 1 𝑜 , 𝑦𝑗, 𝑔 𝑦𝑗 𝑗=1 𝑜

Output 𝑞𝑔 ∈ Δ𝐼 where 𝑞𝑔 is uniform over ℎ𝑢

𝑢=1 𝑈

(Consistent with ERM solution)

slide-14
SLIDE 14

Computing the Optimal Empirical Solution.

Theorem: After 𝑃 𝑛 ⋅

log 𝑜 𝜗2

calls to the learning oracle, the algorithm returns a solution 𝑞 ∈ Δ𝐼 𝑛 that achieves empirical error at most: 𝑃𝑄𝑈 𝛽, ෠ 𝑄, ෠ 𝑅 + 𝜗 and satisfies for every 𝑗, 𝑗′ ∈ {1, … 𝑜}: 𝐺𝑂 𝑦𝑗, 𝑞, ෠ 𝑅 − 𝐺𝑂 𝑦𝑗′, 𝑞, ෠ 𝑅 ≤ 𝛽 + 𝜗

slide-15
SLIDE 15

Generalization: Two Directions

S ෠ 𝑄 𝑦1 ⋮ 𝑦𝑜 𝑔

1

… 𝑔

𝑛

෠ 𝑅 𝑅 𝑄 S’

slide-16
SLIDE 16

Generalization

Theorem: Assuming 1) 𝑛 ≥ poly log 𝑜 ,

1 𝜗 , log 1 𝜀 ,

2) 𝑜 ≥ 𝑞𝑝𝑚𝑧 𝑛, 𝑊𝐷𝐸𝐽𝑁 𝐼 ,

1 𝜗 , 1 𝛾 , log 1 𝜀

the algorithm returns a solution 𝜔 that with probability 1 − 𝜀 achieves error at most: 𝑃𝑄𝑈 𝛽, 𝑄, 𝑅 + 𝜗 and is such that with probability 1 − 𝛾 over 𝑦, 𝑦′ ∼ 𝑄: 𝐺𝑂 𝑦, 𝜔, 𝑅 − 𝐺𝑂 𝑦′, 𝜔, 𝑅 ≤ 𝛽 + 𝜗

slide-17
SLIDE 17

Does it work?

  • It is important to experimentally verify “oracle efficient” algorithms,

since it is possible to abuse the model.

  • E.g. use learning oracle as an arbitrary NP oracle.
  • A brief “Sanity Check” experiment:
  • Dataset: Communities and Crime
  • First 50 features are designated as “problems” (i.e. labels to predict)
  • Remaining features treated as features for learning.
slide-18
SLIDE 18
slide-19
SLIDE 19

Takeaways

  • We should think carefully about what definitions of “fairness” really

promise to individuals.

  • Making promises to individuals is sometimes possible, even without

making heroic assumptions.

  • Once we fix a definition, there is often an interesting algorithm design

problem.

  • Once we have an algorithm, we can have the tools to explore

inevitable tradeoffs.

slide-20
SLIDE 20

Thanks!

Average Individual Fairness: Algorithms, Generalization and Experiments Michael Kearns, Aaron Roth, Saeed Sharifimalvajerdi Shameless book plug: The Ethical Algorithm Michael Kearns and Aaron Roth