Measuring Non-Expert Comprehension of Machine Learning Fairness - - PowerPoint PPT Presentation

measuring non expert comprehension of machine learning
SMART_READER_LITE
LIVE PREVIEW

Measuring Non-Expert Comprehension of Machine Learning Fairness - - PowerPoint PPT Presentation

Measuring Non-Expert Comprehension of Machine Learning Fairness Metrics Debjani Saha , Candice Schumann, Duncan C. McElfresh, John P. Dickerson, Michelle L. Mazurek, Michael Carl Tschantz 37th International Conference on Machine Learning (ICML)


slide-1
SLIDE 1

Measuring Non-Expert Comprehension of Machine Learning Fairness Metrics

Debjani Saha, Candice Schumann, Duncan C. McElfresh, John P. Dickerson, Michelle L. Mazurek, Michael Carl Tschantz

37th International Conference on Machine Learning (ICML) July 12-18th, 2020

1

slide-2
SLIDE 2

Motivation

2

slide-3
SLIDE 3

Fairness in ML is a growing issue

  • Plenty of current news articles on bias

in machine learning

  • Many companies are focusing on bias,

fairness, and explainability

○ Google What-If Tool ○ IBM AI Fairness 360 ○ NSF Program on Fairness in AI in Collaboration with Amazon

  • Technical solutions are being

pursued...

3 Berkeley CS294 slides: Fairness in Machine Learning: CS 294

slide-4
SLIDE 4

Many fairness definitions are developed by ML experts using lots of math...

  • Statistical parity
  • Accuracy/error rates
  • Causality

How is ML fairness defined?

4

slide-5
SLIDE 5

Many fairness definitions are developed by ML experts using lots of math...

  • Statistical parity
  • Accuracy/error rates
  • Causality

… but are largely used by and impact non-ML experts in diverse settings including:

  • Hiring
  • Education
  • Criminal justice

Who ultimately uses ML fairness?

5

slide-6
SLIDE 6

What needs to be done?

6

How can we decide which definitions are appropriate in different real-world settings, if any?

slide-7
SLIDE 7

Our Contribution

7

How can we decide which definitions are appropriate in different real-world settings, if any? Does the general public understand mathematical definitions

  • f ML fairness and their behavior in real-world settings?
slide-8
SLIDE 8

Why non-experts?

  • Understand how people who will be impacted by ML decisions perceive

these fairness definitions

8

slide-9
SLIDE 9

Why non-experts?

  • Understand how people who will be impacted by ML decisions perceive

these fairness definitions

  • Importance of considering all stakeholders

9

slide-10
SLIDE 10

Research Questions

Can we develop a metric to measure lay understanding of ML fairness definitions?

10

slide-11
SLIDE 11

Research Questions

Can we develop a metric to measure lay understanding of ML fairness definitions? Does a non-expert audience comprehend ML fairness definitions and their implications?

11

slide-12
SLIDE 12

Research Questions

Can we develop a metric to measure lay understanding of ML fairness definitions? Does a non-expert audience comprehend ML fairness definitions and their implications?

  • What factors play a role in comprehension?

12

slide-13
SLIDE 13

Research Questions

Can we develop a metric to measure lay understanding of ML fairness definitions? Does a non-expert audience comprehend ML fairness definitions and their implications?

  • What factors play a role in comprehension?
  • How are comprehension and sentiment related?

13

slide-14
SLIDE 14

Survey Design

We assess the following ML fairness definitions in our survey:

  • Demographic parity
  • Equal opportunity (FPR, FNR)
  • Equalized odds

14

slide-15
SLIDE 15

P(Y | A=0) = P(Y | A=1)

15

Demographic Parity

slide-16
SLIDE 16

Equal Opportunity (FPR)

16

P(Ŷ=1 | A=0, Y=0) = P(Ŷ=1 | A=1, Y=0)

slide-17
SLIDE 17

Equal Opportunity (FNR)

17

P(Ŷ=0 | A=0, Y=1) = P(Ŷ=0 | A=1, Y=1)

slide-18
SLIDE 18

Equalized Odds

18

P(Ŷ=0 | A=0, Y=1) = P(Ŷ=0 | A=1, Y=1) P(Ŷ=1 | A=0, Y=0) = P(Ŷ=1 | A=1, Y=0)

slide-19
SLIDE 19

Survey Design

Participants are presented with a decision-making scenario, along with a rule to ensure that the decisions are made fairly

19

slide-20
SLIDE 20

Survey Design

Participants are presented with a decision-making scenario, along with a rule to ensure that the decisions are made fairly “A hiring manager at a new sales company is reviewing 100 new job applications.”

20

slide-21
SLIDE 21

Survey Design

Participants are presented with a decision-making scenario, along with a rule to ensure that the decisions are made fairly “A hiring manager at a new sales company is reviewing 100 new job applications.” “The fraction of applicants who receive job offers that are female should equal the fraction of applicants that are female. Similarly, fraction of applicants who receive job offers that are male should equal the fraction of applicants that are male.”

21

slide-22
SLIDE 22

Survey Design

Participants are presented with a decision-making scenario, along with a rule to ensure that the decisions are made fairly “A hiring manager at a new sales company is reviewing 100 new job applications.” “The fraction of applicants who receive job offers that are female should equal the fraction of applicants that are female. Similarly, fraction of applicants who receive job offers that are male should equal the fraction of applicants that are male.”

22

demographic parity

slide-23
SLIDE 23

Survey Design

Survey contains 18 questions:

23

slide-24
SLIDE 24

Survey Design

Survey contains 18 questions: 2 questions concerning participant evaluation of the scenario

24

slide-25
SLIDE 25

Survey Design

Survey contains 18 questions: 2 questions concerning participant evaluation of the scenario 9 comprehension questions about the fairness rule

25

slide-26
SLIDE 26

Survey Design

Survey contains 18 questions: 2 questions concerning participant evaluation of the scenario 9 comprehension questions about the fairness rule 2 self-report questions on participant understanding and use of the rule

26

slide-27
SLIDE 27

Survey Design

Survey contains 18 questions: 2 questions concerning participant evaluation of the scenario 9 comprehension questions about the fairness rule 2 self-report questions on participant understanding and use of the rule 2 self-report questions on participant liking of and agreement with the rule

27

slide-28
SLIDE 28

Survey Design

Survey contains 18 questions: 2 questions concerning participant evaluation of the scenario 9 comprehension questions about the fairness rule 2 self-report questions on participant understanding and use of the rule 2 self-report questions on participant liking of and agreement with the rule 3 free-response questions on comprehension and opinion of the rule

28

slide-29
SLIDE 29

Survey Design

Survey contains 18 questions: 2 questions concerning participant evaluation of the scenario 9 comprehension questions about the fairness rule 2 self-report questions on participant understanding and use of the rule 2 self-report questions on participant liking of and agreement with the rule 3 free-response questions on comprehension and opinion of the rule

29

slide-30
SLIDE 30

Survey Design

Survey contains 18 questions: 2 questions concerning participant evaluation of the scenario 9 comprehension questions about the fairness rule 2 self-report questions on participant understanding and use of the rule 2 self-report questions on participant liking of and agreement with the rule 3 free-response questions on comprehension and opinion of the rule

30

COMPREHENSION SCORE

slide-31
SLIDE 31

Participant Demographics

349 participants Recruited through a web panel to approximate US distributions on race, age, gender, and education (2017 census)

31

slide-32
SLIDE 32

Research Question 1

Can we develop a metric to measure lay understanding of ML fairness definitions? Does a non-expert audience comprehend ML fairness definitions and their implications?

  • What factors play a role in comprehension?
  • How are comprehension and sentiment related?

32

slide-33
SLIDE 33

Our metric effectively measures comprehension

We confirm this using two different measures…

33

slide-34
SLIDE 34

“In your own words, explain the rule.”

34

Our metric effectively measures comprehension

slide-35
SLIDE 35

“In your own words, explain the rule.”

35

Our metric effectively measures comprehension

slide-36
SLIDE 36

“In your own words, explain the rule.”

36

Our metric effectively measures comprehension

slide-37
SLIDE 37

“In your own words, explain the rule.”

37

Our metric effectively measures comprehension

slide-38
SLIDE 38

“In your own words, explain the rule.”

38

Our metric effectively measures comprehension

slide-39
SLIDE 39

“What did you use to answer the questions?”

39

Our metric effectively measures comprehension

slide-40
SLIDE 40

“What did you use to answer the questions?”

40

Our metric effectively measures comprehension

slide-41
SLIDE 41

We confirm this using two different measures… 1. Greater ability to explain the rule is associated with higher comprehension score 2. Self-reported compliance with the rule is associated with higher comprehension score

41

Our metric effectively measures comprehension

slide-42
SLIDE 42

Research Question 2a

Can we develop a metric to measure lay understanding of ML fairness definitions? Does a non-expert audience comprehend ML fairness definitions and their implications?

  • What factors play a role in comprehension?
  • How are comprehension and sentiment related?

42

slide-43
SLIDE 43

Education predicts performance

Higher education is associated with higher comprehension score

43

slide-44
SLIDE 44

Fairness definition predicts performance

Equal opportunity (FNR) was associated with lower comprehension score

44

slide-45
SLIDE 45

Fairness definition predicts performance

Equal opportunity (FNR) was associated with lower comprehension score

45

slide-46
SLIDE 46

Comprehension

Comprehension is best predicted by two factors 1. Higher education level (Bachelor’s and above) predicts better comprehension 2. Fairness definition itself can affect comprehension (participants whose survey focused on FNR had lower comprehension)

46

slide-47
SLIDE 47

Research Question 2b

Can we develop a metric to measure lay understanding of ML fairness definitions? Does a non-expert audience comprehend ML fairness definitions and their implications?

  • What factors play a role in comprehension?
  • How are comprehension and sentiment related?

47

slide-48
SLIDE 48

“To what extent do you agree with the following statement: I like the hiring rule?”

Those who understand the rule dislike it

48

slide-49
SLIDE 49

“To what extent do you agree with the following statement: I like the hiring rule?”

Those who understand the rule dislike it

49

slide-50
SLIDE 50

“To what extent do you agree with the following statement: I like the hiring rule?”

Those who understand the rule dislike it

50

Dislike Like

slide-51
SLIDE 51

“To what extent do you agree with the following statement: I agree with the hiring rule?”

Those who understand the rule disagree with it

51

slide-52
SLIDE 52

“To what extent do you agree with the following statement: I agree with the hiring rule?”

Those who understand the rule disagree with it

52

slide-53
SLIDE 53

“To what extent do you agree with the following statement: I agree with the hiring rule?”

Those who understand the rule disagree with it

53

Disagree Agree

slide-54
SLIDE 54

Sentiment

Negative sentiment (disliking/disagreement) towards the rule is associated with higher comprehension score

54

slide-55
SLIDE 55

Sentiment

Negative sentiment (disliking/disagreement) towards the rule is associated with higher comprehension score This may suggest that those who understand the rule see its pitfalls

55

slide-56
SLIDE 56

Sentiment

Negative sentiment (disliking/disagreement) towards the rule is associated with higher comprehension score This may suggest that those who understand the rule see its pitfalls Lower education level (~70% US population) predicts lower comprehension

56

slide-57
SLIDE 57

Sentiment

Negative sentiment (disliking/disagreement) towards the rule is associated with higher comprehension score This may suggest that those who understand the rule see its pitfalls Lower education level (~70% US population) predicts lower comprehension Incentivizes companies to obscure their algorithms

57

slide-58
SLIDE 58

Summary

Can we develop a metric to measure lay understanding of ML fairness definitions? Does a non-expert audience comprehend ML fairness definitions and their implications?

  • What factors play a role in comprehension?
  • How are comprehension and sentiment related?

58

slide-59
SLIDE 59

Summary

Can we develop a metric to measure lay understanding of ML fairness definitions? Yes Does a non-expert audience comprehend ML fairness definitions and their implications?

  • What factors play a role in comprehension?
  • How are comprehension and sentiment related?

59

slide-60
SLIDE 60

Summary

Can we develop a metric to measure lay understanding of ML fairness definitions? Yes Does a non-expert audience comprehend ML fairness definitions and their implications? It depends...

  • What factors play a role in comprehension?
  • How are comprehension and sentiment related?

60

slide-61
SLIDE 61

Summary

Can we develop a metric to measure lay understanding of ML fairness definitions? Yes Does a non-expert audience comprehend ML fairness definitions and their implications? It depends...

  • What factors play a role in comprehension?

Higher education predicts better comprehension

  • How are comprehension and sentiment related?

61

slide-62
SLIDE 62

Summary

Can we develop a metric to measure lay understanding of ML fairness definitions? Yes Does a non-expert audience comprehend ML fairness definitions and their implications? It depends...

  • What factors play a role in comprehension?

Higher education predicts better comprehension

  • How are comprehension and sentiment related?

Better comprehension is associated with greater negative sentiment towards the rule

62

slide-63
SLIDE 63

Acknowledgements

Funding for this project was provided by the NSF and Google

63

slide-64
SLIDE 64

Summary

Can we develop a metric to measure lay understanding of ML fairness definitions? Yes Does a non-expert audience comprehend ML fairness definitions and their implications? It depends...

  • What factors play a role in comprehension?

Higher education predicts better comprehension

  • How are comprehension and sentiment related?

Better comprehension is associated with greater negative sentiment towards the rule

64

Debjani Saha dsaha@cs.umd.edu

slide-65
SLIDE 65

65

slide-66
SLIDE 66

Participant Demographics

_____________________________________________ Percent of Sample Study-1 Study-2

___________________________________________________________________________________________________

Gender Male 51.0 40.7 Female 48.3 58.2 Other 0.3 Prefer not to answer 0.7 0.9 _____________________________________________

66

___________________________________________________ Percent of Sample Census Study-1 Study-2

_________________________________________________________________________________________________________________

Ethnicity * AI or AN 0.7 0.7 0.9 Asian or NH or PI 5.7 1.4 2.3 Black or AA 12.3 10.2 15.8 Hispanic or Latinx 18.1 12.2 7.7 Other 2.6 2.7 1.4 White 60.6 72.8 71.9 Education Level Less than HS 12.1 6.1 6.9 HS or equivalent 27.7 29.9 24.9 Some post-secondary 30.8 30.6 24.9 Bachelor’s and above 29.4 33.3 42.7 ___________________________________________________

* Ethnicity AI = American Indian, AN = Alaska Native, NH = Native Hawaiian, PI = Pacific Islander, AA = African American

_____________________________________________ Mean (SD) Study-1 Study-2

___________________________________________________________________________________________________

Age 46 (16) 45 (15) _____________________________________________

slide-67
SLIDE 67

Non-compliance is Associated with Reduced Comprehension

Non-compliant participants tend to report worse understanding of the rule

67

slide-68
SLIDE 68

Non-compliance is Associated with Reduced Comprehension

Non-compliant participants tend to be less able to explain the rule

68

slide-69
SLIDE 69

Non-compliant participants tend to report less negative sentiment (disliking of the rule)

69

Non-compliance is Associated with Less Negative Sentiment