Discrimination in Decision Making: Humans vs. Machines Muhammad - - PowerPoint PPT Presentation

discrimination in decision making humans vs machines
SMART_READER_LITE
LIVE PREVIEW

Discrimination in Decision Making: Humans vs. Machines Muhammad - - PowerPoint PPT Presentation

Discrimination in Decision Making: Humans vs. Machines Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez-Rodriguez, Krishna P. Gummadi Max Planck Institute for Software Systems Machine decision making q Refers to data-driven algorithmic


slide-1
SLIDE 1

Discrimination in Decision Making: Humans vs. Machines

Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez-Rodriguez, Krishna P. Gummadi Max Planck Institute for Software Systems

slide-2
SLIDE 2

Machine decision making

q Refers to data-driven algorithmic decision making

q By learning over data about past decisions

q To assist or replace human decision making q Increasingly being used in several domains

q Recruiting: Screening job applications q Banking: Credit ratings / loan approvals q Judiciary: Recidivism risk assessments q Journalism: News recommender systems

slide-3
SLIDE 3

The concept of discrimination

q Discrimination is a special type of unfairness q Well-studied in social sciences

q Political science q Moral philosophy q Economics q Law

q Majority of countries have anti-discrimination laws q Discrimination recognized in several international human rights laws

q But, less-studied from a computational perspective

slide-4
SLIDE 4

Why, a computational perspective?

  • 1. Datamining is increasingly being used to detect

discrimination in human decision making

q Examples: NYPD stop and frisk, Airbnb rentals

slide-5
SLIDE 5

Why, a computational perspective?

  • 2. Learning to avoid discrimination in data-driven

(algorithmic) decision making

q Aren’t algorithmic decisions inherently objective?

q In contrast to subjective human decisions

q Doesn’t that make them fair & non-discriminatory?

q Objective decisions can be unfair & discriminatory!

slide-6
SLIDE 6

Why, a computational perspective?

q Learning to avoid discrimination in data-driven

(algorithmic) decision making

q A priori discrimination in biased training data

q Algorithms will objectively learn the biases

q Learning objectives target decision accuracy over all users

q Ignoring outcome disparity for different sub-groups of users

slide-7
SLIDE 7

Our agenda: Two high-level questions

1.

How to detect discrimination in decision making?

q

Independently of who makes the decisions

q

Humans or machines

2.

How to avoid discrimination when learning?

q Can we make algorithmic decisions more fair? q If so, algorithms could eliminate biases in human decisions

q Controlling algorithms may be easier than retraining people

slide-8
SLIDE 8

This talk

1.

How to detect discrimination in decision making?

q

Independently of who makes the decisions

q

Humans or machines

2.

How to avoid discrimination when learning?

q Can we make algorithmic decisions more fair? q If so, algorithms could eliminate biases in human decisions

q Controlling algorithms may be easier than retraining people

slide-9
SLIDE 9

The concept of discrimination

q A first approximate normative / moralized definition:

wrongfully impose a relative disadvantage on persons based on their membership in some salient social group e.g., race or gender

slide-10
SLIDE 10

The concept of discrimination

q A first approximate normative / moralized definition:

wrongfully impose a relative disadvantage on persons based on their membership in some salient social group e.g., race or gender

slide-11
SLIDE 11

The devil is in the details

q What constitutes a salient social group?

q A question for political and social scientists

q What constitutes relative disadvantage?

q A question for economists and lawyers

q What constitutes a wrongful decision?

q A question for moral-philosophers

q What constitutes based on?

q A question for computer scientists

slide-12
SLIDE 12

Discrimination: A computational perspective

q Consider binary classification using user attributes

A1 A2 … Am User1

x1,1 x1,2 … x1,m

User2

x2,1 x2,m

User3

x3,1 x3,m

… …

Usern

xn,1 xn,2

xn,m

Decision

Accept Reject Reject … Accept

slide-13
SLIDE 13

Discrimination: A computational perspective

SA1 NSA2 … NSAm User1

x1,1 x1,2 … x1,m

User2

x2,1 x2,m

User3

x3,1 x3,m

… …

Usern

xn,1 xn,2

xn,m

Decision

Accept Reject Reject … Accept

q Consider binary classification using user attributes q Some attributes are sensitive, others non-sensitive

slide-14
SLIDE 14

Discrimination: A computational perspective

SA1 NSA2 … NSAm User1

x1,1 x1,2 … x1,m

User2

x2,1 x2,m

User3

x3,1 x3,m

… …

Usern

xn,1 xn,2

xn,m

Decision

Accept Reject Reject … Accept

q Consider binary classification using user attributes q Some attributes are sensitive, others non-sensitive

Decisions should not be based on sensitive attributes!

slide-15
SLIDE 15

What constitutes “not based on”?

q Most intuitive notion: Ignore sensitive attributes

q Fairness through blindness or veil of ignorance

q When learning, strip sensitive attributes from inputs q Avoids disparate treatment

q Same treatment for users with same non-sensitive attributes

q Irrespective of their sensitive attribute values q Situational testing for discrimination discovery checks for this condition

slide-16
SLIDE 16

Two problems with the intuitive notion

When users of different sensitive attribute groups have different non-sensitive feature distributions, we risk

1.

Disparate Mistreatment

q

Even when training data is unbiased, sensitive attribute groups might have different misclassification rates

2.

Disparate Impact

q

When labels in training data are biased, sensitive attribute groups might see different beneficial outcomes to different extents

q

Training data bias due to past discrimination

slide-17
SLIDE 17

1.

To learn, we define & optimize a risk (loss) function

q Over all examples in training data q Risk function captures inaccuracy in prediction q So learning is cast as an optimization problem

2.

For efficient learning (optimization)

q We define loss functions so that they are convex

Background: Two points about learning

slide-18
SLIDE 18

Origins of disparate mistreatment

SA1 NSA2 … NSAm User1

x1,1 x1,2 … x1,m

User2

x2,1 x2,m

User3

x3,1 x3,m

… …

Usern

xn,1 xn,2

xn,m

Decision

Accept Reject Reject … Accept

slide-19
SLIDE 19

Origins of disparate mistreatment

q Suppose users are of two types: blue and pink

SA1 NSA2 … NSAm User1

x1,1 x1,2 … x1,m

User2

x2,1 x2,m

User3

x3,1 x3,m

… …

Usern

xn,1 xn,2

xn,m

Decision

Accept Reject Reject … Accept

slide-20
SLIDE 20

Origins of disparate mistreatment

q Minimizing L(W), does not guarantee L(W) and L

(W) are equally minimized

q Blue users might have a different risk / loss than red users!

SA1 NSA2 … NSAm User1

x1,1 x1,2 … x1,m

User2

x2,1 x2,m

User3

x3,1 x3,m

… …

Usern

xn,1 xn,2

xn,m

Decision

Accept Reject Reject … Accept

slide-21
SLIDE 21

Origins of disparate mistreatment

q Minimizing L(W), does not guarantee L(W) and L

(W) are equally minimized

q Stripping sensitive attributes does not help!

SA1 NSA2 … NSAm User1

x1,1 x1,2 … x1,m

User2

x2,1 x2,m

User3

x3,1 x3,m

… …

Usern

xn,1 xn,2

xn,m

Decision

Accept Reject Reject … Accept

slide-22
SLIDE 22

Origins of disparate mistreatment

q Minimizing L(W), does not guarantee L(W) and L

(W) are equally minimized

q To avoid disp. mistreatment, we need L(W) = L(W)

SA1 NSA2 … NSAm User1

x1,1 x1,2 … x1,m

User2

x2,1 x2,m

User3

x3,1 x3,m

… …

Usern

xn,1 xn,2

xn,m

Decision

Accept Reject Reject … Accept

slide-23
SLIDE 23

Origins of disparate mistreatment

q Minimizing L(W), does not guarantee L(W) and L

(W) are equally minimized

q Put differently, we need:

SA1 NSA2 … NSAm User1

x1,1 x1,2 … x1,m

User2

x2,1 x2,m

User3

x3,1 x3,m

… …

Usern

xn,1 xn,2

xn,m

Decision

Accept Reject Reject … Accept

slide-24
SLIDE 24

Origins of disparate impact

SA1 NSA2 … NSAm User1

x1,1 x1,2 … x1,m

User2

x2,1 x2,m

User3

x3,1 x3,m

… …

Usern

xn,1 xn,2

xn,m

Decision

Accept Reject Reject … Accept

slide-25
SLIDE 25

Origins of disparate impact

SA1 NSA2 … NSAm User1

x1,1 x1,2 … x1,m

User2

x2,1 x2,m

User3

x3,1 x3,m

… …

Usern

xn,1 xn,2

xn,m

Decision

Accept Reject Accept … Reject

q Suppose training data has biased labels!

slide-26
SLIDE 26

Origins of disparate impact

SA1 NSA2 … NSAm User1

x1,1 x1,2 … x1,m

User2

x2,1 x2,m

User3

x3,1 x3,m

… …

Usern

xn,1 xn,2

xn,m

Decision

Accept Reject Accept … Reject

q Suppose training data has biased labels! q Classifier will learn to make biased decisions

q Using sensitive attributes (SAs)

slide-27
SLIDE 27

Origins of disparate impact

SA1 NSA2 … NSAm User1

x1,1 x1,2 … x1,m

User2

x2,1 x2,m

User3

x3,1 x3,m

… …

Usern

xn,1 xn,2

xn,m

Decision

Accept Reject Accept … Reject

q Suppose training data has biased labels! q Stripping SAs does not fully address the bias

slide-28
SLIDE 28

Origins of disparate impact

SA1 NSA2 … NSAm User1

x1,1 x1,2 … x1,m

User2

x2,1 x2,m

User3

x3,1 x3,m

… …

Usern

xn,1 xn,2

xn,m

Decision

Accept Reject Accept … Reject

q Suppose training data has biased labels! q Stripping SAs does not fully address the bias

q NSAs correlated with SAs will be given more / less weights q Learning tries to compensate for lost SAs

slide-29
SLIDE 29

Analogous to indirect discrimination

q Observed in human decision making q Indirectly discriminate against specific user groups

using their correlated non-sensitive attributes

q E.g., voter-id laws being passed in US states

q Notoriously hard to detect indirect discrimination

q In decision making scenarios without ground truth

slide-30
SLIDE 30

Detecting indirect discrimination

q Doctrine of disparate impact

q A US law applied in employment & housing practices

q Proportionality tests over decision outcomes

q E.g., in 70’s and 80’s, some US courts applied the 80% rule

for employment practices

q If 50% (P1%) of male applicants get selected at least 40% (P2%) of

female applicants must be selected

q UK uses P1 – P2; EU uses (1-P1) / (1-P2) q Fair proportion thresholds may vary across different domains

slide-31
SLIDE 31

A controversial detection policy

q Critics: There exist scenarios where disproportional

  • utcomes are justifiable

q Supporters: Provision for business necessity exists

q Though the burden of proof is on employers

q Law is necessary to detect indirect discrimination!

slide-32
SLIDE 32

Origins of disparate impact

SA1 NSA2 … NSAm User1

x1,1 x1,2 … x1,m

User2

x2,1 x2,m

User3

x3,1 x3,m

… …

Usern

xn,1 xn,2

xn,m

Decision

Accept Reject Accept … Reject

q Suppose training data has biased labels! q Stripping SAs does not fully address the bias

slide-33
SLIDE 33

Origins of disparate impact

SA1 NSA2 … NSAm User1

x1,1 x1,2 … x1,m

User2

x2,1 x2,m

User3

x3,1 x3,m

… …

Usern

xn,1 xn,2

xn,m

Decision

Accept Reject Accept … Reject

q Suppose training data has biased labels! q Stripping SAs does not fully address the bias q What if we required proportional outcomes?

slide-34
SLIDE 34

Origins of disparate impact

SA1 NSA2 … NSAm User1

x1,1 x1,2 … x1,m

User2

x2,1 x2,m

User3

x3,1 x3,m

… …

Usern

xn,1 xn,2

xn,m

Decision

Accept Reject Accept … Reject

q Suppose training data has biased labels! q Stripping SAs does not fully address the bias q Put differently, we need:

slide-35
SLIDE 35

Summary: 3 notions of discrimination

  • 1. Disparate treatment: Intuitive direct discrimination

q To avoid:

  • 2. Disparate impact: Indirect discrimination, when

training data is biased

q To avoid:

  • 3. Disparate mistreatment: Specific to machine learning

q To avoid:

slide-36
SLIDE 36

Learning to avoid discrimination

q Idea: Discrimination notions as constraints on learning q Optimize for accuracy under those constraints

slide-37
SLIDE 37

A few observations

q No free lunch: Additional constraints lower accuracy

q Tradeoff between accuracy & discrimination avoidance

q Might not need all constraints at the same time

q E.g., drop disp. impact constraint when no bias in data q When avoiding disp. impact / mistreatment, we could

achieve higher accuracy without disp. treatment

q i.e., by using sensitive attributes

slide-38
SLIDE 38

Key challenge

q How to learn efficiently under these constraints? q Problem: The above formulations are not convex!

q Can’t learn them efficiently

q Need to find a better way to specify the constraints

q So that loss function under constraints remains convex

slide-39
SLIDE 39

Disparate impact constraints: Intuition

Feature 1 Feature 2 Males Females Limit the differences in the acceptance (or rejection) ratios across members of different sensitive groups

slide-40
SLIDE 40

Disparate impact constraints: Intuition

Feature 1 Feature 2 Males Females Limit the differences in the average strength of acceptance and rejection across members of different sensitive groups

A proxy measure for

slide-41
SLIDE 41

Specifying disparate impact constraints

q Instead of requiring: q Bound covariance between items’ sensitive feature

values and their signed distance from classifier’s decision boundary to less than a threshold

slide-42
SLIDE 42

Learning classifiers w/o disparate impact

q Previous formulation: Non-convex, hard-to-learn q New formulation: Convex, easy-to-learn

slide-43
SLIDE 43

A few observations

q Our formulation can be applied to a variety of

decision boundary classifiers (& loss functions)

q hinge-loss, logistic loss, linear and non-linear SVM

q Works well on test data-sets

q Achieves proportional outcomes with low loss in accuracy

q Can easily change our formulation to optimize for

fairness under accuracy constraints

q Feasible to achieve disp. treatment & impact simultaneously

slide-44
SLIDE 44

Learning classifiers w/o disparate mistreatment

q Previous formulation: Non-convex, hard-to-learn

slide-45
SLIDE 45

Learning classifiers w/o disparate mistreatment

q New formulation: Convex-concave, can learn

efficiently using convex-concave programming

All misclassifications False positives False negatives

slide-46
SLIDE 46

Learning classifiers w/o disparate mistreatment

q New formulation: Convex-concave, can learn

efficiently using convex-concave programming

All misclassifications False positives False negatives

slide-47
SLIDE 47

A few observations

q Our formulation can be applied to a variety of

decision boundary classifiers (& loss functions)

q Can constrain for all misclassifications or for false

positives & only false negatives separately

q Works well on a real-world recidivism risk

estimation data-set

q Addressing a concern raised about COMPASS, a

commercial tool for recidivism risk estimation

slide-48
SLIDE 48

Summary: Discrimination through computational lens

q Defined three notions of discrimination

q disparate treatment / impact / mistreatment q They are applicable in different contexts

q Proposed mechanisms for mitigating each of them

q Formulate the notions as constraints on learning q Proposed measures that can be efficiently learned

slide-49
SLIDE 49

Future work: Beyond binary classifiers

q How to learn

q Non-discriminatory multi-class classification q Non-discriminatory regression q Non-discriminatory set selection q Non-discriminatory ranking

slide-50
SLIDE 50

Fairness beyond discrimination

q Consider today’s recidivism risk prediction tools

q They use features like personal criminal history, family

criminality, work & social environment

q Is using family criminality for risk prediction fair? q How can we reliably measure a social community’s sense

  • f fairness of using a feature in decision making?

q How can we account for such fairness measures when

making decisions?

slide-51
SLIDE 51

Beyond fairness: FATE of Algorithmic Decision Making

q Fairness: The focus of this talk q Accountability: Assigning responsibility for decisions

q Helps correct and improve decision making

q Transparency: Tracking the decision making process

q Helps build trust in decision making

q Explainability: Interpreting (making sense of) decisions

q Helps understand decision making

slide-52
SLIDE 52

Our works

q

Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez and Krishna P.

  • Gummadi. Fairness Constraints: A Mechanism for Fair Classification. In FATML,

2015.

q

Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez and Krishna P.

  • Gummadi. Fairness Beyond Disparate Treatment & Disparate Impact: Learning

Classification without Disparate Mistreatment. In FATML, 2016.

q

Miguel Ferreira, Muhammad Bilal Zafar, and Krishna P. Gummadi. The Case for Temporal Transparency: Detecting Policy Change Events in Black-Box Decision Making Systems. In FATML, 2016.

q

Nina Grgić-Hlača, Muhammad Bilal Zafar, Krishna P. Gummadi and Adrian Weller. The Case for Process Fairness in Learning: Feature Selection for Fair Decision

  • Making. In NIPS Symposium on ML and the Law, 2016.
slide-53
SLIDE 53

Related References

q

Dino Pedreshi, Salvatore Ruggieri and Franco Turini. Discrimination-aware Data

  • Mining. In Proc. KDD, 2008.

q

Faisal Kamiran and Toon Calders. Classifying Without Discriminating. In Proc. IC4, 2009.

q

Faisal Kamiran and Toon Calders. Classification with No Discrimination by Preferential Sampling. In Proc. BENELEARN, 2010.

q

Toon Calders and Sicco Verwer. Three Naive Bayes Approaches for Discrimination-Free Classification. In Data Mining and Knowledge Discovery, 2010.

q

Indrė Žliobaitė, Faisal Kamiran and Toon Calders. Handling Conditional

  • Discrimination. In Proc. ICDM, 2011.

q

Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh and Jun Sakuma. Fairness- aware Classifier with Prejudice Remover Regularizer. In PADM, 2011.

q

Binh Thanh Luong, Salvatore Ruggieri and Franco Turini. k-NN as an Implementation of Situation Testing for Discrimination Discovery and Prevention. In Proc. KDD, 2011.

slide-54
SLIDE 54

Related References

q

Faisal Kamiran, Asim Karim and Xiangliang Zhang. Decision Theory for Discrimination-aware Classification. In Proc. ICDM, 2012.

q

Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold and Rich Zemel. Fairness Through Awareness. In Proc. ITCS, 2012.

q

Sara Hajian and Josep Domingo-Ferrer. A Methodology for Direct and Indirect Discrimination Prevention in Data Mining. In TKDE, 2012.

q

Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, Cynthia Dwork. Learning Fair

  • Representations. In ICML, 2013.

q

Andrea Romei, Salvatore Ruggieri. A Multidisciplinary Survey on Discrimination

  • Analysis. In KER, 2014.

q

Michael Feldman, Sorelle Friedler, John Moeller, Carlos Scheidegger, Suresh

  • Venkatasubramanian. Certifying and Removing Disparate Impact. In Proc. KDD,

2015.

q

Moritz Hardt, Eric Price, Nathan Srebro. Equality of Opportunity in Supervised

  • Learning. In Proc. NIPS, 2016.

q

Jon Kleinberg, Sendhil Mullainathan, Manish Raghavan. Inherent Trade-Offs in the Fair Determination of Risk Scores. In FATML, 2016.