Understanding the Origins of Bias in Word Embeddings Marc-Etienne - - PowerPoint PPT Presentation

understanding the origins of bias in word embeddings
SMART_READER_LITE
LIVE PREVIEW

Understanding the Origins of Bias in Word Embeddings Marc-Etienne - - PowerPoint PPT Presentation

Understanding the Origins of Bias in Word Embeddings Marc-Etienne Brunet Colleen Alkalay-Houlihan Ashton Anderson Richard Zemel Introduction Graduate student at U of T (Vector Institute) Algorithmic NLP Bias Work at the intersection of


slide-1
SLIDE 1

Understanding the Origins of Bias in Word Embeddings

Marc-Etienne Brunet Colleen Alkalay-Houlihan Ashton Anderson Richard Zemel

slide-2
SLIDE 2

Introduction

Graduate student at U of T (Vector Institute) Work at the intersection of Bias, Explainability, and Natural Language Processing Collaborated with Colleen Alkalay-Houlihan Supervised by Ashton Anderson and Richard Zemel NLP Algorithmic Bias Explainability

slide-3
SLIDE 3

Many Forms of Algorithmic Bias

For example:

  • Facial Recognition
  • Automated Hiring
  • Criminal Risk Assessment
  • Word Embeddings
slide-4
SLIDE 4

Many Forms of Algorithmic Bias

For example:

  • Facial Recognition
  • Automated Hiring
  • Criminal Risk Assessment
  • Word Embeddings
slide-5
SLIDE 5

How can we attribute the bias in word embeddings to the individual documents in their training corpora?

slide-6
SLIDE 6

> Background Method Overview Critical Details Experiments

slide-7
SLIDE 7

Word Embeddings: Definitions in Vector Space

cleaner cleaning leader leading

Definitions encode relationships between words

slide-8
SLIDE 8

Word Embeddings: Definitions in Vector Space

cleaner cleaning leader leading

Definitions encode relationships between words

slide-9
SLIDE 9

Word Embeddings: Definitions in Vector Space

cleaner cleaning leader leading role action

Definitions encode relationships between words

slide-10
SLIDE 10

Problematic Definitions in Vector Space

cleaner woman leader man

Definitions encode relationships between words

slide-11
SLIDE 11

Problematic Definitions in Vector Space

cleaner woman leader man man a woman

Definitions encode relationships between words Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai (NeurIPS 2016)

male female

slide-12
SLIDE 12

Measuring Bias in Word Embeddings

How can we measure bias in word embeddings?

T = cleaner B = woman S = leader A = man

slide-13
SLIDE 13

Measuring Bias in Word Embeddings

Implicit Association Test (IAT)

T = cleaner B = woman S = leader A = man T = cleaner B = woman S = leader A = man

slide-14
SLIDE 14

Measuring Bias in Word Embeddings

T = cleaner B = woman S = leader A = man

Implicit Association Test (IAT)

slide-15
SLIDE 15

Measuring Bias in Word Embeddings

T = cleaner B = woman S = leader A = man

Implicit Association Test (IAT)

slide-16
SLIDE 16

Measuring Bias in Word Embeddings

Aylin Caliskan, Joanna J. Bryson, Arvind Narayanan (Science 2017)

T = cleaner B = woman S = leader A = man

AssociationS,A ≈ ΣS,A cos(s,a) Word Embedding Association Test (WEAT) Implicit Association Test (IAT)

T = cleaner B = woman S = leader A = man

slide-17
SLIDE 17

Measuring Bias

Aylin Caliskan, Joanna J. Bryson, Arvind Narayanan (Science 2017)

WEAT on popular corpora matches IAT study results

IAT WEAT Target Words Attribute Words effect size p-val effect size p-val Flowers v.s. Insects Pleasant v.s. Unpleasant 1.35 1.0E-08 1.5 1.0E-07 Math v.s. Arts Male v.s. Female Terms 0.82 1.0E-02 1.06 1.8E-02

... ... ... ...

slide-18
SLIDE 18

Measuring Bias

Aylin Caliskan, Joanna J. Bryson, Arvind Narayanan (Science 2017)

WEAT on popular corpora matches IAT study results

IAT WEAT Target Words Attribute Words effect size p-val effect size p-val Flowers v.s. Insects Pleasant v.s. Unpleasant 1.35 1.0E-08 1.5 1.0E-07 Math v.s. Arts Male v.s. Female Terms 0.82 1.0E-02 1.06 1.8E-02

... ... ... ... “Semantics derived automatically from language corpora contain human-like biases”

slide-19
SLIDE 19

Background > Method Overview Critical Details Experiments

slide-20
SLIDE 20

How can we attribute the bias in word embeddings to the individual documents in their training corpora?

slide-21
SLIDE 21

From Word2Bias

Docn

GloVe

Male Career Female Family

B(w(X)) Bias Measured X : Corpus (e.g. Wikipedia) { wi } = w(X) Word Embedding WEAT

slide-22
SLIDE 22

Differential Bias

Docn Dock removal Idea: Consider the differential contribution

  • f each document

∆B

slide-23
SLIDE 23

Differential Bias

Docn Dock ∆B

Document ID ∆B 1

  • 0.0014

2 0.0127 ... ... k 0.0374 ... ... n 0.0089

Bias Attributed

slide-24
SLIDE 24

Differential Bias

Docn Dock

Document ID ∆B Year Author 1

  • 0.0014

2 0.0127 ... ... k 0.0374 ? ? ... ... n 0.0089

Analyse Metadata?

slide-25
SLIDE 25

Bias Gradient

Docn

GloVe

Male Career Female Family

B(w(X)) Bias Measured X : Corpus (e.g. Wikipedia) { wi } = w(X) Word Embedding WEAT

slide-26
SLIDE 26

Bias Gradient

Docn

GloVe

Male Career Female Family

B(w(X)) Bias Measured X : Corpus (e.g. Wikipedia) { wi } = w(X) Word Embedding WEAT

slide-27
SLIDE 27

Background Method Overview > Critical Details Experiments

slide-28
SLIDE 28

Computing the Components

Fast & Easy: Math, Automatic Differentiation, or two evaluations of B(w). Slow & Hard: Differentiate through an entire training procedure:

  • Leave-one-out retraining? (time-bound)
  • Backprop? (memory-bound)
  • Approximate using Influence Functions

Koh & Liang (ICML 2017)

slide-29
SLIDE 29

Computing the Components

Fast & Easy: Math, Automatic Differentiation, or two evaluations of B(w). Slow & Hard: Differentiate through an entire training procedure:

  • Leave-one-out retraining? (time-bound)
  • Backprop? (memory-bound)
  • Approximate using Influence Functions

Koh & Liang (ICML 2017)

slide-30
SLIDE 30

Computing the Components

Fast & Easy: Math, Automatic Differentiation, or two evaluations of B(w). Slow & Hard: Differentiate through an entire training procedure:

  • Leave-one-out retraining? (time-bound)
  • Backprop? (memory-bound)
  • Approximate using Influence Functions

Koh & Liang (ICML 2017)

slide-31
SLIDE 31

Give us a way to approximate the change in model parameters

Influence Functions

new model params: θ̃ ≈ infl_func(θ, ∆X) perturb training data by ∆X model parameters: θ

slide-32
SLIDE 32

Influence Functions

Inverse Hessian (GloVe: 2VD x 2VD matrix) 2VD can easily be > 109

slide-33
SLIDE 33

Applying Influence Functions to GloVe

  • ther params

(treat as const) GloVe Loss : word vectors

slide-34
SLIDE 34

Applying Influence Functions to GloVe

Hessian becomes block diagonal! Gradient of Pointwise Loss (V Blocks of D by D) Allows us to apply influence function approximation to one word vector at a time!

slide-35
SLIDE 35

Algorithm: Compute Differential Bias

WEAT words

slide-36
SLIDE 36

Algorithm: Compute Differential Bias

WEAT words

slide-37
SLIDE 37

Algorithm: Compute Differential Bias

WEAT words

slide-38
SLIDE 38

Algorithm: Compute Differential Bias

WEAT words

slide-39
SLIDE 39

Algorithm: Compute Differential Bias

WEAT words

slide-40
SLIDE 40

Background Method Overview Critical Details > Experiments

slide-41
SLIDE 41

Objectives of Experiments

1. Assess the accuracy of our influence function approximation 2. Identify and analyse most bias impacting documents

slide-42
SLIDE 42

WEAT Corpora

S = Science T = Arts A = Male B = Female S = Instruments T = Weapons A = Pleasant B = Unpleasant

slide-43
SLIDE 43

Differential Bias

Differential Bias (%)

slide-44
SLIDE 44

Differential Bias

log

Differential Bias (%)

slide-45
SLIDE 45

Differential Bias

slide-46
SLIDE 46

Differential Bias

increase bias by 0.35%! 1 doc ≈ 0.00007% of corpus

Differential Bias (%)

slide-47
SLIDE 47

Ground Truth WEAT Approximated WEAT

slide-48
SLIDE 48

(0.7% of corpus) Removal of bias increasing docs

Ground Truth WEAT Approximated WEAT

Removal of bias increasing docs Baseline Bias (no removals)

slide-49
SLIDE 49

Baseline Bias Removal of bias increasing docs

Ground Truth WEAT Approximated WEAT

slide-50
SLIDE 50

Baseline Bias Removal of bias increasing docs

Ground Truth WEAT Approximated WEAT

slide-51
SLIDE 51

(0.7% of corpus) Baseline Bias Removal of bias increasing docs

Ground Truth WEAT Approximated WEAT

slide-52
SLIDE 52
slide-53
SLIDE 53
slide-54
SLIDE 54
slide-55
SLIDE 55

Document Impact Generalizes

remove bias increasing docs baseline (no removals) remove bias decreasing docs GloVe

  • 1.27

1.14 1.7 word2vec 0.11 1.35 1.6 Removal of documents also affects word2vec, and other metrics! WEAT1 (Science v.s. Arts Gender Bias)

slide-56
SLIDE 56

Limitations & Future Work

  • Consider multiple biases at simultaneously
  • Use metrics that depend on more words
  • Consider bias in downstream tasks where embeddings are used
  • Does this carry over to BERT?
slide-57
SLIDE 57

Recap

  • Bias can be quantified; correlates with

known human biases

  • We can identify the documents that most

impact bias, and approximate impact

  • These documents are qualitatively

meaningful, and impact generalizes

cleaner woman leader man Docn Dock

slide-58
SLIDE 58

Thank you!

Poster # 146

mebrunet@cs.toronto.edu

arXiv: 1810.03611

Marc Colleen Ashton Rich

slide-59
SLIDE 59

References

  • T. Bolukbasi, K.-W. Chang, J. Zou, V. Saligrama, and A. Kalai. Man is to computer programmer as

woman is to homemaker? debiasing word embeddings. In 30th Conference on Neural Information Processing Systems (NIPS), 2016.

  • A. Caliskan, J. J. Bryson, and A. Narayanan. Semantics derived automatically from language

corpora contain human-like biases. Science, 356(6334):183–186, 2017.

  • P. W. Koh and P. Liang. Understanding Black-box Predictions via Influence Functions. In

Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings

  • f Machine Learning Research, pages 1885–1894, 2017.
slide-60
SLIDE 60

Measuring Bias

Aylin Caliskan, Joanna J. Bryson, Arvind Narayanan (Science 2017)

“...results raise the possibility that all implicit human biases are reflected in the statistical properties of language.”

slide-61
SLIDE 61

Impact on Word2Vec

Decrease (0.7%) Baseline Increase (0.7%) GloVe

  • 1.27

1.14 1.7 word2vec 0.11 1.35 1.6 Removal of Documents Identified by our Method

slide-62
SLIDE 62

Word Embeddings

Compact vector representation (like a dictionary for machines) Learned from LARGE corpora. Used in many NLP tasks:

  • Sentiment Analysis
  • Text summarization
  • Machine Translation

{ “dictionally”: [1.33, -0.48, 0.98, -2.33 … ], “dictionary”: [1.23, -0.52, 1.01, -2.14 … ], “dictions”: [1.04, -0.63, 0.87, -2.23 … ], … }

slide-63
SLIDE 63

(0.7% of corpus) (0.7% of corpus) Replace with Table

slide-64
SLIDE 64
slide-65
SLIDE 65

(0.7% of corpus) (0.7% of corpus)

slide-66
SLIDE 66
slide-67
SLIDE 67

Psychology, Bias, and Embeddings

One study examined a dozen well- known human biases: all present Others examined the geometry of

  • Class
  • Race
  • Gender

Austin C. Kozlowski, Matt Taddy, James A. Evans (2018)

slide-68
SLIDE 68

Word Embeddings

What are they?

  • A compact vector representation for words
  • Learned from a very large corpus of text
  • Preserves syntactic and semantic meaning through

vector arithmetic (very useful) Applications:

  • Sentiment analysis
  • Document classification / summarization
  • Translation
  • Temporal semantic trajectories

Queen Woman King Man His Her Castle

(King - Man) (King - Man)

“King” - “Man” + “Woman” ≈ “Queen”

slide-69
SLIDE 69

A Motivating Example

“She is actually a good leader. He is just pretty.” #NoPlanetB

slide-70
SLIDE 70

Presumptuous Translation

slide-71
SLIDE 71

Presumptuous Translation

slide-72
SLIDE 72

Presumptuous Translation

slide-73
SLIDE 73
slide-74
SLIDE 74

Why does this happen?

slide-75
SLIDE 75
slide-76
SLIDE 76

Word Co-Occurrences

engineer nurse leader pretty (all) Ratio of he:she co-occurrences 6.25 0.550 9.25 3.07 3.53

The New York Times Annotated Corpus (1987-2007, approx. 1B words, context window: 8)

slide-77
SLIDE 77

GloVe: Global Vectors for Word Representations

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014.

X : co-occurrence Matrix { wi } : set of word vectors { uj }, b, c : other model parameters

slide-78
SLIDE 78

Bad Analogies

King : Man :: Queen : Woman Paris : France :: London : England Man : Computer_Programmer :: Woman : Homemaker

Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai (NeurIPS 2016)

Homemaker Woman Computer Programmer Man

slide-79
SLIDE 79

WEAT

Effect Size = S=Science T=Arts A=Male B=Female dSA dSB dTB dTA (dSA- dSB) - (dTA - dTB) Target Word Sets: S = {physics, chemistry… } ≈ Science T = {poetry, litterature… } ≈ Arts Attribute Word Sets: A = {he, him, man… } ≈ Male B = {she, her, woman} ≈ Female

Measures relative association between four concepts

slide-80
SLIDE 80

Applying IF to GloVe

IF Approx : GloVe Loss : Our “datapoints” are NOT documents, but rather the entries of X. So one document removal: X̃ = X - X(k), perturbs multiple “datapoints”.

slide-81
SLIDE 81

Applying IF to GloVe

Computed once per WEAT word Computed for every perturbation of interest Computed once per WEAT word

slide-82
SLIDE 82

Influence Functions (IF)

Inverse Hessian Difference of Gradients Perturbed Original δ: Set of perturbed data points