NLP
Gender Bias in Contextualized Word Embeddings
Jieyu Zhao1, Tianlu Wang2, Mark Yatskar3, Ryan Cotterell4, Vicente Ordonez2, Kai-Wei Chang1
1UCLA, 2University of Virginia, 3Allen Institute for AI, 4University of Cambridge
2
Gender Bias in Contextualized Word Embeddings Jieyu Zhao 1 , Tianlu - - PowerPoint PPT Presentation
NLP Gender Bias in Contextualized Word Embeddings Jieyu Zhao 1 , Tianlu Wang 2 , Mark Yatskar 3 , Ryan Cotterell 4 , Vicente Ordonez 2 , Kai-Wei Chang 1 1 UCLA, 2 University of Virginia, 3 Allen Institute for AI, 4 University of Cambridge 2 NLP
NLP
Jieyu Zhao1, Tianlu Wang2, Mark Yatskar3, Ryan Cotterell4, Vicente Ordonez2, Kai-Wei Chang1
1UCLA, 2University of Virginia, 3Allen Institute for AI, 4University of Cambridge
2
NLP
Gender shade: https://www.youtube.com/watch?v=TWWsW1w-BVo [Buolamwini& Gebru 18] kwchang.net/talks/sp.html
3
NLP
the data/model and avoid affecting downstream tasks
kwchang.net/talks/sp.html
4
NLP
http://wordbias.umiacs.umd.edu/
5
NLP
Semantics Only w/ Syntactic Cues
6
NLP
7
NLP
contexts
word2vec ELMo
He taught himself to play the violin . Do you enjoy the play ? Embedding visualization
from context1 from context2
8
NLP
ELMo
resolution
9
NLP
Training Dataset Bias
Gender Male Pronouns Female Pronouns Occurrence (*1000) 5,300 1,600
10
NLP
# co-occurrence (*1000)
45 90 135 180
M-Biased Occupations F-Biased Occupations Male Pronoun Female Pronoun
11
1Zhao et al. Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods NAACL 2018
NLP
ELMo
% of explained variance principle components
12
NLP
The driver transported the counselor to the hospital because she was paid The driver transported the counselor to the hospital because he was paid
Female context Male context
13
NLP
Acc (%) 80 85 90 95 100 Male Context Female Context
gender information to
14% more accurately propagated than female
14
ELMo(occupation) →
<latexit sha1_base64="r2/BIv92hsxnS9nS8QghuqsRs=">AB9HicbVA9TwJBEJ3DL8Qv1NLmIphYkTstCTaWGIiYAIXsrcsGFv9ydw5ALv8PGQmNs/TF2/hsXuELBl0zy8t5MZuaFseAGPe/bya2tb2xu5bcLO7t7+wfFw6OmUYmrEGVUPohJIYJLlkDOQr2EGtGolCwVji6mfmtMdOGK3mPk5gFERlI3ueUoJWCckfzwRCJ1uqp3C2WvIo3h7tK/IyUIEO9W/zq9BRNIiaRCmJM2/diDFKikVPBpoVOYlhM6IgMWNtSJmgnR+9NQ9s0rP7StS6I7V39PpCQyZhKFtjMiODTL3kz8z2sn2L8KUi7jBJmki0X9RLio3FkCbo9rRlFMLCFUc3urS4dE4o2p4INwV9+eZU0qxX/olK9q5Zq1kceTiBUzgHy6hBrdQhwZQeIRneIU3Z+y8O/Ox6I152Qzx/AHzucPgN+R6w=</latexit>context gender
NLP
Semantics Only w/ Syntactic Cues
15
NLP
Stereotypical dataset
performance between Pro. and Anti. dataset.
1https://uclanlp.github.io/corefBias
The physician hired the secretary because he was overwhelmed with clients. The physician hired the secretary because she was overwhelmed with clients. Type 1 The secretary called the physician and told him about a new patient. The secretary called the physician and told her about a new patient.
Type 2
Semantics Only w/ Syntactic Cues Pro. Anti.
16
NLP
Δ: 29.6 Δ: 26.6
45 53.75 62.5 71.25 80 GloVe +ELMo OntoNotes Pro. Anti.
Semantics Only F1 (%)
17
NLP
50 57.5 65 72.5 GloVe w/ ELMo
F1 (%) Neutralizing Embeddings
18
NLP
62 64.5 67 69.5 72 OntoNotes Pro. Anti. Avg.
Data Augmentation Semantics Only F1 (%)
19
NLP
Input Representation Inference Output
Methods
Debiasing Word Embeddings
Data
using Corpus-level Constraints 20
NLP
21