Institute of Computational Perception
Natural Language Processing with Deep Learning Footprint of Societal Biases in NLP
Navid Rekab-Saz
navid.rekabsaz@jku.at Institute of Computational Perception
Natural Language Processing with Deep Learning Footprint of Societal - - PowerPoint PPT Presentation
Natural Language Processing with Deep Learning Footprint of Societal Biases in NLP Navid Rekab-Saz navid.rekabsaz@jku.at Institute of Computational Perception Institute of Computational Perception Agenda Motivation Bias in word
Institute of Computational Perception
navid.rekabsaz@jku.at Institute of Computational Perception
State of the world data model
individuals action feedback
5
Statistical Bias indicates the amount of assumptions, taken to define a model. Higher bias means more assumptions and less flexibility, as in linear regression.
6
Oxford dictionary
Fairness and Machine Learning Solon Barocas, Moritz Hardt, Arvind Narayanan, 2019, fairmlbook.org
7
https://www.theguardian.com/technology/2015/jul/01/google-sorry-racist-auto- tag-photo-app
https://www.theverge.com/2017/4/25/15419522/faceapp-hot-filter-racist- apology
https://www.wired.co.uk/article/robot-beauty-contest-beauty-ai
8
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
10
11
Source: https://fair-ia.ekstrandom.net/sigir2019-slides.pdf
12
signal for some groups of users
are they accountable?
Source: https://fair-ia.ekstrandom.net/sigir2019-slides.pdf
13
E.g. the learned representation of word nurse may convey that its encoded implicit meaning is about being woman!
14
whether a person makes over 50K a year
http://www.fairness-measures.org/Pages/Datasets/censusincome.html
15
De-Arteaga, Maria, et al. "Bias in bios: A case study of semantic representation bias in a high-stakes setting." Proceedings of the Conference on Fairness, Accountability, and Transparency. 2019.
16
Big problems need interdisciplinary thinking!
18
19
20
21
22
23
Bolukbasi, T., Chang, K. W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in neural information processing systems
24
Formal definition of bias
Defining gender as a binary construct – namely female vs. male – is an unpleasant simplification, as it neglects the wide definition of gender! Ideally these formulations should cover all gender definitions: LGBT+
25
!∈ℤ
% !∈& ℤ
!, 𝒘$)
word2vec or GloVe)
ℤ when measuring bias towards female:
26
Garg, N., Schiebinger, L., Jurafsky, D., & Zou, J. (2018). Word embeddings quantify 100 years of gender and ethnic
27
Rekabsaz N., Henderson J., West R., and Hanbury A. "Measuring Societal Biases in Text Corpora via First-Order Co-
1 ℤ #
!∈ℤ
cos (𝒘!, 𝒘$) 1 + ℤ #
% !∈& ℤ
cos (𝒘%
!, 𝒘$)
Associations are measured using a word2vec model, trained on a recent Wikipedia corpus
28
29
30
31
32
What we know so far …
Subsequent questions:
search, content-based recommendation systems, IR, sentiment analysis, etc.
34
retrieved documents of query nurse will be about women (biased towards female)
Do Neural Ranking Models Intensify Gender Bias? Rekabsaz N., Schedl M.. To be appeared in the proceedings of the ACM Conference on Research and Development in Information Retrieval (SIGIR) 2020, https://arxiv.org/abs/2005.00372
35
results with a more prominent representation of a gender
36
'∈ℤ
ℤ(𝐸) = + '∈& ℤ
37
*+, '
(.) − 𝛿 & ℤ(𝐸* (.))
.∈ℚ
($) is the document at position 𝑗 of the list of documents, retrieved by
an IR model when query 𝑅 is issued
38
39
40
features (such as gender, race, ethnicity, age)
learned representations
41
Hamilton, W. L., Leskovec, J., & Jurafsky, D.. Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (2016) https://www.arxiv-vanity.com/papers/1904.02679/ https://blog.ml.cmu.edu/2020/02/28/inherent-tradeoffs-in-learning-fair-representations/