How Computer Algorithms Expose Our Hidden Biases And How To Fix - PowerPoint PPT Presentation

How Computer Algorithms Expose Our Hidden Biases And How To Fix Them Victor Zimmermann LXIV. StuTS Computational Linguistics Department Heidelberg University

The Shitstorm cometh.

What happened? 1 lxiv. stuts | white man explains racism

The Netflix Artwork Controversy Why are the tabloids up in arms over Netflix adverts? Welcome to a stereotypical machine learning controversy . 2 lxiv. stuts | white man explains racism

The algorithm “If the artwork representing a title captures something compelling to you, then it acts as a gateway into that title and gives you some visual ‘evidence’ for why the title might be good for you.” [Cha+17] Figure 1: Different artworks for romance and comedy viewers. 3 lxiv. stuts | white man explains racism

Twitter outrage 4 lxiv. stuts | white man explains racism

Netflix’ Response “We don’t ask members for their race, gender or ethnicity so we cannot use this information to personalise their individual Netflix experience. The only information we use is a member’s viewing history.” [Iqb18] 5 lxiv. stuts | white man explains racism

Nobody expects the Patriarchy.

Sources of Bias There are some obvious reasons for bias in machine learning: • Your training data is bad. • Your algorithm is bad. 6 • You are bad. And you should feel bad. lxiv. stuts | white man explains racism

Bad Training Data

Human Language Spoiler : All human language is biased. Bias in not necessarily performance based . [Tan90][GMS98] Instead it can also be encoded in orthography , lexicography or grammar of a language. • Asymmetrically marked gender (generic masculine, e.g. actor vs actress) • Naming conventions (e.g. Chastity vs. Bob) [Swe13] 1 Wikipedia lists 22 misogynistic and 5 misandric slurs. 7 • Quantity of gendered insults 1 [Sta77] lxiv. stuts | white man explains racism

Word Embeddings Donnelly. The rally ended at about 3 Nurse Homemaker Paris What are Word Embeddings? Now you can do maths with words!? . . . p.m. and then spoke a rally at ... 8 Joe Obama spoke Sunday afternoon at a supporting Democrat U.S. Sen. Obama campaigned in Chicago and CHICAGO – Former President Barack Condensed mathematical representations of collocations. [Mik+13] northwest Indiana on Sunday, just tions. days ahead of Tuesday’s midterm elec- get-out-the-vote rally in Gary, Indiana, − − − − − → Obama ( 0 . 2 , 0 . 6 , ... ) − − − − → speaks ( 0 . 1 , 0 . 8 , ... ) − − − − − → Chicago ( 0 . 3 , 0 . 2 , ... ) ⇒ ⇒ − − − → press ( 0 . 0 , 0 . 5 , ... ) − − → Queen − − − → France = − − − → King − − − → Man + − − − − − → Woman = − − − − → Berlin − − − − − − − → Germany + − − − − → − − − − − − − − − → Programmer − − − → Man + − − − − − → Woman = − − − − − − − − → − − − − − → Surgeon − − − → Man + − − − − − → Woman = − − − → lxiv. stuts | white man explains racism

Word Embeddings What are Word Embeddings used for? • Similarity Measures [Kus+15] • Machine Translation [Zou+13] • Sentence Classification [Kim14] • Part-of-Speech-Tagging [SZ14][RRZ18] • Dependency Parsing [CM14] • Semantic Modelling [Fu+14] • Coreference Resolution [Lee+17] Basically the entire field of Computational Linguistics. 9 lxiv. stuts | white man explains racism

Mathematical Sledgehammer What if we just remove gender? Figure 2: Mind = Blown 10 lxiv. stuts | white man explains racism

Mathematical Sledgehammer • Take “good” analogies, e.g. man-woman, he-she, king-queen, etc. • Extract some average “gender vector” from their embeddings. • Substract this new vector from all other relations. - Not applicable to most other kinds of bias. 11 lxiv. stuts | white man explains racism

Mathematical Sledgehammer (in beautiful) being the means of the defining Word sets W , defining subsets n rows of SVD( C ), where Bias subspace B consists of the first k subsets. 12 with Words to neutralise N ∈ W , family of D 1 , D 2 , ..., D n ⊂ W , embedding equality sets ε := { E 1 , E 2 , ..., E m } , { w ∈ R d } w ∈ W , integer parameter k ≥ 1, E i ⊆ W , with reembedded words w ∈ N defined as ∑ µ i := w / | D i | w := ( w − w B ) / | w − w B | w ∈ D i . For each set E ∈ ε , let ∑ µ := w / | E | w ∈ E v := µ − µ B 1 − | v | 2 w B − µ B √ For each w ∈ E , w := v + ∑ ∑ C := ( w − µ i ) T ( w − µ i ) / | D i | . | w B − µ B | w ∈ D i i = 1 lxiv. stuts | white man explains racism

Bad Algorithms

Google’s Image Recognition Controversy Google automatically labels pictures according to their content. Problem: Their algorithm is bad. Source: @jackyalcine on Twitter 13 lxiv. stuts | white man explains racism

Google’s Image Recognition Controversy Their solution: Source: www.theverge.com (visited on 2018-11-06) 14 lxiv. stuts | white man explains racism

No easy solutions. Not one of these solutions is really good . • Total avoidance of problem. [Iqb18] • Limited applicability. [Bol+16] • Exploitation of false classification. [BGO16] • Introduction of even more priors and meta parameters. [Zha+17] 15 lxiv. stuts | white man explains racism

Bad People

Facebook Actual Quote from an actual Facebook Employee “We started out of a college dorm. I mean, c’mon, we’re Facebook. We never wanted to deal with this shit.” [Sha16] 16 lxiv. stuts | white man explains racism

Facebook Possible cause of this apathy: (Don’t quote me on this.) 17 lxiv. stuts | white man explains racism

Help, my Chatbot joined the KKK!

Microsoft Tay 18 lxiv. stuts | white man explains racism

Microsoft Tay 19 lxiv. stuts | white man explains racism

Microsoft Tay What can we learn from this? • Tay is a chat bot.Tay is a chat bot. • Tay is down with the kids?Tay is down with the kids? • Tay learns from Twitter data. 20 lxiv. stuts | white man explains racism

Microsoft Tay The absolutely expected happens... Source: www.theguardian.com (visited on 2018-11-19) 21 lxiv. stuts | white man explains racism

What should you take away from this talk? • Just because something uses “machine learning” doesn’t mean it is unbiased. • All language is implicitly prejudiced. • Training data does make a difference. • Diverse staff makes a difference. • Testing your system makes a difference. 22 lxiv. stuts | white man explains racism

What should you take away from this talk? Don’t listen to chat bots. They may act human. 23 lxiv. stuts | white man explains racism

Appendix

Language Classification Common language identification systems use extensive news corpora for training. + Big corpora in most languages. + Mostly unbiased “unbiased” texts. - Written in main dialect. - Privileged writing staff. Problem : African American English is 20% less likely to be classified as English than Standard English. [BO17] lxiv. stuts | white man explains racism

Language Classification Solution by Blodgett, Green, and O’Connor (2016): 1. Use US Census data und geolocated tweets to estimate race of user, 2. Train classifier to identify “race” of a given tweet, based on high AA tweets from first set. Result: • Build new corpus from high AA tweets. • (Find out that “Asian” captures all foreign languages and use that fact for classification.) lxiv. stuts | white man explains racism

References [Ang+16] Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. “Machine bias: There’s software used across the country to ProPublica, May 23 (2016). [BGO16] Su Lin Blodgett, Lisa Green, and Brendan O’Connor. “Demographic dialectal variation in social media: A case study of African-American English”. In: arXiv preprint arXiv:1608.08868 (2016). predict future criminals. and it’s biased against blacks”. In: lxiv. stuts | white man explains racism

References [BO17] pp. 183–186. issn: 10959203. arXiv: 1608.07187 . “Semantics derived automatically from language corpora Aylin Caliskan, Joanna J Bryson, and Arvind Narayanan. [CBN17] Programmer as Woman is to Homemaker? Debiasing Word Venkatesh Saligrama, and Adam Kalai. “Man is to Computer Tolga Bolukbasi, Kai-wei Chang, James Zou, [Bol+16] arXiv:1707.00061 (2017). Natural Language Processing: A Case Study of Social Media Su Lin Blodgett and Brendan O’Connor. “Racial Disparity in African-American English”. In: arXiv preprint Embeddings”. In: Nips (2016), pp. 1–9. contain human-like biases”. In: Science 356.6334 (2017), lxiv. stuts | white man explains racism

How Computer Algorithms Expose Our Hidden Biases And How To Fix - PowerPoint PPT Presentation

How Computer Algorithms Expose Our Hidden Biases And How To Fix Them Victor Zimmermann LXIV. StuTS Computational Linguistics Department Heidelberg University The Shitstorm cometh. What happened? 1 lxiv. stuts | white man explains racism

Heuristics and biases Tina Nane 2 Heuristics and biases Lotto Icon by Dapete is

Unconscious Bias 1 Questions to Start: Are we aware of our unconscious biases? Do we accept

Finding Hidden Supernovae with Finding Hidden Supernovae with Finding Hidden Supernovae with

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Investigating Potential Investigating Potential Biases in Aerosol Light Biases in Aerosol Light

Biases in Decision Making Alexander Felfernig alexander.felfernig@ist.tugraz.at Decision Biases

TEXT AND TEXT AND AUTOMATED BIASES AUTOMATED BIASES NATURAL LANGUAGES ARE THE NATURAL

Capital Budgeting: Biases (Welch, Chapter 13-5) Ivo Welch More Biases Overconfidence Are you

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Another view Hidden Input CEC is constant error Hidden carrousel No vanishing gradients

PUBLIC EXPOSE PT Indika Energy Tbk. 26 April 2018 Term and conditions This presentation material

Hidden Subgroup Hidden Subgroup Def. A Map is said to have A Map

Visibility Determination AKA, hidden surface elimination Visibility Algorithms Roger Crawfis

Hidden Markov Models Terminology, Representation and Basic Problems The next two weeks Hidden

LSTMs Overview Subhashini Venugopalan Neural Networks z t Output B Hidden Hidden Input WHY

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

ML modules Daniel Jackson MIT Lab for Computer Science 6898: Advanced Topics in Software Design

Chief FOIA Officer Council Meeting October 4, 2018 1 Department of Justice FOIA Guidance

Shock wave boundary layer interaction in intakes at incidence Giacomo Castiglioni, Francesco

Redefining Events in art Tom Junk art Users' Meeting 17 June 2016 Many Thanks Many thanks to

Matching Theory and Practice David Delacr etaz The University of Melbourne and The Centre for

CS 401: Computer Algorithms I Stable Matching / Representative Problems Xiaorui Sun 1 Last

Ontology Learning caro Medeiros CIn - UFPE September 30, 2008 caro Medeiros (CIn - UFPE)

Emotional Enterprises? Measuring Affective Language in Companies External Communication Sven