 
              Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology ACL 2019 Ran Zmigrod, Sebastian J. Mielke , Hanna Wallach, Ryan Cotterell University of Cambridge // Johns Hopkins University // Microsoft Research rz279@cam.ac.uk sjmielke@jhu.edu wallach@microsoft.com rdc42@cam.ac.uk Twitter: @RanZmigrod – paper and thread pinned! // @sjmielke 1
Gender bias in NLP systems Coreference resolution systems are biased: Even though the doctor reassured the nurse, she was worried. 2
Gender bias in NLP systems Coreference resolution systems are biased: Even though the doctor reassured the nurse, she was worried. 2
Gender bias in NLP systems Coreference resolution systems are biased: Even though the doctor reassured the nurse, she was worried. 2
Gender bias in NLP systems Coreference resolution systems are biased: Even though the doctor reassured the nurse, she was worried. Both are possible... 2
Gender bias in NLP systems Coreference resolution systems are biased: Even though the doctor reassured the nurse, she was worried. (Rudinger et al., 2018; Both are possible... but systems prefer nurse! Zhao et al., 2018) 2
Gender bias in NLP systems Coreference resolution systems are biased: Even though the doctor reassured the nurse, she was worried. (Rudinger et al., 2018; Both are possible... but systems prefer nurse! Zhao et al., 2018) Word embeddings carry biases: 2
This shouldn’t come as a surprise: our data is biased Google n-grams frequency counts he is a doctor she is a doctor 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 3
Our focus: stereotypes in language modeling (Lu et al., 2018) stereotype Training data counts m f are visible as m He is a good doctor. He is a good nurse. likelihoods under a pronoun language model: f She is a good doctor. She is a good nurse. 4
Our focus: stereotypes in language modeling (Lu et al., 2018) stereotype Training data counts m f are visible as m He is a good doctor. He is a good nurse. likelihoods under a pronoun language model: f She is a good doctor. She is a good nurse. The solution: C ounterfactual D ata A ugmentation (Lu et al., 2018) 4
Our focus: stereotypes in language modeling (Lu et al., 2018) stereotype Training data counts m f are visible as m He is a good doctor. He is a good nurse. likelihoods under a pronoun language model: f She is a good doctor. She is a good nurse. For every sentence with she / he : The solution: e.g., “She is a nurse.” C ounterfactual D ata A ugmentation (Lu et al., 2018) 4
Our focus: stereotypes in language modeling (Lu et al., 2018) stereotype Training data counts m f are visible as m He is a good doctor. He is a good nurse. likelihoods under a pronoun language model: f She is a good doctor. She is a good nurse. For every sentence with she / he : The solution: e.g., “She is a nurse.” C ounterfactual add that sentence with he / she for training: D ata e.g., “He is a nurse.” A ugmentation (Lu et al., 2018) 4
Our focus: stereotypes in language modeling (Lu et al., 2018) stereotype Training data counts m f are visible as m He is a good doctor. He is a good nurse. likelihoods under a pronoun language model: f She is a good doctor. She is a good nurse. For every sentence with she / he : The solution: e.g., “She is a nurse.” C ounterfactual add that sentence with he / she for training: D ata e.g., “He is a nurse.” A ugmentation Now they should yield a balanced model! (Lu et al., 2018) 4
“Agreement” or “what if: German” stereotype m f m Er ist ein guter Arzt. Er ist ein guter Krankenpfleger. pronoun f Sie ist eine gute Ärztin. Sie ist eine gute Krankenpflegerin. 6
“Agreement” or “what if: German” stereotype m f m Er ist ein guter Arzt. Er ist ein guter Krankenpfleger. pronoun f Sie ist eine gute Ärztin. Sie ist eine gute Krankenpflegerin. 6
“Agreement” or “what if: German” stereotype m f m Er ist ein guter Arzt. Er ist ein guter Krankenpfleger. pronoun f Sie ist eine gute Ärztin. Sie ist eine gute Krankenpflegerin. 6
“Agreement” or “what if: German” stereotype m f m Er ist ein guter Arzt. Er ist ein guter Krankenpfleger. pronoun f Sie ist eine gute Ärztin. Sie ist eine gute Krankenpflegerin. 6
“Agreement” or “what if: German” stereotype m f m Er ist ein guter Arzt. Er ist ein guter Krankenpfleger. pronoun f Sie ist eine gute Ärztin. Sie ist eine gute Krankenpflegerin. So, uh, can we just... change all words’ grammatical gender? 6
“Agreement” or “what if: German” stereotype m f m Er ist ein guter Arzt. Er ist ein guter Krankenpfleger. pronoun f Sie ist eine gute Ärztin. Sie ist eine gute Krankenpflegerin. So, uh, can we just... change all words’ grammatical gender? Example: Der Arzt sitzt auf einem Stuhl (The male doctor sits on a chair) 6
“Agreement” or “what if: German” stereotype m f m Er ist ein guter Arzt. Er ist ein guter Krankenpfleger. pronoun f Sie ist eine gute Ärztin. Sie ist eine gute Krankenpflegerin. So, uh, can we just... change all words’ grammatical gender? Example: Der Arzt sitzt auf einem Stuhl (The male doctor sits on a chair) Swap all: Die Ärztin sitzt auf einer Stuhl 6
“Agreement” or “what if: German” stereotype m f m Er ist ein guter Arzt. Er ist ein guter Krankenpfleger. pronoun f Sie ist eine gute Ärztin. Sie ist eine gute Krankenpflegerin. So, uh, can we just... change all words’ grammatical gender? Example: Der Arzt sitzt auf einem Stuhl (The male doctor sits on a chair) Swap all: Die Ärztin sitzt auf einer Stuhl (The female doctor sits on a... what?) No, what we need is... 6
Syntax to the rescue: use dependency parses gute Der Arzt sitzt auf einem Stuhl 7
Syntax to the rescue: use dependency parses Only words “connected” in the dependency parse should change! gute Der Arzt sitzt auf einem Stuhl 7
Syntax to the rescue: use dependency parses Only words “connected” in the dependency parse should change! Build a MRF over morphological tags along the dependency parse! 3P; M ; SG; M ; SG; M ; SG; - M ; SG; M ; SG; SG; NOM NOM NOM DAT DAT PRS 7
Syntax to the rescue: use dependency parses Only words “connected” in the dependency parse should change! Build a MRF over morphological tags along the dependency parse! d learned from data, o r o n c c e n t e e m / a g r s a c t o r u r a l f n e 3P; M ; SG; M ; SG; M ; SG; - M ; SG; M ; SG; SG; NOM NOM NOM DAT DAT PRS 7
Syntax to the rescue: use dependency parses Only words “connected” in the dependency parse should change! Build a MRF over morphological tags along the dependency parse! d learned from data, o r o n c c e n t e e m / a g r s a c t o r u r a l f n e 3P; M ; SG; M ; SG; M ; SG; - M ; SG; M ; SG; SG; NOM NOM NOM DAT DAT PRS m a n u a l d a g not learned, boosts tags that stay m p e n i n what they were before intervention 7
Syntax to the rescue: use dependency parses Only words “connected” in the dependency parse should change! Build a MRF over morphological tags along the dependency parse! d learned from data, o r o n c c e n t e e m / a g r s a c t o r u r a l f n e 3P; M ; SG; M ; SG; F ; SG; - M ; SG; M ; SG; SG; NOM NOM NOM DAT DAT PRS m a n u a l d a g not learned, boosts tags that stay m p e n i n what they were before intervention 7
Syntax to the rescue: use dependency parses Only words “connected” in the dependency parse should change! Build a MRF over morphological tags along the dependency parse! d learned from data, o r o n c c e n t e e m / a g r s a c t o r u r a l f n e 3P; F ; SG; F ; SG; F ; SG; - M ; SG; M ; SG; SG; NOM NOM NOM DAT DAT PRS m a n u a l d a g not learned, boosts tags that stay m p e n i n what they were before intervention 7
Recap: what is a Markov Random Field (Koller and Friedman, 2009) ? Model p ( x , y , z ) by decomposing into factors ( )! y x z 8
Recap: what is a Markov Random Field (Koller and Friedman, 2009) ? Model p ( x , y , z ) by decomposing into factors ( )! y x z 8
Recap: what is a Markov Random Field (Koller and Friedman, 2009) ? Model p ( x , y , z ) by decomposing into factors ( )! Every factor gives a score to certain assignments: ( x = 2, y = 1 ) = 0.42 ( y = 1 ) = 1.3 y x ( z = 1 ) = − 1 z 8
Recap: what is a Markov Random Field (Koller and Friedman, 2009) ? Model p ( x , y , z ) by decomposing into factors ( )! Every factor gives a score to certain assignments: ( x = 2, y = 1 ) = 0.42 ( y = 1 ) = 1.3 y x ( z = 1 ) = − 1 Add up all factors to obtain global score: score ( x = 2, y = 1, z = 4 ) = z ( x = 2, y = 1 ) + ( y = 1 ) + ( z = 4 ) 8
Recap: what is a Markov Random Field (Koller and Friedman, 2009) ? Model p ( x , y , z ) by decomposing into factors ( )! Every factor gives a score to certain assignments: ( x = 2, y = 1 ) = 0.42 ( y = 1 ) = 1.3 y x ( z = 1 ) = − 1 Add up all factors to obtain global score: score ( x = 2, y = 1, z = 4 ) = z ( x = 2, y = 1 ) + ( y = 1 ) + ( z = 4 ) Get p by global normalization (easy in trees): p ( x = 2, y = 1, z = 4 ) ∝ expscore ( x = 2, y = 1, z = 4 ) 8
Recommend
More recommend