dig seminar civil rephrases of toxic texts with self
play

DIG Seminar: Civil Rephrases Of Toxic Texts With Self-Supervised - PowerPoint PPT Presentation

DIG Seminar: Civil Rephrases Of Toxic Texts With Self-Supervised Transformers Thomas Bonald 1 1 Tlcom Paris, Institut Polytechnique de Paris 2 Athens University of Economics & Business 3 Stockholm University 4 Google October 15, 2020


  1. DIG Seminar: Civil Rephrases Of Toxic Texts With Self-Supervised Transformers Thomas Bonald 1 1 Télécom Paris, Institut Polytechnique de Paris 2 Athens University of Economics & Business 3 Stockholm University 4 Google October 15, 2020 Laugier, L. (IP Paris) Presentation October 15, 2020 1 / 42 Léo Laugier 1 , John Pavlopoulos 2, 3 , Jefgrey Sorensen 4 , Lucas Dixon 4 ,

  2. Contents 4 October 15, 2020 Presentation Laugier, L. (IP Paris) Conclusion 5 Results on sentiment transfer and detoxicfjcation Evaluation: How to evaluate with automatic metrics? 1 3 Language Model Method: We fjne-tuned a Denoising Auto-Encoder bi-conditional 2 corpus? Introduction: Can we nudge healthier conversations from an unpaired 2 / 42

  3. Contents 4 October 15, 2020 Presentation Laugier, L. (IP Paris) Conclusion 5 Results on sentiment transfer and detoxicfjcation Evaluation: How to evaluate with automatic metrics? 1 3 Language Model Method: We fjne-tuned a Denoising Auto-Encoder bi-conditional 2 corpus? Introduction: Can we nudge healthier conversations from an unpaired 3 / 42

  4. Introduction (1/5): Nudging healthier conversations online Laugier, L. (IP Paris) Presentation October 15, 2020 4 / 42

  5. Introduction (1/5): Nudging healthier conversations online Laugier, L. (IP Paris) Presentation October 15, 2020 4 / 42

  6. Introduction (2/5): Machine learning systems classify toxic comments online Figure: from Pavlopoulos et al. [1] Laugier, L. (IP Paris) Presentation October 15, 2020 5 / 42

  7. Introduction (3/5): Deep learning is effjcient when applied to generative transfer tasks Figure: Left: CycleGAN [2] Right: Neural Machine Translation (NMT) (from https://jalammar.github.io/ ) Laugier, L. (IP Paris) Presentation October 15, 2020 6 / 42

  8. Introduction (3/5): Deep learning is effjcient when applied to generative transfer tasks Figure: Left: CycleGAN [2] Right: Neural Machine Translation (NMT) (from https://jalammar.github.io/ ) Laugier, L. (IP Paris) Presentation October 15, 2020 6 / 42

  9. Introduction (4/5): Golden annotated pairs are more expensive and diffjcult to get than monolingual corpora annotated in attribute Figure: Left: Parallel (paired) corpus for supervised NMT Right: Non-parallel (Unpaired) corpora for self-supervised NMT Laugier, L. (IP Paris) Presentation October 15, 2020 7 / 42

  10. Introduction (4/5): Golden annotated pairs are more expensive and diffjcult to get than monolingual corpora annotated in attribute Figure: Left: Parallel (paired) corpus for supervised NMT Right: Non-parallel (Unpaired) corpora for self-supervised NMT Laugier, L. (IP Paris) Presentation October 15, 2020 7 / 42

  11. Right: Yelp Review dataset [4] (for initial experiments and fair comparison Introduction (5/5): Therefore we opted for a self-supervised setting Figure: Left: Polarised Civil Comments dataset [3] purpose) Laugier, L. (IP Paris) Presentation October 15, 2020 8 / 42

  12. Introduction (5/5): Therefore we opted for a self-supervised setting Figure: Left: Polarised Civil Comments dataset [3] Right: Yelp Review dataset [4] (for initial experiments and fair comparison purpose) Laugier, L. (IP Paris) Presentation October 15, 2020 8 / 42

  13. Contents 4 October 15, 2020 Presentation Laugier, L. (IP Paris) Conclusion 5 Results on sentiment transfer and detoxicfjcation Evaluation: How to evaluate with automatic metrics? 1 3 Language Model Method: We fjne-tuned a Denoising Auto-Encoder bi-conditional 2 corpus? Introduction: Can we nudge healthier conversations from an unpaired 9 / 42

  14. 2 and CTRL [7] 2 and CTRL [7] 3 There exist two related approaches Encoder-decoder architectures work well for supervised sequence-to-sequence (seq2seq) tasks (NMT): T5 [5] 1 2 generation: GPT-2 [6] Language Models (LMs) are effjcient for self-supervised “free” 1 1 2 Laugier, L. (IP Paris) Presentation October 15, 2020 2 Method (1/14): Formalizing the problem Goal sequence-to-sequence (seq2seq) tasks (NMT): T5 [5] generation: GPT-2 [6] Encoder-decoder architectures work well for supervised There exist two related approaches 1 2 3 Language Models (LMs) are effjcient for self-supervised “free” 10 / 42 Let X T and X C be the “toxic” and “civil” non-parallel copora. Let X = X T ∪ X C . We aim at learning in a self-supervised setting, a mapping f θ s. t. ∀ ( x , a ) ∈ X × {“civil”, “toxic”}, y = f θ ( x , a ) is a text: 1 Satisfying a , 2 Fluent in English, 3 Preserving the meaning of x “as much as possible”.

  15. 2 and CTRL [7] Method (1/14): Formalizing the problem 3 There exist two related approaches Encoder-decoder architectures work well for supervised sequence-to-sequence (seq2seq) tasks (NMT): T5 [5] 1 2 generation: GPT-2 [6] Language Models (LMs) are effjcient for self-supervised “free” 1 1 2 Laugier, L. (IP Paris) Presentation October 15, 2020 2 10 / 42 Goal sequence-to-sequence (seq2seq) tasks (NMT): T5 [5] generation: GPT-2 [6] Encoder-decoder architectures work well for supervised There exist two related approaches 1 2 3 Language Models (LMs) are effjcient for self-supervised “free” Let X T and X C be the “toxic” and “civil” non-parallel copora. Let X = X T ∪ X C . We aim at learning in a self-supervised setting, a mapping f θ s. t. ∀ ( x , a ) ∈ X × {“civil”, “toxic”}, y = f θ ( x , a ) is a text: 1 Satisfying a , 2 Fluent in English, 3 Preserving the meaning of x “as much as possible”. 2 and CTRL [7]

  16. 2 and CTRL [7] 1 October 15, 2020 Presentation Laugier, L. (IP Paris) 2 1 generation: GPT-2 [6] Language Models (LMs) are effjcient for self-supervised “free” 3 2 1 sequence-to-sequence (seq2seq) tasks (NMT): T5 [5] Encoder-decoder architectures work well for supervised There exist two related approaches 2 Method (1/14): Formalizing the problem Goal sequence-to-sequence (seq2seq) tasks (NMT): T5 [5] generation: GPT-2 [6] There exist two related approaches Encoder-decoder architectures work well for supervised 10 / 42 1 2 3 Language Models (LMs) are effjcient for self-supervised “free” Let X T and X C be the “toxic” and “civil” non-parallel copora. Let X = X T ∪ X C . We aim at learning in a self-supervised setting, a mapping f θ s. t. ∀ ( x , a ) ∈ X × {“civil”, “toxic”}, y = f θ ( x , a ) is a text: 1 Satisfying a , 2 Fluent in English, 3 Preserving the meaning of x “as much as possible”. 2 and CTRL [7]

  17. y j if training y j if inference P Yj Y 1 1 X w 1 Y 1 1 X w 2 Y 1 1 X w V 1 X 1 y j x . . . generation at inference P Auto-Regressive (AR) Y j Y 1 y 1 Y j 1 y j y j x Y j Method (2/14): Encoder-Decoder for supervised seq2seq y 1 x Presentation October 15, 2020 11 / 42 y 1 Yj 1 Laugier, L. (IP Paris) yj P Y j y 1 Y j 1 y j x P Y j � ¯ y j = y j if training

  18. y j if training y j if inference Laugier, L. (IP Paris) y j generation at inference Auto-Regressive (AR) . . . Method (2/14): Encoder-Decoder for supervised seq2seq Presentation 11 / 42 October 15, 2020 � ¯ y j = y j if training � P θ yj − 1 , X = x = ˆ Yj | ˆ y 1 , ··· , ˆ Y 1 = ¯ Yj − 1 =¯  P θ ( ˆ Y j = w 1 | ˆ y 1 , · · · , ˆ  Y 1 = ¯ Y j − 1 = ¯ y j − 1 , X = x ) P θ ( ˆ Y j = w 2 | ˆ y 1 , · · · , ˆ Y 1 = ¯ Y j − 1 = ¯ y j − 1 , X = x )     P θ ( ˆ Y j = w | V | | ˆ y 1 , · · · , ˆ Y 1 = ¯ Y j − 1 = ¯ y j − 1 , X = x )

  19. y j if training Method (2/14): Encoder-Decoder for supervised seq2seq Laugier, L. (IP Paris) . . Auto-Regressive (AR) generation at inference . y j 11 / 42 October 15, 2020 Presentation � ¯ y j = y j if training ˆ � P θ y j if inference yj − 1 , X = x = ˆ Yj | ˆ y 1 , ··· , ˆ Y 1 = ¯ Yj − 1 =¯  P θ ( ˆ Y j = w 1 | ˆ y 1 , · · · , ˆ  Y 1 = ¯ Y j − 1 = ¯ y j − 1 , X = x ) P θ ( ˆ Y j = w 2 | ˆ y 1 , · · · , ˆ Y 1 = ¯ Y j − 1 = ¯ y j − 1 , X = x )     P θ ( ˆ Y j = w | V | | ˆ y 1 , · · · , ˆ Y 1 = ¯ Y j − 1 = ¯ y j − 1 , X = x )

  20. orders would confmict with the First Law . Method (3/14): Encoding and decoding is modeled via Figure: Cross-attention heat map for NMT, from Bahdanau et al. [8] (2015) Second Law of Robotics A robot must obey the orders given it by human beings except where such Laugier, L. (IP Paris) Presentation October 15, 2020 12 / 42 attention mechanism (see https://jalammar.github.io/ )

  21. Method (3/14): Encoding and decoding is modeled via Second Law of Robotics A robot must obey the orders given it by human beings except where such Laugier, L. (IP Paris) Presentation October 15, 2020 12 / 42 attention mechanism (see https://jalammar.github.io/ ) orders would confmict with the First Law .

  22. Method (3/14): Encoding and decoding is modeled via Second Law of Robotics A robot must obey the orders given it by human beings except where such Laugier, L. (IP Paris) Presentation October 15, 2020 12 / 42 attention mechanism (see https://jalammar.github.io/ ) orders would confmict with the First Law .

  23. Method (4/14): Bi-transformers [9] encode the input and Laugier, L. (IP Paris) Presentation October 15, 2020 13 / 42 decode the hidden states (see https://jalammar.github.io/ )

  24. Method (5/14): Inference time - where the Natural Laugier, L. (IP Paris) Presentation October 15, 2020 14 / 42 Language Generation happens (see https://jalammar.github.io/ )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend