DIG Seminar: Civil Rephrases Of Toxic Texts With Self-Supervised Transformers Thomas Bonald 1 1 Télécom Paris, Institut Polytechnique de Paris 2 Athens University of Economics & Business 3 Stockholm University 4 Google October 15, 2020 Laugier, L. (IP Paris) Presentation October 15, 2020 1 / 42 Léo Laugier 1 , John Pavlopoulos 2, 3 , Jefgrey Sorensen 4 , Lucas Dixon 4 ,
Contents 4 October 15, 2020 Presentation Laugier, L. (IP Paris) Conclusion 5 Results on sentiment transfer and detoxicfjcation Evaluation: How to evaluate with automatic metrics? 1 3 Language Model Method: We fjne-tuned a Denoising Auto-Encoder bi-conditional 2 corpus? Introduction: Can we nudge healthier conversations from an unpaired 2 / 42
Contents 4 October 15, 2020 Presentation Laugier, L. (IP Paris) Conclusion 5 Results on sentiment transfer and detoxicfjcation Evaluation: How to evaluate with automatic metrics? 1 3 Language Model Method: We fjne-tuned a Denoising Auto-Encoder bi-conditional 2 corpus? Introduction: Can we nudge healthier conversations from an unpaired 3 / 42
Introduction (1/5): Nudging healthier conversations online Laugier, L. (IP Paris) Presentation October 15, 2020 4 / 42
Introduction (1/5): Nudging healthier conversations online Laugier, L. (IP Paris) Presentation October 15, 2020 4 / 42
Introduction (2/5): Machine learning systems classify toxic comments online Figure: from Pavlopoulos et al. [1] Laugier, L. (IP Paris) Presentation October 15, 2020 5 / 42
Introduction (3/5): Deep learning is effjcient when applied to generative transfer tasks Figure: Left: CycleGAN [2] Right: Neural Machine Translation (NMT) (from https://jalammar.github.io/ ) Laugier, L. (IP Paris) Presentation October 15, 2020 6 / 42
Introduction (3/5): Deep learning is effjcient when applied to generative transfer tasks Figure: Left: CycleGAN [2] Right: Neural Machine Translation (NMT) (from https://jalammar.github.io/ ) Laugier, L. (IP Paris) Presentation October 15, 2020 6 / 42
Introduction (4/5): Golden annotated pairs are more expensive and diffjcult to get than monolingual corpora annotated in attribute Figure: Left: Parallel (paired) corpus for supervised NMT Right: Non-parallel (Unpaired) corpora for self-supervised NMT Laugier, L. (IP Paris) Presentation October 15, 2020 7 / 42
Introduction (4/5): Golden annotated pairs are more expensive and diffjcult to get than monolingual corpora annotated in attribute Figure: Left: Parallel (paired) corpus for supervised NMT Right: Non-parallel (Unpaired) corpora for self-supervised NMT Laugier, L. (IP Paris) Presentation October 15, 2020 7 / 42
Right: Yelp Review dataset [4] (for initial experiments and fair comparison Introduction (5/5): Therefore we opted for a self-supervised setting Figure: Left: Polarised Civil Comments dataset [3] purpose) Laugier, L. (IP Paris) Presentation October 15, 2020 8 / 42
Introduction (5/5): Therefore we opted for a self-supervised setting Figure: Left: Polarised Civil Comments dataset [3] Right: Yelp Review dataset [4] (for initial experiments and fair comparison purpose) Laugier, L. (IP Paris) Presentation October 15, 2020 8 / 42
Contents 4 October 15, 2020 Presentation Laugier, L. (IP Paris) Conclusion 5 Results on sentiment transfer and detoxicfjcation Evaluation: How to evaluate with automatic metrics? 1 3 Language Model Method: We fjne-tuned a Denoising Auto-Encoder bi-conditional 2 corpus? Introduction: Can we nudge healthier conversations from an unpaired 9 / 42
2 and CTRL [7] 2 and CTRL [7] 3 There exist two related approaches Encoder-decoder architectures work well for supervised sequence-to-sequence (seq2seq) tasks (NMT): T5 [5] 1 2 generation: GPT-2 [6] Language Models (LMs) are effjcient for self-supervised “free” 1 1 2 Laugier, L. (IP Paris) Presentation October 15, 2020 2 Method (1/14): Formalizing the problem Goal sequence-to-sequence (seq2seq) tasks (NMT): T5 [5] generation: GPT-2 [6] Encoder-decoder architectures work well for supervised There exist two related approaches 1 2 3 Language Models (LMs) are effjcient for self-supervised “free” 10 / 42 Let X T and X C be the “toxic” and “civil” non-parallel copora. Let X = X T ∪ X C . We aim at learning in a self-supervised setting, a mapping f θ s. t. ∀ ( x , a ) ∈ X × {“civil”, “toxic”}, y = f θ ( x , a ) is a text: 1 Satisfying a , 2 Fluent in English, 3 Preserving the meaning of x “as much as possible”.
2 and CTRL [7] Method (1/14): Formalizing the problem 3 There exist two related approaches Encoder-decoder architectures work well for supervised sequence-to-sequence (seq2seq) tasks (NMT): T5 [5] 1 2 generation: GPT-2 [6] Language Models (LMs) are effjcient for self-supervised “free” 1 1 2 Laugier, L. (IP Paris) Presentation October 15, 2020 2 10 / 42 Goal sequence-to-sequence (seq2seq) tasks (NMT): T5 [5] generation: GPT-2 [6] Encoder-decoder architectures work well for supervised There exist two related approaches 1 2 3 Language Models (LMs) are effjcient for self-supervised “free” Let X T and X C be the “toxic” and “civil” non-parallel copora. Let X = X T ∪ X C . We aim at learning in a self-supervised setting, a mapping f θ s. t. ∀ ( x , a ) ∈ X × {“civil”, “toxic”}, y = f θ ( x , a ) is a text: 1 Satisfying a , 2 Fluent in English, 3 Preserving the meaning of x “as much as possible”. 2 and CTRL [7]
2 and CTRL [7] 1 October 15, 2020 Presentation Laugier, L. (IP Paris) 2 1 generation: GPT-2 [6] Language Models (LMs) are effjcient for self-supervised “free” 3 2 1 sequence-to-sequence (seq2seq) tasks (NMT): T5 [5] Encoder-decoder architectures work well for supervised There exist two related approaches 2 Method (1/14): Formalizing the problem Goal sequence-to-sequence (seq2seq) tasks (NMT): T5 [5] generation: GPT-2 [6] There exist two related approaches Encoder-decoder architectures work well for supervised 10 / 42 1 2 3 Language Models (LMs) are effjcient for self-supervised “free” Let X T and X C be the “toxic” and “civil” non-parallel copora. Let X = X T ∪ X C . We aim at learning in a self-supervised setting, a mapping f θ s. t. ∀ ( x , a ) ∈ X × {“civil”, “toxic”}, y = f θ ( x , a ) is a text: 1 Satisfying a , 2 Fluent in English, 3 Preserving the meaning of x “as much as possible”. 2 and CTRL [7]
y j if training y j if inference P Yj Y 1 1 X w 1 Y 1 1 X w 2 Y 1 1 X w V 1 X 1 y j x . . . generation at inference P Auto-Regressive (AR) Y j Y 1 y 1 Y j 1 y j y j x Y j Method (2/14): Encoder-Decoder for supervised seq2seq y 1 x Presentation October 15, 2020 11 / 42 y 1 Yj 1 Laugier, L. (IP Paris) yj P Y j y 1 Y j 1 y j x P Y j � ¯ y j = y j if training
y j if training y j if inference Laugier, L. (IP Paris) y j generation at inference Auto-Regressive (AR) . . . Method (2/14): Encoder-Decoder for supervised seq2seq Presentation 11 / 42 October 15, 2020 � ¯ y j = y j if training � P θ yj − 1 , X = x = ˆ Yj | ˆ y 1 , ··· , ˆ Y 1 = ¯ Yj − 1 =¯ P θ ( ˆ Y j = w 1 | ˆ y 1 , · · · , ˆ Y 1 = ¯ Y j − 1 = ¯ y j − 1 , X = x ) P θ ( ˆ Y j = w 2 | ˆ y 1 , · · · , ˆ Y 1 = ¯ Y j − 1 = ¯ y j − 1 , X = x ) P θ ( ˆ Y j = w | V | | ˆ y 1 , · · · , ˆ Y 1 = ¯ Y j − 1 = ¯ y j − 1 , X = x )
y j if training Method (2/14): Encoder-Decoder for supervised seq2seq Laugier, L. (IP Paris) . . Auto-Regressive (AR) generation at inference . y j 11 / 42 October 15, 2020 Presentation � ¯ y j = y j if training ˆ � P θ y j if inference yj − 1 , X = x = ˆ Yj | ˆ y 1 , ··· , ˆ Y 1 = ¯ Yj − 1 =¯ P θ ( ˆ Y j = w 1 | ˆ y 1 , · · · , ˆ Y 1 = ¯ Y j − 1 = ¯ y j − 1 , X = x ) P θ ( ˆ Y j = w 2 | ˆ y 1 , · · · , ˆ Y 1 = ¯ Y j − 1 = ¯ y j − 1 , X = x ) P θ ( ˆ Y j = w | V | | ˆ y 1 , · · · , ˆ Y 1 = ¯ Y j − 1 = ¯ y j − 1 , X = x )
orders would confmict with the First Law . Method (3/14): Encoding and decoding is modeled via Figure: Cross-attention heat map for NMT, from Bahdanau et al. [8] (2015) Second Law of Robotics A robot must obey the orders given it by human beings except where such Laugier, L. (IP Paris) Presentation October 15, 2020 12 / 42 attention mechanism (see https://jalammar.github.io/ )
Method (3/14): Encoding and decoding is modeled via Second Law of Robotics A robot must obey the orders given it by human beings except where such Laugier, L. (IP Paris) Presentation October 15, 2020 12 / 42 attention mechanism (see https://jalammar.github.io/ ) orders would confmict with the First Law .
Method (3/14): Encoding and decoding is modeled via Second Law of Robotics A robot must obey the orders given it by human beings except where such Laugier, L. (IP Paris) Presentation October 15, 2020 12 / 42 attention mechanism (see https://jalammar.github.io/ ) orders would confmict with the First Law .
Method (4/14): Bi-transformers [9] encode the input and Laugier, L. (IP Paris) Presentation October 15, 2020 13 / 42 decode the hidden states (see https://jalammar.github.io/ )
Method (5/14): Inference time - where the Natural Laugier, L. (IP Paris) Presentation October 15, 2020 14 / 42 Language Generation happens (see https://jalammar.github.io/ )
Recommend
More recommend