DIG Seminar: Civil Rephrases Of Toxic Texts With Self-Supervised - PowerPoint PPT Presentation

DIG Seminar: Civil Rephrases Of Toxic Texts With Self-Supervised Transformers Thomas Bonald 1 1 Télécom Paris, Institut Polytechnique de Paris 2 Athens University of Economics & Business 3 Stockholm University 4 Google October 15, 2020 Laugier, L. (IP Paris) Presentation October 15, 2020 1 / 42 Léo Laugier 1 , John Pavlopoulos 2, 3 , Jefgrey Sorensen 4 , Lucas Dixon 4 ,

Contents 4 October 15, 2020 Presentation Laugier, L. (IP Paris) Conclusion 5 Results on sentiment transfer and detoxicfjcation Evaluation: How to evaluate with automatic metrics? 1 3 Language Model Method: We fjne-tuned a Denoising Auto-Encoder bi-conditional 2 corpus? Introduction: Can we nudge healthier conversations from an unpaired 2 / 42

Introduction (1/5): Nudging healthier conversations online Laugier, L. (IP Paris) Presentation October 15, 2020 4 / 42

Introduction (2/5): Machine learning systems classify toxic comments online Figure: from Pavlopoulos et al. [1] Laugier, L. (IP Paris) Presentation October 15, 2020 5 / 42

Introduction (3/5): Deep learning is effjcient when applied to generative transfer tasks Figure: Left: CycleGAN [2] Right: Neural Machine Translation (NMT) (from https://jalammar.github.io/ ) Laugier, L. (IP Paris) Presentation October 15, 2020 6 / 42

Introduction (4/5): Golden annotated pairs are more expensive and diffjcult to get than monolingual corpora annotated in attribute Figure: Left: Parallel (paired) corpus for supervised NMT Right: Non-parallel (Unpaired) corpora for self-supervised NMT Laugier, L. (IP Paris) Presentation October 15, 2020 7 / 42

Right: Yelp Review dataset [4] (for initial experiments and fair comparison Introduction (5/5): Therefore we opted for a self-supervised setting Figure: Left: Polarised Civil Comments dataset [3] purpose) Laugier, L. (IP Paris) Presentation October 15, 2020 8 / 42

Introduction (5/5): Therefore we opted for a self-supervised setting Figure: Left: Polarised Civil Comments dataset [3] Right: Yelp Review dataset [4] (for initial experiments and fair comparison purpose) Laugier, L. (IP Paris) Presentation October 15, 2020 8 / 42

2 and CTRL [7] 2 and CTRL [7] 3 There exist two related approaches Encoder-decoder architectures work well for supervised sequence-to-sequence (seq2seq) tasks (NMT): T5 [5] 1 2 generation: GPT-2 [6] Language Models (LMs) are effjcient for self-supervised “free” 1 1 2 Laugier, L. (IP Paris) Presentation October 15, 2020 2 Method (1/14): Formalizing the problem Goal sequence-to-sequence (seq2seq) tasks (NMT): T5 [5] generation: GPT-2 [6] Encoder-decoder architectures work well for supervised There exist two related approaches 1 2 3 Language Models (LMs) are effjcient for self-supervised “free” 10 / 42 Let X T and X C be the “toxic” and “civil” non-parallel copora. Let X = X T ∪ X C . We aim at learning in a self-supervised setting, a mapping f θ s. t. ∀ ( x , a ) ∈ X × {“civil”, “toxic”}, y = f θ ( x , a ) is a text: 1 Satisfying a , 2 Fluent in English, 3 Preserving the meaning of x “as much as possible”.

2 and CTRL [7] Method (1/14): Formalizing the problem 3 There exist two related approaches Encoder-decoder architectures work well for supervised sequence-to-sequence (seq2seq) tasks (NMT): T5 [5] 1 2 generation: GPT-2 [6] Language Models (LMs) are effjcient for self-supervised “free” 1 1 2 Laugier, L. (IP Paris) Presentation October 15, 2020 2 10 / 42 Goal sequence-to-sequence (seq2seq) tasks (NMT): T5 [5] generation: GPT-2 [6] Encoder-decoder architectures work well for supervised There exist two related approaches 1 2 3 Language Models (LMs) are effjcient for self-supervised “free” Let X T and X C be the “toxic” and “civil” non-parallel copora. Let X = X T ∪ X C . We aim at learning in a self-supervised setting, a mapping f θ s. t. ∀ ( x , a ) ∈ X × {“civil”, “toxic”}, y = f θ ( x , a ) is a text: 1 Satisfying a , 2 Fluent in English, 3 Preserving the meaning of x “as much as possible”. 2 and CTRL [7]

2 and CTRL [7] 1 October 15, 2020 Presentation Laugier, L. (IP Paris) 2 1 generation: GPT-2 [6] Language Models (LMs) are effjcient for self-supervised “free” 3 2 1 sequence-to-sequence (seq2seq) tasks (NMT): T5 [5] Encoder-decoder architectures work well for supervised There exist two related approaches 2 Method (1/14): Formalizing the problem Goal sequence-to-sequence (seq2seq) tasks (NMT): T5 [5] generation: GPT-2 [6] There exist two related approaches Encoder-decoder architectures work well for supervised 10 / 42 1 2 3 Language Models (LMs) are effjcient for self-supervised “free” Let X T and X C be the “toxic” and “civil” non-parallel copora. Let X = X T ∪ X C . We aim at learning in a self-supervised setting, a mapping f θ s. t. ∀ ( x , a ) ∈ X × {“civil”, “toxic”}, y = f θ ( x , a ) is a text: 1 Satisfying a , 2 Fluent in English, 3 Preserving the meaning of x “as much as possible”. 2 and CTRL [7]

y j if training y j if inference P Yj Y 1 1 X w 1 Y 1 1 X w 2 Y 1 1 X w V 1 X 1 y j x . . . generation at inference P Auto-Regressive (AR) Y j Y 1 y 1 Y j 1 y j y j x Y j Method (2/14): Encoder-Decoder for supervised seq2seq y 1 x Presentation October 15, 2020 11 / 42 y 1 Yj 1 Laugier, L. (IP Paris) yj P Y j y 1 Y j 1 y j x P Y j � ¯ y j = y j if training

y j if training y j if inference Laugier, L. (IP Paris) y j generation at inference Auto-Regressive (AR) . . . Method (2/14): Encoder-Decoder for supervised seq2seq Presentation 11 / 42 October 15, 2020 � ¯ y j = y j if training � P θ yj − 1 , X = x = ˆ Yj | ˆ y 1 , ··· , ˆ Y 1 = ¯ Yj − 1 =¯  P θ ( ˆ Y j = w 1 | ˆ y 1 , · · · , ˆ  Y 1 = ¯ Y j − 1 = ¯ y j − 1 , X = x ) P θ ( ˆ Y j = w 2 | ˆ y 1 , · · · , ˆ Y 1 = ¯ Y j − 1 = ¯ y j − 1 , X = x )     P θ ( ˆ Y j = w | V | | ˆ y 1 , · · · , ˆ Y 1 = ¯ Y j − 1 = ¯ y j − 1 , X = x )

y j if training Method (2/14): Encoder-Decoder for supervised seq2seq Laugier, L. (IP Paris) . . Auto-Regressive (AR) generation at inference . y j 11 / 42 October 15, 2020 Presentation � ¯ y j = y j if training ˆ � P θ y j if inference yj − 1 , X = x = ˆ Yj | ˆ y 1 , ··· , ˆ Y 1 = ¯ Yj − 1 =¯  P θ ( ˆ Y j = w 1 | ˆ y 1 , · · · , ˆ  Y 1 = ¯ Y j − 1 = ¯ y j − 1 , X = x ) P θ ( ˆ Y j = w 2 | ˆ y 1 , · · · , ˆ Y 1 = ¯ Y j − 1 = ¯ y j − 1 , X = x )     P θ ( ˆ Y j = w | V | | ˆ y 1 , · · · , ˆ Y 1 = ¯ Y j − 1 = ¯ y j − 1 , X = x )

orders would confmict with the First Law . Method (3/14): Encoding and decoding is modeled via Figure: Cross-attention heat map for NMT, from Bahdanau et al. [8] (2015) Second Law of Robotics A robot must obey the orders given it by human beings except where such Laugier, L. (IP Paris) Presentation October 15, 2020 12 / 42 attention mechanism (see https://jalammar.github.io/ )

Method (3/14): Encoding and decoding is modeled via Second Law of Robotics A robot must obey the orders given it by human beings except where such Laugier, L. (IP Paris) Presentation October 15, 2020 12 / 42 attention mechanism (see https://jalammar.github.io/ ) orders would confmict with the First Law .

Method (4/14): Bi-transformers [9] encode the input and Laugier, L. (IP Paris) Presentation October 15, 2020 13 / 42 decode the hidden states (see https://jalammar.github.io/ )

Method (5/14): Inference time - where the Natural Laugier, L. (IP Paris) Presentation October 15, 2020 14 / 42 Language Generation happens (see https://jalammar.github.io/ )

DIG Seminar: Civil Rephrases Of Toxic Texts With Self-Supervised - PowerPoint PPT Presentation

DIG Seminar: Civil Rephrases Of Toxic Texts With Self-Supervised Transformers Thomas Bonald 1 1 Tlcom Paris, Institut Polytechnique de Paris 2 Athens University of Economics & Business 3 Stockholm University 4 Google October 15, 2020

ReLeaf Winnipeg Tree Planting Workshop Before you Dig Welcome to ReLeaf! Before you dig: Call

The Unhealthy Secrets of Coal Michele Prevost, MD Why is Coal Toxic? Toxic metals

Introduction to Historical Texts Over 350, 000 late 15 th to long 19 th century

Nectar of Instruction (NOI) From shraddha to prema In Eleven Verses Texts 1-3 Text 8 Texts

Dig boys! Dig boys! Archeologists can tell us a lot from the bones they find like in this

Carver Data DIG Presentation Michael LaMont 3/28/2017 APS Data + Information Group (DIG)

Planting Containerized Trees Dig a hole Dig a hole 3 to 4 times wider than the container and

Passages worth the dig Matt 4.1-11 PASSAGES WORTH THE DIG MATTHEW 4. THE DEVIL DIDNT MAKE

Passages worth the dig: Passages worth the dig: Picking a Pastor/Leader How would YOU

worth the dig Passages worth the dig Can you type? Can you type without looking? Often ,I

Passages worth the dig Passages worth the dig Matt 7.1-5 Judge not, that ye not be judged

Asia Pacific Civil Asia Pacific Civil Military Asia Pacific Civil Asia Pacific Civil Military

Civil Civil Civil Civil- - - -Military Air Traffic Management Cooperation Military Air

Civil Actions Civil Actions Civil Actions Lesson No. 13 ENV H 471 Environmental Health

SUMMARY OF KEY FEDERAL AND STATE CIVIL AND CRIMINAL ENVIRONMENTAL LAWS AND REGULATIONS Angela M.

Do humans need to detox? Why is there a need to Detox? All of us live in an ever- increasing

AN ANAL ALYSIS SIS OF OF TR TRAF AFFIC FIC SP SPEEDS IN EDS IN NE NEW W YOR ORK CIT

The Retribution Principle John 9:12 1 As Jesus went along, he saw a man blind from birth. 2

Monitoring MySQL Performance with Percona Monitoring and Management Santa Clara, California |

Research Infrastructure and the City Region: A Dalhousie Example smart city / cool city

Lecture 5 Lecture 5 Previously in Opt 2 The maximum flow problem Input G = (N, E),

Fifth lecture for MAT137 - LEC5201 Topics for today: 1. More limits 2. Squeeze theorems 3.

Question Points Score 1 15 2 20 3 15 4 10 5 5 6 5 7 10 8 20 1 Name: Email: DO

Objectives Review Lab 1 Linux practice Programming practice Print statements