Improving Neural Language Modeling via Adversarial Training Dilin - PowerPoint PPT Presentation

Improving Neural Language Modeling via Adversarial Training Dilin Wang*, Chengyue Gong* (equal contribution) Qiang Liu Department of Computer Science The University of Texas at Austin Dilin Wang*, Chengyue Gong*, Qiang Liu Adversarial Softmax 1 / 8

Neural Language Modeling Example: the clouds are in the sky x t h t = f NN ( x t − 1 , h 1: t − 1 ; θ θ θ ) p ( x t | x 1: t − 1 ; θ w ) = Softmax ( x t , h t ; w w ) θ θ, w w w h t-1 h t exp( w ⊤ x t h t ) = � |V| ℓ =1 exp( w ⊤ ℓ h t ) x t-1 Maximum log-likelihood estimation (MLE): � max log p ( x t | x 1: t − 1 ; θ θ θ, w w ) w θ θ, w θ w w t Dilin Wang*, Chengyue Gong*, Qiang Liu Adversarial Softmax 2 / 8

Overfitting AWD-LSTM -- Train AWD-LSTM -- Validation 100 Perplexity 80 60 40 0 200 400 600 Training Epochs (WT2) Existing overfitting preventing methods: Dropout[e.g., Gal & Ghahramani, 2016] Optimizer [e.g., Merity et al., 2017] Other: weight tying [Press & Wolf, 2016; Inan et al., 2017]; activation regularization [Merity et al., 2017], etc. Dilin Wang*, Chengyue Gong*, Qiang Liu Adversarial Softmax 3 / 8

Adversarial MLE Idea: inject an adversarial perturbation on the word embedding vectors in the Softmax layer, and maximize the worst-case performance, exp(( w t + δ t ) ⊤ h t ) � � � max min log � exp(( w t + δ t ) ⊤ h t ) + exp( w ⊤ θ, w θ θ w w δ t j h t ) t j � = t s . t || δ t || ≤ ǫ. A closed-form solution ( w t + δ t ) ⊤ h t = − ǫ h t δ ∗ t = arg min || h t || . || δ t ||≤ ǫ Dilin Wang*, Chengyue Gong*, Qiang Liu Adversarial Softmax 4 / 8

Adversarial MLE Promotes Diversity If w i dominates all the other words under ǫ -adversarial perturbation, in that || δ i ||≤ ǫ ( w i + δ i ) ⊤ h = ( w ⊤ min i h − ǫ || h || ) > w ⊤ j h , ∀ j � = i , then we have, min j � = i || w j − w i || > ǫ, that is, w i is separated from the embedding vectors of all other words by at least ǫ distance. Dilin Wang*, Chengyue Gong*, Qiang Liu Adversarial Softmax 5 / 8

Improving on Language Modeling Method Params Valid Test AWD-LSTM 24M 51.60 51.10 (Merity et al., 2017) 24M AWD-LSTM + Ours 49.31 48.72 AWD-LSTM + MoS (Yang et al., 2017) 22M 48.33 47.69 22M AWD-LSTM + MoS + Ours 47.15 46.52 Table: PTB Method Params Valid Test AWD-LSTM 33M 46.40 44.30 (Merity et al., 2017) AWD-LSTM + Ours 33M 42.48 40.71 AWD-LSTM + MoS (Yang et al., 2017) 35M 42.41 40.68 35M AWD-LSTM + MoS + Ours 40.27 38.65 Table: WT2 Dilin Wang*, Chengyue Gong*, Qiang Liu Adversarial Softmax 6 / 8

Improving on Machine Translation Method BLEU Transformer Base 27.30 Vaswani et al., 2017 Transformer Base + Ours 28.43 Transformer Big 28.40 Vaswani et al., 2017 Transformer Big + Ours 29.52 Table: WMT2014 Ee → De Method BLEU Transformer Small Vaswani et al., 2017 32.47 Transformer Small + Ours 33.61 Transformer Base Wang et al., 2018 34.43 Transformer Base + Ours 35.18 Table: IWSLT2014 De → En Dilin Wang*, Chengyue Gong*, Qiang Liu Adversarial Softmax 7 / 8

Conclusions Proposed an adversarial training mechanism for language modeling 1 A Closed-form solution & easy to implement 2 Diversity Promotion 3 Strong empirical results Thank You Poster #105, Today 06:30 PM – 09:00 PM @ Pacific Ballroom Dilin Wang*, Chengyue Gong*, Qiang Liu Adversarial Softmax 8 / 8

Improving Neural Language Modeling via Adversarial Training Dilin - PowerPoint PPT Presentation

Improving Neural Language Modeling via Adversarial Training Dilin Wang, Chengyue Gong (equal contribution) Qiang Liu Department of Computer Science The University of Texas at Austin Dilin Wang, Chengyue Gong, Qiang Liu Adversarial Softmax

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Adversarial Learning Bounds for Linear Classes and Neural Nets Understanding Adversarial Learning

Improving Adversarial Robustness via Promoting Ensemble Diversity Tianyu

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Language Modeling CS 6956: Deep Learning for NLP Overview What is a language model? How

Natural Language Processing with Deep Learning Language Modeling with Recurrent Neural Networks

ON THE ADVERSARIAL ROBUSTNESS OF UNCERTAINTY AWARE DEEP NEURAL NETWORKS APRIL 29 TH , 2019

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Announcements 61A Lecture 37 Syntactic Ambiguity in English Sentence Noun Verb Phrase

NPFL103: Information Retrieval (5) Ranking, Complete search system, Evaluation, Benchmarks Pavel

Overview Motivation and introduction Structure independent approach ECE 553: TESTING AND

Maximum Contiguous Subsequence Sum After todays class you will be able to: provide an example

Exponential cone in MOSEK ISMP2018, Relative Entropy Optimization, 6 July 2018 Micha l

T w + C Minimize T z fo r some Z spae N 1 n 2 w n =1 K ( x , x ) =

CS453 Intro and PA1 1 operator < If statement Low level pseudocode for result of translation

A Type System for Functional Traversal-Based Aspects Bryan Chadwick and Karl Lieberherr March 2

Improving Neural Language Modeling via Adversarial Training Dilin - PowerPoint PPT Presentation

Improving Neural Language Modeling via Adversarial Training Dilin Wang*, Chengyue Gong* (equal contribution) Qiang Liu Department of Computer Science The University of Texas at Austin Dilin Wang*, Chengyue Gong*, Qiang Liu Adversarial Softmax

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Synthesizing Robust Adversarial Examples Anish Athalye*, Logan Engstrom*, Andrew Ilyas*, Kevin

Language Modeling CSE354 - Spring 2020 Task Language Modeling Probabilistic Modeling

Adversarial Learning Bounds for Linear Classes and Neural Nets Understanding Adversarial Learning

Improving Adversarial Robustness via Promoting Ensemble Diversity Tianyu

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Language Modeling CS 6956: Deep Learning for NLP Overview What is a language model? How

Natural Language Processing with Deep Learning Language Modeling with Recurrent Neural Networks

ON THE ADVERSARIAL ROBUSTNESS OF UNCERTAINTY AWARE DEEP NEURAL NETWORKS APRIL 29 TH , 2019

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Announcements 61A Lecture 37 Syntactic Ambiguity in English Sentence Noun Verb Phrase

NPFL103: Information Retrieval (5) Ranking, Complete search system, Evaluation, Benchmarks Pavel

Overview Motivation and introduction Structure independent approach ECE 553: TESTING AND

Maximum Contiguous Subsequence Sum After todays class you will be able to: provide an example

Exponential cone in MOSEK ISMP2018, Relative Entropy Optimization, 6 July 2018 Micha l

T w + C Minimize T z fo r some Z spae N 1 n 2 w n =1 K ( x , x ) =

CS453 Intro and PA1 1 operator &lt; If statement Low level pseudocode for result of translation

A Type System for Functional Traversal-Based Aspects Bryan Chadwick and Karl Lieberherr March 2

Improving Neural Language Modeling via Adversarial Training Dilin Wang, Chengyue Gong (equal contribution) Qiang Liu Department of Computer Science The University of Texas at Austin Dilin Wang, Chengyue Gong, Qiang Liu Adversarial Softmax

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

CS453 Intro and PA1 1 operator < If statement Low level pseudocode for result of translation