Adversarial Learning for Neural Dialogue Generation 1 , Will Monroe - PowerPoint PPT Presentation

Adversarial Learning for Neural Dialogue Generation 1 , Will Monroe 1 , Tianlan Shi 1 , Jiwei Li 2 , Alan Ritter 3 , Dan Jurafsky 1 Sébastian Jean 1 Stanford University, 2 New York University, 3 Ohio State University Some slides/images taken from Ian Goodfellow, Jeremy Kawahara, Andrej Karpathy 1

Talk Outline • Generative Adversarial Networks (Introduced by Goodfellow et. al, 2014) • Policy gradients and REINFORCE • GANs for Dialogue Generation (this paper) 2

Talk Outline • Generative Adversarial Networks (Introduced by Goodfellow et. al, 2014) • Policy gradients and REINFORCE • GANs for Dialogue Generation (this paper) 3

Generative Modelling • Have training examples x ~ p data (x) • Want a model that can draw samples: x ~ p model (x) • Where p model ≈ p data x ~ p data (x) x ~ p model (x) 4

Why Generative Modelling? • Conditional generative models - Speech synthesis: T ext > Speech - Machine Translation: French > English • French: Si mon tonton tond ton tonton, ton tonton sera tondu. • English: If my uncle shaves your uncle, your uncle will be shaved - Image > Image segmentation - Dialogue Systems: Context > Response • Environment simulator - Reinforcement learning - Planning • Leverage unlabeled data 5

Adversarial Nets Framework • A game between two players: 1. Discriminator D 2. Generator G • D tries to discriminate between: • A sample from the data distribution and • A sample from the generator G • G tries to “trick” D by generating samples that are hard for D to distinguish from true data. 6

Adversarial Nets Framework D tries to D tries to output 1 output 0 Differentiable Differentiable function D function D x sampled x sampled from data from model Differentiable function G Input noise Z 7

Deep Convolutional Generative Adversarial Network Can be thought of as two separate networks 8

Generator Discriminator 9

Generator G(.) input= random numbers , output= generated image Uniform noise vector (random numbers) Generated image G(z) 10

Generator G(.) Discriminator D(.) input= generated/real image , input= random numbers , output= prediction of real image output= generated image Uniform noise vector (random numbers) Generated image G(z) 11

Real image, so goal is D(x) =1 Generator G(.) Discriminator D(.) input= generated/real image , input= random numbers , output= prediction of real image output= generated image Uniform noise vector (random numbers) Generated image G(z) Discriminator Goal : discriminate between real and generated images i.e., D(x)=1, where x is a real image D(G(z))=0, where G(z) is a generated image 12 Generated image, so goal is D(G(z)) =0

Real image, so goal is D(x) =1 Generator G(.) Discriminator D(.) input= generated/real image , input= random numbers , output= prediction of real image output= generated image Uniform noise vector (random numbers) Generated image G(z) Discriminator Goal : discriminate between real and generated images Generator Goal: Fool D(G(z))   i.e., D(x)=1, where x is a real image i.e., generate an image G(z) D(G(z))=0, where G(z) is a generated image such that D(G(z)) is wrong.   i.e., D(G(z)) = 1 13 Generated image, so goal is D(G(z)) =0

Real image, so goal is D(x) =1 Generator G(.) Discriminator D(.) input= generated/real image , input= random numbers , output= prediction of real image output= generated image ***Notes*** 0. Conflicting goals   1.Both goals are unsupervised 2. Optimal when D(.)=0.5 (i.e., cannot tell the difference between real and generated images) and G(z)=learns the training images distribution Uniform noise vector (random numbers) Generated image G(z) Discriminator Goal : discriminate between real and generated images Generator Goal: Fool D(G(z))   i.e., D(x)=1, where x is a real image i.e., generate an image G(z) D(G(z))=0, where G(z) is a generated image such that D(G(z)) is wrong.   i.e., D(G(z)) = 1 14 Generated image, so goal is D(G(z)) =0

Zero-Sum Game • Minimax objective function: min max V ( D, G ) = E x ~ p data ( x ) [log D ( x )] + E z ~ p z ( z ) [log(1 — D ( G ( z )))] G D 15

maximize Loss function to maximize for the Discriminator minimize Loss function to minimize for the Generator 17

maximize Loss function to maximize for the Gradient w.r.t the parameters of Discriminator the Discriminator minimize Loss function to minimize for the Generator Gradient w.r.t the parameters of the Generator 18

[interpretation] compute the gradient of the loss function, and then update the parameters to min/max the loss function (gradient descent/ascent) maximize Loss function to maximize for the Gradient w.r.t the parameters of Discriminator the Discriminator minimize Loss function to minimize for the Generator Gradient w.r.t the parameters of the Generator 19

Theoretical Results • Assuming enough data and model capacity, we have a unique global optimum • Generator distribution corresponds to data distribution • For a fixed generator, the optimal discriminator is: • So at optimum, discriminator outputs 0.5 (can’t tell if input is generated by G or from data) 20

Learning Process 21

GANs - The Good and the Bad • Generator is forced to discover features that explain the underlying distribution • Produce sharp images instead of blurry like MLE. • However, generator can be quite difficult to train • Can suffer from problem of ‘missing modes’ 22

Talk Outline • Discussion of Generative Adversarial Networks (Introduced by Goodfellow et. al, 2014) • Policy Gradients and REINFORCE • Discussion of GANs for Dialogue Generation (this paper) 23

Policy Gradient We have a differentiable stochastic policy 𝛒 (x; θ ) • We sample an action x from 𝛒 (x; θ ) — the future reward or ‘return’ for • action x is r(x) We want to maximize the expected return E x~ 𝛒 (x; θ ) [r(x)] • 24

Policy Gradient We want to maximize the expected return E x~ 𝛒 (x; θ ) [r(x)] • So we’d like to compute the gradient ∇ θ E x~ 𝛒 (x; θ ) [r(x)] • 25

REINFORCE We know that ∇ θ E x~ 𝛒 (x ∣ θ ) [r(x)] is nothing but E x~ 𝛒 (x; θ ) [r(x) ∇ θ log( 𝛒 (x; θ ))] • • We can estimate this gradient using samples from one or more episodes — we can do this because the policy itself is differentiable • This can be seen as a Monte Carlo Policy Gradient, which is nothing but REINFORCE 26

Estimate gradient of sampling operation • Sampling operation inside a neural network — this is the policy 27

Estimate gradient of sampling operation We sample an action x from 𝛒 (x; θ ), which gives us a reward r(x) — this • could be a supervised loss • We can now use REINFORCE to estimate gradient 28

Talk Outline • Discussion of Generative Adversarial Networks (Introduced by Goodfellow et. al, 2014) • Policy Gradients and REINFORCE • Discussion of GANs for Dialogue Generation (this paper) 29

GANs for NLP: Dialogue systems • Given dialogue history x, want to generate response y • Generator G • Input to G: x • Output from G: y • Discriminator D • Input to D: x, y • Output from D: Probability that (x, y) is from training data 30

GANs for NLP: Dialogue systems + Gagan + Barun • Given dialogue history x, want to generate response y • Generator G • Input to G: x • Output from G: y • Discriminator D • Input to D: x, y • Output from D: Probability that (x, y) is from training data 31

GANs for NLP: Dialogue systems Challenge: • Typical seq2seq models for machine translation, dialogue generation etc. involve sampling from a distribution — can’t directly backpropagate from discriminator to generator Workarounds: • Use intermediate layer from generator as input to discriminator (not very appealing) • Use reinforcement learning to train generator (this paper) 32

Architecture Q + ({x,y}) y T y 1 y 2 : Response y y t sampled from policy 𝛒 x T Dialogue History x : x 1 x 2 Full dialogue: (x, y) Generator Discriminator 33

Architecture Generator: • Encoder-Decoder with attention (Think machine translation) • Last two utterances in x are concatenated and fed as input Discriminator: • HRED model • After feeding {x,y} as input, we get a hidden representation at the dialogue level • This is transformed to a scalar between 0 and 1 through an MLP 34

Training Discriminator: • Simple back propagation with SGD or any other optimizer Generator: REINFORCE: 𝛒 is our policy, Q + ({x, y}) is the return (same for each action) • J( θ ) = E y~ 𝛒 (y|x; θ ) [Q + ({x,y})] is our loss function • As discussed before ∇ J( θ ) ~ [Q + ({x, y})] ∇ Σ t log 𝛒 (y t | x, y 1:t-1 ) • • A baseline b({x,y}) is subtracted from Q to reduce variance 35

Reward for Every Generation Step • Till now, same reward is given to each action (that is, for each word token generated by G) Example: History: What’s your name? Gold Response: I am John Machine Response: I don’t know Discriminator Output for machine response: 0.1 Same reward given for I, don’t and know 36

Adversarial Learning for Neural Dialogue Generation 1 , Will Monroe - PowerPoint PPT Presentation

Adversarial Learning for Neural Dialogue Generation 1 , Will Monroe 1 , Tianlan Shi 1 , Jiwei Li 2 , Alan Ritter 3 , Dan Jurafsky 1 Sbastian Jean 1 Stanford University, 2 New York University, 3 Ohio State University Some slides/images taken

Adversarial Learning for Neural Dialogue Generation Li, Jiwei, Will Monroe, Tianlin Shi, Alan

Adversarial Learning Bounds for Linear Classes and Neural Nets Understanding Adversarial Learning

dialogue notations and design Dialogue Notations and Design Dialogue Notations

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Another Diversity-Promoting Objective Function for Neural Dialogue Generation Ryo Nakamura ,

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Language Technology II: Natural Language Dialogue Verbal Output Generation in Dialogue

SECURITY, ADVERSARIAL SECURITY, ADVERSARIAL LEARNING, AND PRIVACY LEARNING, AND PRIVACY

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Dialogue corpora NPFL070 December 11, 2019 (NPFL070) Dialogue corpora December 11, 2019 1 /

Language and Computers Speech acts Rules Early dialogue Dialog Systems systems ELIZA Other

dialogue systems, dialogue modeling 15 June 2007 ptt dialogue systems: intro 1/71 Dialog

dialogue notations and Dialogue linked to the semantics of the system what it does

Science II Arrays Li Xiong 1 Roadmap Basics of Array Number guessing and Binary Search

Security of Pseudo-Random Number Generators With Input Damien Vergnaud cole normale

Uniform Variate Generation Refs: Chapter 7 in Law, Pierre Lecuyer Tutorial, Winter Simulation

C Programming for Engineers Arrays & Pointers ICEN 360 Spring 2017 Prof. Dola Saha 1

Vacuum fluctuations Secure heterodyne-based quantum random number quantum random number

Cybersecurity for Future Presidents Homework for next week: Reading, Exercises Reading for next

Random numbers We have seen that in many applications we need random number generators For

Random Functions and Simulation Ali Taheri Sharif University of Technology Spring 2019 Slides

Adversarial Learning for Neural Dialogue Generation 1 , Will Monroe - PowerPoint PPT Presentation

Adversarial Learning for Neural Dialogue Generation 1 , Will Monroe 1 , Tianlan Shi 1 , Jiwei Li 2 , Alan Ritter 3 , Dan Jurafsky 1 Sbastian Jean 1 Stanford University, 2 New York University, 3 Ohio State University Some slides/images taken

Adversarial Learning for Neural Dialogue Generation Li, Jiwei, Will Monroe, Tianlin Shi, Alan

Adversarial Learning Bounds for Linear Classes and Neural Nets Understanding Adversarial Learning

dialogue notations and design Dialogue Notations and Design Dialogue Notations

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Another Diversity-Promoting Objective Function for Neural Dialogue Generation Ryo Nakamura ,

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Synthesizing Robust Adversarial Examples Anish Athalye*, Logan Engstrom*, Andrew Ilyas*, Kevin

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Language Technology II: Natural Language Dialogue Verbal Output Generation in Dialogue

SECURITY, ADVERSARIAL SECURITY, ADVERSARIAL LEARNING, AND PRIVACY LEARNING, AND PRIVACY

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Dialogue corpora NPFL070 December 11, 2019 (NPFL070) Dialogue corpora December 11, 2019 1 /

Language and Computers Speech acts Rules Early dialogue Dialog Systems systems ELIZA Other

dialogue systems, dialogue modeling 15 June 2007 ptt dialogue systems: intro 1/71 Dialog

dialogue notations and Dialogue linked to the semantics of the system what it does

Science II Arrays Li Xiong 1 Roadmap Basics of Array Number guessing and Binary Search

Security of Pseudo-Random Number Generators With Input Damien Vergnaud cole normale

Uniform Variate Generation Refs: Chapter 7 in Law, Pierre Lecuyer Tutorial, Winter Simulation

C Programming for Engineers Arrays &amp; Pointers ICEN 360 Spring 2017 Prof. Dola Saha 1

Vacuum fluctuations Secure heterodyne-based quantum random number quantum random number

Cybersecurity for Future Presidents Homework for next week: Reading, Exercises Reading for next

Random numbers We have seen that in many applications we need random number generators For

Random Functions and Simulation Ali Taheri Sharif University of Technology Spring 2019 Slides

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

C Programming for Engineers Arrays & Pointers ICEN 360 Spring 2017 Prof. Dola Saha 1