SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient - PowerPoint PPT Presentation

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient Lantao Yu † , Weinan Zhang † , Jun Wang ‡ , Yong Yu † † Shanghai Jiao Tong University, ‡ University College London

Attribution • Multiple slides taken from • Hung-yi Lee • Paarth Neekhara • Ruirui Li • Original authors at AAAI 2017 • Presented by: Pratyush Maini

Outline 1. Introduction to GANs 2. Brief theoretical overview of GANs 3. Overview of GANs in Sequence Generation 4. SeqGAN 5. Other recent work: Unsupervised Conditional Sequence Generation

All Kinds of GAN … https://github.com/hindupuravinash/the-gan-zoo More than 500 species in the zoo (not updated since 2018.09)

All Kinds of GAN … https://github.com/hindupuravinash/the-gan-zoo GAN ACGAN BGAN CGAN DCGAN EBGAN fGAN GoGAN …… Mihaela Rosca, Balaji Lakshminarayanan, David Warde-Farley, Shakir Mohamed, “ Variational Approaches for Auto-Encoding Generative Adversarial Networks ”, arXiv, 2017

Three Categories of GAN 1. Generation −0.3 0.1 Generator ⋮ 0.9 random vector image 2. Conditional Generation blue eyes, “Girl with red hair, Generator red hair” short hair text paired data image 3. Unsupervised Conditional Generation x y domain y domain x Generator Photo Vincent van unpaired data Gogh’s style

Anime Face Generation Generator Draw Examples

Powered by: http://mattya.github.io/chainer-DCGAN/ Basic Idea of GAN It is a neural network (NN), or a function. high vector dimensional Generator image vector 0.1 3 −3 −3 Generator ⋮ Generator ⋮ 2.4 2.4 0.9 0.9 Each dimension of input vector Longer hair represents some characteristics. 0.1 0.1 2.1 −3 Generator Generator ⋮ ⋮ 5.4 2.4 0.9 3.5 Open mouth blue hair

Basic Idea of GAN It is a neural network (NN), or a function. scalar Discri- image Larger value means real, minator smaller value means fake. Discri- Discri- 1.0 1.0 minator minator Discri- Discri- 0.1 0.1 minator minator

Algorithm • Initialize generator and discriminator G D • In each training iteration: Step 1 : Fix generator G, and update discriminator D sample Update 1 1 1 1 D generated Database 0 0 0 0 objects randomly vector vector vector vector G Fix sampled Discriminator learns to assign high scores to real objects and low scores to generated objects.

Algorithm • Initialize generator and discriminator G D • In each training iteration: Step 2 : Fix discriminator D, and update generator G Generator learns to “fool” the discriminator hidden layer NN Discri- 0.13 Generator minator vector update fix Gradient Ascent large network

Algorithm • Initialize generator and discriminator G D • In each training iteration: Sample some Update 1 1 1 1 real objects: D Learning Generate some D fake objects: 0 0 0 0 vector vector vector vector fix G Learning G vector vector vector vector image 1 G D image image image update fix

Anime Face Generation 100 updates Source of training data: https://zhuanlan.zhihu.com/p/24767059

Anime Face Generation 1000 updates

Anime Face Generation 10,000 updates

In 2019, with StyleGAN …… Source of video: https://www.gwern.net/Faces

The first GAN [Ian J. Goodfellow, et al., NIPS, 2014]

Outline 1. Introduction to GANs 2. Brief theoretical overview of GANs 3. Overview of GANs in Sequence Generation 1. Reinforcement Learning 2. GAN + RL 4. SeqGAN 5. Other recent work: Unsupervised Conditional Sequence Generation

NLP tasks usually involve Se Sequence Generatio ion How to use GAN to improve sequence generation?

Reinforcement Learning Learn to maximize expected reward E.g. Policy Gradient En De Input sentence c response sentence x “How are you?” Chatbot “Not bad” “I’m John” +1 -1 Input sentence c 𝑆 𝑑, 𝑦 Human response sentence x reward human [Li, et al., EMNLP , 2016]

Policy Gradient 𝜄 𝑢+1 ← 𝜄 𝑢 + 𝜃𝛼 ത 𝑆 𝜄 𝑢 𝜄 𝑢 𝑂 1 𝑆 𝑑 𝑗 , 𝑦 𝑗 𝛼𝑚𝑝𝑕𝑄 𝜄 𝑢 𝑦 𝑗 |𝑑 𝑗 𝑂 ෍ 𝑗=1 𝑆 𝑑 1 , 𝑦 1 𝑑 1 , 𝑦 1 𝑆 𝑑 𝑗 , 𝑦 𝑗 is positive 𝑆 𝑑 2 , 𝑦 2 𝑑 2 , 𝑦 2 Updating 𝜄 to increase 𝑄 𝜄 𝑦 𝑗 |𝑑 𝑗 …… …… 𝑆 𝑑 𝑗 , 𝑦 𝑗 is negative 𝑆 𝑑 𝑂 , 𝑦 𝑂 𝑑 𝑂 , 𝑦 𝑂 Updating 𝜄 to decrease 𝑄 𝜄 𝑦 𝑗 |𝑑 𝑗

Policy Gradient Maximum Reinforcement Learning - Likelihood Policy Gradient 𝑂 𝑂 1 1 Objective 𝑆 𝑑 𝑗 , 𝑦 𝑗 𝑚𝑝𝑕𝑄 𝜄 𝑦 𝑗 |𝑑 𝑗 𝑦 𝑗 |𝑑 𝑗 𝑂 ෍ 𝑚𝑝𝑕𝑄 𝜄 ො 𝑂 ෍ Function 𝑗=1 𝑗=1 𝑂 𝑂 1 1 𝑆 𝑑 𝑗 , 𝑦 𝑗 𝛼𝑚𝑝𝑕𝑄 𝜄 𝑦 𝑗 |𝑑 𝑗 𝑦 𝑗 |𝑑 𝑗 𝑂 ෍ 𝛼𝑚𝑝𝑕𝑄 𝜄 ො 𝑂 ෍ Gradient 𝑗=1 𝑗=1 𝑦 1 , … , 𝑑 𝑂 , ො 𝑑 1 , 𝑦 1 , … , 𝑑 𝑂 , 𝑦 𝑂 𝑑 1 , ො 𝑦 𝑂 Training Data 𝑦 𝑗 = 1 obtained from interaction 𝑆 𝑑 𝑗 , ො weighted by 𝑆 𝑑 𝑗 , 𝑦 𝑗

Outline 1. Introduction to GANs 2. Brief theoretical overview of GANs 3. Overview of GANs in Sequence Generation 1. Reinforcement Learning 2. GAN + RL 4. SeqGAN 5. Other recent work: Unsupervised Conditional Sequence Generation

Why we need GAN? Maximize • Chat-bot as example I’m good. likelihood Output: Not bad I’m John. Human output better sentence x Training Criterion better Training …… Encoder Decoder data: A: How are you ? Seq2seq B: I’m good. Input sentence c …… How are you ?

Conditional GAN I am busy. However, there is an issue when you train your generator. En De Input sentence c response sentence x Chatbot Input sentence c 𝑆 𝑑, 𝑦 Discriminator response sentence x reward Replace human evaluation with machine evaluation [Li, et al., EMNLP , 2017]

Three Categories of Solutions Gumbel-softmax • [Matt J. Kusner , et al. , arXiv, 2016][Weili Nie, et al. ICLR, 2019] Continuous Input for Discriminator • [Sai Rajeswar, et al., arXiv, 2017][Ofir Press, et al., ICML workshop, 2017][Zhen Xu, et al., EMNLP, 2017][Alex Lamb, et al., NIPS, 2016][Yizhe Zhang, et al., ICML, 2017] Reinforcement Learning • [Yu, et al., AAAI, 2017][Li, et al., EMNLP , 2017] [Tong Che , et al , arXiv, 2017][Jiaxian Guo, et al., AAAI, 2018][Kevin Lin, et al, NIPS, 2017][William Fedus, et al., ICLR, 2018]

scalar Discriminator Use the distribution as the input of A B A discriminator A A A Avoid the sampling B B B process Update Parameters We can do backpropagation now. <BOS> A B

What is the problem? Discriminator with constraint (e.g. WGAN) can be helpful. • Real sentence 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 Discriminator can 0 0 0 1 0 immediately find • Generated the difference. 0 0 0 0 1 0.9 0.1 0.1 0 0 0.1 0.9 0.1 0 0 Can never 0 0 0.7 0.1 0 be 1-hot 0 0 0.1 0.8 0.1 0 0 0 0.1 0.9

Three Categories of Solutions Gumbel-softmax • [Matt J. Kusner , et al. , arXiv, 2016][Weili Nie, et al. ICLR, 2019] Continuous Input for Discriminator • [Sai Rajeswar, et al., arXiv, 2017][Ofir Press, et al., ICML workshop, 2017][Zhen Xu, et al., EMNLP, 2017][Alex Lamb, et al., NIPS, 2016][Yizhe Zhang, et al., ICML, 2017] Reinforcement Learning • [Yu, et al., AAAI, 2017][Li, et al., EMNLP, 2017][Tong Che , et al , arXiv, 2017][Jiaxian Guo, et al., AAAI, 2018][Kevin Lin, et al, NIPS, 2017][William Fedus, et al., ICLR, 2018]

Tips for Sequence Generation GAN . RL is difficult to train GAN is difficult to train Sequence Generation GAN (RL+GAN)

Tips for Sequence Generation GAN I don’t know which • Typical part is wrong … Discrimi En De You is good 0.1 nator Chatbot • Reward for Every Generation Step You 0.9 En De Discrimi You is 0.1 Chatbot nator You is good 0.1

Tips for Sequence Generation GAN • Reward for Every Generation Step You 0.9 En De Discrimi You is 0.1 Chatbot nator You is good 0.1 Method 1. Monte Carlo (MC) Search [Yu, et al., AAAI, 2017] Method 2. Discriminator For Partially Decoded Sequences [Li, et al., EMNLP , 2017] Method 3. Step-wise evaluation [Tual, Lee, TASLP , 2019][Xu, et al., EMNLP , 2018][William Fedus, et al., ICLR, 2018]

Task 1. Given a dataset of real-world structured sequences, train a generative model G θ to produce sequences that mimic the real ones. 2. We want G θ to fit the unknown true data distribution p true ( y t |Y 1: t— 1 ), which is only revealed by the given dataset D = {Y 1: T } .

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient - PowerPoint PPT Presentation

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient Lantao Yu , Weinan Zhang , Jun Wang , Yong Yu Shanghai Jiao Tong University, University College London Attribution Multiple slides taken from

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Robust Estimation and Generative Adversarial Networks Weizhi ZHU Hong Kong University of Science

GAN-based Photo Video Synthesis Summary of Generative Adversarial Nets Lei Zhang What is

Conflict nets: Efficient locally canonical MALL proof nets Dominic J. D. Hughes and Willem

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

CSC421/2516 Lecture 18: Generative Adversarial Networks Roger Grosse and Jimmy Ba Roger Grosse

Generative Adversarial Networks Benjamin Striner CMU 11-785 March 21, 2018 Benjamin Striner

Petri Nets Petri Nets Inputs and Outputs Petri Nets vs FSM Lionel Morel Modeling Templates

Mix-Nets Lecture 19 Some tools for electronic-voting (and other things) Mix-Nets Mix-Nets

Petri Nets and Model Checking Natasa Gkolfi University of Oslo March 31, 2017 Petri Nets and

The Neural Noisy Channel: Generative Models for Sequence to Sequence Modeling Chris

generative design systems Generative Brief Design Definitions Workshop Processes

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

CSC321 Lecture 19: Generative Adversarial Networks Roger Grosse Roger Grosse CSC321 Lecture 19:

Generative Adversarial Networks presented by Ian Goodfellow presentation co-developed with Aaron

Applications of GANs Photo-Realistic Single Image Super-Resolution Using a Generative

The e Mec echa Anime e Panel el at Aya yacon 20 2013 Introduction Wha What Is Mecha

Using Comics and Graphic Novels with Children and Teens Presented by Kyla Hunt and Katherine

property In [ ]: lista = [1, 2, 3, 4] a1, a2, a3, a4 = lista print(a2) In [ ]: print(lista)

On this side of the Gate: Japanese politics and geopolitics in the anime series Gate: The

A Unified View of Local Learning: Theory and Algorithms for Enhancing Linear Models Valentina

Wh What t Ca Can You Learn fr from an n IP? Simran Patil and Nikita Borisov University of

Desi De signi gning ng the he Metadata Mod odel for or the he Aggr Aggrega gation ion

RID Implementation Report Toshifumi Kai (kai@trc.mew.co.jp), Akito Nagashima