GANs for Discrete Text Generation Junfu Oct. 20 th , 2018 Show, - PowerPoint PPT Presentation

Paper Reading GANs for Discrete Text Generation Junfu Oct. 20 th , 2018

Show, Tell and Discriminate  Problems in Image Captioning  Imitate the language structure patterns (phrases, sentences)  Templated and Generic (Different image -> Same Captions)  Stereotype of sentences and phrases (50% from trainingset) 2 Xihui Liu, et al. Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data. ECCV 2018, CUHK.

Show, Tell and Discriminate  Motivation  Both discriminativeness and fidelity should be improved  Discriminativeness: distinguish correspond. image and others  Dual task: Image captioning  Text-to-Image  Model Architecture  Captioning Module  Self-retrieval Module  Act as a metric and an evaluator of caption discriminativeness to assure the quality of generated captions  Use unlabeled data to boost captioning performance 3

Show, Tell and Discriminate  Framework Image Image Encoder (CNN) Caption Encoder (GRU) Caption 𝐷 = {𝑥 1 , 𝑥 2 , … , 𝑥 𝑈 } 𝑤 = 𝐹 𝑗 (𝐽) 𝑑 = 𝐹 𝑑 (𝐷) 𝐽 ∗ } 𝐷 ∗ = {𝑥 1 ∗ , 𝑥 2 ∗ , … , 𝑥 𝑈 ′ Similarity between 𝑑 𝑗 and 𝑤 𝑘 : 𝑡(𝑑 𝑗 , 𝑤 𝑘 ) Encoder: CNN Decoder: LSTM 𝑤 = 𝐹 𝑗 (𝐽) 𝐷 = 𝐸 𝑑 (𝑤) Train with ranking loss: 𝑈 ∗ |𝑤, 𝑥 𝑢 ∗ , … , 𝑥 𝑢−1 ∗ 𝑀 𝐷𝐹 𝜄 = − log(𝑞 𝜄 (𝑥 𝑢 )) Pre-train: 𝑀 𝑠𝑓𝑢 𝐷 𝑗 , 𝐽 1 , 𝐽 2 , … , 𝐽 𝑜 = max 𝑘≠𝑗 𝑛 − 𝑡 𝑑 𝑗 , 𝑤 𝑗 + 𝑡 𝑑 𝑗 , 𝑤 𝑘 + 4 𝑢=1 where 𝑦 + = max(𝑦, 0) 𝑡 = 𝑠 𝑑𝑗𝑒𝑓𝑠 𝐷 𝑗 𝑡 + 𝛽 ∙ 𝑠 𝑠𝑓𝑢 (𝐷 𝑗 𝑡 , {𝐽 1 , … , 𝐽 𝑜 }) 𝑠 𝐷 𝑗 Adv-train:

Show, Tell and Discriminate  Improving Captioning with Partially Labeled Image 𝑚 } 𝑚 } 𝑣 } 𝑚 , 𝐽 2 𝑚 , … , 𝐽 𝑜 𝑚 𝑚 , 𝐷 2 𝑚 , … , 𝐷 𝑜 𝑚 𝑣 , 𝐽 2 𝑣 , … , 𝐽 𝑜 𝑣 Labeled Image: {𝐽 1 Generated Caption: {𝐷 1 Unlabeled Image: {𝐽 1 𝑚 = 𝑠 𝑚 + 𝛽 ∙ 𝑠 𝑚 }⋃{𝐽 1 𝑣 }) 𝑚 , {𝐽 1 𝑚 , … , 𝐽 𝑜 𝑚 𝑣 , … , 𝐽 𝑜 𝑣 𝑚 , 𝐽 2 𝑣 , 𝐽 2 Labeled 𝑠 𝐷 𝑗 𝑑𝑗𝑒𝑓𝑠 𝐷 𝑗 𝑠𝑓𝑢 (𝐷 𝑗 Data 𝑣 = 𝛽 ∙ 𝑠 𝑚 }⋃{𝐽 1 𝑣 }) Unlabeled 𝑣 , {𝐽 1 𝑚 , 𝐽 2 𝑚 , … , 𝐽 𝑜 𝑚 𝑣 , 𝐽 2 𝑣 , … , 𝐽 𝑜 𝑣 𝑠 𝐷 𝑗 𝑠𝑓𝑢 (𝐷 𝑗 Data

Show, Tell and Discriminate  Moderately Hard Negative Mining in Unlabeled Images Feature: Unlabeled Image: Groundtruth Caption: 𝑣 } 𝑣 } ∗ } 𝐷 ∗ = {𝑥 1 𝑣 , 𝑤 2 𝑣 , … , 𝑤 𝑜 𝑚 𝑣 , 𝐽 2 𝑣 , … , 𝐽 𝑜 𝑣 {𝑤 1 {𝐽 1 ∗ , 𝑥 2 ∗ , … , 𝑥 𝑈 ′ Similarity: 𝑣 } 𝑣 ), 𝑡(𝑑 ∗ , 𝑤 2 𝑣 ), … , 𝑡(𝑑 ∗ , 𝑤 𝑜 𝑣 {𝑡(𝑑 ∗ , 𝑤 1 Rank and sample: [ℎ 𝑛𝑗𝑜 , ℎ 𝑛𝑏𝑦 ]

Show, Tell and Discriminate  Training Strategy  Train text-to-image self-retrieval module  Images and corresponding captions in labeled dataset  Pre-train captioning module  Images and corresponding captions in labeled dataset  Share image encoder with self-retrieval module  MLE with cross-entropy loss  Continue training by REINFORCE  Reward for labeled data: CIDEr and self-retrieval reward  Reward for unlabeled data: self-retrieval reward  CIDEr: guarantee the similarity between caption and groundtruth  Self-retrieval reward: encourage caption to be discriminative

Show, Tell and Discriminate  Implementation Details  Self-retrieval module:  Word embedding: 300-D vector  Image encoder: ResNet-101  Language decoder: single GRU with 1024 hidden units  Captioning module:  Share image encoder with self-retrieval module  Language decoder: attention LSTM  Visual feature: 2048x7x7 before pooling  𝛽 = 1, #𝑚𝑏𝑐𝑓𝑚𝑓𝑒 𝑒𝑏𝑢𝑏: #𝑣𝑜𝑚𝑏𝑐𝑓𝑚𝑓𝑒 𝑒𝑏𝑢𝑏 = 1: 1  Inference:  Beam search size: 5  Unlabeled data: COCO unlabeled images

Show, Tell and Discriminate  Quantitative results Baseline: captioning module only trained only with CIDEr (w/o self-retrieval module) SR-FL: proposed method training with fully-labeled data SR-PL: proposed method training with additional unlabeled data

Show, Tell and Discriminate  Quantitative results VSE0: VSE++:

Show, Tell and Discriminate  Uniqueness and novelty evaluation Unique captions: captions that are unique in all generated captions Novel captions: captions that have not been seen in training  Qualitative results

Speaking the Same Language  Problems in Captioning  Machine and human captions are quite distinct  Word distributions  Vocabulary size  Strong bias (frequent captions)  How to generate human-like captions  Multiple captions  Diverse captions 13 Rakshith Shetty, et al., Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training. ICCV, 2017.

Speaking the Same Language 14 Rakshith Shetty, et al., Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training. ICCV, 2017.

Speaking the Same Language  Discreteness Problem  Produce captions from generator  Generate multiple sentences and pick one with highest prob  Use greedy search approaches (beam search)  Directly providing discrete samples as input to discriminator does not allow BP (Discontinuous , Non- differentiable)  Alternative Options:  Reinforce trick (Policy Gradient)  High variance  Computationally intensive (sampling)  Softmax Distribution -> Discriminator  Easily distinguishes between softmax distribution and sharp ref.  Straight-Through Gumbel Softmax approximation 15

Gumbel-Softmax  Gumbel 分布 CDF: PDF: 均值 𝑏 + 𝛿𝑐  标准 Gumbel 分布 G(0,1)  采样 16

Speaking the Same Language  Experimental Results Performance Comparison Diversity Comparison Diversity in a set of captions for corresp. Image Corpus Level Diversity 17

Adversarial Neural Machine Translation  Framework 18 Lijun Wu, Yingce Xia, Tie-yan Liu, et al., Adversarial Neural Machine Translation. ACML, 2018.

Adversarial Neural Machine Translation  Discriminator  Training  Warm-up training with MLE  For a mini-batch, 50% samples for PG, others for MLE  Reward: whole sentence reward for each time step 19 Lijun Wu, Yingce Xia, Tie-yan Liu, et al., Adversarial Neural Machine Translation. ACML, 2018.

Sources  CaptionGAN: Theano Implementation  SeqGAN: TensorFlow Implementation  Adversarial-NMT: PyTorch Implementation 20

Thank you~

GANs for Discrete Text Generation Junfu Oct. 20 th , 2018 Show, - PowerPoint PPT Presentation

Paper Reading GANs for Discrete Text Generation Junfu Oct. 20 th , 2018 Show, Tell and Discriminate Problems in Image Captioning Imitate the language structure patterns (phrases, sentences) Templated and Generic (Different image

GANs for Word Embeddings Akshay Budhkar and Krishnapriya Introduction GANs have shown incredible

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs Yogesh

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Advanced Section #8: Generative Adversarial Networks (GANs) CS109B Data Science 2 Vincent Casser

Reading group: Latent Optimized GANs (Game theory brings guns to GANs) Michal Sustr Dept. of

Bregman and Wasserstein, with Applications to Generative Adversarial Networks (GANs) and beyond

GANs, Optimal Transport, and Implicit Distribution Estimation Tengyuan Liang Econometrics and

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Enhancing ICANN Text Accountability 26 June 2014 Text #ICANN50 Text #ICANN50 Text #ICANN50

Add Your Title Here Replace your text here! Replace your text here! Insert your title here 1

Text Text #ICANN51 15 October 2014 Text Text IDN Root Zone LGR Sarmad Hussain IDN Program

Text Text #ICANN51 Contractual Compliance Text Text Contractual Compliance Update

Text Text #ICANN50 Contractual Compliance Text Text GNSO Council Meeting Wednesday, Jun 25

GANocracy Outline Background: Text Generation Latent-Variable Generation Learning

God Rescues Daniel from the Lions Daniel 6 Here is some test text Here is some test text Here

SPICE: Semantic Propositional Image Caption Evaluation Presented to the COCO Consortium, Sept

TR TRECVI VID 2019 Vi Video t to T Text D Descr cription As Asad d A. But utt NIST;

2000-2016: Dal bortezomib ai nuovi inibitori del proteasoma

Managing Hyperlipidemia: Update 2020 Dedra Hayden, DNP, ANP, APRN-BC Disclosures Dedra

Video Captioning via Hierarchical Reinforcement Learning Xin Wang, Wenhu Chen, Jiawei Wi,

DADI Block-Level Image Service for Agile and Elastic Application Deployment Huiba Li, Yifan Yuan,

No Metrics Are Perfect: Adversarial REward Learning for Visual Storytelling Xin (Eric) Wang*,

Welcome To Must Win Meetings Masterclass 1 1 www.makingbusinessmatter.co.uk Learning