Adversarial Reward Learning for Visual Storytelling Xin Wang, Wenhu - PowerPoint PPT Presentation

Adversarial Reward Learning for Visual Storytelling Xin Wang, Wenhu Chen, Yuan-Fang Wang, William Yang Wang Maria Fabiano

Outline 1. Motivation 2. AREL Model Overview 3. Policy Model 4. Reward Model 5. AREL Objective 6. Data 7. Training and Testing 8. Evaluation 9. Critique

Motivation The authors want to explore how well a computer can create a story from a set of images. Up to the point of this paper, little research had been done in visual storytelling. Visual storytelling represents a deeper understanding of images. This goes beyond image captioning because it requires understanding more complicated visual scenarios, relating sequential images, and associating implicit concepts in the image (e.g., emotions).

Motivation Problems with previous storytelling approaches RL ● Hand-crafted rewards (e.g., METEOR) are too biased or too sparse to drive the policy search ○ Fail to learn implicit semantics (coherence, expressiveness, etc) ○ Require extensive feature and reward engineering ○ GANs ● Prone to unstable gradients or vanishing gradients ○ IRL ● Maximum margin approaches, probabilistic approaches ○

AREL Model Overview dversarial ward earning Policy model : produces the ● story sequence from an image sequence Reward model : learns implicit ● reward function from human-annotated stories and sampled predictions Alternately train the models ● via SGD

Policy Model Takes an image sequence and sequentially chooses words from the vocabulary to create a story. ● Images go through pre-trained CNN ● Encoder (bidirectional GRUs) gets high-level features of images ● Five decoders (single-layer GRU, shared weights) create five substories ● Concatenate substories

(Partial) Reward Model Aims to derive a human-like reward from human-annotated stories and sampled predictions.

Adversarial Reward Learning: Reward Boltzmann Distribution W = Story R θ = Reward function Z θ = Partition function (a normalizing constant) p θ = Approximate data distribution We achieve the optimal reward function R* when the Reward-Boltzmann distribution p θ equals the actual data distribution p*

Adversarial Reward Learning We want the Reward Boltzmann distribution p θ to get close the actual data distribution p*. Adversarial objective : min-max two player game ● Maximize the similarity of p θ with the empirical distribution p e while minimizing the similarity of p θ with the generated data from the policy π β . Meanwhile, π β wants to maximize its similarity with p θ . Distribution similarity is measured using KL-divergence. ● The objective of the reward is to distinguish between human-annotated ● stories and machine-generated stories. Minimize KL-divergence with p e and maximize KL-divergence with π β ○ The objective of the policy is to create stories indistinguishable from ● human-written stories. Minimize KL-divergence with p θ ○

Data VIST dataset of Flickr photos aligned to stories ● One sample is a story for five images from a photo album ● The same album is paired with five different stories as references ● Vocabulary of size 9,837 words (words have to appear more than three times ● in the training set)

Training and Testing 1. Create a baseline model XE-ss (cross-entropy loss with scheduled sampling) with the same architecture as the policy model a. Scheduled sampling uses a sampling probability to decide which action to take 2. Use XE-ss to initialize the policy model 3. Train with AREL framework

Training and Testing Objective of the policy model: ● maximize similarity with p θ Objective of the reward model: ● distinguish between human-generated and machine-generated stories Alternate between training policy ● and reward using SGD N = 50 or 100 ○ For testing, policy uses beam ● search to create the story

Automatic Evaluation AREL achieves SOTA on all metrics except ROUGE; however, these gains are ● very small, and are very similar to the baseline model and vanilla GAN Gain 2.2 1.1 0.2 2.1 Range of new 1.7 0.9 0.9 0.9 0.4 0.4 0.9 methods

Human Evaluation AREL greatly outperforms all other models in human evaluations: Turing test ● Relevance ● Expressiveness ● Concreteness ● Comparison of Turing test results

Critique The “Good” AREL – novel framework of adversarial reward learning to tell stories ● SOTA on VIST dataset and automatic metrics ● Automatic metrics are not great for training or evaluation (empirically shown) ● Comprehensive human evaluation via Turk ● Better results on relevance, expressiveness, and concreteness ○ Clear description of how human evaluation was conducted ○

Critique The “Not so Good” Motivation : interesting problem to solve, but what are practical applications? ● Limited to five photos in a story ○ XE-ss : not mentioned until evaluation section, but it initializes AREL ● Partial rewards : more discussion and motivation needed for this approach ● Missing details ● Type of pooling in reward model is not specified (average? max?) ○ Fine tuning the pre-trained ResNet? ○ ● Data bias (gender and event), and the model augments the largest majority’s influence Small gain in automatic evaluation metrics, and XE-ss performs similarly to AREL; no ● direct comparison of human evaluation between AREL and previous methods Human evaluation improvements ● Include a reason why they chose which sentence was machine-generated or not ○ Rankings instead of pairwise comparisons ○ Decoder shared weights : maybe there is something specific about an image’s position ● that requires different weights (e.g., the structure of a narrative: setting, problem, rising action, climax, falling action, resolution)

Adversarial Reward Learning for Visual Storytelling Xin Wang, Wenhu - PowerPoint PPT Presentation

Adversarial Reward Learning for Visual Storytelling Xin Wang, Wenhu Chen, Yuan-Fang Wang, William Yang Wang Maria Fabiano Outline 1. Motivation 2. AREL Model Overview 3. Policy Model 4. Reward Model 5. AREL Objective 6. Data 7.

The ULTIMATE Business Incentive Company REWARD YOUR CUSTOMERS; REWARD YOUR EMPLOYEES REWARD YOUR

No Metrics Are Perfect: Adversarial REward Learning for Visual Storytelling Xin Wang*, Wenhu

No Metrics Are Perfect: Adversarial REward Learning for Visual Storytelling Xin (Eric) Wang*,

Risk/Reward Risk/Reward If you buy here, what is the target? What is the risk? 1 221

Transmedia Storytelling Liz Ellis & Kristen Reid TRANSMEDIA STORYTELLING Plan for the

Storytelling & Designing Immersive Experiences Lecture 7 IML 499 Storytelling Why is

Advantage YPO Online Presentation October 15, 2014 How Can We Define Storytelling? Storytelling

STORYTELLING STORYTELLING IS THE INTERACTIVE ART OF USING WORDS AND ACTIONS TO REVEAL THE

Visual Storytelling applied to national and regional Statistics Prof Mikael Jern Visual

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Reward Shaping in Episodic Reinforcement Learning Marek Grze s Canterbury, UK AAMAS 2017

Where's The Reward? Where's The Reward? A Review of Reinforcement Learning for Instructional

SECURITY, ADVERSARIAL SECURITY, ADVERSARIAL LEARNING, AND PRIVACY LEARNING, AND PRIVACY

Getting Creative with Systems of Equations A Cinematic Lesson Samuel Otten & Zandra de

THE UNBURDENED HEART [FORGIVENESS] QUESTIONS FOR DISCUSSION & DISCOVERY 1. How does Lot

4 dimensions of storytelling in VR UX in the City Manchester 2018 We will cover: How

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

Innovate Organisations From Ego To Eco Oana Juncu @ojuncu www.coemerge.com #2019 Oana Juncu

BIO 121 Review Session - By AMS Tutoring (Sambina and Parvin) Review Session Overview Topics

1E8H D 2CE D

Network File System: NFS and Coda Presented by Hakim Weatherspoon Evolution of networked file

Adversarial Reward Learning for Visual Storytelling Xin Wang, Wenhu - PowerPoint PPT Presentation

Adversarial Reward Learning for Visual Storytelling Xin Wang, Wenhu Chen, Yuan-Fang Wang, William Yang Wang Maria Fabiano Outline 1. Motivation 2. AREL Model Overview 3. Policy Model 4. Reward Model 5. AREL Objective 6. Data 7.

The ULTIMATE Business Incentive Company REWARD YOUR CUSTOMERS; REWARD YOUR EMPLOYEES REWARD YOUR

No Metrics Are Perfect: Adversarial REward Learning for Visual Storytelling Xin Wang*, Wenhu

No Metrics Are Perfect: Adversarial REward Learning for Visual Storytelling Xin (Eric) Wang*,

Risk/Reward Risk/Reward If you buy here, what is the target? What is the risk? 1 221

Transmedia Storytelling Liz Ellis &amp; Kristen Reid TRANSMEDIA STORYTELLING Plan for the

Storytelling &amp; Designing Immersive Experiences Lecture 7 IML 499 Storytelling Why is

Advantage YPO Online Presentation October 15, 2014 How Can We Define Storytelling? Storytelling

STORYTELLING STORYTELLING IS THE INTERACTIVE ART OF USING WORDS AND ACTIONS TO REVEAL THE

Visual Storytelling applied to national and regional Statistics Prof Mikael Jern Visual

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Reward Shaping in Episodic Reinforcement Learning Marek Grze s Canterbury, UK AAMAS 2017

Where's The Reward? Where's The Reward? A Review of Reinforcement Learning for Instructional

SECURITY, ADVERSARIAL SECURITY, ADVERSARIAL LEARNING, AND PRIVACY LEARNING, AND PRIVACY

Getting Creative with Systems of Equations A Cinematic Lesson Samuel Otten &amp; Zandra de

THE UNBURDENED HEART [FORGIVENESS] QUESTIONS FOR DISCUSSION &amp; DISCOVERY 1. How does Lot

4 dimensions of storytelling in VR UX in the City Manchester 2018 We will cover: How

CSc 337 LECTURE 24: CREATING A DATABASE AND MORE JOINS Creating a database In the command line

Innovate Organisations From Ego To Eco Oana Juncu @ojuncu www.coemerge.com #2019 Oana Juncu

BIO 121 Review Session - By AMS Tutoring (Sambina and Parvin) Review Session Overview Topics

1E8H D 2CE D

Network File System: NFS and Coda Presented by Hakim Weatherspoon Evolution of networked file

Transmedia Storytelling Liz Ellis & Kristen Reid TRANSMEDIA STORYTELLING Plan for the

Storytelling & Designing Immersive Experiences Lecture 7 IML 499 Storytelling Why is

Getting Creative with Systems of Equations A Cinematic Lesson Samuel Otten & Zandra de

THE UNBURDENED HEART [FORGIVENESS] QUESTIONS FOR DISCUSSION & DISCOVERY 1. How does Lot