Adversarial Reward Learning for Visual Storytelling
Maria Fabiano
Xin Wang, Wenhu Chen, Yuan-Fang Wang, William Yang Wang
Adversarial Reward Learning for Visual Storytelling Xin Wang, Wenhu - - PowerPoint PPT Presentation
Adversarial Reward Learning for Visual Storytelling Xin Wang, Wenhu Chen, Yuan-Fang Wang, William Yang Wang Maria Fabiano Outline 1. Motivation 2. AREL Model Overview 3. Policy Model 4. Reward Model 5. AREL Objective 6. Data 7.
Xin Wang, Wenhu Chen, Yuan-Fang Wang, William Yang Wang
1. Motivation 2. AREL Model Overview 3. Policy Model 4. Reward Model 5. AREL Objective 6. Data 7. Training and Testing 8. Evaluation 9. Critique
The authors want to explore how well a computer can create a story from a set of images. Up to the point of this paper, little research had been done in visual storytelling. Visual storytelling represents a deeper understanding of images. This goes beyond image captioning because it requires understanding more complicated visual scenarios, relating sequential images, and associating implicit concepts in the image (e.g., emotions).
○ Hand-crafted rewards (e.g., METEOR) are too biased or too sparse to drive the policy search ○ Fail to learn implicit semantics (coherence, expressiveness, etc) ○ Require extensive feature and reward engineering
○ Prone to unstable gradients or vanishing gradients
○ Maximum margin approaches, probabilistic approaches
Problems with previous storytelling approaches
story sequence from an image sequence
reward function from human-annotated stories and sampled predictions
via SGD
dversarial ward earning
pre-trained CNN
GRUs) gets high-level features of images
GRU, shared weights) create five substories
Takes an image sequence and sequentially chooses words from the vocabulary to create a story.
Aims to derive a human-like reward from human-annotated stories and sampled predictions.
We achieve the optimal reward function R* when the Reward-Boltzmann distribution pθ equals the actual data distribution p* W = Story Rθ = Reward function Zθ = Partition function (a normalizing constant) pθ = Approximate data distribution
We want the Reward Boltzmann distribution pθ to get close the actual data distribution p*.
Maximize the similarity of pθ with the empirical distribution pe while minimizing the similarity of pθ with the generated data from the policy πβ. Meanwhile, πβ wants to maximize its similarity with pθ.
stories and machine-generated stories.
○ Minimize KL-divergence with pe and maximize KL-divergence with πβ
human-written stories.
○ Minimize KL-divergence with pθ
in the training set)
1. Create a baseline model XE-ss (cross-entropy loss with scheduled sampling) with the same architecture as the policy model
a. Scheduled sampling uses a sampling probability to decide which action to take
2. Use XE-ss to initialize the policy model 3. Train with AREL framework
maximize similarity with pθ
distinguish between human-generated and machine-generated stories
and reward using SGD
○ N = 50 or 100
search to create the story
very small, and are very similar to the baseline model and vanilla GAN
2.2 1.1
Gain
0.9 0.4
Range of new methods
0.9 0.4 0.9 0.9 1.7 0.2 2.1
AREL greatly outperforms all
evaluations:
Comparison of Turing test results
○ Better results on relevance, expressiveness, and concreteness ○ Clear description of how human evaluation was conducted
The “Good”
○ Limited to five photos in a story
○ Type of pooling in reward model is not specified (average? max?) ○ Fine tuning the pre-trained ResNet?
direct comparison of human evaluation between AREL and previous methods
○ Include a reason why they chose which sentence was machine-generated or not ○ Rankings instead of pairwise comparisons
that requires different weights (e.g., the structure of a narrative: setting, problem, rising action, climax, falling action, resolution)
The “Not so Good”