[PPT] - Trainable Decoding of Sets of Sequences for Neural Sequence Models PowerPoint Presentation

SLIDE 1

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Trainable Decoding of Sets of Sequences for Neural Sequence Models

Ashwin Kalyan Dhruv Batra Stefan Lee Peter Anderson

Ashwin Kalyan ICML 2019

SLIDE 2

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Standard Sequence Prediction Pipeline

RNN

ht ht−1

RNN

yt yt+1

t
1. Train RNNs to maximize Log Likelihood

SLIDE 3

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Standard Sequence Prediction Pipeline

RNN

ht ht−1

RNN

yt yt+1

t
1. Train RNNs to maximize Log Likelihood

is picture a the is shows This

B = 2

2. Perform Beam Search to decode top K

SLIDE 4

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Standard Sequence Prediction Pipeline

RNN

ht ht−1

RNN

yt yt+1

t
1. Train RNNs to maximize Log Likelihood

is picture a the is shows This

B = 2

2. Perform Beam Search to decode top K
3. Return the best sequence in the top K

> A kitchen with a stove. > A kitchen with a stove and a sink. > A kitchen with a stove and a microwave. > A kitchen with a stove and a refrigerator.

SLIDE 5

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

But… many real world tasks are multi-modal!

ü A group of people riding horses. ü Kids riding horses with adults help. ü A girl poses on her horse in equestrian dress by a small crowd. ü Some people stand near some horses in a field. ü People are standing around children riding horses in a grassy area. ü A small girl is riding a large light brown horse. ü A young girl in riding gear mounts a pony in front of a group. ü A group of people with a jockey and her horse ü Several people playing with ponies in a park.

How to model more than one correct output?

SLIDE 6

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Retool the Standard Sequence Prediction Pipeline

RNN

ht ht−1

RNN

yt yt+1

t
1. Train RNNs to maximize Log Likelihood

is picture a the is shows This

B = 2

2. Perform Beam Search to decode top K
3. Return the best sequence in the top K

> A kitchen with a stove. > A kitchen with a stove and a sink. > A kitchen with a stove and a microwave. > A kitchen with a stove and a refrigerator.

SLIDE 7

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Retool the Standard Sequence Prediction Pipeline

RNN

ht ht−1

RNN

yt yt+1

t
1. Train RNNs to maximize Log Likelihood

is picture a the is shows This

B = 2

2. Perform Beam Search to decode top K
3. Return the best sequence in the top K

> A kitchen with a stove. > A kitchen with a stove and a sink. > A kitchen with a stove and a microwave. > A kitchen with a stove and a refrigerator.

SLIDE 8

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Beam Search outputs are nearly identical!

Ø A group of people riding horses on a field. Ø A group of people riding horses in a field. Ø A group of people riding horses down a dirt road. Ø A group of people riding horses through a field. Ø A group of people riding on the back of horses. Ø A group of people riding on the back of a horse. Ø A group of people riding on a horse. Ø A couple of people riding on the back of horses. Ø A couple of people riding on the back of a horse. Ø A couple of people riding horses on a field.

Doesn’t model intra-set interactions!

SLIDE 9

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Beam Search outputs are nearly identical!

Ø A group of people riding horses on a field. Ø A group of people riding horses in a field. Ø A group of people riding horses down a dirt road. Ø A group of people riding horses through a field. Ø A group of people riding on the back of horses. Ø A group of people riding on the back of a horse. Ø A group of people riding on a horse. Ø A couple of people riding on the back of horses. Ø A couple of people riding on the back of a horse. Ø A couple of people riding horses on a field.

Fails to COVER the variation in the output space! Doesn’t model intra-set interactions!

SLIDE 10

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Learning to Decode Sets of Sequences

Select top-B words at each time step

is picture

t

This

B = 2

SLIDE 11

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Learning to Decode Sets of Sequences

Select top-B words at each time step

is picture

t

a the is shows This

B = 2

SLIDE 12

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Learning to Decode Sets of Sequences

Select top-B words at each time step

is picture

t

This

B = 2

a shows

SLIDE 13

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Learning to Decode Sets of Sequences

Select top-B words at each time step

is picture

t

This

B = 2

Till end token is generated or max time a shows

SLIDE 14

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Incoming beams All possible expansions EXPAND

|V| × B Beam Search as Subset Selection

SLIDE 15

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Outgoing beams Incoming beams All possible expansions EXPAND MERGE

|V| × B Beam Search as Subset Selection

SLIDE 16

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Outgoing beams Incoming beams All possible expansions EXPAND SUBMODULAR MAXIMIZATION

|V| × B Beam Search as Subset Selection

SLIDE 17

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Outgoing beams Incoming beams All possible expansions EXPAND SUBMODULAR MAXIMIZATION

|V| × B Submodular Maximization for Subset Selection

Naturally models

coverage, promoting diversity

SLIDE 18

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Outgoing beams Incoming beams All possible expansions EXPAND SUBMODULAR MAXIMIZATION

|V| × B Submodular Maximization for Subset Selection

Naturally models

coverage, promoting diversity

NP Hard!

SLIDE 19

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Outgoing beams Incoming beams All possible expansions EXPAND SUBMODULAR MAXIMIZATION

|V| × B Submodular Maximization for Subset Selection

Naturally models

coverage, promoting diversity

NP Hard!
Greedy algorithms

with approximation guarantees exist!

SLIDE 20

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Learning Submodular Functions X wi wi ≥ 0

Set feature

MLP W ≥ 0

log(1 + ·)

f(S) ∀e ∈ S, φ(e) ≥ 0

[Bilmes et al., 2017]

SLIDE 21

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

FOR t = 1 to T:

1. Construct set of all possible extensions

FOR k = 1 to K:

2. Compute marginal gain of each

extension

3. Sample an extension proportional to

marginal gain RETURN Set of K Sequences of length T

Yt−1 × |V| ∇BS (diff-BS)

SLIDE 22

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

FOR t = 1 to T:

1. Construct set of all possible extensions

FOR k = 1 to K:

2. Compute marginal gain of each

extension

3. Sample an extension proportional to

marginal gain RETURN Set of K Sequences of length T

∇BS (diff-BS) Yt−1 × |V|

SLIDE 23

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

FOR t = 1 to T:

1. Construct set of all possible extensions

FOR k = 1 to K:

2. Compute marginal gain of each

extension

3. Sample an extension proportional to

marginal gain RETURN Set of K Sequences of length T

Yt−1 × |V| ∇BS (diff-BS)

SLIDE 24

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

FOR t = 1 to T:

1. Construct set of all possible extensions

FOR k = 1 to K:

2. Compute marginal gain of each

extension

3. Sample an extension proportional to

marginal gain RETURN Set of K Sequences of length T

Yt−1 × |V| ∇BS (diff-BS)

SLIDE 25

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

FOR t = 1 to T:

1. Construct set of all possible extensions

FOR k = 1 to K:

2. Compute marginal gain of each

extension

3. Sample an extension proportional to

marginal gain RETURN Set of K Sequences of length T

Yt−1 × |V| ∇BS (diff-BS)

SLIDE 26

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

FOR t = 1 to T:

1. Construct set of all possible extensions

FOR k = 1 to K:

2. Compute marginal gain of each

extension

3. Sample an extension proportional to

marginal gain RETURN Set of K Sequences of length T

Yt−1 × |V| ∇BS (diff-BS)

SLIDE 27

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

π∗ = arg max

π∈Π E(Y1,...,YT )∼π(·|x)SET−METRIC(Y|x)

“Set of Sequences” Level Training

SLIDE 28

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

π∗ = arg max

π∈Π E(Y1,...,YT )∼π(·|x)SET−METRIC(Y|x)

“Set of Sequences” Level Training

Set-metric?
Oracle, average accuracy

SLIDE 29

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

π∗ = arg max

π∈Π E(Y1,...,YT )∼π(·|x)SET−METRIC(Y|x)

“Set of Sequences” Level Training

Set-metric?
Oracle, average accuracy
Facility Location Accuracy [NEW]

SLIDE 30

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

π∗ = arg max

π∈Π E(Y1,...,YT )∼π(·|x)SET−METRIC(Y|x)

“Set of Sequences” Level Training

Set-metric?
Oracle, average accuracy
Facility Location Accuracy [NEW]
Training?
Teacher Forcing if multiple annotations are available.

SLIDE 31

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

π∗ = arg max

π∈Π E(Y1,...,YT )∼π(·|x)SET−METRIC(Y|x)

“Set of Sequences” Level Training

Set-metric?
Oracle, average accuracy
Facility Location Accuracy [NEW]
Training?
Teacher Forcing if multiple annotations are available.
Imitation Learning if expert is available

SLIDE 32

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

π∗ = arg max

π∈Π E(Y1,...,YT )∼π(·|x)SET−METRIC(Y|x)

“Set of Sequences” Level Training

Set-metric?
Oracle, average accuracy
Facility Location Accuracy [NEW]
Training?
Teacher Forcing if multiple annotations are available
Imitation Learning if expert is available
REINFORCE to directly optimize for the set-metric

SLIDE 33

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Novel perspective. Beam Search as Subset Selection
Models intra-set dependencies
Can be used with arbitrary set constraints
No train-test or loss-evaluation mismatch
Outperforms Beam Search and other baselines on captioning

Doesn’t scale very well with beam size (some tricks in the paper)

In Summary

SLIDE 34

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Novel perspective. Beam Search as Subset Selection
Models intra-set dependencies
Can be used with arbitrary set constraints
No train-test or loss-evaluation mismatch
Outperforms Beam Search and other baselines on captioning

Doesn’t scale very well with beam size (some tricks in the paper)

In Summary

SLIDE 35

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Novel perspective. Beam Search as Subset Selection
Models intra-set dependencies
Can be used with arbitrary set constraints
No train-test or loss-evaluation mismatch
Outperforms Beam Search and other baselines on captioning

Doesn’t scale very well with beam size (some tricks in the paper)

In Summary

SLIDE 36

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Novel perspective. Beam Search as Subset Selection
Models intra-set dependencies
Can be used with arbitrary set constraints
No train-test or loss-evaluation mismatch
Outperforms Beam Search and other baselines on captioning

Doesn’t scale very well with beam size (some tricks in the paper)

In Summary

SLIDE 37

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Novel perspective. Beam Search as Subset Selection
Models intra-set dependencies
Can be used with arbitrary set constraints
No train-test or loss-evaluation mismatch
Outperforms Beam Search and other baselines on captioning

Doesn’t scale very well with beam size (some tricks in the paper)

In Summary

SLIDE 38

Ashwin Kalyan ICML 2019 Tr Trainable Decoding of Sets of Sequences for Neural Sequence Models

Novel perspective. Beam Search as Subset Selection
Models intra-set dependencies
Can be used with arbitrary set constraints
No train-test or loss-evaluation mismatch
Outperforms Beam Search and other baselines on captioning

Doesn’t scale very well with beam size (some tricks in the paper)

In Summary

SLIDE 39