Policy Gradient as a Proxy for Dynamic Oracles in Constituency - - PowerPoint PPT Presentation

β–Ά
policy gradient as a proxy for dynamic oracles in
SMART_READER_LITE
LIVE PREVIEW

Policy Gradient as a Proxy for Dynamic Oracles in Constituency - - PowerPoint PPT Presentation

Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing Daniel Fried and Dan Klein Parsing by Local Decisions S VP NP NP nap . a The cat took (S (NP The cat ) (VP = log ; ) =


slide-1
SLIDE 1

Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing

Daniel Fried and Dan Klein

slide-2
SLIDE 2

Parsing by Local Decisions

The cat took a nap . NP NP VP S (S (NP The cat ) (VP

𝑀 πœ„ = log π‘ž 𝑧 𝑦; πœ„) = ෍

𝑒

log π‘ž(𝑧𝑒|𝑧1:π‘’βˆ’1, 𝑦; πœ„)

…

slide-3
SLIDE 3

Non-local Consequences

Exposure Bias

Prediction True Parse

(S (NP The (S (VP (NP cat

??

𝑧 ො 𝑧

[Ranzato et al. 2016; Wiseman and Rush 2016]

…

Loss-Evaluation Mismatch

The cat took a nap . NP NP VP S The cat took a nap . VP NP VP S NP

βˆ†(𝑧, ො 𝑧): -F1(𝑧, ො 𝑧) 𝑧 ො 𝑧

slide-4
SLIDE 4

Dynamic Oracle Training

Prediction

(sample, or greedy)

True Parse

(S (NP The (S (VP (NP cat … The The

𝑀 πœ„ = ෍

𝑒

log π‘ž(𝑧𝑒

βˆ—|ො

𝑧1:π‘’βˆ’1, 𝑦; πœ„) ො 𝑧

(NP

Oracle

π‘§βˆ—

The cat …

Explore at training time. Supervise each state with an expert policy. 𝑧 𝑧𝑒

βˆ—

choose to maximize achievable F1 (typically)

addresses loss mismatch addresses exposure bias

[Goldberg & Nivre 2012; Ballesteros et al. 2016; inter alia]

slide-5
SLIDE 5

Dynamic Oracles Help!

Expert Policies / Dynamic Oracles

Daume III et al., 2009; Ross et al., 2011; Choi and Palmer, 2011; Goldberg and Nivre, 2012; Chang et al., 2015; Ballesteros et al., 2016; Stern et al. 2017 System Static Oracle Dynamic Oracle Coavoux and CrabbΓ©, 2016 88.6 89.0 Cross and Huang, 2016 91.0 91.3 FernΓ‘ndez-GonzΓ‘lez and GΓ³mez-RodrΓ­guez, 2018 91.5 91.7

PTB Constituency Parsing F1

mostly dependency parsing

slide-6
SLIDE 6

What if we don’t have a dynamic oracle? Use reinforcement learning

slide-7
SLIDE 7

Reinforcement Learning Helps! (in other tasks)

Auli and Gao, 2014; Ranzato et al., 2016; Shen et al., 2016

machine translation

Xu et al., 2016; Wiseman and Rush, 2016; Edunov et al. 2017

machine translation several, including dependency parsing CCG parsing

slide-8
SLIDE 8

Policy Gradient Training

[Williams, 1992]

Minimize expected sequence-level cost:

𝑆(πœ„) = ෍

ො 𝑧

π‘ž ො 𝑧 𝑦; πœ„ βˆ†(𝑧, ො 𝑧)

addresses exposure bias (compute by sampling) addresses loss mismatch (compute F1) compute in the same way as for the true tree

The man had an idea. NP NP VP S The man had an idea. NP NP VP S NP

βˆ†(𝑧, ො 𝑧)

Prediction True Parse

ො 𝑧 𝑧

𝛼𝑆 πœ„ = ෍

ො 𝑧

π‘ž ො 𝑧 𝑦; πœ„ βˆ† 𝑧, ො 𝑧 𝛼 log π‘ž(ො 𝑧|𝑦; πœ„)

slide-9
SLIDE 9

Policy Gradient Training

βˆ†(𝑧, ො 𝑧) (negative F1)

The cat took a nap.

The cat took a nap . NP NP VP S NP

βˆ’89 βˆ—

The cat took a nap . NP NP VP S-INV

βˆ’80 βˆ—

The cat took a nap . NP NP ADJP S

βˆ’80 βˆ— gradient for candidate

𝛼 log π‘ž(ො 𝑧1|𝑦; πœ„) 𝛼 log π‘ž(ො 𝑧2|𝑦; πœ„) 𝛼 log π‘ž(ො 𝑧3|𝑦; πœ„)

The cat took a nap . NP NP VP S

βˆ— βˆ’100

𝛼 log π‘ž(𝑧|𝑦; πœ„)

𝛼𝑆 πœ„ = ෍

ො 𝑧

π‘ž ො 𝑧 𝑦; πœ„ βˆ† 𝑧, ො 𝑧 𝛼 log π‘ž(ො 𝑧|𝑦; πœ„)

k candidates, ො

𝑧

Input, 𝑦

slide-10
SLIDE 10

Experiments

slide-11
SLIDE 11

Setup

Parsers

Span-Based [Cross & Huang, 2016] Top-Down [Stern et al. 2016] RNNG [Dyer et al. 2016] In-Order [Liu and Zhang, 2017]

Training

Static oracle Dynamic oracle Policy gradient

x

slide-12
SLIDE 12

English PTB F1

90 90.5 91 91.5 92 92.5 93 Span-Based Top-Down RNNG-128 RNNG-256 In-Order

Static oracle Policy gradient Dynamic oracle

slide-13
SLIDE 13

Training Efficiency

PTB learning curves for the Top-Down parser

89 89.5 90 90.5 91 91.5 92

5 10 15 20 25 30 35 40 45

Development F1 Training Epoch static oracle dynamic oracle policy gradient

slide-14
SLIDE 14

French Treebank F1

80 81 82 83 84 Span-Based Top-Down RNNG-128 RNNG-256 In-Order

Static oracle Policy gradient Dynamic oracle

slide-15
SLIDE 15

Chinese Penn Treebank v5.1 F1

83 84 85 86 87 88 Span-Based Top-Down RNNG-128 RNNG-256 In-Order

Static oracle Policy gradient Dynamic oracle

slide-16
SLIDE 16

Conclusions

  • Local decisions can have non-local consequences
  • Loss mismatch
  • Exposure bias
  • How to deal with the issues caused by local decisions?
  • Dynamic oracles: efficient, model specific
  • Policy gradient: slower to train, but general purpose
slide-17
SLIDE 17

Thank you!

slide-18
SLIDE 18

For Comparison: A Novel Oracle for RNNG

(S (NP The man

  • 1. Close current constituent if it’s a true constituent…

… or it could never be a true constituent.

  • 2. Otherwise, open the outermost unopened true constituent at this position.
  • 3. Otherwise, shift the next word.

(S (NP The man ) (VP had ) (S (NP The man ) (VP ) (S (NP The man ) (VP (S (NP The man ) (VP had …