[PPT] - LEARNING PRUNING POLICIES FOR LINEAR CONTEXT-FREE REWRITING SYSTEMS PowerPoint Presentation

SLIDE 1

Faculty of Computer Science Theoretical Computer Science, Chair of Foundations of Programming

LEARNING PRUNING POLICIES FOR LINEAR CONTEXT-FREE REWRITING SYSTEMS

INF-PM-FPG

Andy Püschel

Dresden, July 20, 2018

SLIDE 2

Motivation

Example:

Weighted Deductive Parsing for LCFRS
Sentence w = Nun werden sie umworben .
Parser computes the highest scoring derivation ˆ

d

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 2

SLIDE 3

Linear Context-free Rewriting System

Definition

A linear context-free rewriting system is a tuple G = (N, Σ, Ξ, P, S) where

N is a finite nonempty N-sorted set (nonterminal symbols),
Σ is a finite set (terminal symbols) (with ∀l ∈ N : Σ ∩ Nl = /

0),

Ξ is a finite nontempty set (variable symbols) (with Ξ ∩ Σ = /

0 and ∀l ∈ N : Ξ ∩ Nl = / 0),

P is a set of production rules of the form ρ = φ → ψ where

– φ = A(α1, . . . , αl) (called left-hand side of ρ) where l ∈ N, A ∈ Nl, α1, . . . , αl ∈ (Σ ∪ Ξ)* and – ψ = B1(X (1)

1 , . . . , X (1) l1 ) . . . Bm(X (m) 1

, . . . , X (m)

lm ) (called right-hand side of ρ)

where m ∈ N, B1 ∈ Nl1, . . . , Bm ∈ Nlm, X (i)

j

∈ Ξ for 1 < − i < − m, 1 < − j < − li and for every X ∈ Ξ occurring in ρ we require that X occurs exactly once in the left-hand side of ρ and exactly once in the right-hand side of ρ, and

S ∈ N1 (initial nonterminal symbol).

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 3

SLIDE 4

Example PLCFRS

PLCFRS (G, p) and G = (N, Σ, Ξ, P, S) where

N = {VROOT, S, VP, ADV, VAFIN, VAINF, VVINF, PPER, VVPP, $, . . .},
Σ = {Nun, werden, sie, umworben, ., . . .} and
P = {. . . ,

ADV(Nun) → ε#1, VAFIN(werden) → ε#0, 5, VAINF(werden) → ε#0, 25, VVINF(werden) → ε#0, 25, PPER(sie) → ε#1, VVPP(umworben) → ε#1, $(.) → ε#1, , . . .}

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 4

SLIDE 5

Example PLCFRS

PLCFRS (G, p) and G = (N, Σ, Ξ, P, S) where

N = {VROOT, S, VP, ADV, VAFIN, VAINF, VVINF, PPER, VVPP, $, . . .},
Σ = {Nun, werden, sie, umworben, ., . . .} and
P = {. . . ,

VP(X (1)

1 , X (2) 1 ) → ADV(X (1) 1 )VVP(X (2) 1 )#0, 5,

S(X (1)

1 X (2) 1 X (3) 1 ) → VAFIN(X (1) 1 )PPER(X (2) 1 )VVPP(X (3) 1 )#0, 25,

S(X (1)

1 X (2) 1 , X (1) 2 ) → VP(X (1) 1 , X (1) 2 )VAINF(X (2) 1 )#0, 25,

S(X (1)

1 X (2) 1 X (3) 1 X (1) 2 ) → VP(X (1) 1 , X (1) 2 )VAFIN(X (2) 1 )PPER(X (3) 1 )#0, 5,

S(X (1)

1 X (1) 2 X (2) 1 X (1) 3 ) → S(X (1) 1 X (1) 2 , X (1) 3 )PPER(X (2) 1 )#0, 25,

VROOT(X (1)

1 X (1) 2 X (1) 3 X (1) 4 X (2) 1 ) → S(X (1) 1 X (1) 2 X (1) 3 X (1) 4 )$(X (2) 1 )#1

, . . .}

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 4

SLIDE 6

PARSE - Weighted Deductive Parsing: Nun werden sie umworben .

Initialize vertives

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5

SLIDE 7

PARSE - Weighted Deductive Parsing: Nun werden sie umworben .

hyperedges for ADV(Nun) → ε#1, . . .

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5

SLIDE 8

PARSE - Weighted Deductive Parsing: Nun werden sie umworben .

hyperedge for VP(X (1)

1 , X (2) 1 ) → ADV(X (1) 1 )VVP(X (2) 1 )#0, 5 TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5

SLIDE 9

PARSE - Weighted Deductive Parsing: Nun werden sie umworben .

hyperedge for S(X (1)

1 X (2) 1 X (3) 1 ) → VAFIN(X (1) 1 )PPER(X (2) 1 )VVPP(X (3) 1 )#0, 25 TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5

SLIDE 10

PARSE - Weighted Deductive Parsing: Nun werden sie umworben .

hyperedge for S(X (1)

1 X (2) 1 , X (1) 2 ) → VP(X (1) 1 , X (1) 2 )VAINF(X (2) 1 )#0, 25 TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5

SLIDE 11

PARSE - Weighted Deductive Parsing: Nun werden sie umworben .

hyperedge for S(X (1)

1 X (2) 1 X (3) 1 X (1) 2 ) → VP(X (1) 1 , X (1) 2 )VAFIN(X (2) 1 )PPER(X (3) 1 )#0, 5 TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5

SLIDE 12

PARSE - Weighted Deductive Parsing: Nun werden sie umworben .

hyperedge for S(X (1)

1 X (1) 2 X (2) 1 X (1) 3 ) → S(X (1) 1 X (1) 2 , X (1) 3 )PPER(X (2) 1 )#0, 25 TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5

SLIDE 13

PARSE - Weighted Deductive Parsing: Nun werden sie umworben .

hyperedge for VROOT(X (1)

1 X (1) 2 X (1) 3 X (1) 4 X (2) 1 ) → S(X (1) 1 X (1) 2 X (1) 3 X (1) 4 )$(X (2) 1 )#1 TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5

SLIDE 14

PARSE - Weighted Deductive Parsing: Nun werden sie umworben .

Undesired hyperedges

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5

SLIDE 15

PARSE - Weighted Deductive Parsing: Nun werden sie umworben .

Prune

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5

SLIDE 16

Motivation

How to reduce the parse time for a sentence?

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 6

SLIDE 17

Motivation

How to reduce the parse time for a sentence?
What is a good pruning method?

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 6

SLIDE 18

Motivation

How to reduce the parse time for a sentence?
What is a good pruning method?
How to train such a pruning method?

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 6

SLIDE 19

Overview

Motivation
Preliminaries
LOLS
Change Propagation
Dynamic Programming
Results

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 7

SLIDE 20

Preliminaries

H = (V, E) ∈ H(G,p)(w) : derivation graph from PARSE c ⊂ Σ* × TN(Σ) : X × Y − corpus s : state of the derivation graph a ∈ {keep, prune} : action τ = s0a0s1a1 . . . sT : trajectory

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 8

SLIDE 21

Preliminaries

pruning policy π : inputs a hyperedge and a sub sentence w′

utputs a pruning decision a ∈ {keep, prune}

How to evaluate π?

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 9

SLIDE 22

Preliminaries

pruning policy π : inputs a hyperedge and a sub sentence w′

utputs a pruning decision a ∈ {keep, prune}

How to evaluate π? reward function r : H(G,p)(w) × TN(Σ) → R schematically r = accuracy − λ · runtime where accuracy : TN(Σ) × TN(Σ) → R and runtime : H(G,p)(w) → R λ ∈ R : trade-off factor empirical value of π : R(π) = 1 |c|

(w,ξ)∈c

r(PARSE(G, w, π), ξ) · c(w, ξ)

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 9

SLIDE 23

Preliminaries

trajectory: s0a0s1a1 . . . sT

s1 s2 . . . sT

r1[a1]

a1 a2 aT−1

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 10

SLIDE 24

Preliminaries

trajectory: s0a0s1a1 . . . sT , (intervention at state s1)

s1 s2 . . . sT

r1[a1]

a1 a2 aT−1 s′

2

. . . s′

T

r1[a′

1]

a′

2

a′

T−1

a′

1

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 10

SLIDE 25

LOLS

Locally Optimal Learning to Search

Algorithm 1 Locally Optimal Learning to Search algorithm by [VE17] and [Cha+15] Input: PLCFRS (G, p) with G = (N, Σ, Ξ, P, S), X × Y-corpus c such that X ⊂ Σ* and Y ⊂ TN(Σ) Output: pruning policy π

1: function LOLS((G, p), c) 2:

π1 := INITIALIZEPOLICY(. . .)

3:

for i := 1 to n do ⊲ n : number of iterations

4:

Qi := / ⊲ Qi : set of state-reward tuples

5:

for (w, ξ) ∈ c do ⊲ w : sentence

6:

τ := ROLL-IN((G, p), w, πi, ξ) ⊲ τ = s0a0s1a1 . . . sT : trajectory

7:

for t := 0 to |τ| − 1 do

8:

for ¯ at ∈ {keep, prune} do ⊲ intervention

9:

rt[a′

t] := ROLL-OUT (πi, st, a′ t, ξ)

10:

end for

11:

Qi := Qi ∪ {(st, rt)}

12:

end for

13:

end for

14:

πi+1 := TRAIN(i

k=1 Qk)

⊲ dataset aggregation

15:

end for

16:

return argmaxπj:1<

−j< −n R(πj)

17: end function

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 11

SLIDE 26

Overview

Motivation
Preliminaries
LOLS
Change Propagation
Results

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 12

SLIDE 27

Change Propagation

Change pruning bit

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 13

SLIDE 28

Change Propagation

Delete witness for {1, 2, 3, 4} and S

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 13

SLIDE 29

Change Propagation

Find new witness for {1, 2, 3, 4} and S

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 13

SLIDE 30

Change Propagation

Repeat for affected vertices

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 13

SLIDE 31

Change Propagation

Done

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 13

SLIDE 32

Overview

Motivation
Preliminaries
LOLS
Change Propagation
Results

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 14

SLIDE 33

Accuracy Measure

FN TN TP FP Relevant Elements

precision = |TP| |TP| + |FP| recall = |TP| |TP| + |FN| p(ξ) = r(ξ) =

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 15

SLIDE 34

Accuracy Measure

derivation tree by parsing

S NP VP NP NP PP

derivation tree by gold standard

S CNP NP PP

precision = |TP| |TP| + |FP| recall = |TP| |TP| + |FN| p(ξ) = r(ξ) =

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 15

SLIDE 35

Accuracy Measure

derivation tree by parsing

S NP VP NP NP PP

derivation tree by gold standard

S CNP NP PP

precision = |TP| |TP| + |FP| recall = |TP| |TP| + |FN| p(ξ) = 3 3 r(ξ) = 3 3

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 15

SLIDE 36

Accuracy Measure

derivation tree by parsing

S NP VP NP NP PP

derivation tree by gold standard

S CNP NP PP

precision = |TP| |TP| + |FP| recall = |TP| |TP| + |FN| p(ξ) = 3 3 + 3 r(ξ) = 3 3 + 1

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 15

SLIDE 37

Accuracy Measure

derivation tree by parsing

S NP VP NP NP PP

derivation tree by gold standard

S CNP NP PP

precision = |TP| |TP| + |FP| recall = |TP| |TP| + |FN| p(ξ) = 3 3 + 3 = 0, 5 r(ξ) = 3 3 + 1 = 0, 75

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 15

SLIDE 38

Setup

accuracy(ξ, ζ) = 2 · p(ξ, ζ) · r(ξ, ζ) p(ξ, ζ) + r(ξ, ζ) F1-Measure, runtime(H) = |E| for H = (V, E) λ ∈ [0, 1]

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 16

SLIDE 39

Results

1e−04 1e−03 1e−02 1e−01 1e+00 65 70 75 80 85 90 trade−off factor accuracy in %

(a) accuracy for λ

1e−04 1e−03 1e−02 1e−01 1e+00 60 80 100 120 140 trade−off factor runtime in s

(b) runtime for λ

Figure : runtime and accuracy for given lambda

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 17

SLIDE 40

References I

Umut A. Acar and Ruy Ley-Wild. “Self-adjusting Computation with Delta ML”. In: Advanced Functional Programming: 6th International School, AFP 2008, Heijen, The Netherlands, May 2008, Revised Lectures. Ed. by Pieter Koopman, Rinus Plasmeijer, and Doaitse Swierstra. Springer Berlin Heidelberg, 2009, pp. 1–38. ISBN: 978-3-642-04652-0. DOI: 10.1007/978-3-642-04652-0_1.

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 18

SLIDE 41

References II

Kai-Wei Chang et al. “Learning to Search Better than Your Teacher”. In: Proceedings of the 32nd International Conference on Machine Learning (ICML-15). Ed. by David Blei and Francis Bach. JMLR Workshop and Conference Proceedings, 2015, pp. 2058–2066. Andreas van Cranenburgh, Remko Scha, and Rens Bod. “Data-Oriented Parsing with discontinuous constituents and function tags”. In: Journal of Language Modelling 4.1 (2016),

pp. 57–111. URL:

http://dx.doi.org/10.15398/jlm.v4i1.100.

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 19

SLIDE 42

References III

Laura Kallmeyer. Parsing Beyond Context-Free

Grammars. Springer Publishing Company,

Incorporated, 2012. ISBN: 3642264530, 9783642264535. Laura Kallmeyer and Wolfgang Maier. “Data-driven Parsing with Probabilistic Linear Context-free Rewriting Systems”. In: Proceedings of the 23rd International Conference

n Computational Linguistics. COLING ’10.

Beijing, China: Association for Computational Linguistics, 2010, pp. 537–545. URL: http://dl. acm.org/citation.cfm?id=1873781.1873842.

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 20

SLIDE 43

References IV

Yuki Kato, Hiroyuki Seki, and Tadao Kasami. “Stochastic Multiple Context-free Grammar for RNA Pseudoknot Modeling”. In: Proceedings of the Eighth International Workshop on Tree Adjoining Grammar and Related Formalisms. TAGRF ’06. Sydney, Australia: Association for Computational Linguistics, 2006, pp. 57–64. ISBN: 1-932432-85-X. URL: http://dl.acm.org/ citation.cfm?id=1654690.1654698. Mark-Jan Nederhof. “Weighted deductive parsing and Knuth’s algorithm”. In: Computational Linguistics 29.1 (2003), pp. 135–143.

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 21

SLIDE 44

References V

David M. W. Powers. “Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation”. In: Journal of Machine Learning Technologies 2.1 (2011),

pp. 37–63. ISSN: 2229-3981 & 2229-399X.

Stéphane Ross, Geoffrey J. Gordon, and

J. Andrew Bagnell. “No-Regret Reductions for

Imitation Learning and Structured Prediction”. In: CoRR abs/1011.0686 (2010). Tim Vieira and Jason Eisner. “Learning to Prune: Exploring the Frontier of Fast and Accurate Pasring”. In: Transactions of the Association for Computational Linguistics (TACL) 5 (Feb. 2017).

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 22

SLIDE 45

References VI

K. Vijay-Shanker, David J. Weir, and

Aravind K. Joshi. “Characterizing Structural Descriptions Produced by Various Grammatical Formalisms”. In: Proceedings of the 25th Annual Meeting on Association for Computational

Linguistics. ACL ’87. Stanford, California:

Association for Computational Linguistics, 1987,

pp. 104–111. DOI: 10.3115/981175.981190. URL:

https://doi.org/10.3115/981175.981190.

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 23