LEARNING PRUNING POLICIES FOR LINEAR CONTEXT-FREE REWRITING SYSTEMS - - PowerPoint PPT Presentation

learning pruning policies for linear context free
SMART_READER_LITE
LIVE PREVIEW

LEARNING PRUNING POLICIES FOR LINEAR CONTEXT-FREE REWRITING SYSTEMS - - PowerPoint PPT Presentation

Faculty of Computer Science Theoretical Computer Science, Chair of Foundations of Programming LEARNING PRUNING POLICIES FOR LINEAR CONTEXT-FREE REWRITING SYSTEMS INF-PM-FPG Andy Pschel Dresden, July 20, 2018 Motivation Example:


slide-1
SLIDE 1

Faculty of Computer Science Theoretical Computer Science, Chair of Foundations of Programming

LEARNING PRUNING POLICIES FOR LINEAR CONTEXT-FREE REWRITING SYSTEMS

INF-PM-FPG

Andy Püschel

Dresden, July 20, 2018

slide-2
SLIDE 2

Motivation

Example:

  • Weighted Deductive Parsing for LCFRS
  • Sentence w = Nun werden sie umworben .
  • Parser computes the highest scoring derivation ˆ

d

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 2

slide-3
SLIDE 3

Linear Context-free Rewriting System

Definition

A linear context-free rewriting system is a tuple G = (N, Σ, Ξ, P, S) where

  • N is a finite nonempty N-sorted set (nonterminal symbols),
  • Σ is a finite set (terminal symbols) (with ∀l ∈ N : Σ ∩ Nl = /

0),

  • Ξ is a finite nontempty set (variable symbols) (with Ξ ∩ Σ = /

0 and ∀l ∈ N : Ξ ∩ Nl = / 0),

  • P is a set of production rules of the form ρ = φ → ψ where

– φ = A(α1, . . . , αl) (called left-hand side of ρ) where l ∈ N, A ∈ Nl, α1, . . . , αl ∈ (Σ ∪ Ξ)* and – ψ = B1(X (1)

1 , . . . , X (1) l1 ) . . . Bm(X (m) 1

, . . . , X (m)

lm ) (called right-hand side of ρ)

where m ∈ N, B1 ∈ Nl1, . . . , Bm ∈ Nlm, X (i)

j

∈ Ξ for 1 < − i < − m, 1 < − j < − li and for every X ∈ Ξ occurring in ρ we require that X occurs exactly once in the left-hand side of ρ and exactly once in the right-hand side of ρ, and

  • S ∈ N1 (initial nonterminal symbol).

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 3

slide-4
SLIDE 4

Example PLCFRS

PLCFRS (G, p) and G = (N, Σ, Ξ, P, S) where

  • N = {VROOT, S, VP, ADV, VAFIN, VAINF, VVINF, PPER, VVPP, $, . . .},
  • Σ = {Nun, werden, sie, umworben, ., . . .} and
  • P = {. . . ,

ADV(Nun) → ε#1, VAFIN(werden) → ε#0, 5, VAINF(werden) → ε#0, 25, VVINF(werden) → ε#0, 25, PPER(sie) → ε#1, VVPP(umworben) → ε#1, $(.) → ε#1, , . . .}

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 4

slide-5
SLIDE 5

Example PLCFRS

PLCFRS (G, p) and G = (N, Σ, Ξ, P, S) where

  • N = {VROOT, S, VP, ADV, VAFIN, VAINF, VVINF, PPER, VVPP, $, . . .},
  • Σ = {Nun, werden, sie, umworben, ., . . .} and
  • P = {. . . ,

VP(X (1)

1 , X (2) 1 ) → ADV(X (1) 1 )VVP(X (2) 1 )#0, 5,

S(X (1)

1 X (2) 1 X (3) 1 ) → VAFIN(X (1) 1 )PPER(X (2) 1 )VVPP(X (3) 1 )#0, 25,

S(X (1)

1 X (2) 1 , X (1) 2 ) → VP(X (1) 1 , X (1) 2 )VAINF(X (2) 1 )#0, 25,

S(X (1)

1 X (2) 1 X (3) 1 X (1) 2 ) → VP(X (1) 1 , X (1) 2 )VAFIN(X (2) 1 )PPER(X (3) 1 )#0, 5,

S(X (1)

1 X (1) 2 X (2) 1 X (1) 3 ) → S(X (1) 1 X (1) 2 , X (1) 3 )PPER(X (2) 1 )#0, 25,

VROOT(X (1)

1 X (1) 2 X (1) 3 X (1) 4 X (2) 1 ) → S(X (1) 1 X (1) 2 X (1) 3 X (1) 4 )$(X (2) 1 )#1

, . . .}

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 4

slide-6
SLIDE 6

PARSE - Weighted Deductive Parsing: Nun werden sie umworben .

Initialize vertives

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5

slide-7
SLIDE 7

PARSE - Weighted Deductive Parsing: Nun werden sie umworben .

hyperedges for ADV(Nun) → ε#1, . . .

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5

slide-8
SLIDE 8

PARSE - Weighted Deductive Parsing: Nun werden sie umworben .

hyperedge for VP(X (1)

1 , X (2) 1 ) → ADV(X (1) 1 )VVP(X (2) 1 )#0, 5 TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5

slide-9
SLIDE 9

PARSE - Weighted Deductive Parsing: Nun werden sie umworben .

hyperedge for S(X (1)

1 X (2) 1 X (3) 1 ) → VAFIN(X (1) 1 )PPER(X (2) 1 )VVPP(X (3) 1 )#0, 25 TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5

slide-10
SLIDE 10

PARSE - Weighted Deductive Parsing: Nun werden sie umworben .

hyperedge for S(X (1)

1 X (2) 1 , X (1) 2 ) → VP(X (1) 1 , X (1) 2 )VAINF(X (2) 1 )#0, 25 TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5

slide-11
SLIDE 11

PARSE - Weighted Deductive Parsing: Nun werden sie umworben .

hyperedge for S(X (1)

1 X (2) 1 X (3) 1 X (1) 2 ) → VP(X (1) 1 , X (1) 2 )VAFIN(X (2) 1 )PPER(X (3) 1 )#0, 5 TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5

slide-12
SLIDE 12

PARSE - Weighted Deductive Parsing: Nun werden sie umworben .

hyperedge for S(X (1)

1 X (1) 2 X (2) 1 X (1) 3 ) → S(X (1) 1 X (1) 2 , X (1) 3 )PPER(X (2) 1 )#0, 25 TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5

slide-13
SLIDE 13

PARSE - Weighted Deductive Parsing: Nun werden sie umworben .

hyperedge for VROOT(X (1)

1 X (1) 2 X (1) 3 X (1) 4 X (2) 1 ) → S(X (1) 1 X (1) 2 X (1) 3 X (1) 4 )$(X (2) 1 )#1 TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5

slide-14
SLIDE 14

PARSE - Weighted Deductive Parsing: Nun werden sie umworben .

Undesired hyperedges

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5

slide-15
SLIDE 15

PARSE - Weighted Deductive Parsing: Nun werden sie umworben .

Prune

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 5

slide-16
SLIDE 16

Motivation

  • How to reduce the parse time for a sentence?

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 6

slide-17
SLIDE 17

Motivation

  • How to reduce the parse time for a sentence?
  • What is a good pruning method?

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 6

slide-18
SLIDE 18

Motivation

  • How to reduce the parse time for a sentence?
  • What is a good pruning method?
  • How to train such a pruning method?

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 6

slide-19
SLIDE 19

Overview

  • Motivation
  • Preliminaries
  • LOLS
  • Change Propagation
  • Dynamic Programming
  • Results

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 7

slide-20
SLIDE 20

Preliminaries

H = (V, E) ∈ H(G,p)(w) : derivation graph from PARSE c ⊂ Σ* × TN(Σ) : X × Y − corpus s : state of the derivation graph a ∈ {keep, prune} : action τ = s0a0s1a1 . . . sT : trajectory

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 8

slide-21
SLIDE 21

Preliminaries

pruning policy π : inputs a hyperedge and a sub sentence w′

  • utputs a pruning decision a ∈ {keep, prune}

How to evaluate π?

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 9

slide-22
SLIDE 22

Preliminaries

pruning policy π : inputs a hyperedge and a sub sentence w′

  • utputs a pruning decision a ∈ {keep, prune}

How to evaluate π? reward function r : H(G,p)(w) × TN(Σ) → R schematically r = accuracy − λ · runtime where accuracy : TN(Σ) × TN(Σ) → R and runtime : H(G,p)(w) → R λ ∈ R : trade-off factor empirical value of π : R(π) = 1 |c|

  • (w,ξ)∈c

r(PARSE(G, w, π), ξ) · c(w, ξ)

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 9

slide-23
SLIDE 23

Preliminaries

trajectory: s0a0s1a1 . . . sT

s1 s2 . . . sT

  • r1[a1]

a1 a2 aT−1

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 10

slide-24
SLIDE 24

Preliminaries

trajectory: s0a0s1a1 . . . sT , (intervention at state s1)

s1 s2 . . . sT

  • r1[a1]

a1 a2 aT−1 s′

2

. . . s′

T

  • r1[a′

1]

a′

2

a′

T−1

a′

1

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 10

slide-25
SLIDE 25

LOLS

Locally Optimal Learning to Search

Algorithm 1 Locally Optimal Learning to Search algorithm by [VE17] and [Cha+15] Input: PLCFRS (G, p) with G = (N, Σ, Ξ, P, S), X × Y-corpus c such that X ⊂ Σ* and Y ⊂ TN(Σ) Output: pruning policy π

1: function LOLS((G, p), c) 2:

π1 := INITIALIZEPOLICY(. . .)

3:

for i := 1 to n do ⊲ n : number of iterations

4:

Qi := / ⊲ Qi : set of state-reward tuples

5:

for (w, ξ) ∈ c do ⊲ w : sentence

6:

τ := ROLL-IN((G, p), w, πi, ξ) ⊲ τ = s0a0s1a1 . . . sT : trajectory

7:

for t := 0 to |τ| − 1 do

8:

for ¯ at ∈ {keep, prune} do ⊲ intervention

9:

  • rt[a′

t] := ROLL-OUT (πi, st, a′ t, ξ)

10:

end for

11:

Qi := Qi ∪ {(st, rt)}

12:

end for

13:

end for

14:

πi+1 := TRAIN(i

k=1 Qk)

⊲ dataset aggregation

15:

end for

16:

return argmaxπj:1<

−j< −n R(πj)

17: end function

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 11

slide-26
SLIDE 26

Overview

  • Motivation
  • Preliminaries
  • LOLS
  • Change Propagation
  • Results

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 12

slide-27
SLIDE 27

Change Propagation

Change pruning bit

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 13

slide-28
SLIDE 28

Change Propagation

Delete witness for {1, 2, 3, 4} and S

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 13

slide-29
SLIDE 29

Change Propagation

Find new witness for {1, 2, 3, 4} and S

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 13

slide-30
SLIDE 30

Change Propagation

Repeat for affected vertices

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 13

slide-31
SLIDE 31

Change Propagation

Done

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 13

slide-32
SLIDE 32

Overview

  • Motivation
  • Preliminaries
  • LOLS
  • Change Propagation
  • Results

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 14

slide-33
SLIDE 33

Accuracy Measure

FN TN TP FP Relevant Elements

precision = |TP| |TP| + |FP| recall = |TP| |TP| + |FN| p(ξ) = r(ξ) =

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 15

slide-34
SLIDE 34

Accuracy Measure

derivation tree by parsing

S NP VP NP NP PP

derivation tree by gold standard

S CNP NP PP

precision = |TP| |TP| + |FP| recall = |TP| |TP| + |FN| p(ξ) = r(ξ) =

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 15

slide-35
SLIDE 35

Accuracy Measure

derivation tree by parsing

S NP VP NP NP PP

derivation tree by gold standard

S CNP NP PP

precision = |TP| |TP| + |FP| recall = |TP| |TP| + |FN| p(ξ) = 3 3 r(ξ) = 3 3

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 15

slide-36
SLIDE 36

Accuracy Measure

derivation tree by parsing

S NP VP NP NP PP

derivation tree by gold standard

S CNP NP PP

precision = |TP| |TP| + |FP| recall = |TP| |TP| + |FN| p(ξ) = 3 3 + 3 r(ξ) = 3 3 + 1

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 15

slide-37
SLIDE 37

Accuracy Measure

derivation tree by parsing

S NP VP NP NP PP

derivation tree by gold standard

S CNP NP PP

precision = |TP| |TP| + |FP| recall = |TP| |TP| + |FN| p(ξ) = 3 3 + 3 = 0, 5 r(ξ) = 3 3 + 1 = 0, 75

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 15

slide-38
SLIDE 38

Setup

accuracy(ξ, ζ) = 2 · p(ξ, ζ) · r(ξ, ζ) p(ξ, ζ) + r(ξ, ζ) F1-Measure, runtime(H) = |E| for H = (V, E) λ ∈ [0, 1]

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 16

slide-39
SLIDE 39

Results

1e−04 1e−03 1e−02 1e−01 1e+00 65 70 75 80 85 90 trade−off factor accuracy in %

(a) accuracy for λ

1e−04 1e−03 1e−02 1e−01 1e+00 60 80 100 120 140 trade−off factor runtime in s

(b) runtime for λ

Figure : runtime and accuracy for given lambda

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 17

slide-40
SLIDE 40

References I

Umut A. Acar and Ruy Ley-Wild. “Self-adjusting Computation with Delta ML”. In: Advanced Functional Programming: 6th International School, AFP 2008, Heijen, The Netherlands, May 2008, Revised Lectures. Ed. by Pieter Koopman, Rinus Plasmeijer, and Doaitse Swierstra. Springer Berlin Heidelberg, 2009, pp. 1–38. ISBN: 978-3-642-04652-0. DOI: 10.1007/978-3-642-04652-0_1.

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 18

slide-41
SLIDE 41

References II

Kai-Wei Chang et al. “Learning to Search Better than Your Teacher”. In: Proceedings of the 32nd International Conference on Machine Learning (ICML-15). Ed. by David Blei and Francis Bach. JMLR Workshop and Conference Proceedings, 2015, pp. 2058–2066. Andreas van Cranenburgh, Remko Scha, and Rens Bod. “Data-Oriented Parsing with discontinuous constituents and function tags”. In: Journal of Language Modelling 4.1 (2016),

  • pp. 57–111. URL:

http://dx.doi.org/10.15398/jlm.v4i1.100.

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 19

slide-42
SLIDE 42

References III

Laura Kallmeyer. Parsing Beyond Context-Free

  • Grammars. Springer Publishing Company,

Incorporated, 2012. ISBN: 3642264530, 9783642264535. Laura Kallmeyer and Wolfgang Maier. “Data-driven Parsing with Probabilistic Linear Context-free Rewriting Systems”. In: Proceedings of the 23rd International Conference

  • n Computational Linguistics. COLING ’10.

Beijing, China: Association for Computational Linguistics, 2010, pp. 537–545. URL: http://dl. acm.org/citation.cfm?id=1873781.1873842.

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 20

slide-43
SLIDE 43

References IV

Yuki Kato, Hiroyuki Seki, and Tadao Kasami. “Stochastic Multiple Context-free Grammar for RNA Pseudoknot Modeling”. In: Proceedings of the Eighth International Workshop on Tree Adjoining Grammar and Related Formalisms. TAGRF ’06. Sydney, Australia: Association for Computational Linguistics, 2006, pp. 57–64. ISBN: 1-932432-85-X. URL: http://dl.acm.org/ citation.cfm?id=1654690.1654698. Mark-Jan Nederhof. “Weighted deductive parsing and Knuth’s algorithm”. In: Computational Linguistics 29.1 (2003), pp. 135–143.

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 21

slide-44
SLIDE 44

References V

David M. W. Powers. “Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation”. In: Journal of Machine Learning Technologies 2.1 (2011),

  • pp. 37–63. ISSN: 2229-3981 & 2229-399X.

Stéphane Ross, Geoffrey J. Gordon, and

  • J. Andrew Bagnell. “No-Regret Reductions for

Imitation Learning and Structured Prediction”. In: CoRR abs/1011.0686 (2010). Tim Vieira and Jason Eisner. “Learning to Prune: Exploring the Frontier of Fast and Accurate Pasring”. In: Transactions of the Association for Computational Linguistics (TACL) 5 (Feb. 2017).

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 22

slide-45
SLIDE 45

References VI

  • K. Vijay-Shanker, David J. Weir, and

Aravind K. Joshi. “Characterizing Structural Descriptions Produced by Various Grammatical Formalisms”. In: Proceedings of the 25th Annual Meeting on Association for Computational

  • Linguistics. ACL ’87. Stanford, California:

Association for Computational Linguistics, 1987,

  • pp. 104–111. DOI: 10.3115/981175.981190. URL:

https://doi.org/10.3115/981175.981190.

TU Dresden, July 20, 2018 Learning Pruning Policies for Linear Context-free Rewriting Systems slide 23