Training Deterministic Parsers with Non-Deterministic Oracles by - - PowerPoint PPT Presentation

training deterministic parsers with non deterministic
SMART_READER_LITE
LIVE PREVIEW

Training Deterministic Parsers with Non-Deterministic Oracles by - - PowerPoint PPT Presentation

Training Deterministic Parsers with Non-Deterministic Oracles by Yoav Goldberg and Joakim Nivre, 2013 Seminarvortrag Pius Meinert July 13, 2018 Training Deterministic Parsers with Non-Deterministic Oracles root PRD SBJ DOBJ IOBJ


slide-1
SLIDE 1

“Training Deterministic Parsers with Non-Deterministic Oracles”

by Yoav Goldberg and Joakim Nivre, 2013

Seminarvortrag Pius Meinert July 13, 2018

slide-2
SLIDE 2

Training Deterministic Parsers with Non-Deterministic Oracles

root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DET

1

slide-3
SLIDE 3

Training Deterministic Parsers with Non-Deterministic Oracles

root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DET

1

slide-4
SLIDE 4

Transition System

Defjnition (Transition System) A transition system for dependency parsing is a quadruple S = (C, T, cs, Ct), where

  • 1. C is a set (confjgurations),
  • 2. T is a set of transitions, each of which is a (partial)

function t : C → C,

  • 3. cs is an initialization function, mapping sentence

w = w1w2...wn to a confjguration c ∈ C,

  • 4. Ct ⊆ C (terminal confjgurations).

2

slide-5
SLIDE 5

Training Deterministic Parsers with Non-Deterministic Oracles

root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DET cs w root He1 wrote2 her3 a4 letter5

3

slide-6
SLIDE 6

Training Deterministic Parsers with Non-Deterministic Oracles

root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DET cs(w) [root], [He1, wrote2, her3, a4, letter5], {}

3

slide-7
SLIDE 7

Training Deterministic Parsers with Non-Deterministic Oracles

root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DET [root], [He1, wrote2, her3, a4, letter5] Shift [root, He1], [wrote2, her3, a4, letter5]

3

slide-8
SLIDE 8

Training Deterministic Parsers with Non-Deterministic Oracles

root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DET [root, He1], [wrote2, her3, a4, letter5] LeftSBJ [root], [wrote2, her3, a4, letter5]

3

slide-9
SLIDE 9

Training Deterministic Parsers with Non-Deterministic Oracles

root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DET [root], [wrote2, her3, a4, letter5] RightPRD [root, wrote2], [her3, a4, letter5]

3

slide-10
SLIDE 10

Training Deterministic Parsers with Non-Deterministic Oracles

root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DET [root, wrote2], [her3, a4, letter5] RightIOBJ, Shift, LeftDET, Reduce, RightDOBJ [root, wrote2, letter5], [ ] ∈ Ct

3

slide-11
SLIDE 11

Training Deterministic Parsers with Non-Deterministic Oracles

1 if c = (σ|i, j|β, A) and (j, i) ∈ T then 2

t ← Left

3 else if c = (σ|i, j|β, A) and (i, j) ∈ T then 4

t ← Right

5 else if c = (σ|i, j|β, A) and ∃k[k < i ∧ [(k, j) ∈ T ∨ (j, k) ∈ T]]

then

6

t ← Reduce

7 else 8

t ← Shift

9 return t 4

slide-12
SLIDE 12

Greedy Classifjer-based Parsing

1 2

c ← cs(w)

3

while c / ∈ Ct do

4

tp ← arg maxt∈Legal(c) w · φ(c, t)

5 6 7 8 9 10 11

c ← tp(c)

12 return Ac 5

slide-13
SLIDE 13

Training Deterministic Parsers with Non-Deterministic Oracles

1 for (w, T) ∈ d do 2

c ← cs(w)

3

while c / ∈ Ct do

4

tp ← arg maxt∈Legal(c) w · φ(c, t)

5

Correct(c) ← {t | o(t; c, T) = true}

6

to ← arg maxt∈Correct(c) w · φ(c, t)

7

if tp / ∈ Correct(c) then

8

Update(w, φ(c, to), φ(c, tp))

9

c ← to(c)

10

else

11

c ← tp(c)

12 return w 6

slide-14
SLIDE 14

Training Deterministic Parsers with Non-Deterministic Oracles

root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DET SH, LASBJ, RAPRD, RAIOBJ, SH, LADET, RE, RADOBJ SH LASBJ RAPRD RAIOBJ RE SH LADET RADOBJ spurious ambiguity requires non-deterministic oracle instead of static oracle

7

slide-15
SLIDE 15

Training Deterministic Parsers with Non-Deterministic Oracles

root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DET SH, LASBJ, RAPRD, RAIOBJ, SH, LADET, RE, RADOBJ SH, LASBJ, RAPRD, RAIOBJ, RE, SH, LADET, RADOBJ → spurious ambiguity requires non-deterministic oracle instead of static oracle

7

slide-16
SLIDE 16

... with Non-Deterministic and Complete Oracles

root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ DET [root], [He1, wrote2, her3, a4, letter5] SH, LASBJ, RAPRD, SH [root, wrote2, her3], [a4, letter5] error propagation can be mitigated by complete oracle dynamic oracle: non-deterministic + complete

8

slide-17
SLIDE 17

... with Non-Deterministic and Complete Oracles

root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ DET [root, wrote2, her3], [a4, letter5] SH, LADET, SH [root, wrote2, her3, letter5], [ ] ∈ Ct → error propagation can be mitigated by complete oracle → dynamic oracle: non-deterministic + complete

8

slide-18
SLIDE 18

Training (Standard)

1 for (w, T) ∈ d do 2

c ← cs(w)

3

while c / ∈ Ct do

4

tp ← arg maxt∈Legal(c) w · φ(c, t)

5

Correct(c) ← {t | o(t; c, T) = true}

6

to ← arg maxt∈Correct(c) w · φ(c, t)

7

if tp / ∈ Correct(c) then

8

Update(w, φ(c, to), φ(c, tp))

9

c ← to(c)

10

else

11

c ← tp(c)

12 return w 9

slide-19
SLIDE 19

Training with Exploration

1 for (w, T) ∈ d do 2

c ← cs(w)

3

while c / ∈ Ct do

4

tp ← arg maxt∈Legal(c) w · φ(c, t)

5

Optimal(c) ← {t | o(t; c, T) = true}

6

to ← arg maxt∈Optimal(c) w · φ(c, t)

7

if tp / ∈ Optimal(c) then

8

Update(w, φ(c, to), φ(c, tp))

9

c ← Explore(c, to, tp)

10

else

11

c ← tp(c)

12 return w 10

slide-20
SLIDE 20

Optimality / Transition Costs

root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DOBJ DET A T 2

  • d c T

t t c T

11

slide-21
SLIDE 21

Optimality / Transition Costs

root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DOBJ DET C(A, T) = 2

  • d c T

t t c T

11

slide-22
SLIDE 22

Optimality / Transition Costs

root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DOBJ DET [root, wrote2, her3], [a4, letter5]

  • d c T

t t c T

11

slide-23
SLIDE 23

Optimality / Transition Costs

root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ DOBJ IOBJ IOBJ DOBJ DET min

A:cA C(A, T) = 0

  • d c T

t t c T

11

slide-24
SLIDE 24

Optimality / Transition Costs

root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DOBJ DET [root, wrote2, her3], [a4, letter5] SH, ...

  • d c T

t t c T

11

slide-25
SLIDE 25

Optimality / Transition Costs

root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ DOBJ IOBJ DOBJ DET C(Shift; c, T) = min

A:t(c)A C(A, T) − min A:cA C(A, T) = 1

  • d c T

t t c T

11

slide-26
SLIDE 26

Optimality / Transition Costs

root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ DOBJ IOBJ DOBJ DET C(Shift; c, T) = min

A:t(c)A C(A, T) − min A:cA C(A, T) = 1

  • d(c, T) = {t | C(t; c, T) = 0}

11

slide-27
SLIDE 27

Arc Decomposition - Defjnition

Defjnition (Tree Consistency) A set of arcs A is said to be tree consistent if there exists a projective dependency tree T such that A ⊆ T. Defjnition (Arc Decomposition) A transition system is said to be arc decomposable if, for every tree consistent arc set A and confjguration c, c A is entailed by c (h, d) for every arc (h, d) ∈ A.

12

slide-28
SLIDE 28

Arc Decomposition - Arc-Standard Counterexample

c = ([a, b, c], β) a b c Arc-Standard Transitions Left[(σ|s1|s0, β, A)] = (σ|s0, β, A ∪ {(s0, s1)}) Right[(σ|s1|s0, β, A)] = (σ|s1, β, A ∪ {(s1, s0)}) Shift[(σ, b|β, A)] = (σ|b, β, A)

13

slide-29
SLIDE 29

Arc Decomposition - Arc-Standard Counterexample

c = ([a, b, c], β)

Left

⊢ ([a, c], β) a b c Arc-Standard Transitions Left[(σ|s1|s0, β, A)] = (σ|s0, β, A ∪ {(s0, s1)}) Right[(σ|s1|s0, β, A)] = (σ|s1, β, A ∪ {(s1, s0)}) Shift[(σ, b|β, A)] = (σ|b, β, A)

13

slide-30
SLIDE 30

Arc Decomposition - Arc-Standard Counterexample

c = ([a, b, c], β)

Right

⊢ ([a, b], β)

Left

⊢ ([b], β) a b c Arc-Standard Transitions Left[(σ|s1|s0, β, A)] = (σ|s0, β, A ∪ {(s0, s1)}) Right[(σ|s1|s0, β, A)] = (σ|s1, β, A ∪ {(s1, s0)}) Shift[(σ, b|β, A)] = (σ|b, β, A)

13

slide-31
SLIDE 31

Arc Decomposition - Arc-Eager Proof Sketch

Given: arbitrary confjguration c = (σ, β, A) and tree consistent arc set A′ such that all arc are reachable from c. To show: c A′ B = {(h, d) | h, d / ∈ β} B = {(h, d) | h, d ∈ β} Bh = {(h, d) | h ∈ β, d ∈ σ} Bd = {(h, d) | d ∈ β, h ∈ σ}

14

slide-32
SLIDE 32

Arc Decomposition - Arc-Eager Proof Sketch

1 2 3 4 5 6 7 8

β σ

B = {(h, d) | h, d / ∈ β} B = {(h, d) | h, d ∈ β} Bh = {(h, d) | h ∈ β, d ∈ σ} Bd = {(h, d) | d ∈ β, h ∈ σ}

14

slide-33
SLIDE 33

Arc Decomposition - Arc-Eager Proof Sketch

1 2 3 4 5 6 7 8

β σ

B = {(h, d) | h, d / ∈ β} B = {(h, d) | h, d ∈ β} Bh = {(h, d) | h ∈ β, d ∈ σ} Bd = {(h, d) | d ∈ β, h ∈ σ}

14

slide-34
SLIDE 34

Arc Decomposition - Arc-Eager Proof Sketch

1 2 3 4 5 6 7 8

β σ

B = {(h, d) | h, d / ∈ β} B = {(h, d) | h, d ∈ β} Bh = {(h, d) | h ∈ β, d ∈ σ} Bd = {(h, d) | d ∈ β, h ∈ σ}

14

slide-35
SLIDE 35

Arc Decomposition - Arc-Eager Proof Sketch

1 2 3 4 5 6 7 8

β σ

B = {(h, d) | h, d / ∈ β} B = {(h, d) | h, d ∈ β} Bh = {(h, d) | h ∈ β, d ∈ σ} Bd = {(h, d) | d ∈ β, h ∈ σ}

14

slide-36
SLIDE 36

Arc Decomposition - Arc-Eager Proof Sketch

1 2 3 4 5 6 7 8

β σ

B = {(h, d) | h, d / ∈ β} B = {(h, d) | h, d ∈ β} Bh = {(h, d) | h ∈ β, d ∈ σ} Bd = {(h, d) | d ∈ β, h ∈ σ}

14

slide-37
SLIDE 37

Dynamic Oracle

  • d(c, T) = {t | C(t; c, T) = 0}

C(t; c, T) = minA:t(c)A C(A, T) − minA:cA C(A, T) Effjciently compute transition costs:

  • 1. Intersect set of individually reachable arcs with goal arc set.
  • 2. Gain set of individually reachable goal arcs and thusly,

reachable goal arc set.

  • 3. See how a given transition affects this set of reachable arcs.

15

slide-38
SLIDE 38

Transition Systems

✓ Arc-Eager

  • Nivre 2003
  • Goldberg and Nivre 2012

✗ Arc-Standard

  • Nivre 2004
  • Goldberg, Sartorio, and Satta 2014

✓ Hybrid

  • Kuhlmann, Gómez-Rodríguez, and Satta 2011

✓ Easy-First

  • Goldberg and Elhadad 2010

16

slide-39
SLIDE 39

Conclusion

  • spurious ambiguity

static → non-deterministic oracle

  • error propagation

incomplete → complete oracle

  • dynamic oracle (non-deterministic + complete)

arc decomposability

  • good runtime
  • ptimization during training
  • experiments show improved accuracy

17

slide-40
SLIDE 40

References i

Yoav Goldberg and Michael Elhadad. “An Effjcient Algorithm for Easy-First Non-Directional Dependency Parsing”. In: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, June 2-4, 2010, Los Angeles, California,

  • USA. 2010, pp. 742–750. url: http:

//www.aclweb.org/anthology/N10-1115.

slide-41
SLIDE 41

References ii

Yoav Goldberg and Joakim Nivre. “A Dynamic Oracle for Arc-Eager Dependency Parsing”. In: COLING 2012, 24th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, 8-15 December 2012, Mumbai, India. 2012,

  • pp. 959–976. url:

http://aclweb.org/anthology/C/C12/C12- 1059.pdf. Yoav Goldberg and Joakim Nivre. “Training Deterministic Parsers with Non-Deterministic Oracles”. In: TACL 1 (2013), pp. 403–414. url: https://tacl2013.cs.columbia.edu/ojs/ index.php/tacl/article/view/145.

slide-42
SLIDE 42

References iii

Yoav Goldberg, Francesco Sartorio, and Giorgio Satta. “A Tabular Method for Dynamic Oracles in Transition-Based Parsing”. In: TACL 2 (2014),

  • pp. 119–130. url:

https://tacl2013.cs.columbia.edu/ojs/ index.php/tacl/article/view/302.

slide-43
SLIDE 43

References iv

Marco Kuhlmann, Carlos Gómez-Rodríguez, and Giorgio Satta. “Dynamic Programming Algorithms for Transition-Based Dependency Parsers”. In: The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA. 2011, pp. 673–682. url: http: //www.aclweb.org/anthology/P11-1068. Joakim Nivre. “An Effjcient Algorithm for Projective Dependency Parsing”. In: Proceedings of the 8th International Workshop on Parsing Technologies (IWPT). 2003, pp. 149–160.

slide-44
SLIDE 44

References v

Joakim Nivre. “Incrementality in Deterministic Dependency Parsing”. In: Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together. 2004. url: http://www.aclweb.org/anthology/W04- 0308. Joakim Nivre. “Algorithms for Deterministic Incremental Dependency Parsing”. In: Computational Linguistics 34.4 (2008), pp. 513–553. doi: 10.1162/coli.07-056-R1-07-027. url: https: //doi.org/10.1162/coli.07-056-R1-07-027.

slide-45
SLIDE 45

Transition Costs - Arc-Eager

C(Shift; (σ, b|β, A), T) = |{(k, b) ∈ T | k ∈ σ} ∪ {(b, k) ∈ T | k ∈ σ ∧ ∀x ∈ V : (x, k) / ∈ A}| C(Right; (σ|s, b|β, A), T) = |{(k, b) ∈ T | k ∈ σ ∪ β} ∪ {(b, k) ∈ T | k ∈ σ ∧ ∀x ∈ V : (x, k) / ∈ A}| C(Left; (σ|s, b|β, A), T) = |{(k, s) ∈ T | k ∈ β} ∪ {(s, k) ∈ T | k ∈ β}| C(Reduce; (σ|s, β, A), T) = |{(s, k) ∈ T | k ∈ β}|

slide-46
SLIDE 46

Transition Costs - Hybrid

C(Shift; (σ|s1|s0, b|β, A), T) = |{(b, k) ∈ T | k ∈ {s0, s1} ∪ σ} ∪ {(k, b) ∈ T | k ∈ {s1} ∪ σ}| C(Right; (σ|s1|s0, β, A), T) = |{(s0, k) ∈ T | k ∈ β} ∪ {(k, s0) ∈ T | k ∈ β}| C(Left; (σ|s1|s0, b|β, A), T) = |{(s0, k) ∈ T | k ∈ {b} ∪ β} ∪ {(k, s0) ∈ T | k ∈ {s1} ∪ β}|

slide-47
SLIDE 47

Transition Costs - Easy-First

C(Tr; (λ, A), T) = |{(h′, d) ∈ T | h′ ∈ λ ∧ h′ = h} ∪ {(d, d′) ∈ T | d′ ∈ λ}| Tr ∈ {Lefti

lb | 1 < i ≤ |λ|} ∪ {Righti lb | 1 ≤ i < |λ|} and (h, d) added by Tr