SLIDE 1
Training Deterministic Parsers with Non-Deterministic Oracles by - - PowerPoint PPT Presentation
Training Deterministic Parsers with Non-Deterministic Oracles by - - PowerPoint PPT Presentation
Training Deterministic Parsers with Non-Deterministic Oracles by Yoav Goldberg and Joakim Nivre, 2013 Seminarvortrag Pius Meinert July 13, 2018 Training Deterministic Parsers with Non-Deterministic Oracles root PRD SBJ DOBJ IOBJ
SLIDE 2
SLIDE 3
Training Deterministic Parsers with Non-Deterministic Oracles
root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DET
1
SLIDE 4
Transition System
Defjnition (Transition System) A transition system for dependency parsing is a quadruple S = (C, T, cs, Ct), where
- 1. C is a set (confjgurations),
- 2. T is a set of transitions, each of which is a (partial)
function t : C → C,
- 3. cs is an initialization function, mapping sentence
w = w1w2...wn to a confjguration c ∈ C,
- 4. Ct ⊆ C (terminal confjgurations).
2
SLIDE 5
Training Deterministic Parsers with Non-Deterministic Oracles
root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DET cs w root He1 wrote2 her3 a4 letter5
3
SLIDE 6
Training Deterministic Parsers with Non-Deterministic Oracles
root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DET cs(w) [root], [He1, wrote2, her3, a4, letter5], {}
3
SLIDE 7
Training Deterministic Parsers with Non-Deterministic Oracles
root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DET [root], [He1, wrote2, her3, a4, letter5] Shift [root, He1], [wrote2, her3, a4, letter5]
3
SLIDE 8
Training Deterministic Parsers with Non-Deterministic Oracles
root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DET [root, He1], [wrote2, her3, a4, letter5] LeftSBJ [root], [wrote2, her3, a4, letter5]
3
SLIDE 9
Training Deterministic Parsers with Non-Deterministic Oracles
root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DET [root], [wrote2, her3, a4, letter5] RightPRD [root, wrote2], [her3, a4, letter5]
3
SLIDE 10
Training Deterministic Parsers with Non-Deterministic Oracles
root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DET [root, wrote2], [her3, a4, letter5] RightIOBJ, Shift, LeftDET, Reduce, RightDOBJ [root, wrote2, letter5], [ ] ∈ Ct
3
SLIDE 11
Training Deterministic Parsers with Non-Deterministic Oracles
1 if c = (σ|i, j|β, A) and (j, i) ∈ T then 2
t ← Left
3 else if c = (σ|i, j|β, A) and (i, j) ∈ T then 4
t ← Right
5 else if c = (σ|i, j|β, A) and ∃k[k < i ∧ [(k, j) ∈ T ∨ (j, k) ∈ T]]
then
6
t ← Reduce
7 else 8
t ← Shift
9 return t 4
SLIDE 12
Greedy Classifjer-based Parsing
1 2
c ← cs(w)
3
while c / ∈ Ct do
4
tp ← arg maxt∈Legal(c) w · φ(c, t)
5 6 7 8 9 10 11
c ← tp(c)
12 return Ac 5
SLIDE 13
Training Deterministic Parsers with Non-Deterministic Oracles
1 for (w, T) ∈ d do 2
c ← cs(w)
3
while c / ∈ Ct do
4
tp ← arg maxt∈Legal(c) w · φ(c, t)
5
Correct(c) ← {t | o(t; c, T) = true}
6
to ← arg maxt∈Correct(c) w · φ(c, t)
7
if tp / ∈ Correct(c) then
8
Update(w, φ(c, to), φ(c, tp))
9
c ← to(c)
10
else
11
c ← tp(c)
12 return w 6
SLIDE 14
Training Deterministic Parsers with Non-Deterministic Oracles
root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DET SH, LASBJ, RAPRD, RAIOBJ, SH, LADET, RE, RADOBJ SH LASBJ RAPRD RAIOBJ RE SH LADET RADOBJ spurious ambiguity requires non-deterministic oracle instead of static oracle
7
SLIDE 15
Training Deterministic Parsers with Non-Deterministic Oracles
root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DET SH, LASBJ, RAPRD, RAIOBJ, SH, LADET, RE, RADOBJ SH, LASBJ, RAPRD, RAIOBJ, RE, SH, LADET, RADOBJ → spurious ambiguity requires non-deterministic oracle instead of static oracle
7
SLIDE 16
... with Non-Deterministic and Complete Oracles
root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ DET [root], [He1, wrote2, her3, a4, letter5] SH, LASBJ, RAPRD, SH [root, wrote2, her3], [a4, letter5] error propagation can be mitigated by complete oracle dynamic oracle: non-deterministic + complete
8
SLIDE 17
... with Non-Deterministic and Complete Oracles
root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ DET [root, wrote2, her3], [a4, letter5] SH, LADET, SH [root, wrote2, her3, letter5], [ ] ∈ Ct → error propagation can be mitigated by complete oracle → dynamic oracle: non-deterministic + complete
8
SLIDE 18
Training (Standard)
1 for (w, T) ∈ d do 2
c ← cs(w)
3
while c / ∈ Ct do
4
tp ← arg maxt∈Legal(c) w · φ(c, t)
5
Correct(c) ← {t | o(t; c, T) = true}
6
to ← arg maxt∈Correct(c) w · φ(c, t)
7
if tp / ∈ Correct(c) then
8
Update(w, φ(c, to), φ(c, tp))
9
c ← to(c)
10
else
11
c ← tp(c)
12 return w 9
SLIDE 19
Training with Exploration
1 for (w, T) ∈ d do 2
c ← cs(w)
3
while c / ∈ Ct do
4
tp ← arg maxt∈Legal(c) w · φ(c, t)
5
Optimal(c) ← {t | o(t; c, T) = true}
6
to ← arg maxt∈Optimal(c) w · φ(c, t)
7
if tp / ∈ Optimal(c) then
8
Update(w, φ(c, to), φ(c, tp))
9
c ← Explore(c, to, tp)
10
else
11
c ← tp(c)
12 return w 10
SLIDE 20
Optimality / Transition Costs
root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DOBJ DET A T 2
- d c T
t t c T
11
SLIDE 21
Optimality / Transition Costs
root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DOBJ DET C(A, T) = 2
- d c T
t t c T
11
SLIDE 22
Optimality / Transition Costs
root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DOBJ DET [root, wrote2, her3], [a4, letter5]
- d c T
t t c T
11
SLIDE 23
Optimality / Transition Costs
root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ DOBJ IOBJ IOBJ DOBJ DET min
A:cA C(A, T) = 0
- d c T
t t c T
11
SLIDE 24
Optimality / Transition Costs
root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ IOBJ DOBJ DET [root, wrote2, her3], [a4, letter5] SH, ...
- d c T
t t c T
11
SLIDE 25
Optimality / Transition Costs
root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ DOBJ IOBJ DOBJ DET C(Shift; c, T) = min
A:t(c)A C(A, T) − min A:cA C(A, T) = 1
- d c T
t t c T
11
SLIDE 26
Optimality / Transition Costs
root He 1 wrote 2 her 3 a 4 letter 5 PRD SBJ DOBJ DOBJ IOBJ DOBJ DET C(Shift; c, T) = min
A:t(c)A C(A, T) − min A:cA C(A, T) = 1
- d(c, T) = {t | C(t; c, T) = 0}
11
SLIDE 27
Arc Decomposition - Defjnition
Defjnition (Tree Consistency) A set of arcs A is said to be tree consistent if there exists a projective dependency tree T such that A ⊆ T. Defjnition (Arc Decomposition) A transition system is said to be arc decomposable if, for every tree consistent arc set A and confjguration c, c A is entailed by c (h, d) for every arc (h, d) ∈ A.
12
SLIDE 28
Arc Decomposition - Arc-Standard Counterexample
c = ([a, b, c], β) a b c Arc-Standard Transitions Left[(σ|s1|s0, β, A)] = (σ|s0, β, A ∪ {(s0, s1)}) Right[(σ|s1|s0, β, A)] = (σ|s1, β, A ∪ {(s1, s0)}) Shift[(σ, b|β, A)] = (σ|b, β, A)
13
SLIDE 29
Arc Decomposition - Arc-Standard Counterexample
c = ([a, b, c], β)
Left
⊢ ([a, c], β) a b c Arc-Standard Transitions Left[(σ|s1|s0, β, A)] = (σ|s0, β, A ∪ {(s0, s1)}) Right[(σ|s1|s0, β, A)] = (σ|s1, β, A ∪ {(s1, s0)}) Shift[(σ, b|β, A)] = (σ|b, β, A)
13
SLIDE 30
Arc Decomposition - Arc-Standard Counterexample
c = ([a, b, c], β)
Right
⊢ ([a, b], β)
Left
⊢ ([b], β) a b c Arc-Standard Transitions Left[(σ|s1|s0, β, A)] = (σ|s0, β, A ∪ {(s0, s1)}) Right[(σ|s1|s0, β, A)] = (σ|s1, β, A ∪ {(s1, s0)}) Shift[(σ, b|β, A)] = (σ|b, β, A)
13
SLIDE 31
Arc Decomposition - Arc-Eager Proof Sketch
Given: arbitrary confjguration c = (σ, β, A) and tree consistent arc set A′ such that all arc are reachable from c. To show: c A′ B = {(h, d) | h, d / ∈ β} B = {(h, d) | h, d ∈ β} Bh = {(h, d) | h ∈ β, d ∈ σ} Bd = {(h, d) | d ∈ β, h ∈ σ}
14
SLIDE 32
Arc Decomposition - Arc-Eager Proof Sketch
1 2 3 4 5 6 7 8
β σ
B = {(h, d) | h, d / ∈ β} B = {(h, d) | h, d ∈ β} Bh = {(h, d) | h ∈ β, d ∈ σ} Bd = {(h, d) | d ∈ β, h ∈ σ}
14
SLIDE 33
Arc Decomposition - Arc-Eager Proof Sketch
1 2 3 4 5 6 7 8
β σ
B = {(h, d) | h, d / ∈ β} B = {(h, d) | h, d ∈ β} Bh = {(h, d) | h ∈ β, d ∈ σ} Bd = {(h, d) | d ∈ β, h ∈ σ}
14
SLIDE 34
Arc Decomposition - Arc-Eager Proof Sketch
1 2 3 4 5 6 7 8
β σ
B = {(h, d) | h, d / ∈ β} B = {(h, d) | h, d ∈ β} Bh = {(h, d) | h ∈ β, d ∈ σ} Bd = {(h, d) | d ∈ β, h ∈ σ}
14
SLIDE 35
Arc Decomposition - Arc-Eager Proof Sketch
1 2 3 4 5 6 7 8
β σ
B = {(h, d) | h, d / ∈ β} B = {(h, d) | h, d ∈ β} Bh = {(h, d) | h ∈ β, d ∈ σ} Bd = {(h, d) | d ∈ β, h ∈ σ}
14
SLIDE 36
Arc Decomposition - Arc-Eager Proof Sketch
1 2 3 4 5 6 7 8
β σ
B = {(h, d) | h, d / ∈ β} B = {(h, d) | h, d ∈ β} Bh = {(h, d) | h ∈ β, d ∈ σ} Bd = {(h, d) | d ∈ β, h ∈ σ}
14
SLIDE 37
Dynamic Oracle
- d(c, T) = {t | C(t; c, T) = 0}
C(t; c, T) = minA:t(c)A C(A, T) − minA:cA C(A, T) Effjciently compute transition costs:
- 1. Intersect set of individually reachable arcs with goal arc set.
- 2. Gain set of individually reachable goal arcs and thusly,
reachable goal arc set.
- 3. See how a given transition affects this set of reachable arcs.
15
SLIDE 38
Transition Systems
✓ Arc-Eager
- Nivre 2003
- Goldberg and Nivre 2012
✗ Arc-Standard
- Nivre 2004
- Goldberg, Sartorio, and Satta 2014
✓ Hybrid
- Kuhlmann, Gómez-Rodríguez, and Satta 2011
✓ Easy-First
- Goldberg and Elhadad 2010
16
SLIDE 39
Conclusion
- spurious ambiguity
static → non-deterministic oracle
- error propagation
incomplete → complete oracle
- dynamic oracle (non-deterministic + complete)
arc decomposability
- good runtime
- ptimization during training
- experiments show improved accuracy
17
SLIDE 40
References i
Yoav Goldberg and Michael Elhadad. “An Effjcient Algorithm for Easy-First Non-Directional Dependency Parsing”. In: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, June 2-4, 2010, Los Angeles, California,
- USA. 2010, pp. 742–750. url: http:
//www.aclweb.org/anthology/N10-1115.
SLIDE 41
References ii
Yoav Goldberg and Joakim Nivre. “A Dynamic Oracle for Arc-Eager Dependency Parsing”. In: COLING 2012, 24th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, 8-15 December 2012, Mumbai, India. 2012,
- pp. 959–976. url:
http://aclweb.org/anthology/C/C12/C12- 1059.pdf. Yoav Goldberg and Joakim Nivre. “Training Deterministic Parsers with Non-Deterministic Oracles”. In: TACL 1 (2013), pp. 403–414. url: https://tacl2013.cs.columbia.edu/ojs/ index.php/tacl/article/view/145.
SLIDE 42
References iii
Yoav Goldberg, Francesco Sartorio, and Giorgio Satta. “A Tabular Method for Dynamic Oracles in Transition-Based Parsing”. In: TACL 2 (2014),
- pp. 119–130. url:
https://tacl2013.cs.columbia.edu/ojs/ index.php/tacl/article/view/302.
SLIDE 43
References iv
Marco Kuhlmann, Carlos Gómez-Rodríguez, and Giorgio Satta. “Dynamic Programming Algorithms for Transition-Based Dependency Parsers”. In: The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA. 2011, pp. 673–682. url: http: //www.aclweb.org/anthology/P11-1068. Joakim Nivre. “An Effjcient Algorithm for Projective Dependency Parsing”. In: Proceedings of the 8th International Workshop on Parsing Technologies (IWPT). 2003, pp. 149–160.
SLIDE 44
References v
Joakim Nivre. “Incrementality in Deterministic Dependency Parsing”. In: Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together. 2004. url: http://www.aclweb.org/anthology/W04- 0308. Joakim Nivre. “Algorithms for Deterministic Incremental Dependency Parsing”. In: Computational Linguistics 34.4 (2008), pp. 513–553. doi: 10.1162/coli.07-056-R1-07-027. url: https: //doi.org/10.1162/coli.07-056-R1-07-027.
SLIDE 45
Transition Costs - Arc-Eager
C(Shift; (σ, b|β, A), T) = |{(k, b) ∈ T | k ∈ σ} ∪ {(b, k) ∈ T | k ∈ σ ∧ ∀x ∈ V : (x, k) / ∈ A}| C(Right; (σ|s, b|β, A), T) = |{(k, b) ∈ T | k ∈ σ ∪ β} ∪ {(b, k) ∈ T | k ∈ σ ∧ ∀x ∈ V : (x, k) / ∈ A}| C(Left; (σ|s, b|β, A), T) = |{(k, s) ∈ T | k ∈ β} ∪ {(s, k) ∈ T | k ∈ β}| C(Reduce; (σ|s, β, A), T) = |{(s, k) ∈ T | k ∈ β}|
SLIDE 46
Transition Costs - Hybrid
C(Shift; (σ|s1|s0, b|β, A), T) = |{(b, k) ∈ T | k ∈ {s0, s1} ∪ σ} ∪ {(k, b) ∈ T | k ∈ {s1} ∪ σ}| C(Right; (σ|s1|s0, β, A), T) = |{(s0, k) ∈ T | k ∈ β} ∪ {(k, s0) ∈ T | k ∈ β}| C(Left; (σ|s1|s0, b|β, A), T) = |{(s0, k) ∈ T | k ∈ {b} ∪ β} ∪ {(k, s0) ∈ T | k ∈ {s1} ∪ β}|
SLIDE 47