Fast(er) Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set
Tianze Shi* Liang Huangβ Lillian Lee*
* Cornell University
β Oregon State
University
Fast(er) Exact Decoding and Global Training for Transition-Based - - PowerPoint PPT Presentation
Fast(er) Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set Tianze Shi* Liang Huang Lillian Lee* * Cornell Oregon State University University Short Version Transition-based
Tianze Shi* Liang Huangβ Lillian Lee*
* Cornell University
β Oregon State
University
exponentially-large search space
2
nsubj root mark xcomp
det INPUT OUTPUT
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 3
Initial state Terminal states
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 4
Initial state Terminal states
Goal:
max score( )
β¦
= max β score( )
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 5
Kuhlmann, GΓ³mez-RodrΓguez and Satta, 2011) β¦ since transition (sub-)sequences can be reused
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 6
Initial state Terminal states
Goal:
max score( )
β¦
= max β score( )
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 7
Exponential to polynomial
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 8
DP Complexity # Transitions
Arc-standard
Arc-eager
Arc-hybrid
In our paper
Presentational convenience
Search State Stack Buffer
π‘3 π‘4 π‘5 π3 π4
β¦ β¦ Initial State Terminal State
ROOT She wanted β¦ ROOT
(Yamada and Matsumoto, 2003) (GΓ³mez-RodrΓguez et al., 2008) (Kuhlmann et al., 2011) Background π(π#) in theory π(π%) in practice Back to π(π#) Results 9
Transitions shift
π3
β¦ β¦ reduceβ· reduceβΆ
π3
β¦ β¦
π3
β¦ β¦
π‘3
β¦ β¦ β¦
π‘4 π‘3
β¦ π‘4 β¦
π‘3 π‘3
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 10
π3
β¦
Same as arc-standard
Transitions shift
π3
β¦ β¦ reduceβ· reduceβΆ
π3
β¦ β¦
π3
β¦ β¦
π‘3
β¦ β¦ β¦
π‘4 π‘3
β¦ π‘4 β¦
π‘3 π‘3
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 11
π3
β¦
Same as arc-standard
Transitions shift
π3
β¦ β¦ reduceβ· reduceβΆ
π3
β¦ β¦
π3
β¦ β¦
π‘3
β¦ β¦ β¦
π‘4 π‘3
β¦ π‘4 β¦
π‘3 π‘3
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 12
π3
β¦
Same as arc-standard
Stack Buffer ROOT She wanted to eat an apple
Background π(π#) in theory π(π%) in practice Back to π(π#) Results
shift initial shift She wanted to eat an apple ROOT wanted to eat an apple ROOT She reduceβΆ wanted to eat an apple ROOT She shift to eat an apple ROOT wanted
13
shift eat an apple ROOT wanted to
Stack Buffer ROOT She wanted to eat an apple
Background π(π#) in theory π(π%) in practice Back to π(π#) Results
shift initial shift She wanted to eat an apple ROOT wanted to eat an apple ROOT She reduceβΆ wanted to eat an apple ROOT She shift to eat an apple ROOT wanted
14
shift eat an apple ROOT wanted to
Stack Buffer ROOT She wanted to eat an apple
Background π(π#) in theory π(π%) in practice Back to π(π#) Results
shift initial shift She wanted to eat an apple ROOT wanted to eat an apple ROOT She reduceβΆ wanted to eat an apple ROOT She shift to eat an apple ROOT wanted
15
shift eat an apple ROOT wanted to
Stack Buffer ROOT She wanted to eat an apple
Background π(π#) in theory π(π%) in practice Back to π(π#) Results
shift initial shift She wanted to eat an apple ROOT wanted to eat an apple ROOT She reduceβΆ wanted to eat an apple ROOT She shift to eat an apple ROOT wanted
16
shift eat an apple ROOT wanted to
Stack Buffer ROOT She wanted to eat an apple
Background π(π#) in theory π(π%) in practice Back to π(π#) Results
shift initial shift She wanted to eat an apple ROOT wanted to eat an apple ROOT She reduceβΆ wanted to eat an apple ROOT She shift to eat an apple ROOT wanted
17
shift eat an apple ROOT wanted to
Stack Buffer ROOT She wanted to eat an apple
Background π(π#) in theory π(π%) in practice Back to π(π#) Results
shift initial shift She wanted to eat an apple ROOT wanted to eat an apple ROOT She reduceβΆ wanted to eat an apple ROOT She shift to eat an apple ROOT wanted
18
shift eat an apple ROOT wanted to
ROOT wanted eat Stack Buffer
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 19
reduceβΆ eat an apple ROOT wanted to shift an apple ROOT wanted eat an shift apple ROOT wanted eat reduceβΆ apple an ROOT wanted eat apple shift
ROOT wanted eat Stack Buffer
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 20
reduceβΆ eat an apple ROOT wanted to shift an apple ROOT wanted eat an shift apple ROOT wanted eat reduceβΆ apple an ROOT wanted eat apple shift
ROOT wanted eat Stack Buffer
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 21
reduceβΆ eat an apple ROOT wanted to shift an apple ROOT wanted eat an shift apple ROOT wanted eat reduceβΆ apple an ROOT wanted eat apple shift
ROOT wanted eat Stack Buffer
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 22
reduceβΆ eat an apple ROOT wanted to shift an apple ROOT wanted eat an shift apple ROOT wanted eat reduceβΆ apple an ROOT wanted eat apple shift
ROOT wanted eat Stack Buffer
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 23
reduceβΆ eat an apple ROOT wanted to shift an apple ROOT wanted eat an shift apple ROOT wanted eat reduceβΆ apple an ROOT wanted eat apple shift
Stack Buffer
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 24
ROOT wanted eat reduceβ· apple ROOT wanted reduceβ· eat ROOT reduceβ· (terminal) wanted
Stack Buffer
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 25
ROOT wanted eat reduceβ· apple ROOT wanted reduceβ· eat ROOT reduceβ· (terminal) wanted
Stack Buffer
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 26
ROOT wanted eat reduceβ· apple ROOT wanted reduceβ· eat ROOT reduceβ· (terminal) wanted
27
Stack Buffer
ROOT She wanted to eat an apple 0 1 2 3 4 5 6 (π)
π
β¦ β¦
π
π + 1
Background π(π#) in theory π(π%) in practice Back to π(π#) Results
shift shift
Background π(π#) in theory π(π%) in practice Back to π(π#) Results
π
β¦ β¦
π
β¦
π + 1
β¦
π π
28
reduceβΆ
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 29
reduceβΆ
π
β¦ β¦
? π
β¦ β¦
π π
reduceβΆ
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 30
reduceβΆ
π
β¦ β¦
π π π
β¦ β¦
π π
reduceβΆ
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 31
reduceβΆ
π
β¦ β¦
π π π
β¦ β¦
π π
β¦ β¦
π π
*
reduceβΆ reduceβΆ
π
β¦ β¦
π π π
β¦ β¦
π π
β¦ β¦
π π
Background π(π#) in theory π(π%) in practice Back to π(π#) Results
π
β¦ β¦ *
*
32
* [π,π] In Kuhlmann et al. (2011)βs notation
reduceβ· reduceβΆ
π
β¦ β¦
π π π
β¦ β¦
π π
β¦ β¦
π π
Background π(π#) in theory π(π%) in practice Back to π(π#) Results
π
β¦ β¦ *
*
33
* [π,π]
reduceβΆ
shift
reduceβ·
Goal:
Background π(π#) in theory π(π%) in practice Back to π(π#) Results
π βΆ π π β· π
34
stack and in the buffer
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 35
π‘3 π3
β¦ β¦
Information about π‘4 is not available, needs extra bookkeeping
Background π(π#) in theory π(π%) in practice Back to π(π#) Results
π‘3 π‘4 π3
β¦ β¦ shift
π
β¦ β¦
π
β¦
π + 1
β¦
π π
shift
π
β¦
π
β¦
π + 1
β¦
π π
β¦
?
π‘CD π,π π‘CD ? ,π, π
36
stack and in the buffer
(Huang and Sagae, 2010) Stack Buffer
π‘3 π‘4 π‘5 π3 π4
β¦ β¦
π‘4. ππ π‘4. π π
β¦
π‘3. ππ π‘3. π π
β¦
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 37
Feature representation
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 38
Zhao et al., 2013)
et al., 2016)
2016; Wiseman and Rush, 2016)
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 39
How Many Positional Features Do We Need?
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 40
Non-neural (manual engineering) Chen and Manning (2014)
How Many Positional Features Do We Need?
Stack Buffer
π‘3 π‘4 π‘5 π3 π4
β¦ β¦
π5 π‘4. ππI π‘4. π πI
β¦
π‘3. ππI π‘3. π πI
β¦
π‘3. π π3. π π3 π‘3. ππ3. ππ3 π‘4. π π3. π π3 π‘4. ππ3. ππ3
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 41
How Many Positional Features Do We Need?
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 42
Non-neural (manual engineering) Chen and Manning (2014) Stack LSTM (Dyer et al., 2016) Bi-LSTM Kiperwasser and Goldberg (2016) Cross and Huang (2016) Exponential DP Slow DP Fast DP More tree-structure information
β¦
How Many Positional Features Do We Need?
and buffer (Dyer et al., 2016)
Stack Buffer
π3 π4
β¦
π5
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 43
π‘3 π‘4 π‘5
β¦
π‘ J K4 π L K4
How Many Positional Features Do We Need?
(Kiperwasser and Goldberg, 2016; Cross and Huang, 2016)
Stack Buffer
π‘3 π‘4 π‘5 π3
β¦ β¦ Stack Buffer
π‘3 π‘4 π3
β¦ β¦
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 44
Background π(π#) in theory π(π%) in practice Back to π(π#) Results
Bi-directional LSTM Word embeddings + POS embeddings
π‘3 π3 π‘4
Multi-layer perceptron
45
π‘5
Background π(π#) in theory π(π%) in practice Back to π(π#) Results
Bi-directional LSTM Word embeddings + POS embeddings
π‘3 π3 π‘4
Multi-layer perceptron
46
Background π(π#) in theory π(π%) in practice Back to π(π#) Results
Bi-directional LSTM Word embeddings + POS embeddings
π‘3 π3
Multi-layer perceptron
47
Background π(π#) in theory π(π%) in practice Back to π(π#) Results
Bi-directional LSTM Word embeddings + POS embeddings
π3
Multi-layer perceptron
48
β¦ experimented with greedy decoding
40 60 80 100 {π‘5,π‘4,π‘3, π3} {π‘4, π‘3,π3} {π‘3,π3} {π3} ππ.ππ Β±0.13 ππ.ππ Β±0.05 ππ.ππ Β±0.12 ππ.ππ Β±0.23 UAS
PTB (dev)
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 49
Considered in prior work
Stack Buffer
π‘3 π3
β¦ β¦
Background π(π#) in theory π(π%) in practice Back to π(π#) Results
50
reduceβ· β¦
π‘4 π‘3
β¦
π‘4 π‘3 π3
β¦
π3
β¦
enough information to extract features
global decoders!
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 51
How Many Positional Features Do We Need?
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 52
Non-neural (manual engineering) Chen and Manning (2014) Stack LSTM (Dyer et al., 2016) Bi-LSTM Kiperwasser and Goldberg (2016) Cross and Huang (2016) Our work Exponential DP Slow DP Fast DP Fast(er) DP More tree-structure information
Feature representation
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 53
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 54
shift
Background π(π#) in theory π(π%) in practice Back to π(π#) Results
*
π
β¦ β¦ shift
π
β¦ β¦
π
β¦
π + 1
β¦
π π
55
Score of the sub-sequence
π, π :π€4 π, π :π€5 π, π :π€4 + π€5 + Ξ
reduceβΆ
Background π(π#) in theory π(π%) in practice Back to π(π#) Results
Ξ = π‘CD π, π + π‘MNβΆ π, π
56
reduceβΆ
π
β¦ β¦
π π π
β¦ β¦
π π
β¦ β¦
π π π
β¦ β¦ *
*
* [π,π]
max score( ) + cost( ) - score( )
β¦ β¦ β¦ β¦
π, π : π€4 π,π : π€5 π,π : π€4 + π€5 + π‘CD π,π + π‘MNβΆ π, π + π βπππ π β π reduceβΆ
π
β¦ β¦
π π
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 57
BGDS16 CH16 DBLMS15 KG16a KG16b CFHGD16 DM17 KG16a KBKDS16 WC16
86.0 86.5 87.0 87.5 88.0 88.5 89.0 89.5 90.0 90.5 93.0 93.5 94.0 94.5 95.0 95.5 96.0
Local Global
Chinese
CTB UAS English PTB UAS
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 58
BGDS16 CH16 DBLMS15 KG16a KG16b CFHGD16 DM17 KG16a KBKDS16 WC16
Our arc-eager DP Our arc-hybrid DP
86.0 86.5 87.0 87.5 88.0 88.5 89.0 89.5 90.0 90.5 93.0 93.5 94.0 94.5 95.0 95.5 96.0
Local Global Our Global
Chinese
CTB UAS English PTB UAS
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 59
BGDS16 CH16 DBLMS15 KG16a KG16b CFHGD16 DM17 KG16a KBKDS16 WC16
Our best local Our arc-eager DP Our arc-hybrid DP
86.0 86.5 87.0 87.5 88.0 88.5 89.0 89.5 90.0 90.5 93.0 93.5 94.0 94.5 95.0 95.5 96.0
Local Global Our Local Our Global
Chinese
CTB UAS English PTB UAS
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 60
BGDS16 CH16 DBLMS15 KG16a KG16b CFHGD16 DM17 KG16a KBKDS16 WC16
Our best local Our arc-eager DP Our arc-hybrid DP 15 Our all global
20 KBKDS16
5 Our arc-eager DP 5 Our arc-hybrid DP
86.0 86.5 87.0 87.5 88.0 88.5 89.0 89.5 90.0 90.5 93.0 93.5 94.0 94.5 95.0 95.5 96.0
Local Global Our Local Our Global Ensemble
Chinese
CTB UAS English PTB UAS
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 61
75.00 74.32 74.00 73.75
73 74 75
LAS Ensemble Exact Arc-eager Exact Arc-hybrid Graph- based
(Shi, Wu, Chen and Cheng, 2017; Zeman et al., 2017)
Background π(π#) in theory π(π%) in practice Back to π(π#) Results 62
63
(arc-standard, arc-hybrid, arc-eager)
64
Tianze Shi* Liang Huangβ Lillian Lee*
https://github.com/tzshi/dp-parser-emnlp17
* Cornell University β Oregon State University
66