Linear Time Constituency Parsing with RNNs and Dynamic Programming
Juneki Hong 1 Liang Huang 1,2
1 Oregon State University 2 Baidu Research Silicon
Valley AI Lab
Linear Time Constituency Parsing with RNNs and Dynamic Programming - - PowerPoint PPT Presentation
Linear Time Constituency Parsing with RNNs and Dynamic Programming Juneki Hong 1 Liang Huang 1,2 1 Oregon State University 2 Baidu Research Silicon Valley AI Lab Span Parsing is SOTA in Constituency Parsing Cross+Huang 2016 introduced Span
1 Oregon State University 2 Baidu Research Silicon
Valley AI Lab
2
Accuracy Speed
Cross + Huang 2016
Stern et al. 2017
Our Work
Kitaev + Klein 2018 Joshi et al. 2018
New at ACL 2018! Also Span Parsing!
3
Baseline Chart Parser (Stern et al. 2017a) 91.79 Our Linear Time Parser 91.97
chart parsing
r w
k
4
5
(i,j,X)∈t
(fj − fi, bi − bj)
s(i, j, X)
⟨/s⟩ ⟨s⟩
f0 f1 f2 f3 f4 f5 b1 b2 b3 b4 b5 b0
1 3 5 2 4
You should eat ice cream
Cross + Huang 2016 Stern et al. 2017 Wang + Chang 2016
6
Eat
VB 1
ice
NN
cream
NN
after
IN
lunch
NN 2 3 4 5
Action Label Stack
Cross + Huang 2016
S VP PP NP NN lunch IN after NP NN cream NN ice VB Eat
7
Eat
VB 1
ice
NN
cream
NN
after
IN
lunch
NN 2 3 4 5
ø
Action Label Stack 1 Shift ø (0, 1)
Cross + Huang 2016
S VP PP NP NN lunch IN after NP NN cream NN ice VB Eat
8
Eat
VB 1
ice
NN
cream
NN
after
IN
lunch
NN 2 3 4 5
ø ø
Action Label Stack 1 Shift ø (0, 1) 2 Shift ø (0, 1) (1, 2)
Cross + Huang 2016
9
Eat
VB 1
ice
NN
cream
NN
after
IN
lunch
NN 2 3 4 5
ø ø ø
S PP NP NN lunch IN after VP NP NN cream NN cream NN ice VB Eat
Action Label Stack 1 Shift ø (0, 1) 2 Shift ø (0, 1) (1, 2) 3 Shift ø (0, 1) (1, 2) (2, 3)
Cross + Huang 2016
S VP PP NP NN lunch IN after NP NN cream NN ice VB Eat
10
Eat
VB 1
ice
NN
cream
NN
after
IN
lunch
NN 2 3 4 5
ø ø
NP
ø
Action Label Stack 1 Shift ø (0, 1) 2 Shift ø (0, 1) (1, 2) 3 Shift ø (0, 1) (1, 2) (2, 3) 4 Reduce NP (0, 1) (1, 3)
Cross + Huang 2016
S temp PP NP NN lunch IN after ∅ NP NN cream NN ice VB Eat
11
Eat
VB 1
ice
NN
cream
NN
after
IN
lunch
NN 2 3 4 5
ø ø ø
NP
ø
Action Label Stack 1 Shift ø (0, 1) 2 Shift ø (0, 1) (1, 2) 3 Shift ø (0, 1) (1, 2) (2, 3) 4 Reduce NP (0, 1) (1, 3) 5 Reduce ø (0, 3)
Cross + Huang 2016
S temp PP NP NN lunch IN after ∅ NP NN cream NN ice VB Eat
12
Eat
VB 1
ice
NN
cream
NN
after
IN
lunch
NN 2 3 4 5
ø ø ø
NP
ø ø
Action Label Stack 1 Shift ø (0, 1) 2 Shift ø (0, 1) (1, 2) 3 Shift ø (0, 1) (1, 2) (2, 3) 4 Reduce NP (0, 1) (1, 3) 5 Reduce ø (0, 3) 6 Shift ø (0, 3) (3, 4)
Cross + Huang 2016
13
Eat
VB 1
ice
NN
cream
NN
after
IN
lunch
NN 2 3 4 5
ø ø ø
NP
ø ø
NP
S temp NP NN lunch PP IN after ∅ NP NN cream NN ice VB Eat
Action Label Stack 1 Shift ø (0, 1) 2 Shift ø (0, 1) (1, 2) 3 Shift ø (0, 1) (1, 2) (2, 3) 4 Reduce NP (0, 1) (1, 3) 5 Reduce ø (0, 3) 6 Shift ø (0, 3) (3, 4) 7 Shift NP (0, 3) (3, 4) (4, 5)
Cross + Huang 2016
14
Eat
VB 1
ice
NN
cream
NN
after
IN
lunch
NN 2 3 4 5
ø ø ø
NP
ø ø
PP NP
S temp PP NP NN lunch IN after ∅ NP NN cream NN ice VB Eat
Action Label Stack 1 Shift ø (0, 1) 2 Shift ø (0, 1) (1, 2) 3 Shift ø (0, 1) (1, 2) (2, 3) 4 Reduce NP (0, 1) (1, 3) 5 Reduce ø (0, 3) 6 Shift ø (0, 3) (3, 4) 7 Shift NP (0, 3) (3, 4) (4, 5) 8 Reduce PP (0, 3) (3, 5)
Cross + Huang 2016
15
Eat
VB 1
ice
NN
cream
NN
after
IN
lunch
NN 2 3 4 5
S VP PP NP NN lunch IN after NP NN cream NN ice VB Eat
Action Label Stack 1 Shift ø (0, 1) 2 Shift ø (0, 1) (1, 2) 3 Shift ø (0, 1) (1, 2) (2, 3) 4 Reduce NP (0, 1) (1, 3) 5 Reduce ø (0, 3) 6 Shift ø (0, 3) (3, 4) 7 Shift NP (0, 3) (3, 4) (4, 5) 8 Reduce PP (0, 3) (3, 5) 9 Reduce S-VP (0, 5)
ø ø ø
NP
ø ø
PP NP S-VP
Cross + Huang 2016
16
O(2n)
17
becomes
Graph-Structured Stack (Tomita 1988; Huang + Sagae 2010)
18
Left Pointers
Graph-Structured Stack (Tomita 1988; Huang + Sagae 2010)
19
Left Pointers
reduce reduce Graph-Structured Stack (Tomita 1988; Huang + Sagae 2010)
20
O(2n)
O(n3)
21
O(2n) O(n4)
1 ✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (2,5) (1,5) (0,5) (0,2) (1,3) (2,4) (4,5) (3,5) (2,5) (2,3) (3,4) (1,4) (4,5) (3,5) (0,3) (2,4) (0,4) (4,5) (3,4)
sh
22
Gold: Shift (0,1)
1 2 ✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (2,5) (1,5) (0,5) (0,2) (1,3) (2,4) (4,5) (3,5) (2,5) (2,3) (3,4) (1,4) (4,5) (3,5) (0,3) (2,4) (0,4) (4,5) (3,4)
sh sh
23
Gold: Shift (0,1) Shift (1,2) Left Pointers Gold Parse
1 2 3 ✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (2,5) (1,5) (0,5) (0,2) (1,3) (2,4) (4,5) (3,5) (2,5) (2,3) (3,4) (1,4) (4,5) (3,5) (0,3) (2,4) (0,4) (4,5) (3,4)
sh sh sh r
24
Gold: Shift (0,1) Shift (1,2) Shift (2, 3) Left Pointers Gold Parse
25
Gold: Shift (0,1) Shift (1,2) Shift (2, 3) Reduce (1, 3) Left Pointers
1 2 3 4 ✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (2,5) (1,5) (0,5) (0,2) (1,3) (2,4) (4,5) (3,5) (2,5) (2,3) (3,4) (1,4) (4,5) (3,5) (0,3) (2,4) (0,4) (4,5) (3,4)
sh sh sh r sh r sh
Gold Parse
26
Gold: Shift (0,1) Shift (1,2) Shift (2, 3) Reduce (1, 3) Reduce (0, 3) Left Pointers
1 2 3 4 5 ✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (2,5) (1,5) (0,5) (0,2) (1,3) (2,4) (4,5) (3,5) (2,5) (2,3) (3,4) (1,4) (4,5) (3,5) (0,3) (2,4) (0,4) (4,5) (3,4)
sh sh sh r sh r sh sh r sh r sh r
Gold Parse
27
Gold: Shift (0,1) Shift (1,2) Shift (2, 3) Reduce (1, 3) Reduce (0, 3) Shift (3, 4) Left Pointers Gold Parse
1 2 3 4 5 6 ✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (2,5) (1,5) (0,5) (0,2) (1,3) (2,4) (4,5) (3,5) (2,5) (2,3) (3,4) (1,4) (4,5) (3,5) (0,3) (2,4) (0,4) (4,5) (3,4)
sh sh sh r sh r sh sh r sh r sh r r sh r sh r r sh
1 2 3 4 5 6 7 8 9 ✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (2,5) (1,5) (0,5) (0,2) (1,3) (2,4) (4,5) (3,5) (2,5) (2,3) (3,4) (1,4) (4,5) (3,5) (0,3) (2,4) (0,4) (4,5) (3,4)
sh sh sh r sh r sh sh r sh r sh r r sh r sh r r sh r r r sh r r sh r r r r r sh r r r r
28
Gold: Shift (0,1) Shift (1,2) Shift (2, 3) Reduce (1, 3) Reduce (0, 3) Shift (3, 4) Shift (4, 5) Reduce (3, 5) Reduce (0, 5)
1 2 3 4 5 6 7 8 9 ✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (2,5) (1,5) (0,5) (0,2) (1,3) (2,4) (4,5) (3,5) (2,5) (2,3) (3,4) (1,4) (4,5) (3,5) (0,3) (2,4) (0,4) (4,5) (3,4)
sh sh sh r sh r sh sh r sh r sh r r sh r sh r r sh r r r sh r r sh r r r r r sh r r r r
29
Huang+Sagae 2010
30
#steps: 2n − 1 = O(n)
1 2 3 4 5 6 7 8 9 ✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (2,5) (1,5) (0,5) (0,2) (1,3) (2,4) (4,5) (3,5) (2,5) (2,3) (3,4) (1,4) (4,5) (3,5) (0,3) (2,4) (0,4) (4,5) (3,4)
sh sh sh r sh r sh sh r sh r sh r r sh r sh r r sh r r r sh r r sh r r r r r sh r r r r
Huang+Sagae 2010
31
O(n2) #states per step: (i, j) 2n − 1 = O(n) 2n − 1 = O(n) 2n − 1 = O(n) #steps:
✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (2,5) (1,5) (0,5) (0,2) (1,3) (2,4) (4,5) (3,5) (2,5) (2,3) (3,4) (1,4) (4,5) (3,5) (0,3) (2,4) (0,4) (4,5) (3,4)
sh sh sh r sh r sh sh r sh r sh r r sh r sh r r sh r r r sh r r sh r r r r r sh r r r r
Huang+Sagae 2010
32
(i, j) 2n − 1 = O(n) 2n − 1 = O(n) #steps: O(n2) #states per step:
✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (2,5) (1,5) (0,5) (0,2) (1,3) (2,4) (4,5) (3,5) (2,5) (2,3) (3,4) (1,4) (4,5) (3,5) (0,3) (2,4) (0,4) (4,5) (3,4)
sh sh sh r sh r sh sh r sh r sh r r sh r sh r r sh r r r sh r r sh r r r r r sh r r r r
Huang+Sagae 2010
✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (2,5) (1,5) (0,5) (0,2) (1,3) (2,4) (4,5) (3,5) (2,5) (2,3) (3,4) (1,4) (4,5) (3,5) (0,3) (2,4) (0,4) (4,5) (3,4)
sh sh sh r sh r sh sh r sh r sh r r sh r sh r r sh r r r sh r r sh r r r r r sh r r r r
33
(i, j) #left pointers per state: O(n) 2n − 1 = O(n) #steps: O(n2) #states per step:
Check out the paper for our new theorem: Huang+Sagae 2010
Thanks to Dezhong Deng!
34
✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (2,5) (1,5) (0,5) (0,2) (1,3) (2,4) (4,5) (3,5) (2,5) (2,3) (3,4) (1,4) (4,5) (3,5) (0,3) (2,4) (0,4) (4,5) (3,4)
sh sh sh r sh r sh sh r sh r sh r r sh r sh r r sh r r r sh r r sh r r r r r sh r r r r
✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (2,5) (1,5) (0,5) (0,2) (1,3) (2,4) (4,5) (3,5) (2,5) (2,3) (3,4) (1,4) (4,5) (3,5) (0,3) (2,4) (0,4) (4,5) (3,4) ✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (2,5) (1,5) (0,5) (0,2) (2,3) (0,3) (3,4) (0,4) (4,5)
sh sh sh r sh r sh sh r sh r sh r r sh r sh r r sh r r r sh r r sh r r r r r sh r r r r sh sh sh r sh sh sh r r sh r r r sh r r
35
✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (2,5) (1,5) (0,5) (0,2) (1,3) (2,4) (4,5) (4,5) (3,5) (2,3) (0,3) (3,4) (0,4) (4,5)
sh sh sh r sh r sh sh r r r r sh sh r r sh r r r sh r r r
36
✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (0,2) (1,3) (2,4) (4,5) (2,3) (0,3) (3,4)
37
O(b)
✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (0,2) (1,3) (2,4) (4,5) (2,3) (0,3) (3,4)
left pointers per state
b states per action step
✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (0,2) (1,3) (2,4) (4,5) (2,3) (0,3) (3,4)
sh sh sh r sh r sh sh r r r r sh sh
Chiang 2007 Huang+Chiang 2007
✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (0,2) (1,3) (2,4) (4,5) (2,3) (0,3) (3,4)
sh sh sh r sh r sh sh r r r r sh sh
Chiang 2007 Huang+Chiang 2007
✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (2,5) (0,2) (1,3) (2,4) (4,5) (4,5) (2,3) (0,3) (3,4) (0,4)
sh sh sh r sh r sh sh r r r r sh sh r sh r
Chiang 2007 Huang+Chiang 2007
✏ (0,1) (1,2) (2,3) (3,4) (4,5) (3,5) (2,5) (0,2) (1,3) (2,4) (4,5) (4,5) (2,3) (0,3) (3,4) (0,4)
sh sh sh r sh r sh sh r r r r sh sh r sh r
Chiang 2007 Huang+Chiang 2007
42
chart parsing
r w
k
43
∀t, s(t∗) − s(t) ≥ ∆(t, t∗)
ˆ t = arg max
t
t) + ∆(ˆ t, t∗)
44
(i,j,X)∈t
(i,j)
45
i j
t*(i, j) = Ø
i j
t*(i, j) = Ø
46
(i,j,X)∈t
(i,j)
47
(i,j,X)∈t
(i,j) _ cross(i, j, t∗)
i j
48
Huang et. al. 2012
early max- violation latest full
(standard)
best in the beam worst in the beam falls off the beam biggest violation last valid update correct sequence invalid update!
49
Model Note F1 (PTB test) Stern et al. (2017a) Baseline Chart Parser
+our cross-span loss
Our Work Beam 15
Beam 20
50
Model Note F1 Durett + Klein 2015 91.1 Cross + Huang 2016 Original Span Parser 91.3 Liu + Zhang 2016 91.7 Dyer et al. 2016 Discriminative 91.7 Stern et al. 2017a Baseline Chart Parser 91.79 Stern et al. 2017c Separate Decoding 92.56 Our Work Beam 20 91.97 Model Note F1 Vinyals et al. 2015 Ensemble 90.5 Dyer et al. 2016 Generative Reranking 93.3 Choe + Charniak 2016 Reranking 93.8 Fried et al. 2017 Ensemble Reranking 94.25
Reranking, Ensemble, Extra Data PTB only, Single Model, End-to-End
51
52
chart parsing
r w
k
53