Forest-Based Search Algorithms
for Parsing and Machine Translation
Liang Huang
University of Pennsylvania
Google Research, March 14th, 2008
Forest-Based Search Algorithms for Parsing and Machine Translation - - PowerPoint PPT Presentation
Forest-Based Search Algorithms for Parsing and Machine Translation Liang Huang University of Pennsylvania Google Research, March 14th, 2008 Search in NLP is not trivial! I saw her duck. Aravind Joshi 2 Search in NLP is not trivial!
Liang Huang
University of Pennsylvania
Google Research, March 14th, 2008
2
Aravind Joshi
2
Aravind Joshi
3
Aravind Joshi
3
Aravind Joshi
4
4
4
4
4
5
...
S NP PRP I VP VBD saw NP PRP$ her NN duck PP IN with NP DT a NN telescope
5
...
S NP PRP I VP VBD saw NP PRP$ her NN duck PP IN with NP DT a NN telescope
eat sushi with tuna
6
6
6
6
with Non-Local Features
7
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
held ... talk
VP3, 6
with ... Sharon
PP1, 3
bigram
9
(Klein and Manning, 2001; Huang and Chiang, 2005)
0 I 1 saw 2 him 3 with 4 a 5 mirror 6
9
(Klein and Manning, 2001; Huang and Chiang, 2005)
0 I 1 saw 2 him 3 with 4 a 5 mirror 6
nodes hyperedges
a hypergraph
derivations (paths or trees)
10
11
A: f (b’, c) ≤ f (b, c) B: b’≤b
C: c
v
u
fe
d(v) = d(v) ⊕ fe(d(u))
update along a hyperedge
12
v
u1 u2
fe
u’ d(v) ⊕ = fe(d(u1), · · · , d(u|e|))
fe’
13
S NP PRP I VP VBP eat NP NN sushi PP IN with NP NN tuna
I eat sushi with tuna.
1-best from Charniak parser
14
.1
a b
v
u1 u2
fe
a b
15
.1
a b
16
.1
a b
17
.1
a b
s u c c e s s
s
18
.1
a b
s u c c e s s
s
19
VP1, 6
PP1, 3 VP3, 6 PP1, 4 VP4, 6 PP3, 6 VP2, 3
hyperedge
NP1, 2
19
VP1, 6
PP1, 3 VP3, 6 PP1, 4 VP4, 6 PP3, 6 VP2, 3
hyperedge
NP1, 2
locally Dijkstra globally Viterbi
(keeping alternative hyperedges)
20
21
S1, 9
NP1, 3 VP3, 9 NP1, 5 VP5, 9 PP5, 9 S1, 5
hyperedge
? ?
21
S1, 9
NP1, 3 VP3, 9 NP1, 5 VP5, 9 PP5, 9 S1, 5
hyperedge
? ?
what’s your 2nd-best?
21
S1, 9
NP1, 3 VP3, 9 NP1, 5 VP5, 9 PP5, 9 S1, 5
hyperedge
? ?
what’s your 2nd-best?
21
S1, 9
NP1, 3 VP3, 9 NP1, 5 VP5, 9 PP5, 9 S1, 5
hyperedge
? ?
PP2, 9 NN1,2 PP6, 9 VB5, 6 what’s your 2nd-best?
22
Algorithm 1
hyperedge O( E k log k ) O(k V)
Algorithm 2
node O( E + V k log k ) O(k V)
Algorithm 3
global O( E + D k log k ) O(E + k D)
E - hyperedges: O(n3); V - nodes: O(n2); D - derivation: O(n)
23
O( E + D k log k )
24
Collins 2000
Oracle Parseval score
with Non-Local Features
25
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
held ... talk
VP3, 6
with ... Sharon
PP1, 3
bigram
26
et al., 2005)
as possible at each node
27
28
“decoder” feature representation
(Collins, 2002)
particular configuration occurs in y
29
instances of Rule feature f 100 (y) = f S → NP VP . (y) = 1 f 200 (y) = f NP → DT NN (y) = 2
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
(Charniak & Johnson, 2005) (Collins, 2000)
30
Rule is local ParentRule is non-local
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
31
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
2 words
f 400 (y) = f NP 2 saw with (y) = 1
31
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
2 words
WordEdges is local f 400 (y) = f NP 2 saw with (y) = 1
31
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
2 words
WordEdges is local f 400 (y) = f NP 2 saw with (y) = 1
31
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
2 words
WordEdges is local POSEdges is non-local f 800 (y) = f NP 2 VBD IN (y) = 1 f 400 (y) = f NP 2 saw with (y) = 1
31
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
2 words
WordEdges is local POSEdges is non-local f 800 (y) = f NP 2 VBD IN (y) = 1 local features comprise ~70% of all instances! f 400 (y) = f NP 2 saw with (y) = 1
become computable at this level
32
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
unit instance of ParentRule feature at the TOP node
become computable at this level
32
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
unit instance of ParentRule feature at the TOP node
become computable at this level
32
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
unit instance of ParentRule feature at the TOP node
33
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
Ai,k Bi,j wi . . . wj−1 Cj,k wj . . . wk−1
unit instance of node A
Forest Reranking
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
unit instance of node A
Ai,k Bi,j wi . . . wj−1 Cj,k wj . . . wk−1
Forest Reranking
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
unit instance of node A
Ai,k Bi,j wi . . . wj−1 Cj,k wj . . . wk−1
Forest Reranking
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
unit instance of node A
Ai,k Bi,j wi . . . wj−1 Cj,k wj . . . wk−1
Forest Reranking
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
unit instance of node A
Ai,k Bi,j wi . . . wj−1 Cj,k wj . . . wk−1
Forest Reranking
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
unit instance of node A
Ai,k Bi,j wi . . . wj−1 Cj,k wj . . . wk−1
Forest Reranking
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
unit instance of node A
Ai,k Bi,j wi . . . wj−1 Cj,k wj . . . wk−1
Forest Reranking
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
unit instance of node A
Ai,k Bi,j wi . . . wj−1 Cj,k wj . . . wk−1
Forest Reranking
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
unit instance of node A
Ai,k Bi,j wi . . . wj−1 Cj,k wj . . . wk−1
Forest Reranking
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
unit instance of node A
Ai,k Bi,j wi . . . wj−1 Cj,k wj . . . wk−1
35
TOP/saw S/saw NP/I PRP/I I VP/saw VBD/saw saw NP/the DT/the the NN/boy boy PP/with IN/with with NP/a DT/a a NN/telescope telescope ./. .
35
TOP/saw S/saw NP/I PRP/I I VP/saw VBD/saw saw NP/the DT/the the NN/boy boy PP/with IN/with with NP/a DT/a a NN/telescope telescope ./. .
35
TOP/saw S/saw NP/I PRP/I I VP/saw VBD/saw saw NP/the DT/the the NN/boy boy PP/with IN/with with NP/a DT/a a NN/telescope telescope ./. .
unit instances at VP node
saw - the; saw - with
36
Ai,k Bi,j wi . . . wj−1 Cj,k wj . . . wk−1
w·fN( ) = 0.5
1.0 3.0 8.0
2.0 + 0.5 4.0 + 5.0 9.0 + 0.5
2.1 + 0.3 4.1 + 5.4 9.1 + 0.3
4.5 + 0.6 6.5 +10.5 11.5 + 0.6
36
Ai,k Bi,j wi . . . wj−1 Cj,k wj . . . wk−1
w·fN( ) = 0.5
1.0 3.0 8.0
2.0 + 0.5 4.0 + 5.0 9.0 + 0.5
2.1 + 0.3 4.1 + 5.4 9.1 + 0.3
4.5 + 0.6 6.5 +10.5 11.5 + 0.6
37
Ai,k Bi,j wi . . . wj−1 Cj,k wj . . . wk−1
w·fN( ) = 0.5
1.0 3.0 8.0 1.0 2.5 9.0 9.5 1.1 2.4 9.5 9.4 3.5 5.1 17.0 12.1
38
Ai,k Bi,j wi . . . wj−1 Cj,k wj . . . wk−1
1.0 3.0 8.0 1.0
2.5 9.0 9.5
1.1
2.4 9.5 9.4
3.5
5.1 17.0 12.1
39
VP PP1, 3 VP3, 6 PP1, 4 VP4, 6 PP3, 6 VP2, 3
hyperedge
NP1, 2
40
97.8 96.8 98.6 97.2
41
baseline: 1-best Charniak parser 89.72 features n or k pre-comp. training F1% local 50 1.4G / 25h 1 x 0.3h 91.01 all 50 2.4G / 34h 5 x 0.5h 91.43 all 100 5.3G / 77h 5 x 1.3h 91.47 local
3 x 1.4h 91.25 all
k=15
4 x 11h 91.69
42
type system
F1%
D Collins (2000) 89.7 Henderson (2004) 90.1 Charniak and Johnson (2005) 91.0 updated (2006) 91.4 Petrov and Klein (2008) 88.3 this work 91.7
G
Bod (2000) 90.7 Petrov and Klein (2007) 90.1
McClosky et al. (2006) 92.1 best accuracy to date on the Penn Treebank
with Non-Local Features
43
TOP S NP PRP I VP VBD saw NP DT the NN boy PP IN with NP DT a NN telescope . .
held ... talk
VP3, 6
with ... Sharon
PP1, 3
bigram
44
(Knight and Koehn, 2003)
translation model (TM) competency language model (LM) fluency
Spanish Broken English English Spanish/English Bilingual Text English Text Statistical Analysis Statistical Analysis Que hambre tengo yo What hunger have I Hungry I am so Have I that hunger I am so hungry How hunger have I ... I am so hungry
44
(Knight and Koehn, 2003)
translation model (TM) competency language model (LM) fluency
Spanish Broken English English Spanish/English Bilingual Text English Text Statistical Analysis Statistical Analysis Que hambre tengo yo What hunger have I Hungry I am so Have I that hunger I am so hungry How hunger have I ... I am so hungry
45
translation model (TM) competency language model (LM) fluency
Spanish Broken English English Spanish/English Bilingual Text English Text Statistical Analysis Statistical Analysis
phrase-based TM syntax-based
n-gram LM
Que hambre tengo yo I am so hungry
decoder (LM-integrated)
integrated decoder
46
translation model (TM) competency language model (LM) fluency
Spanish Broken English English Spanish/English Bilingual Text English Text Statistical Analysis Statistical Analysis
phrase-based TM syntax-based
n-gram LM
Que hambre tengo yo I am so hungry
decoder (LM-integrated)
integrated decoder packed forest forest rescorer
as non-local info
47
VP PP yu Shalong VP juxing le huitan VP VP held a meeting PP with Sharon
VP → PP(1) VP(2), VP(2) PP(1) VP → juxing le huitan, held a meeting PP → yu Shalong, with Sharon
48
PP1, 3 VP3, 6 VP1, 6
yu Shalong juxing le huitan
VP → PP(1) VP(2), VP(2) PP(1) VP → juxing le huitan, held a meeting PP → yu Shalong, with Sharon
48
PP1, 3 VP3, 6 VP1, 6
yu Shalong juxing le huitan
with Sharon held a talk held a talk with Sharon
VP → PP(1) VP(2), VP(2) PP(1) VP → juxing le huitan, held a meeting PP → yu Shalong, with Sharon
49
held ... talk
VP3, 6
with ... Sharon
PP1, 3
bigram
held ... Sharon
S1, 6
50
(VP held meeting
3,6
) (VP held talk
3,6
) (VP hold conference
3,6
)
( P P
w i t h
h a r
1 , 3
)
( P P
a l
g
h a r
1 , 3
) ( P P
w i t h
h a l
g 1 , 3
)
non-monotonicity due to LM combo costs
PP1, 3 VP3, 6 VP1, 6
50
(VP held meeting
3,6
) (VP held talk
3,6
) (VP hold conference
3,6
)
( P P
w i t h
h a r
1 , 3
)
( P P
a l
g
h a r
1 , 3
) ( P P
w i t h
h a l
g 1 , 3
)
non-monotonicity due to LM combo costs
bigram (meeting, with)
PP1, 3 VP3, 6 VP1, 6
51
(VP held meeting
3,6
) (VP held talk
3,6
) (VP hold conference
3,6
)
( P P
w i t h
h a r
1 , 3
)
( P P
a l
g
h a r
1 , 3
) ( P P
w i t h
h a l
g 1 , 3
)
PP1, 3 VP3, 6 VP1, 6
52
VP
PP1, 3 VP3, 6 PP1, 4 VP4, 6 NP1, 4 VP4, 6
k-best Algorithm 2, with search errors
hyperedge
53
speed ++
quality++
~100 times faster
Algorithm 2:
54
speed ++
quality++
Algorithm 3: Algorithm 2:
55
56
58
59
60
Huang and Chiang Forest Rescoring
61
speed ++
quality ++
Huang and Chiang Forest Rescoring
61
speed ++
quality ++
32 times faster
Huang and Chiang Forest Rescoring
61
speed ++
quality ++
32 times faster
same parameters
62
speed ++
quality ++
10 times faster
63
synchronous tree- substitution grammars (STSG)
(Galley et al., 2004; Eisner, 2003)
VP VBD was VP-C VP VBN shot PP TO to NP-C NN death PP IN by NP-C DT the NN police
extended to translate a packed-forest instead of a tree
(Mi, Huang, Liu, 2008)
63
synchronous tree- substitution grammars (STSG)
(Galley et al., 2004; Eisner, 2003)
VP VBD was VP-C VP VBN shot PP TO to NP-C NN death PP IN by NP-C DT the NN police !"""#$%&"""'$
bei
VP VBD was VP-C VP VBN shot PP TO to NP-C NN death PP IN by NP-C DT the NN police
extended to translate a packed-forest instead of a tree
(Mi, Huang, Liu, 2008)
64
number of test brackets
the max. number of matched brackets?”
66
67
t
f(t)
2
1
3
2
⊗
t
g(t)
4
4
5
4
=
t
(f⊗g)(t)
6
5
7
6
8
6
u
v
w
67
t
f(t)
2
1
3
2
⊗
t
g(t)
4
4
5
4
=
t
(f⊗g)(t)
6
5
7
6
8
6
this node matched?
t (f⊗g)⇑(1,0) (t) 7
5
8
6
9
6
N
t (f⊗g)⇑(1,1) (t) 7
6
8
7
9
7
Y
u
v
w
67
t
f(t)
2
1
3
2
⊗
t
g(t)
4
4
5
4
=
t
(f⊗g)(t)
6
5
7
6
8
6
final answer: this node matched?
t (f⊗g)⇑(1,0) (t) 7
5
8
6
9
6
N
t (f⊗g)⇑(1,1) (t) 7
6
8
7
9
7
Y
u
v
w
69