Features of Statistical Parsers
Mark Johnson Brown Laboratory for Linguistic Information Processing CoNLL 2005
1
Features of Statistical Parsers Mark Johnson Brown Laboratory for - - PowerPoint PPT Presentation
Features of Statistical Parsers Mark Johnson Brown Laboratory for Linguistic Information Processing CoNLL 2005 1 Features of Statistical Parsers Confessions of a bottom-feeder: Dredging in the Statistical Muck Mark Johnson Brown Laboratory
1
2
3
4
5
S NP PRP He VP VBD raised NP the price . . NN DT
6
t∈Tc(s)
sentence s tn . . . . . . f(t1) f(tn) w · f(t1) w · f(tn) . . . n-best parser parses Tc(s) t1 feature vectors parse scores apply feature fns linear combination argmax “best” parse for s
7
t
t m
8
9
S NP VP rice grows
j
10
11
12
1), . . . , (sn, t⋆ n))
n
i|si)
w
13
t⋆
i
T
1, . . . , t⋆ n)
n
i)
w
i as likely as possible
14
T (si) t⋆
i
T
1, s1) . . . , (t⋆ n, sn))
n
i|si)
w
15
16
100×
VP V run
2×
V see NP N people P with NP N telescopes VP PP VP
1×
VP V see N people P with NP N telescopes NP PP NP
. . . × 2/105 × . . . . . . × 1/7 × . . . . . . × 2/7 × . . . . . . × 1/7 × . . . Rule count rel freq better vals VP → V 100 100/105 4/7 VP → V NP 3 3/105 1/7 VP → VP PP 2 2/105 2/7 NP → N 6 6/7 6/7 NP → NP PP 1 1/7 1/7
17
w
i|si)
18
1 , . . . , t+ n)
i = argmaxt∈T (si) Ft⋆
i (t)
i
i |si)
t⋆
i
T t+
i
Tc(si)
19
t⋆
i
Tc(si) T +
c (t⋆ i)
T
c (t⋆ i) equally close to the
i: which one(s) should we declare to be the best parse?
i)|Tc(si)) 20
1, s1), . . . , (t⋆ n, sn))
c (t⋆ i) = trees in Tc(si) with max f-score
c (t⋆ i)
t⋆
i
Tc(si) T +
c (t⋆ i)
T
c (t⋆ i)|Tc(si))
c (t⋆) exp w · f(t)
c (t⋆) and Tc(si) 21
22
ROOT S NP WDT That VP VBD went PP IN
NP NP DT the JJ permissible NN line PP IN for NP ADJP JJ warm CC and JJ fuzzy NNS feelings . . Heads Ancestor Context Rule
23
S DT A NN record NN date VP VBZ has RB n’t VP VBN been VP VBN set . . NP functional functional lexical
24
ROOT S NP DT The NNS rules VP VBP force S NP NNS executives VP TO to VP VB report NP NNS purchases . .
25
ROOT S NP DT The NN clash VP AUX is NP NP DT a NN sign PP IN
NP NP DT a JJ new NN toughness CC and NN divisiveness PP IN in NP NP NNP Japan POS ’s JJ
JJ financial NNS circles . . Left of head, non-adjacent to head
26
ROOT S NP WDT That VP VBD went PP IN
NP NP DT the JJ permissible NN line PP IN for NP ADJP JJ warm CC and JJ fuzzy NNS feelings . .
27
ROOT S NP PRP They VP VBD were VP VBN consulted PP IN in NP NN advance . .
28
ROOT WDT That went
DT the JJ permissible NN line IN for JJ warm CC and JJ fuzzy NNS feelings . . PP VP S NP PP NP NP VBD IN NP ADJP
29
ROOT S NP WDT That VP VBD went PP IN
NP NP DT the JJ permissible NN line PP IN for NP ADJP JJ warm CC and JJ fuzzy NNS feelings . . > 5 words =1 punctuation
30
ROOT S NP PRP They VP VP VBD were VP VBN consulted PP IN in NP NN advance CC and VP VDB were VP VBN surprised PP IN at NP NP DT the NN action VP VBN taken . . Isomorphic trees to depth 4
31
ROOT S NP PRP They VP VP VBD were VP VBN consulted PP IN in NP NN advance CC and VP VDB were VP VBN surprised PP IN at NP NP DT the NN action VP VBN taken . . 4 words 6 words CoLenPar feature: (2,true)
32
ROOT S NP WDT That VP VBD went PP IN
NP NP DT
the
JJ permissible NN line PP IN for NP ADJP JJ warm CC and JJ fuzzy NNS feelings . .
33
ROOT S NP WDT That VP VBD went PP IN
NP NP DT the JJ permissible NN line PP IN for NP ADJP JJ warm CC and JJ fuzzy NNS feelings . . > 5 words
34
ROOT S NP WDT That VP VBD went PP IN
NP NP DT the JJ permissible NN line PP IN for NP ADJP JJ warm CC and JJ fuzzy NNS feelings . .
35
36
Beam size Oracle f-score 50 40 30 20 10 0.98 0.96 0.94 0.92 0.9
37
Rank of best parse in n-best list Fraction of sentences 50 40 30 20 10 0.5 0.4 0.3 0.2 0.1
38
c (t⋆) count as “correct”
39
40
41
Averaged perceptron feature selection f-score on sections 20-21 f-score on section 24 0.911 0.91 0.909 0.908 0.907 0.906 0.905 0.904 0.903 0.902 0.901 0.908 0.906 0.904 0.902 0.9 0.898 0.896 0.894 0.892
42
43
Exponentional model, tuning regularizer constant c Averaged perceptron w/ randomized data f-score on sections 20-21 f-score on section 24 0.91 0.9095 0.909 0.9085 0.908 0.9075 0.907 0.9065 0.906 0.9055 0.905 0.9045 0.907 0.906 0.905 0.904 0.903 0.902 0.901 0.9 0.899
44
Averaged perceptron with scaled feature values f-score on sections 20-21 f-score on section 24 0.912 0.91 0.908 0.906 0.904 0.902 0.9 0.898 0.896 0.894 0.892 0.915 0.91 0.905 0.9 0.895 0.89 0.885 0.88
45
section 24 sections 20-21 regularizer constant c f-score 1.4e-06 1.2e-06 1e-06 8e-07 6e-07 4e-07 2e-07 0.912 0.91 0.908 0.906 0.904 0.902 0.9
46
47
48
S NP PRP He ‘‘ ‘‘ VP MD will RB not VP AUX be VP VBN shaken PRT RP
PP IN by NP JJ external NNS events , , ADVP RB however S ADJP JJ surprising , , JJ alarming CC
JJ vexing : ... . .
S NP PRP He ‘‘ ‘‘ VP MD will RB not VP AUX be VP VBN shaken PRT RP
PP IN by NP NP JJ external NNS events , , ADJP RB however JJ surprising , , JJ alarming CC
JJ vexing : ... . .
49
S NP JJ Soviet NNS leaders VP VBD said SBAR S NP PRP they VP MD would VP VB support NP PRP$ their NNP Kabul NNS clients PP IN by NP NP DT all NNS means ADJP JJ necessary :
and AUX did . . S NP JJ Soviet NNS leaders VP VP VBD said SBAR S NP PRP they VP MD would VP VB support NP PRP$ their NNP Kabul NNS clients PP IN by NP NP DT all NNS means ADJP JJ necessary :
and VP AUX did . .
50
S NP NNP Kia VP AUX is NP NP DT the ADJP RBS most JJ aggressive PP IN
NP NP DT the NNP Korean NNP Big NNP Three PP IN in NP NN
NN financing . .
S NP NNP Kia VP AUX is NP NP DT the RBS most JJ aggressive PP IN
NP DT the NNP Korean NNP Big NNP Three PP IN in S VP VBG
NP NN financing . .
51
S ADVP NP CD Two NNS years RB ago , , NP DT the NN district VP VBD decided S VP TO to VP VB limit NP DT the NNS bikes S VP TO to VP VB fire NP NNS roads PP IN in NP PRP$ its CD 65,000 JJ hilly NNS acres . . S ADVP NP CD Two NNS years IN ago , , NP DT the NN district VP VBD decided S VP TO to VP VB limit NP DT the NNS bikes PP TO to NP NP NN fire NNS roads PP IN in NP PRP$ its CD 65,000 JJ hilly NNS acres . .
52
S NP DT The NN company ADVP RB also VP VBD pleased NP NNS analysts PP IN by S VP VBG announcing NP NP CD four JJ new NN store NNS
VP VBN planned PP IN for NP JJ fiscal CD 1990 , , S VP VBG ending NP JJ next NNP August . .
S NP DT The NN company ADVP RB also VP VBD pleased NP NNS analysts PP IN by S VP VBG announcing NP NP CD four JJ new NN store NNS
VP VBN planned PP IN for NP NP JJ fiscal CD 1990 , , VP VBG ending NP JJ next NNP August . .
53
S CC But NP NNS funds ADVP RB generally VP AUX are VP ADVP RB better VBN prepared NP DT this NN time RP around . . S CC But NP NNS funds ADVP RB generally VP AUX are ADJP RBR better JJ prepared ADVP NP DT this NN time RB around . .
54
S NP DT The NNP U.S. VP VBD said SBAR S NP PRP it VP MD would ADVP RB fully VP VP VB support NP DT the NN resistance :
and VP AUX did RB n’t . . S NP DT The NNP U.S. VP VP VBD said SBAR S NP PRP it VP MD would VP ADVP RB fully VB support NP DT the NN resistance :
and VP AUX did RB n’t . .
55
nsentences = 1345 in test corpus. model 1 nfeatures = 670688, corpus f-score = 0.9037 model 2 nfeatures = 670688, corpus f-score = 0.902782 permutation test significance of corpus f-score difference = 0.58234 model 1 better on 214 = 15.9108% sentences model 2 better on 170 = 12.6394% sentences models 1 and 2 tied on 961 = 71% sentences binomial 2-sided significance of sentence-by-sentence comparison = 0.0280806 bootstrap 95% confidence interval for model 1 f-scores = (0.897672 0.9096) bootstrap 95% confidence interval for model 2 f-scores = (0.896832 0.908697)
56
nsentences = 1345 in test corpus. model 1 nfeatures = 670688, corpus f-score = 0.9037 model 2 nfeatures = 670688, corpus f-score = 0.902357 permutation test significance of corpus f-score difference = 0.22695 model 1 better on 121 = 8.99628% sentences model 2 better on 98 = 7.28625% sentences models 1 and 2 tied on 1126 = 83% sentences binomial 2-sided significance of sentence-by-sentence comparison = 0.136934 bootstrap 95% confidence interval for model 1 f-scores = (0.897672 0.9096) bootstrap 95% confidence interval for model 2 f-scores = (0.896315 0.908321)
57
nsentences = 1345 in test corpus. model 1 nfeatures = 670688, corpus f-score = 0.9037 model 2 nfeatures = 670688, corpus f-score = 0.902865 permutation test significance of corpus f-score difference = 0.59533 model 1 better on 169 = 12.5651% sentences model 2 better on 150 = 11.1524% sentences models 1 and 2 tied on 1026 = 76% sentences binomial 2-sided significance of sentence-by-sentence comparison = 0.313546 bootstrap 95% confidence interval for model 1 f-scores = (0.897672 0.9096) bootstrap 95% confidence interval for model 2 f-scores = (0.89686 0.908797)
58
59