1
StatisticalNLP
Spring2010
Lecture14:PCFGs
DanKlein– UCBerkeley
TreebankPCFGs
- UsePCFGsforbroadcoverageparsing
- Cantakeagrammarrightoffthetrees(doesn’tworkwell):
→
- →
- →
- →
- Baseline
StatisticalNLP Spring2010 Lecture14:PCFGs DanKlein UCBerkeley - - PDF document
StatisticalNLP Spring2010 Lecture14:PCFGs DanKlein UCBerkeley TreebankPCFGs [Charniak96] UsePCFGsforbroadcoverageparsing
DanKlein– UCBerkeley
→
Passive/completesymbols:NP,NP^S Active/incompletesymbols:NP→ NPCC• Training: sections 02921 Development: section 22(here,first20files) Test: section 23
73% 74% 75% 76% 77% 78% 79% 1 2v 2 3v 3 VerticalMarkovOrder 5000 10000 15000 20000 25000 1 2v 2 3v 3 VerticalMarkovOrder
70% 71% 72% 73% 74% 1 2v 2 inf HorizontalMarkovOrder 3000 6000 9000 12000 1 2v 2 inf HorizontalMarkovOrder
Annotation F1 Size Base 77.8 7.5K UNARY 78.3 8.0K
Annotation F1 Size Previous 78.3 8.0K SPLIT9IN 80.3 8.1K
Syntacticvs.semantic heads Headshipnotin(most) treebanks Usually, e.g.:
NP:
TakeleftmostNP TakerightmostN* TakerightmostJJ Takerightchild
VP:
TakeleftmostVB* TakeleftmostVP Takeleftchild
Chooseaheadtagandword Chooseacomplementbag Generatechildren(incl.adjuncts) Recursivelyderivechildren
$# !$" "!# $"# !$ Y[h] Z[h’] X[h] ihkh’j
*& %&%'(+++*&•
Essentially,runtheO(n5) CKY Rememberonlyafewhypothesesfor eachspan<i,j>. IfwekeepKhypothesesateach span,thenwedoatmostO(nK2) workperspan(why?) Keepsthingsmoreorlesscubic
Y[h] Z[h’] X[h] ihkh’j
Thisisn’ttrivial,andtherearecleverspeedups
SkipanyX:[i,j]whichhadlow(say,<0.0001)posterior
10 20 30 40 50 60 5 10 15 20 25 30 35 40 Length Time(sec) CombinedPhase DependencyPhase PCFGPhase
74 76 78 80 82 84 86 88 90 100 300 500 700 900 1100 1300 1500 1700 TotalNumberofgrammarsymbols Parsingaccuracy(F1)
5 10 15 20 25 30 35 40 NP VP PP ADVP S ADJP SBAR QP WHNP PRN NX SINV PRT WHPP SQ CONJP FRAG NAC UCP WHADVP INTJ SBARQ RRC WHADJP X ROOT LST
10 20 30 40 50 60 70 NNP JJ NNS NN VBN RB VBG VB VBD CD IN VBZ VBP DT NNPS CC JJR JJS : PRP PRP$ MD RBR WP POS PDT WRB 9LRB9 . EX WP$ WDT 9RRB9 '' FW RBS TO $ UH , `` SYM RP LS #
?????????
… QP NP VP …
coarse: refined:
… QP NP VP …
coarse: splitintwo:
… QP1 QP2 NP1 NP2 VP1 VP2 … … QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4 …
splitinfour: splitineight: …
… … … … … … … … … … … … … … … …
≤40words F1 all F1 *+ Charniak&Johnson‘05(generative) 90.1 89.6 ,! -)#.#
Dubey‘05 76.3 9 ,! -)#.#
Chiangetal.‘02 80.0 76.6 ,! -)#.#