Assignment 2: Parsing
PCFG and CKY with C2FP
Chan Young Park
Assignment 2: Parsing PCFG and CKY with C2FP Chan Young Park - - PowerPoint PPT Presentation
Assignment 2: Parsing PCFG and CKY with C2FP Chan Young Park Background: PCFG Recap 2 Background: PCFG Recap S NP VP S NP VP N I NP N NP N N students NP NP PP NP NP PP N telescope NP DT NN NP DT NN
Chan Young Park
2
S →NP VP NP →N NP →NP PP NP →DT NN VP →ADV V NP PP →P NP N →I N →students N →telescope ADV →recently V →saw P →with DT →a
3
S →NP VP NP →N NP →NP PP NP →DT NN VP →ADV V NP PP →P NP
S →NP VP NP →N NP →NP PP NP →DT NN VP →ADV V NP PP →P NP N →I N →students N →telescope ADV →recently V →saw P →with DT →a
1.0 0.5 0.33 0.33 0.33 0.33 1.0 1.0 1.0 1.0 1.0 0.5 0.25 0.25 1.0 1.0 1.0 1.0 1.0 0.25 0.5 0.33 1.0 0.25 1.0 0.33 4
5
annotateTrees(trainTrees)
Use given classes Grammar, Simple Lexicon, UnaryClosure
6
Use given classes to build a parser Grammar, Simple Lexicon, UnaryClosure
7
CKY Algorithm(buildChart) Back Tracking (getBestTree)
8
In Summary, What do you need to implement?
9
Binarization + Markovization + Parent Annotation
10
11
12
13
14
15
Filling in Unary & Binary Charts
Slide credit: Lecture Slides for Stanford Coursera course -- Probabilistic Parsing
16
○ Use two charts: binary chart and unary chart ○ Binary chart: store the scores of non-terminals after applying binary rules ○ Unary chart: store the scores of non-terminals after applying unary rules ○ Alternate filling the unary and binary charts
17
[Main Stuff] CKY Algorithm
max=1 max=2 max=3 min=0 min=1 min=2 max=1 max=2 max=3 min=0 min=1 min=2
Method 1 Based on length and then min Method 2 Based on max and then min
18
<fill in possible pre-terminals and unary closures of those> for each max from 2 to n for each min from max-2 to 0 for each non-terminal C for each binary rule C -> C1 C2 for each mid from min+1 to max-1 if unary_chart[min][mid][C1] and unary_chart[mid][max][C2] then binary_chart[min][max][C] = score(min, mid, max, C, C1, C2) <fill in unary_chart based on binary_chart, but with unary rules>
19 max=1 max=2 max=3 min=0 min=1 min=2 max=4 min=3
<fill in possible pre-terminals and unary closures of those> for each max from 2 to n for each min from max-2 to 0 for each mid from min+1 to max-1 for each non-terminal C1 present at [min][mid] for each binary rule C -> C1 C2 if unary_chart[min][mid][C1] and unary_chart[mid][max][C2] then binary_chart[min][max][C] = score(min, mid, max, C, C1, C2) <fill in unary_chart based on binary_chart, but with unary rules>
max=1 max=2 max=3 min=0 min=1 min=2 20
21
22
23
24
25
26
27
28
29
30
31
○ Grammar.binaryRulesBy{LeftChild,RightChild,Parent} ○
○ Lexicon.scoreTagging
○ A fixed list of all non-terminals? How many of them? Is it small enough? ○ A dynamic-size array for non-terminals? Is it fast enough?
32
33
○ The information should be able to identify uniquely the previous step during the bottom-up process of CKY
○ For the purpose of this assignment, you still need to return a Tree object
34
buildTree()
35
buildTree() buildTree()
36
37
PER ORG LOC O PER ORG LOC O PER ORG LOC O PER ORG LOC O PER ORG LOC O
. . . . . .
START STOP
38
39
unary chains, label set)
process manually, and test on them
*Threshold will be decided later
40
instead of Grammar.getUnaryRulesBy
(+ UnaryClosure.getPath to unroll the closed unary rules)
markovization)
41
Markovization? Any examples showing the benefits/drawbacks?
42
https://nlp.stanford.edu/~manning/papers/unlexicalized-parsing.pdf Dan Klein and Christopher D. Manning. 2003. Accurate Unlexicalized Parsing
43