Bootstrapping Statistical Parsers from Small Datasets
Anoop Sarkar Department of Computing Science Simon Fraser University
anoop@cs.sfu.ca http://www.cs.sfu.ca/˜anoop
1
Bootstrapping Statistical Parsers from Small Datasets Anoop Sarkar - - PowerPoint PPT Presentation
Bootstrapping Statistical Parsers from Small Datasets Anoop Sarkar Department of Computing Science Simon Fraser University anoop@cs.sfu.ca http://www.cs.sfu.ca/anoop 1 Overview Task: find the most likely parse for natural language
1
2
3
NP NP a program VP to VP promote NP NP safety PP in NP trucks and minivans NP NP a program VP to VP promote NP NP safety PP in trucks and NP minivans 4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
q 1 p 1 q 2 p 2 F
19
20
21
22
71.5 72 72.5 73 73.5 74 74.5 75 75.5 76 76.5 10 20 30 40 50 60 70 80 90 100 F Score Co-training rounds Self-training results LTAG self Collins-CFG self
23
24
25
26
76 78 80 82 84 86 88 90 100 5000 10000 15000 20000 25000 30000 35000 40000 F Score Number of Sentences Collins-CFG Learning Curve Collins-CFG (<= 40wds)
27
28
74.5 75 75.5 76 76.5 77 77.5 78 10 20 30 40 50 60 70 80 90 100 F Score Co-training rounds Co-training versus self-training "wsj-500" "self"
29
30
74.5 75 75.5 76 76.5 77 77.5 78 78.5 79 79.5 80 10 20 30 40 50 60 70 80 90 100 F Score Co-training rounds The effect of seed size "wsj-1k" "wsj-500"
31
32
75 75.5 76 76.5 77 77.5 78 78.5 79 10 20 30 40 50 60 70 80 90 100 F Score Co-training rounds Cross-genre co-training "brown-1k-tiny" "brown-1k"
33
34
35
76 78 80 82 84 86 88 90 100 5000 10000 15000 20000 25000 30000 35000 40000 F Score Number of Sentences Collins-CFG Learning Curve Collins-CFG (<= 40wds)
36
83 84 85 86 87 88 89 5000 10000 15000 20000 25000 30000 35000 40000 F Score Number of Sentences LTAG Learning Curve LTAG (<= 40wds)
37
38
39
40