Introduction to treebanks
Session 1: 7/08/2011
1
Introduction to treebanks Session 1: 7/08/2011 1 Outline Types of - - PowerPoint PPT Presentation
Introduction to treebanks Session 1: 7/08/2011 1 Outline Types of treebanks (Syntactic) Treebank PropBank Discourse Treebank The English Penn Treebank Why do we need treebanks? Hw1 2 (Syntactic) Treebank
1
2
structure or phrase structure)
Greek, Hebrew, Hindi, Hungarian, Icelandic, Italian, Japanese, Korean, Latin, Norwegian, Polish, Spanish, Turkish, etc.
3
S NP VP ./. John/NNP loves/VBP NP Mary/NNP loves/VBP John/NNP Mary/NNP ./.
4
5
6
S NP VP ./. John/NNP loves/VBP NP Mary/NNP loves/VBP John/NNP Mary/NNP ./. “loves” is predicate. “John” is Arg0. “Mary” is Arg1.
7
8
– WSJ: 1-million words from 1987 to 1989 – Others: Brown Corpus, ATIS, etc.
– 1992: version 1 – 1995: version 2 – 1999: version 3
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
S NP VP ./. John/NNP loves/VBP NP Mary/NNP Input: John loves Mary . Output:
24
S S => NP VP NP => PN VP => V NP VP => VP PP NP => NP PP PP => P NP NP VP John/NNP bought/VBP NP the book PP in the store VP NP NP John/NNP bought/VBP NP the book PP in the store VP S
25
S NP VP John/NNP bought/VBP NP the book PP in the store VP NP NP John/NNP bought/VBP NP the book PP in the store VP S 1 2 3,4 5,6,7 1 2 3,4 5,6,7 (1, 7, S) (1, 1, NP) (2, 7, VP) (3, 7, NP) (3, 4, NP) (5, 7, PP) (6, 7, NP) (1, 7, S) (1, 1, NP) (2, 7, VP) (2, 4, VP) (3, 4, NP) (5, 7, PP) (6, 7, NP) Prec=6/7, recall=6/7, f-score=6/7 sys output: gold standard:
26
27
28
29