SI425 : NLP
Set 7 Syntax and Parsing
SI425 : NLP Set 7 Syntax and Parsing Syntax Grammar, or syntax: - - PowerPoint PPT Presentation
SI425 : NLP Set 7 Syntax and Parsing Syntax Grammar, or syntax: The kind of implicit knowledge of your native language that you had mastered by the time you were 3 years old Not the kind of stuff you were later taught in
SI425 : NLP
Set 7 Syntax and Parsing
Syntax
you had mastered by the time you were 3 years old
school
2
Example
3
Example 2
“I saw the man on the hill with a telescope.”
4
Example 3
5
Syntax
Linguists like to argue
transformational syntax, X-bar theory, principles and parameters, government and binding, GPSG, HPSG, LFG, relational grammar, minimalism.... And on and on.
6
Syntax
Why should you care?
7
Syntax
8
Word Classes, or Parts of Speech
interjection, pronoun, conjunction, etc.
number, nature, and universality of these
9
POS examples
N noun chair, bandwidth, pacing V verb study, debate, munch ADJ adjective purple, tall, ridiculous ADV adverb unfortunately, slowly P preposition
PRO pronoun I, me, mine DET determiner the, a, that, those
10
POS Tagging
class marker to each word in a collection. word tag the DET koala N put V the DET keys N
P the DET table N
11
POS Tags Vary on Context
He will refuse to lead. There is lead in the refuse.
V V N N
12
Open and Closed Classes
which play a role in grammar)
with respect to new items
13
Closed Class Words
Examples:
14
Open Class Words
15
POS: Choosing a Tagset
Almost all NLPers use these.
16
Penn TreeBank POS Tagset
17
Important! Not 1-to-1 mapping!
POS tag for a particular instance of a word. This can change the entire parse tree.
These examples from Dekang Lin
18
Exercise!
Label each word with its Part of Speech tag!
(look back 2 slides at the POS tag list for help)
19
Word Classes and Constituency
Noun Phrase “the big blue ball”
20
Constituency
to behave in similar ways
21
Constituency
constituent members?
(follows or precedes)?
22
Constituency
English...
can all precede verbs.
23
Exercise!
Try some constituency tests!
1. Is this a Verb phrase or Noun phrase? Why?
1. Is this a Verb phrase or Noun phrase? Why?
1. Can this be used as an adjective? Why?
24
Grammars and Constituency
up with right set of constituents and the rules that govern how they combine...
correspond to a modern linguistic theory of grammar).
25
Context-Free Grammars
So…we’ll make CFG rules for all valid noun phrases.
26
Definition
27
Context-Free Grammars
number of terminals and non-terminals on the right.
28
Some NP Rules
29
Example Grammar
30
Generativity
either analysis or synthesis engines
31
Derivations
that accounts for that string
string
the string
32
Parsing
grammar and returning parse tree(s) for that string
33
Sentence Types
S NP VP
S VP
S Aux NP VP
S WH-NP Aux NP VP
34
35
Noun Phrases
NP Det Nominal
hidden inside this one rule.
36
Noun Phrases
37
Determiners
38
Nominals
modifiers of the head.
39
Agreement
among various constituents that take part in a rule or set
nouns in NPs have to agree in their number.
This flight Those flights *This flights *Those flight
40
Verb Phrases
following constituents which we’ll call arguments.
41
Subcategorization
rules.
according to the sets of VP rules that they participate in.
transitive/intransitive.
42
Subcategorization
43
Programming Analogy
parameters
number, position and types.
44
Subcategorization
formally express these facts
45
Why subcategorization?
arguments that don’t go together
is a valid NP
46
Possible CFG Solution
agreement.
for all the verb/VP classes.
47
CFG Solution for Agreement
constraints explodes the number of rules in our grammar.
48
The Ugly Reality
English.
staying within the CFG framework.
us out of the CFG framework (beyond its formal power)
49
50
What do we as computer scientists?
implicitly capturing these nasty constraints with probabilities.
51
Treebanks
been paired with a parse tree.
necessary.
tagset, and a grammar and instructions for how to deal with particular grammatical constructions.
52
Penn Treebank
Most well known part is the Wall Street Journal section
1987-1989 Wall Street Journal.
53
Create a Treebank Grammar
you’ll have a grammar with decent coverage.
54
Learned Treebank Grammars
they tend to avoid recursion.
Among them...
55
Lexically Decorated Tree
56
Treebank Uses
a given language
57