CFGs and Intro to Parsing Scott Farrar CLMA, University of - - PowerPoint PPT Presentation

cfgs and intro to parsing
SMART_READER_LITE
LIVE PREVIEW

CFGs and Intro to Parsing Scott Farrar CLMA, University of - - PowerPoint PPT Presentation

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language CFGs and Intro to Parsing Scott Farrar CLMA, University of Washington farrar@uw.edu January 11, 2010 Scott Farrar CLMA, University of


slide-1
SLIDE 1

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language

CFGs and Intro to Parsing

Scott Farrar CLMA, University of Washington farrar@uw.edu January 11, 2010

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-2
SLIDE 2

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language

Today’s lecture

1 Practical Grammar Writing

Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

2 Parsing: Key ideas 3 Approaches to parsing

Parsing Methods Top-down parsing Bottom-up parsing

4 Issues concerning natural language

Ambiguity Recursion Center embedding

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-3
SLIDE 3

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Word classes and treebanks

Word classes The number of word classes (pre-terminals) depends on the task and how fine you want to cut the pie (Tagged Brown corpus has 87 pre-terminal tags; Penn Treebank uses a 49-item pre-terminal tagset.) There’s no right answer for NLP. Penn Treebank has primarily been used for developing and testing

  • parsers. A treebank or corpus used for semantic analysis or NLG

might look very different. A tour of the Penn Treebank and associated work:

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-4
SLIDE 4

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Closed-class words

Definition Closed class word: a function word in a grammar; there are relatively few of these in a language, though their frequency is very

  • high. In treebank construction, such words can be, for the most

part, tagged automatically. Homework This should be the easy part of hw1.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-5
SLIDE 5

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Closed classes

DT determiner a(n), the, that, those MD modal do, can, may PRP pronoun she, her, him, he, we EX existential there there are many fish CD cardinal number

  • ne, two, three

... (see list in front cover of J&M)

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-6
SLIDE 6

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Open-class words

Definition Open class word: a content word in a grammar; there is an

  • pen-ended set of these, but their frequencies may be very low (cf.

home with octogenarian). Such words are harder to tag automatically in treebank construction. Why? Nouns Verbs Adjectives Adverbs

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-7
SLIDE 7

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Nouns

Recall grade school definition: Definition A noun is a person, place, thing, or idea.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-8
SLIDE 8

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Nouns

Recall grade school definition: Definition A noun is a person, place, thing, or idea. “You shall know a word by the company it keeps.” J. R. Firth (d. 1960)

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-9
SLIDE 9

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Nouns

Recall grade school definition: Definition A noun is a person, place, thing, or idea. “You shall know a word by the company it keeps.” J. R. Firth (d. 1960) In other words, syntactic word categories are defined based on their distribution:

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-10
SLIDE 10

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Nouns

Recall grade school definition: Definition A noun is a person, place, thing, or idea. “You shall know a word by the company it keeps.” J. R. Firth (d. 1960) In other words, syntactic word categories are defined based on their distribution: Definition Noun is a class of lexical items that occur after determiners (the, a, ...) or adjectives, and can be subjects of sentences. Nouns often represent a person, place, thing, or idea.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-11
SLIDE 11

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Nouns

NN a singular common noun, occurring after adjectives and determiners the [NNfisherman] caught it

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-12
SLIDE 12

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Nouns

NN a singular common noun, occurring after adjectives and determiners the [NNfisherman] caught it NNS a plural common noun, occurring alone or after adjectives and determiners [NNSfish] swim well

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-13
SLIDE 13

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Nouns

NN a singular common noun, occurring after adjectives and determiners the [NNfisherman] caught it NNS a plural common noun, occurring alone or after adjectives and determiners [NNSfish] swim well NNP a proper noun or name, occurring alone in a noun phrase; does not (usually) occur after a determiner [NNPJack] knows

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-14
SLIDE 14

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Nouns

NN a singular common noun, occurring after adjectives and determiners the [NNfisherman] caught it NNS a plural common noun, occurring alone or after adjectives and determiners [NNSfish] swim well NNP a proper noun or name, occurring alone in a noun phrase; does not (usually) occur after a determiner [NNPJack] knows NNPS a plural proper noun the [NNPSimpsons] know the [NNP Jones]

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-15
SLIDE 15

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Verbs

Definition A verb describes states or events. The forms of English verbs predict where they will occur. Consider these verb labels (based on WSJ corpus):

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-16
SLIDE 16

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Verbs

Definition A verb describes states or events. The forms of English verbs predict where they will occur. Consider these verb labels (based on WSJ corpus): VBD a past tense form occurs alone the Earl [VBD ate] a sandwich

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-17
SLIDE 17

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Verbs

Definition A verb describes states or events. The forms of English verbs predict where they will occur. Consider these verb labels (based on WSJ corpus): VBD a past tense form occurs alone the Earl [VBD ate] a sandwich VBZ a third person form occurs after a singular (pro)noun she [VBZ runs] two marathons a year

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-18
SLIDE 18

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Verbs

Definition A verb describes states or events. The forms of English verbs predict where they will occur. Consider these verb labels (based on WSJ corpus): VBD a past tense form occurs alone the Earl [VBD ate] a sandwich VBZ a third person form occurs after a singular (pro)noun she [VBZ runs] two marathons a year VBN a participle form occurs after was, were, has, had, have, got, get, etc he was [VBN bitten] by a tiger

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-19
SLIDE 19

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Adjectives

Definition Adjectives ascribe properties to nouns. They occur before nouns or after verbs in the predicate.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-20
SLIDE 20

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Adjectives

Definition Adjectives ascribe properties to nouns. They occur before nouns or after verbs in the predicate. JJ a simple adjective the [JJmetamorphic] rock, the rock is [JJmetamorphic]

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-21
SLIDE 21

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Adjectives

Definition Adjectives ascribe properties to nouns. They occur before nouns or after verbs in the predicate. JJ a simple adjective the [JJmetamorphic] rock, the rock is [JJmetamorphic] JJR a comparative adjective the [JJRbigger] rock

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-22
SLIDE 22

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Adjectives

Definition Adjectives ascribe properties to nouns. They occur before nouns or after verbs in the predicate. JJ a simple adjective the [JJmetamorphic] rock, the rock is [JJmetamorphic] JJR a comparative adjective the [JJRbigger] rock JJS a superlative adjective the [JJSbiggest] one

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-23
SLIDE 23

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Adverbs

Definition Adverbs modify verbs (and adjectives) to specify time, manner, place, or direction of the event.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-24
SLIDE 24

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Adverbs

Definition Adverbs modify verbs (and adjectives) to specify time, manner, place, or direction of the event. RB an adverb can occur around the verb phrase or at the beginning/end of the clause (fast, quickly, really, here)

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-25
SLIDE 25

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Adverbs

Definition Adverbs modify verbs (and adjectives) to specify time, manner, place, or direction of the event. RB an adverb can occur around the verb phrase or at the beginning/end of the clause (fast, quickly, really, here) RBR comparative adverb: ran [RBRfaster] than..., woke up [RBRearlier]

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-26
SLIDE 26

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Adverbs

Definition Adverbs modify verbs (and adjectives) to specify time, manner, place, or direction of the event. RB an adverb can occur around the verb phrase or at the beginning/end of the clause (fast, quickly, really, here) RBR comparative adverb: ran [RBRfaster] than..., woke up [RBRearlier] RBS superlative adverb: [RBSmost] notable, ran [RBSfastest]

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-27
SLIDE 27

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Other common abbreviations

Symbol Meaning Symbol Meaning Det determiner NP noun phrase Noun noun VP verb phrase Nom nominal AP adjective phrase Pro pronoun PP prepositional phrase Aux auxiliary Card cardinal number Ord

  • rdinal number

Quant quantifier

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-28
SLIDE 28

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Small grammar writing strategy

The task in grammar writing is to choose the best elements for nonterminals.

1 Settle on a tagset for pre-terminals (part-of-speech) 2 Tag data for part of speech 3 Identify larger clause patterns; come up with tags 4 Identify each phrase type; come up with tags 5 Fill in details for each phrase type 6 Identify major clause types 7 Address problematic cases Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-29
SLIDE 29

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

PTB phrase types

NP noun phrase including all constituents that depend on the noun head VP : verb phrase including all constituents that depend on the verb head PP : prepositional phrase ADJP : adjective phrase headed by an adjective ADVP : adverb phrase headed by an adverb CONJP : used to mark multi-word conjunctions QP : quantifier phrase, used inside NPs . . .

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-30
SLIDE 30

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

PTB Clause types

The number of non-terminals (excluding pre-terminals) is generally

  • small. In the Penn Treebank, there are, for example, 29 basic tags

for syntactic constituents, including 5 basic clause types and 21 phrase-level constituents.

S declaratives, passives, imperatives, questions with declarative order, (embedded) infinitive clauses, gerund classes SINV inverted clauses SBAR relative and subordinate clauses SBARQ Wh-questions SQ Y/N-questions, inside SBARQ S-CLF : it-cleft clauses FRAG stand-alone clauses, phrases without a predicate argument structure.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-31
SLIDE 31

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Problems with Penn Treebank

As a CFG, why is the Penn Treebank fundamentally flawed?

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-32
SLIDE 32

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Problems with Penn Treebank

As a CFG, why is the Penn Treebank fundamentally flawed? number of rules is intractably large 17,500, in order to parse 50,000 sentences

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-33
SLIDE 33

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Problems with Penn Treebank

As a CFG, why is the Penn Treebank fundamentally flawed? number of rules is intractably large 17,500, in order to parse 50,000 sentences number of rules seems disproportinate at best

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-34
SLIDE 34

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Problems with Penn Treebank

As a CFG, why is the Penn Treebank fundamentally flawed? number of rules is intractably large 17,500, in order to parse 50,000 sentences number of rules seems disproportinate at best number rules seems to grow linearly with the addition of new sentences

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-35
SLIDE 35

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Problems with Penn Treebank

As a CFG, why is the Penn Treebank fundamentally flawed? number of rules is intractably large 17,500, in order to parse 50,000 sentences number of rules seems disproportinate at best number rules seems to grow linearly with the addition of new sentences Main point The rules do not express linguistic generalizations.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-36
SLIDE 36

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Rule growth in the Penn Treebank

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-37
SLIDE 37

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Problems w the Treebank

Why is the rule set so large?

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-38
SLIDE 38

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Problems w the Treebank

Why is the rule set so large? diversity of language

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-39
SLIDE 39

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Problems w the Treebank

Why is the rule set so large? diversity of language some sort of generative process going on (in the heads of annotators)

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-40
SLIDE 40

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Problems w the Treebank

Why is the rule set so large? diversity of language some sort of generative process going on (in the heads of annotators) shallow analysis of sentence by annotators

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-41
SLIDE 41

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Some Solutions

See Gaizaukas paper

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-42
SLIDE 42

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Some Solutions

See Gaizaukas paper eliminate low frequency rules

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-43
SLIDE 43

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Some Solutions

See Gaizaukas paper eliminate low frequency rules 2144 rules account for 95% of grammar

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-44
SLIDE 44

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Some Solutions

See Gaizaukas paper eliminate low frequency rules 2144 rules account for 95% of grammar author used 100 rules to obtain a grammar that accounted for 70% of rule occurrences

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-45
SLIDE 45

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Some Solutions

See Gaizaukas paper eliminate low frequency rules 2144 rules account for 95% of grammar author used 100 rules to obtain a grammar that accounted for 70% of rule occurrences try to parse RHS of low frequency rules with higher ones

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-46
SLIDE 46

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Some Solutions

See Gaizaukas paper eliminate low frequency rules 2144 rules account for 95% of grammar author used 100 rules to obtain a grammar that accounted for 70% of rule occurrences try to parse RHS of low frequency rules with higher ones Goal Come up with a tractable, yet expressive grammar for parsing experiments.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-47
SLIDE 47

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

The Penn Treebank parses

What are you thinking about?

(SBARQ (WHNP (WP What)) (SQ (VBP are) (NP (PRP you)) (VP (VBG thinking) (IN about))) (PUNC ?))

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-48
SLIDE 48

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Traces in the Penn Treebank

What are you thinking about *T*?

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-49
SLIDE 49

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Traces in the Penn Treebank

What are you thinking about *T*?

(SBARQ (WHNP (WP What)) (SQ (VBP are) (NP (PRP you)) (VP (VBG thinking) (PP (IN about) (NP *T*)))) (PUNC ?))

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-50
SLIDE 50

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Traces in the Penn Treebank

Where did I put the marker?

(SBARQ (WHADVP (WRB Where)) (SQ (VBD did) (NP (PRP I)) (VP (VB put) (NP (DT the) (NN marker)))) (PUNC ?))

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-51
SLIDE 51

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Word classes Clause/Phrase classes Problems with the Treebank Other notes about WSJ

Traces in the Penn Treebank

Where did I put the marker *T*?

(SBARQ (WHADVP (WRB Where)) (SQ (VBD did) (NP (PRP I)) (VP (VB put) (NP (DT the) (NN marker) (ADVP *T*)))) (PUNC ?))

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-52
SLIDE 52

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language

Parsing

Definition Parsing is the task of deriving a structural description of natural language utterances. Given a sentence S of natural language and some grammar G, the parsing task is to return a syntactic structure, in the form of a parse-tree T, of S.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-53
SLIDE 53

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language

Parsing

Definition Parsing is the task of deriving a structural description of natural language utterances. Given a sentence S of natural language and some grammar G, the parsing task is to return a syntactic structure, in the form of a parse-tree T, of S. Definition A variant of parsing is recognition: Given a sentence S of natural language and some grammar G, the recognition task is to return true, if S is a valid sentence of G—i.e., if a syntactic structure can be found—or false otherwise.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-54
SLIDE 54

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language

Parsing

Why parse? Parsing is used for: grammar checking, speech recognition, deriving a semantic representation (for MT, question-answering, information extraction), and many other NLP tasks.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-55
SLIDE 55

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language

Parsing

Why parse? Parsing is used for: grammar checking, speech recognition, deriving a semantic representation (for MT, question-answering, information extraction), and many other NLP tasks. It’s all about getting at the units, or parts (parse from Lt. pars)

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-56
SLIDE 56

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language

Parsing

Why parse? Parsing is used for: grammar checking, speech recognition, deriving a semantic representation (for MT, question-answering, information extraction), and many other NLP tasks. It’s all about getting at the units, or parts (parse from Lt. pars) Orthographic (or phonological) units will ultimately reveal patterns that map onto the semantic units (according to the grammar). Those patterns, in some sense, are the syntax of the language (recall definition).

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-57
SLIDE 57

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language

Parser Demo

There are several parser available here: /NLP TOOLS/parsers

$ cd ~/dropbox/09-10/571/misc_code/stanford_parser $ ./parse

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-58
SLIDE 58

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Parsing as search

The parsing task can be approached as a search problem.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-59
SLIDE 59

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Parsing as search

The parsing task can be approached as a search problem. Definition A search algorithm is one that starts with a problem input and returns a number of solutions based on some method of generating the possible solutions.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-60
SLIDE 60

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

A quick overview of search

Elements of search Search can be conceptualized as a tree of partial to complete solutions:

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-61
SLIDE 61

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

A quick overview of search

Elements of search Search can be conceptualized as a tree of partial to complete solutions: tree search: a strategy that generates a tree of possible solutions.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-62
SLIDE 62

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

A quick overview of search

Elements of search Search can be conceptualized as a tree of partial to complete solutions: tree search: a strategy that generates a tree of possible solutions. search node: a data structure holding information about some step in the solution process.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-63
SLIDE 63

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

A quick overview of search

Elements of search Search can be conceptualized as a tree of partial to complete solutions: tree search: a strategy that generates a tree of possible solutions. search node: a data structure holding information about some step in the solution process. solution node: a search node containing a solution.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-64
SLIDE 64

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

A quick overview of search

Elements of search Search can be conceptualized as a tree of partial to complete solutions: tree search: a strategy that generates a tree of possible solutions. search node: a data structure holding information about some step in the solution process. solution node: a search node containing a solution. search space: the set of all possible solutions (including solution paths) to a search problem

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-65
SLIDE 65

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Search example

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-66
SLIDE 66

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Searching for a parse

search node: a partial parse tree the cat (PP (IN in ((DT the) (NN hat)))) solution node: a complete parse tree search space: all the paths that lead to a successful parse and all the dead-ends

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-67
SLIDE 67

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Elements of search

How to expand each node? And how do we determine success?

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-68
SLIDE 68

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Elements of search

How to expand each node? And how do we determine success? expansion function: a way to build the contents of the next node and expand the search tree.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-69
SLIDE 69

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Elements of search

How to expand each node? And how do we determine success? expansion function: a way to build the contents of the next node and expand the search tree. evaluation function: one that returns true if a solution is found at a solution node.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-70
SLIDE 70

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Varying the search strategies

Exploring the space Two ways to explore the search space:

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-71
SLIDE 71

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Varying the search strategies

Exploring the space Two ways to explore the search space:

1 Breadth-first search is an uninformed search strategy

whereby the search space is explored by visiting all neighboring (sister) nodes first, before going deeper into the tree.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-72
SLIDE 72

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Varying the search strategies

Exploring the space Two ways to explore the search space:

1 Breadth-first search is an uninformed search strategy

whereby the search space is explored by visiting all neighboring (sister) nodes first, before going deeper into the tree.

2 Depth-first search is an uninformed search strategy whereby

the search space is explored by going deeper and deeper (down a branch of the tree structure) until backtracking is required.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-73
SLIDE 73

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Varying the expansion strategy

For NL parsing the choice of expansion function is important:

1 top-down parse tree expansion 2 bottom-up parse tree expansion Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-74
SLIDE 74

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Top-down parsing

Definition Using a top-down parse tree expansion strategy, start with the root node (e.g. S) and work towards the solution via subgoals, namely solutions for NP, VP, etc. In other words, starting with the root node of the parse tree, progress towards the goal, which is the full parse tree, by progressively expanding the parse tree.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-75
SLIDE 75

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Top-down parsing

Definition Using a top-down parse tree expansion strategy, start with the root node (e.g. S) and work towards the solution via subgoals, namely solutions for NP, VP, etc. In other words, starting with the root node of the parse tree, progress towards the goal, which is the full parse tree, by progressively expanding the parse tree. An example of a top-down parser is the recursive descent parser which tries to build a tree (top-down) by iterating over the rules of the grammar. It backtracks when no terminal is matched.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-76
SLIDE 76

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Top-down parse example

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-77
SLIDE 77

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Pros/cons of top-down strategy

√ Never explores trees that aren’t potential solutions, ones with the wrong kind of root node.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-78
SLIDE 78

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Pros/cons of top-down strategy

√ Never explores trees that aren’t potential solutions, ones with the wrong kind of root node. X But explores trees that do not match the input sentence (predicts input before inspecting input).

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-79
SLIDE 79

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Pros/cons of top-down strategy

√ Never explores trees that aren’t potential solutions, ones with the wrong kind of root node. X But explores trees that do not match the input sentence (predicts input before inspecting input). X Naive top-down parsers never terminate if G contains recursive rules like X → X Y (left recursive rules).

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-80
SLIDE 80

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Pros/cons of top-down strategy

√ Never explores trees that aren’t potential solutions, ones with the wrong kind of root node. X But explores trees that do not match the input sentence (predicts input before inspecting input). X Naive top-down parsers never terminate if G contains recursive rules like X → X Y (left recursive rules). X Backtracking may discard valid constituents that have to be re-discovered later (duplication of effort).

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-81
SLIDE 81

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Pros/cons of top-down strategy

√ Never explores trees that aren’t potential solutions, ones with the wrong kind of root node. X But explores trees that do not match the input sentence (predicts input before inspecting input). X Naive top-down parsers never terminate if G contains recursive rules like X → X Y (left recursive rules). X Backtracking may discard valid constituents that have to be re-discovered later (duplication of effort). Use a top-down strategy when you know what kind of constituent you want to end up with (e.g. NP extraction, named entity extraction). Avoid this strategy if you’re stuck with a highly recursive grammar.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-82
SLIDE 82

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Bottom-up parsing

Definition Using a bottom-up parse tree expansion strategy, starting with the sentence, progress towards the goal, i.e., the full parse tree, by progressively building the parse tree. In other words, try to match the right-hand side of rules to build a partial solution, progressively building structure upwards.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-83
SLIDE 83

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Bottom-up parsing

Definition Using a bottom-up parse tree expansion strategy, starting with the sentence, progress towards the goal, i.e., the full parse tree, by progressively building the parse tree. In other words, try to match the right-hand side of rules to build a partial solution, progressively building structure upwards. An example is the shift-reduce parser. Push input words onto a stack (shift) and try to build structure (reduce).

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-84
SLIDE 84

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Bottom-up parse example

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-85
SLIDE 85

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Pros/cons of bottom-up strategy

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-86
SLIDE 86

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Pros/cons of bottom-up strategy

√ Locally grounded in the input sentence.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-87
SLIDE 87

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Pros/cons of bottom-up strategy

√ Locally grounded in the input sentence. √ Recursive rules are not generally a problem.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-88
SLIDE 88

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Pros/cons of bottom-up strategy

√ Locally grounded in the input sentence. √ Recursive rules are not generally a problem. √ Substructures are only built once.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-89
SLIDE 89

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Pros/cons of bottom-up strategy

√ Locally grounded in the input sentence. √ Recursive rules are not generally a problem. √ Substructures are only built once. X Explores many trees that are not rooted with goal nodes. (Shift-reduce algorithm can fail to find any parse.)

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-90
SLIDE 90

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Parsing Methods Top-down parsing Bottom-up parsing

Pros/cons of bottom-up strategy

√ Locally grounded in the input sentence. √ Recursive rules are not generally a problem. √ Substructures are only built once. X Explores many trees that are not rooted with goal nodes. (Shift-reduce algorithm can fail to find any parse.) Use this type of parser when you’re parsing real-time speech input. Why?

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-91
SLIDE 91

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Difficulties in parsing NL

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-92
SLIDE 92

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Difficulties in parsing NL

Ambiguity: more than one solution (more than one structural description)

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-93
SLIDE 93

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Difficulties in parsing NL

Ambiguity: more than one solution (more than one structural description) Recursion: production rules whose RHS contains the LHS symbol (e.g, S → S CONJ S)

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-94
SLIDE 94

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Difficulties in parsing NL

Ambiguity: more than one solution (more than one structural description) Recursion: production rules whose RHS contains the LHS symbol (e.g, S → S CONJ S) Center embedding: structure within structure The cat [that sat in the chair under the lamp beside the couch] licked its paws

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-95
SLIDE 95

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Ambiguity in natural language

Ambiguous input poses problems for parsers.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-96
SLIDE 96

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Ambiguity in natural language

Ambiguous input poses problems for parsers. Book that flight.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-97
SLIDE 97

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Ambiguity in natural language

Ambiguous input poses problems for parsers. Book that flight. Time flies like an arrow.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-98
SLIDE 98

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Ambiguity in natural language

Ambiguous input poses problems for parsers. Book that flight. Time flies like an arrow. Canadian history teacher

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-99
SLIDE 99

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Ambiguity in natural language

Ambiguous input poses problems for parsers. Book that flight. Time flies like an arrow. Canadian history teacher Galileo saw Medici’s wife with a telescope.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-100
SLIDE 100

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Ambiguity in natural language

Ambiguous input poses problems for parsers. Book that flight. Time flies like an arrow. Canadian history teacher Galileo saw Medici’s wife with a telescope. I ran with my dog.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-101
SLIDE 101

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Types of ambiguity

Two types of ambiguity most relevant for parsing:

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-102
SLIDE 102

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Types of ambiguity

Two types of ambiguity most relevant for parsing: lexical ambiguity: uncertainty introduced when a word token belongs to more than one part-of-speech category. house/NN, house/VB sweet/JJ, sweet/NN.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-103
SLIDE 103

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Types of ambiguity

Two types of ambiguity most relevant for parsing: lexical ambiguity: uncertainty introduced when a word token belongs to more than one part-of-speech category. house/NN, house/VB sweet/JJ, sweet/NN. structural ambiguity: uncertainty introduced by having more than one rule that can describe a given string: a string has more than one structure.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-104
SLIDE 104

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Structural ambiguity

Two types of structural ambiguity:

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-105
SLIDE 105

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Structural ambiguity

Two types of structural ambiguity: attachment ambiguity: when a constituent can be attached at more than place in the parse tree

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-106
SLIDE 106

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Structural ambiguity

Two types of structural ambiguity: attachment ambiguity: when a constituent can be attached at more than place in the parse tree coordination ambiguity: when different constituents can be formed from a conjunction (and, or)

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-107
SLIDE 107

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Structural ambiguity

Two types of structural ambiguity: attachment ambiguity: when a constituent can be attached at more than place in the parse tree coordination ambiguity: when different constituents can be formed from a conjunction (and, or) Parsers that find all possible parses for a given input must be able to disambiguate and choose one candidate.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-108
SLIDE 108

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Recursion in NL

Definition Recursion: A process in a grammatical derivation whereby a rule is re-applied to itself, resulting in the same pattern being repeated

  • ver and over: [Nom X[Nom Y [PP Z]]]

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-109
SLIDE 109

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Recursion in NL

Definition Recursion: A process in a grammatical derivation whereby a rule is re-applied to itself, resulting in the same pattern being repeated

  • ver and over: [Nom X[Nom Y [PP Z]]]

Direct recursion: Nom → Nom PP, S → S CONJ S water under the bridge, Bill ran and Jane jogged

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-110
SLIDE 110

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Recursion in NL

Definition Recursion: A process in a grammatical derivation whereby a rule is re-applied to itself, resulting in the same pattern being repeated

  • ver and over: [Nom X[Nom Y [PP Z]]]

Direct recursion: Nom → Nom PP, S → S CONJ S water under the bridge, Bill ran and Jane jogged Indirect recursion . . . on the thimble in the box on the stool beside the table near the sofa . . . NP → DT Nom Nom → Nom PP PP → Prep NP

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-111
SLIDE 111

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Center embedding

Definition Center embedding: When a syntactic constituent A is contained/nested within another constituent B and surrounded by

  • ther constituents X and Z: [B X [A Y ] Z]

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-112
SLIDE 112

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Center embedding

Definition Center embedding: When a syntactic constituent A is contained/nested within another constituent B and surrounded by

  • ther constituents X and Z: [B X [A Y ] Z]

The company failed.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-113
SLIDE 113

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Center embedding

Definition Center embedding: When a syntactic constituent A is contained/nested within another constituent B and surrounded by

  • ther constituents X and Z: [B X [A Y ] Z]

The company failed. The company the law firm sued failed.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-114
SLIDE 114

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Center embedding

Definition Center embedding: When a syntactic constituent A is contained/nested within another constituent B and surrounded by

  • ther constituents X and Z: [B X [A Y ] Z]

The company failed. The company the law firm sued failed. The company [the law firm sued] failed.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-115
SLIDE 115

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Center embedding

Definition Center embedding: When a syntactic constituent A is contained/nested within another constituent B and surrounded by

  • ther constituents X and Z: [B X [A Y ] Z]

The company failed. The company the law firm sued failed. The company [the law firm sued] failed. The company the law firm the boss hired sued failed.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-116
SLIDE 116

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Center embedding

Definition Center embedding: When a syntactic constituent A is contained/nested within another constituent B and surrounded by

  • ther constituents X and Z: [B X [A Y ] Z]

The company failed. The company the law firm sued failed. The company [the law firm sued] failed. The company the law firm the boss hired sued failed. The company [the law firm [the boss hired ] sued ]failed.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-117
SLIDE 117

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Garden path sentences

The horse raced past the barn fell.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-118
SLIDE 118

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Garden path sentences

The horse raced past the barn fell. The horse [ raced past the barn ] fell.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-119
SLIDE 119

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Garden path sentences

The horse raced past the barn fell. The horse [ raced past the barn ] fell. The horse which was raced past the barn fell.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-120
SLIDE 120

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Garden path sentences

The horse raced past the barn fell. The horse [ raced past the barn ] fell. The horse which was raced past the barn fell. The mayor forced out of office was arrested.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing

slide-121
SLIDE 121

Practical Grammar Writing Parsing: Key ideas Approaches to parsing Issues concerning natural language Ambiguity Recursion Center embedding

Garden path sentences

The horse raced past the barn fell. The horse [ raced past the barn ] fell. The horse which was raced past the barn fell. The mayor forced out of office was arrested. Definition Garden path sentences are those for which unnecessary structure is built up during the parsing process. The parser is then forced to ‘undo’ the structure already built. These pose particular problems for human sentence processing.

Scott Farrar CLMA, University of Washington farrar@uw.edu CFGs and Intro to Parsing