Probabilistic Context Free Grammars CMSC 473/673 UMBC Outline - - PowerPoint PPT Presentation

probabilistic context free grammars
SMART_READER_LITE
LIVE PREVIEW

Probabilistic Context Free Grammars CMSC 473/673 UMBC Outline - - PowerPoint PPT Presentation

Probabilistic Context Free Grammars CMSC 473/673 UMBC Outline Recap: MT word alignment Structure in Language: Constituency (Probabilistic) Context Free Grammars Definitions High-level tasks: Generating and Parsing Some uses for PCFGs CKY


slide-1
SLIDE 1

Probabilistic Context Free Grammars

CMSC 473/673 UMBC

slide-2
SLIDE 2

Outline

Recap: MT word alignment Structure in Language: Constituency (Probabilistic) Context Free Grammars

Definitions High-level tasks: Generating and Parsing Some uses for PCFGs

CKY Algorithm: Parsing with a (P)CFG

slide-3
SLIDE 3

Machine Translation as a Noisy Channel Model

Decode Rerank written in (clean) English

  • bserved

Russian (noisy) text translation/ decode model (clean) language model

English

language

язы́к

speak

text

w
  • r
d language speak

text

w
  • r
d language

Slides courtesy Rebecca Knowles

slide-4
SLIDE 4

?

Idea: Learn Word-to-Word Translation via Word Alignment

The cat is on the chair. Le chat est sur la chaise. The cat is on the chair. Le chat est sur la chaise.

Slides courtesy Rebecca Knowles

slide-5
SLIDE 5

Assumption: Parallel Texts

Whereas recognition of the inherent dignity and of the equal and inalienable rights of all members of the human family is the foundation of freedom, justice and peace in the world, Whereas disregard and contempt for human rights have resulted in barbarous acts which have outraged the conscience of mankind, and the advent of a world in which human beings shall enjoy freedom of speech and belief and freedom from fear and want has been proclaimed as the highest aspiration of the common people, Whereas it is essential, if man is not to be compelled to have recourse, as a last resort, to rebellion against tyranny and oppression, that human rights should be protected by the rule of law, Whereas it is essential to promote the development of friendly relations between nations, …

http://www.un.org/en/universal-declaration-human-rights/

Yolki, pampa ni tlatepanitalotl, ni tlasenkauajkayotl iuan ni kuali nemilistli ipan ni tlalpan, yaya ni moneki moixmatis uan monemilis, ijkinoj nochi kuali tiitstosej ika touampoyouaj. Pampa tlaj amo tikixmatij tlatepanitalistli uan tlen kuali nemilistli ipan ni tlalpan, yeka onkatok kualantli, onkatok tlateuilistli, onkatok majmajtli uan sekinok tlamantli teixpanolistli; yeka moneki ma kuali timouikakaj ika nochi touampoyouaj, ma amo onkaj majmajyotl uan teixpanolistli; moneki ma onkaj yejyektlalistli, ma titlajtlajtokaj uan ma tijneltokakaj tlen tojuantij tijnekij tijneltokasej uan amo tlen ma topanti, kenke, pampa tijnekij ma onkaj tlatepanitalistli. Pampa ni tlatepanitalotl moneki ma tiyejyekokaj, ma tijchiuakaj uan ma tijmanauikaj; ma nojkia kiixmatikaj tekiuajtinij, uejueyij tekiuajtinij, ijkinoj amo onkas nopeka se akajya touampoj san tlen ueli kinekis techchiuilis, technauatis, kinekis technauatis ma tijchiuakaj se tlamantli tlen amo kuali; yeka ni tlatepanitalotl tlauel moneki ipan tonemilis ni tlalpan. Pampa nojkia tlauel moneki ma kuali timouikakaj, ma tielikaj keuak tiiknimej, nochi tlen tlakamej uan siuamej tlen tiitstokej ni tlalpan.

http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=nhn

Slides courtesy Rebecca Knowles

  • Bitext/parallel texts: sentences with their (human provided) translations
  • Sentences are aligned, words are not
  • Commonly used bitext: Europarl (http://www.statmt.org/europarl/)
slide-6
SLIDE 6

Alignments

If we had word-aligned text, we could easily estimate P(f|e). But we don’t usually have word alignments, and they are expensive to produce by hand… If we had P(f|e) we could produce alignments automatically.

Slides courtesy Rebecca Knowles

slide-7
SLIDE 7

Joint model unobserved

IBM Model 1 (1993)

f: vector of French words (visualization of alignment) e: vector of English words a: vector of alignment indices Le chat est sur la chaise verte The cat is on the green chair 0 1 2 3 4 6 5

Slides courtesy Rebecca Knowles

Lexical Translation Model Word Alignment Model For all IBM models, see the original paper (Brown et al, 1993): http://www.aclweb.org/anthology/J93-2003

  • bserved

t(fj|ei) : translation probability of the word fj given the word ei

slide-8
SLIDE 8

Expectation Maximization (EM)

  • 0. Assume some value for

and compute other parameter values Two step, iterative algorithm

  • 1. E-step: count alignments and translations under

uncertainty, assuming these parameters

  • 2. M-step: maximize log-likelihood (update

parameters), using uncertain counts

estimated counts

P( | “the cat”) P( | “the cat”)

le chat le chat Slides courtesy Rebecca Knowles

slide-9
SLIDE 9

Outline

Recap: MT word alignment Structure in Language: Constituency (Probabilistic) Context Free Grammars

Definitions High-level tasks: Generating and Parsing Some uses for PCFGs

CKY Algorithm: Parsing with a (P)CFG

slide-10
SLIDE 10

Parts of Speech

Adapted from Luke Zettlemoyer

Closed class words Open class words Nouns milk cat cats UMBC Baltimore bread speak give can do may Verbs Adjectives would-be wettest large happy red fake

Kamp & Partee (1995)

Adverbs recently happily then there (location)

intransitive

run

ditransitive transitive subsective non- subsective modals, auxiliaries

Numbers I you

  • ne

1,324 Determiners Prepositions Conjunctions

Pronouns

and

  • r

if a the every what in under top Particles

(set) up

so (far) not (call)

  • ff

because because

slide-11
SLIDE 11

Constituency

spans of words that act (syntactically) as a group “X phrase” (noun phrase)

slide-12
SLIDE 12

Constituency

spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be.

noun phrase (NP)

slide-13
SLIDE 13

Constituency

spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be.

noun phrase (NP) noun phrase (NP)

slide-14
SLIDE 14

Constituency

spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be. Is this house a great place to be?

noun phrase (NP) noun phrase (NP)

slide-15
SLIDE 15

Constituency

spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be. *This is house a great place to be.

noun phrase (NP) noun phrase (NP)

slide-16
SLIDE 16

Constituents Help Form Grammars

constituent: spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be.

S  NP V NP

slide-17
SLIDE 17

Constituents Help Form Grammars

constituent: spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be.

S  NP V NP NP  Det Noun NP  Noun

slide-18
SLIDE 18

Constituents Help Form Grammars

constituent: spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be.

S  NP V NP NP  Det Noun NP  Noun NP  Det Adj Noun

slide-19
SLIDE 19

Constituents Help Form Grammars

constituent: spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be. The hill is a great place to be.

S  NP V NP NP  Det Noun NP  Noun NP  Det Adj Noun

slide-20
SLIDE 20

Constituents Help Form Grammars

constituent: spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be. The hill is a great place to be.

S  NP V NP NP  Det Noun NP  Noun NP  Det Adj Noun NP  NP Prep NP

slide-21
SLIDE 21

Constituents Help Form Grammars

constituent: spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be. This red house near the hill is a great place to be. This red house atop the hill is a great place to be. The hill is a great place to be.

S  NP V NP NP  Det Noun NP  Noun NP  Det Adj Noun NP  NP PP PP  P NP

slide-22
SLIDE 22

Constituents Help Form Grammars

constituent: spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be. This red house near the hill is a great place to be. This red house atop the hill is a great place to be. The hill is a great place to be.

S  NP V NP NP  Det Noun NP  Noun NP  Det AdjP NP  NP PP PP  P NP AdjP  Adj Noun

slide-23
SLIDE 23

Constituents Help Form Grammars

constituent: spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be. This red house near the hill is a great place to be. This red house atop the hill is a great place to be. The hill is a great place to be.

S  NP VP NP  Det Noun NP  Noun NP  Det AdjP NP  NP PP PP  P NP AdjP  Adj Noun VP  V NP

slide-24
SLIDE 24

Constituents Help Form Grammars

constituent: spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be. This red house near the hill is a great place to be. This red house atop the hill is a great place to be. The hill is a great place to be.

S  NP VP NP  Det Noun NP  Noun NP  Det AdjP NP  NP PP PP  P NP AdjP  Adj Noun VP  V NP

slide-25
SLIDE 25

Constituents Help Form Grammars

constituent: spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be. This red house near the hill is a great place to be. This red house atop the hill is a great place to be. The hill is a great place to be.

S  NP VP NP  Det Noun NP  Noun NP  Det AdjP NP  NP PP PP  P NP AdjP  Adj Noun VP  V NP Noun  Baltimore

slide-26
SLIDE 26

Outline

Recap: MT word alignment Structure in Language: Constituency (Probabilistic) Context Free Grammars

Definitions High-level tasks: Generating and Parsing Some uses for PCFGs

CKY Algorithm: Parsing with a (P)CFG

slide-27
SLIDE 27

Context Free Grammar

Set of rewrite rules, comprised of terminals and non-terminals Terminals: the words in the language (the lexicon), e.g., Baltimore Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun (Sometimes) Pre-terminals: symbols that can

  • nly trigger lexical rewrites, e.g., Noun

S  NP VP NP  Det Noun NP  Noun NP  Det AdjP NP  NP PP PP  P NP AdjP  Adj Noun VP  V NP Noun  Baltimore

slide-28
SLIDE 28

Context Free Grammar

Set of rewrite rules, comprised of terminals and non-terminals Terminals: the words in the language (the lexicon), e.g., Baltimore Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun (Sometimes) Pre-terminals: symbols that can

  • nly trigger lexical rewrites, e.g., Noun

Applications: Learn more in CMSC 331, 431

Theory: Learn more in CMSC 451

S  NP VP NP  Det Noun NP  Noun NP  Det AdjP NP  NP PP PP  P NP AdjP  Adj Noun VP  V NP Noun  Baltimore

slide-29
SLIDE 29

How Do We Robustly Handle Ambiguities?

slide-30
SLIDE 30

How Do We Robustly Handle Ambiguities?

Add probabilities (to what?)

slide-31
SLIDE 31

Probabilistic Context Free Grammar

Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words in the language (the lexicon), e.g., Baltimore Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun (Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun

S  NP VP NP  Det Noun NP  Noun NP  Det AdjP NP  NP PP PP  P NP AdjP  Adj Noun VP  V NP Noun  Baltimore …

slide-32
SLIDE 32

Probabilistic Context Free Grammar

Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words in the language (the lexicon), e.g., Baltimore Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun (Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun

Q: What are the distributions? What must sum to 1? S  NP VP NP  Det Noun NP  Noun NP  Det AdjP NP  NP PP PP  P NP AdjP  Adj Noun VP  V NP Noun  Baltimore …

slide-33
SLIDE 33

Probabilistic Context Free Grammar

Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words in the language (the lexicon), e.g., Baltimore Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun (Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun

1.0 S  NP VP .4 NP  Det Noun .3 NP  Noun .2 NP  Det AdjP .1 NP  NP PP 1.0 PP  P NP .34 AdjP  Adj Noun .26 VP  V NP .0003 Noun  Baltimore … Q: What are the distributions? What must sum to 1?

A: P(X  Y Z | X)

slide-34
SLIDE 34

Probabilistic Context Free Grammar

p(

S NP VP

Noun

Baltimore

Verb

NP

is a great city

)=

product of probabilities of individual rules used in the derivation

slide-35
SLIDE 35

Probabilistic Context Free Grammar

p(

S NP VP

Noun

Baltimore

Verb

NP

is a great city

)=

p(

S NP VP ) *

product of probabilities of individual rules used in the derivation

slide-36
SLIDE 36

Probabilistic Context Free Grammar

p(

S NP VP

Noun

Baltimore

Verb

NP

is a great city

)=

p(

S NP VP ) *

p( ) *

NP

Noun

p( ) *

Noun

Baltimore

product of probabilities of individual rules used in the derivation

slide-37
SLIDE 37

Probabilistic Context Free Grammar

p(

S NP VP

Noun

Baltimore

Verb

NP

is a great city

)=

p(

S NP VP ) *

p( ) * p( ) *

NP

Noun

p( ) *

Noun

Baltimore

VP

Verb

NP

p( ) *

Verb

is

p( )

NP

a great city

product of probabilities of individual rules used in the derivation

slide-38
SLIDE 38

Log Probabilistic Context Free Grammar

lp(

S NP VP

Noun

Baltimore

Verb

NP

is a great city

)=

lp(

S NP VP ) +

lp( ) + lp( ) +

NP

Noun

lp( ) +

Noun

Baltimore

VP

Verb

NP

lp( ) +

Verb

is

lp( )

NP

a great city

sum of log probabilities of individual rules used in the derivation

slide-39
SLIDE 39

Estimating PCFGs

Attempt 1:

  • Get access to a treebank (corpus of

syntactically annotated sentences), e.g., the English Penn Treebank

  • Count productions
  • Smooth these counts
  • This gets ~75 F1
slide-40
SLIDE 40

Probabilistic Context Free Grammar (PCFG) Tasks

Find the most likely parse (for an observed sequence) Calculate the (log) likelihood of an observed sequence w1, …, wN Learn the grammar parameters

slide-41
SLIDE 41

Outline

Recap: MT word alignment Structure in Language: Constituency (Probabilistic) Context Free Grammars

Definitions High-level tasks: Generating and Parsing Some uses for PCFGs

CKY Algorithm: Parsing with a (P)CFG

slide-42
SLIDE 42

Context Free Grammar

Generate: Iteratively create a string (or tree 1. derivation) using the rewrite rules Parse: Assign a tree (if possible) to an input string 2.

slide-43
SLIDE 43

Generate from a Context Free Grammar

S  NP VP NP  Det Noun NP  Noun NP  Det AdjP NP  NP PP PP  P NP AdjP  Adj Noun VP  V NP Noun  Baltimore S S

slide-44
SLIDE 44

Generate from a Context Free Grammar

S  NP VP NP  Det Noun NP  Noun NP  Det AdjP NP  NP PP PP  P NP AdjP  Adj Noun VP  V NP Noun  Baltimore NP VP S NP VP

slide-45
SLIDE 45

Generate from a Context Free Grammar

S  NP VP NP  Det Noun NP  Noun NP  Det AdjP NP  NP PP PP  P NP AdjP  Adj Noun VP  V NP Noun  Baltimore Noun VP S NP VP

Noun

slide-46
SLIDE 46

Generate from a Context Free Grammar

S  NP VP NP  Det Noun NP  Noun NP  Det AdjP NP  NP PP PP  P NP AdjP  Adj Noun VP  V NP Noun  Baltimore Baltimore VP S NP VP

Noun

Baltimore

slide-47
SLIDE 47

Generate from a Context Free Grammar

S  NP VP NP  Det Noun NP  Noun NP  Det AdjP NP  NP PP PP  P NP AdjP  Adj Noun VP  V NP Noun  Baltimore Baltimore V NP S NP VP

Noun

Baltimore

Verb

NP

slide-48
SLIDE 48

Generate from a Context Free Grammar

S  NP VP NP  Det Noun NP  Noun NP  Det AdjP NP  NP PP PP  P NP AdjP  Adj Noun VP  V NP Noun  Baltimore … Baltimore is a great city S NP VP

Noun

Baltimore

Verb

NP

is a great city

slide-49
SLIDE 49

Generate from a Context Free Grammar

S  NP VP NP  Det Noun NP  Noun NP  Det AdjP NP  NP PP PP  P NP AdjP  Adj Noun VP  V NP Noun  Baltimore … Baltimore is a great city S NP VP

Noun

Baltimore

Verb

NP

is a great city

slide-50
SLIDE 50

Assign Structure (Parse) with a Context Free Grammar

S  NP VP NP  Det Noun NP  Noun NP  Det AdjP NP  NP PP PP  P NP AdjP  Adj Noun VP  V NP Noun  Baltimore … Baltimore is a great city S NP VP

Noun

Baltimore

Verb

NP

is a great city

slide-51
SLIDE 51

Assign Structure (Parse) with a Context Free Grammar

S  NP VP NP  Det Noun NP  Noun NP  Det AdjP NP  NP PP PP  P NP AdjP  Adj Noun VP  V NP Noun  Baltimore … [S [NP [Noun Baltimore] ] [VP [Verb is] [NP a great city]]] S NP VP

Noun

Baltimore

Verb

NP

is a great city

bracket notation

slide-52
SLIDE 52

Assign Structure (Parse) with a Context Free Grammar

S  NP VP NP  Det Noun NP  Noun NP  Det AdjP NP  NP PP PP  P NP AdjP  Adj Noun VP  V NP Noun  Baltimore … (S (NP (Noun Baltimore)) (VP (V is) (NP a great city))) S NP VP

Noun

Baltimore

Verb

NP

is a great city

S-expression

slide-53
SLIDE 53

Some CFG Terminology: Derivation/Parse Tree

S  NP VP NP  Det Noun NP  Noun NP  Det AdjP NP  NP PP PP  P NP AdjP  Adj Noun VP  V `P Noun  Baltimore … S NP VP

Noun

Baltimore

Verb

NP

is a great city

derivation, parse tree

slide-54
SLIDE 54

Some CFG Terminology: Start Symbol

S  NP VP NP  Det Noun NP  Noun NP  Det AdjP NP  NP PP PP  P NP AdjP  Adj Noun VP  V NP Noun  Baltimore … S NP VP

Noun

Baltimore

Verb

NP

is a great city

start symbol

slide-55
SLIDE 55

Some CFG Terminology: Rewrite Choices

S  NP VP NP  Det Noun | Noun | Det AdjP | NP PP PP  P NP AdjP  Adj Noun VP  V NP Noun  Baltimore | …. … S NP VP

Noun

Baltimore

Verb

NP

is a great city

show choices with “|” (vertical bar)

slide-56
SLIDE 56

Some CFG Terminology: Chomsky Normal Form (CNF)

non-terminal  non-terminal non-terminal non-terminal  terminal

X  Y Z X  a

binary rules can only involve non-terminals unary rules can only involve terminals

Restricted binary and unary rules only No ternary rules (or above)

slide-57
SLIDE 57

Outline

Recap: MT word alignment Structure in Language: Constituency (Probabilistic) Context Free Grammars

Definitions High-level tasks: Generating and Parsing Some uses for PCFGs

CKY Algorithm: Parsing with a (P)CFG

slide-58
SLIDE 58

What are some benefits to CFGs? Why should you care about syntax?

slide-59
SLIDE 59

Some Uses of CFGs

Clearly disambiguate certain ambiguities Morphological derivations Identify “grammatical” sentences …

slide-60
SLIDE 60

Clearly Show Ambiguity

I ate the meal with friends

slide-61
SLIDE 61

Clearly Show Ambiguity

I ate the meal with friends

slide-62
SLIDE 62

Clearly Show Ambiguity

I ate the meal with friends salt

slide-63
SLIDE 63

Clearly Show Ambiguity

I ate the meal with friends

NP VP VP NP PP S

slide-64
SLIDE 64

Clearly Show Ambiguity

I ate the meal with friends

NP VP VP NP PP S NP VP S VP NP PP NP

slide-65
SLIDE 65

Clearly Show Ambiguity

I ate the meal with friends

NP VP VP NP PP S NP VP S VP NP PP NP

PP Attachment

(a common source of errors, even still today)

slide-66
SLIDE 66

Clearly Show Ambiguity… But Not Necessarily All Ambiguity

I ate the meal with friends

NP VP VP NP PP S

I ate the meal with gusto I ate the meal with a fork

slide-67
SLIDE 67

Other Attachment Ambiguity

We invited the students, Chris and Pat.

slide-68
SLIDE 68

Coordination Ambiguity

  • ld

men women and

slide-69
SLIDE 69

Grammars Aren’t Just for Syntax

  • vergeneralization

general

  • ize

A AV generalize V

  • tion

VN generalization N

  • ver-

NN

  • vergeneralization

N

slide-70
SLIDE 70

Clearly Show Grammaticality (?)

The

  • ld

man the boats

S NP VP

slide-71
SLIDE 71

Clearly Show Grammaticality (?)

The

  • ld

man the boats

S NP VP S NP NP

slide-72
SLIDE 72

Clearly Show Grammaticality (?)

The

  • ld

man the boats

S NP VP S NP NP

Idea: define grammatical sentences as those that can be parsed by a grammar

slide-73
SLIDE 73

Clearly Show Grammaticality (?)

The

  • ld

man the boats

S NP VP S NP NP

Idea: define grammatical sentences as those that can be parsed by a grammar Issue 1: Which grammar?

slide-74
SLIDE 74

Clearly Show Grammaticality (?)

The

  • ld

man the boats

S NP VP S NP NP

Idea: define grammatical sentences as those that can be parsed by a grammar Issue 1: Which grammar? Issue 2: Discourse demands flexibility Q: What do you see? A: [I see] The old man [and] the boats.

slide-75
SLIDE 75

Outline

Recap: MT word alignment Structure in Language: Constituency (Probabilistic) Context Free Grammars

Definitions High-level tasks: Generating and Parsing Some uses for PCFGs

CKY Algorithm: Parsing with a (P)CFG

slide-76
SLIDE 76

Parsing with a CFG

Top-down backtracking (brute force) CKY Algorithm: dynamic bottom-up Earley’s Algorithm: dynamic top-down

not covered due to time

slide-77
SLIDE 77

CKY Precondition

Grammar must be in Chomsky Normal Form (CNF) non-terminal  non-terminal non-terminal non-terminal  terminal

slide-78
SLIDE 78

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a

Example from Jason Eisner

Entire grammar Assume uniform weights

slide-79
SLIDE 79

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a

1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

slide-80
SLIDE 80

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a

1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

Goal: (S, 0, 7)

slide-81
SLIDE 81

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

Check 1: What are the non- terminals?

slide-82
SLIDE 82

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

Check 1: What are the non- terminals?

S NP VP PP N V P Det

Check 2: What are the terminals?

slide-83
SLIDE 83

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

Check 1: What are the non- terminals?

S NP VP PP N V P Det

Check 2: What are the terminals?

Papa caviar spoon ate with the a

Check 3: What are the pre- terminals?

slide-84
SLIDE 84

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

Check 1: What are the non- terminals?

S NP VP PP N V P Det

Check 2: What are the terminals?

Papa caviar spoon ate with the a

Check 3: What are the pre- terminals?

N V P Det

Check 4: Is this in CNF?

slide-85
SLIDE 85

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

Check 1: What are the non- terminals?

S NP VP PP N V P Det

Check 2: What are the terminals?

Papa caviar spoon ate with the a

Check 3: What are the pre- terminals?

N V P Det

Check 4: Is this in CNF?

Yes

slide-86
SLIDE 86

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

First: Let’s find all NPs

slide-87
SLIDE 87

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

First: Let’s find all NPs

(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon

slide-88
SLIDE 88

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

First: Let’s find all NPs

(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar

slide-89
SLIDE 89

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

First: Let’s find all NPs

(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar

slide-90
SLIDE 90

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

First: Let’s find all NPs

(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar

slide-91
SLIDE 91

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

First: Let’s find all NPs

(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar

slide-92
SLIDE 92

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

First: Let’s find all NPs

(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar

slide-93
SLIDE 93

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a

1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

First: Let’s find all NPs

(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar

(NP, 0, 1) (VP, 1, 7) (S, 0, 7)

slide-94
SLIDE 94

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a

1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

First: Let’s find all NPs

(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar 6 1 2 3 4 5 1 2 3 4 5 6 7

(NP, 0, 1) (VP, 1, 7) (S, 0, 7)

start end

slide-95
SLIDE 95

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a

1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

First: Let’s find all NPs

(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar

NP

6 1 2 3 4 5 1 2 3 4 5 6 7

(NP, 0, 1) (VP, 1, 7) (S, 0, 7)

start end

slide-96
SLIDE 96

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a

1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

First: Let’s find all NPs

(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar

NP VP

6 1 2 3 4 5 1 2 3 4 5 6 7

(NP, 0, 1) (VP, 1, 7) (S, 0, 7)

start end

slide-97
SLIDE 97

“Papa ate the caviar with a spoon”

S  NP VP NP  Det N NP  NP PP VP  V NP VP  VP PP PP  P NP NP  Papa N  caviar N  spoon V  spoon V  ate P  with Det  the Det  a

1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammar Assume uniform weights

First: Let’s find all NPs

(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar

NP VP

6 1 2 3 4 5 1 2 3 4 5 6 7

(NP, 0, 1) (VP, 1, 7) (S, 0, 7)

start end S

slide-98
SLIDE 98

CKY Recognizer

Input: * string of N words * grammar in CNF Output: True (with parse)/False Data structure: N*N table T Rows indicate span start (0 to N-1) Columns indicate span end (1 to N) T[i][j] lists constituents spanning i  j

slide-99
SLIDE 99

CKY Recognizer

Input: * string of N words * grammar in CNF Output: True (with parse)/False Data structure: N*N table T Rows indicate span start (0 to N-1) Columns indicate span end (1 to N) T[i][j] lists constituents spanning i  j For Viterbi in HMMs: build table left-to-right For CKY in trees:

  • 1. build smallest-to-largest &
  • 2. left-to-right
slide-100
SLIDE 100

CKY Recognizer

T = Cell[N][N+1]

slide-101
SLIDE 101

CKY Recognizer

T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X  wordj) }

slide-102
SLIDE 102

CKY Recognizer

T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X  wordj) } for(width = 2; width ≤ N; ++width) { }

slide-103
SLIDE 103

CKY Recognizer

T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X  wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width } }

slide-104
SLIDE 104

CKY Recognizer

T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X  wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { } } }

slide-105
SLIDE 105

CKY Recognizer

T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X  wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(non-terminal Y : T[start][mid]) { for(non-terminal Z : T[mid][end]) { T[start][end].add(X for rule X  Y Z : G) } } } } }

X Y Z Y Z

slide-106
SLIDE 106

CKY Recognizer

T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X  wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(non-terminal Y : T[start][mid]) { for(non-terminal Z : T[mid][end]) { T[start][end].add(X for rule X  Y Z : G) } } } } }

Q: What do we return?

slide-107
SLIDE 107

CKY Recognizer

T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X  wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(non-terminal Y : T[start][mid]) { for(non-terminal Z : T[mid][end]) { T[start][end].add(X for rule X  Y Z : G) } } } } }

Q: What do we return? A: S in T[0][N]

slide-108
SLIDE 108

CKY Recognizer

T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X  wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(non-terminal Y : T[start][mid]) { for(non-terminal Z : T[mid][end]) { T[start][end].add(X for rule X  Y Z : G) } } } } }

Q: How do we get the parse?

slide-109
SLIDE 109

CKY Recognizer

T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X  wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(non-terminal Y : T[start][mid]) { for(non-terminal Z : T[mid][end]) { T[start][end].add(X for rule X  Y Z : G) } } } } }

Q: How do we get the parse? A: Follow backpointers (stored where?)

slide-110
SLIDE 110

CKY Recognizer

T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X  wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(non-terminal Y : T[start][mid]) { for(non-terminal Z : T[mid][end]) { T[start][end].add(X for rule X  Y Z : G) } } } } }

slide-111
SLIDE 111

CKY Recognizer

T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X  wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(rule X  Y Z : G) { T[start][end].add(X if Y in T[start][mid] & Z in T[mid][end]) } } } }

slide-112
SLIDE 112

CKY Recognizer

T = bool[K][N][N+1] for(j = 1; j ≤ N; ++j) { for(non-terminal X in G if X  wordj) { T[X][j-1][j] = True } } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(rule X  Y Z : G) { for rule X  Y Z : G) { T[X][start][end] = T[Y][start][mid] & T[Z][mid][end] } } } } }

slide-113
SLIDE 113

Another PCFG Task: Likelihood of the Observed Words

p(S + w1 w2 w3 … wN) p(w1 w2 w3 … wN)

likelihood of word sequence w1w2…wN

p( )

S

w1 w2 w3 w4

p( )

S

w1 w2 w3 w4

p( )

S

w1 w2 w3 w4

likelihood of word sequence w1w2…wN based on starting at S

“syntactic language model”

slide-114
SLIDE 114

CKY is Versatile: PCFG Tasks

Task PCFG algorithm name HMM analog Find any parse CKY recognizer none Find the most likely parse (for an

  • bserved sequence)

CKY weighted Viterbi Viterbi Calculate the (log) likelihood of an

  • bserved sequence w1, …, wN

Inside algorithm Forward algorithm Learn the grammar parameters Inside-outside algorithm (EM) Forward- backward/Baum- Welch (EM)

slide-115
SLIDE 115

CKY Algorithms

Weights   ⓪ ①

Recognizer Boolean (True/False)

  • r

and False True Viterbi [0,1] max * 1 Inside [0,1] + * 1 Outside? Not really (“Semiring Parsing,” Goodman, 1998). But there is a connection between inside-outside and backprop! (“Inside-Outside and Forward-Backward Algorithms are Just Backprop,” Eisner, 2016)

Adapted from Jason Eisner

slide-116
SLIDE 116

Outline

Recap: MT word alignment Structure in Language: Constituency (Probabilistic) Context Free Grammars

Definitions High-level tasks: Generating and Parsing Some uses for PCFGs

CKY Algorithm: Parsing with a (P)CFG