Probabilistic Context Free Grammars
CMSC 473/673 UMBC
Probabilistic Context Free Grammars CMSC 473/673 UMBC Outline - - PowerPoint PPT Presentation
Probabilistic Context Free Grammars CMSC 473/673 UMBC Outline Recap: MT word alignment Structure in Language: Constituency (Probabilistic) Context Free Grammars Definitions High-level tasks: Generating and Parsing Some uses for PCFGs CKY
Probabilistic Context Free Grammars
CMSC 473/673 UMBC
Outline
Recap: MT word alignment Structure in Language: Constituency (Probabilistic) Context Free Grammars
Definitions High-level tasks: Generating and Parsing Some uses for PCFGs
CKY Algorithm: Parsing with a (P)CFG
Machine Translation as a Noisy Channel Model
Decode Rerank written in (clean) English
Russian (noisy) text translation/ decode model (clean) language model
English
languageязы́к
speaktext
wtext
wSlides courtesy Rebecca Knowles
Idea: Learn Word-to-Word Translation via Word Alignment
The cat is on the chair. Le chat est sur la chaise. The cat is on the chair. Le chat est sur la chaise.
Slides courtesy Rebecca Knowles
Assumption: Parallel Texts
Whereas recognition of the inherent dignity and of the equal and inalienable rights of all members of the human family is the foundation of freedom, justice and peace in the world, Whereas disregard and contempt for human rights have resulted in barbarous acts which have outraged the conscience of mankind, and the advent of a world in which human beings shall enjoy freedom of speech and belief and freedom from fear and want has been proclaimed as the highest aspiration of the common people, Whereas it is essential, if man is not to be compelled to have recourse, as a last resort, to rebellion against tyranny and oppression, that human rights should be protected by the rule of law, Whereas it is essential to promote the development of friendly relations between nations, …
http://www.un.org/en/universal-declaration-human-rights/Yolki, pampa ni tlatepanitalotl, ni tlasenkauajkayotl iuan ni kuali nemilistli ipan ni tlalpan, yaya ni moneki moixmatis uan monemilis, ijkinoj nochi kuali tiitstosej ika touampoyouaj. Pampa tlaj amo tikixmatij tlatepanitalistli uan tlen kuali nemilistli ipan ni tlalpan, yeka onkatok kualantli, onkatok tlateuilistli, onkatok majmajtli uan sekinok tlamantli teixpanolistli; yeka moneki ma kuali timouikakaj ika nochi touampoyouaj, ma amo onkaj majmajyotl uan teixpanolistli; moneki ma onkaj yejyektlalistli, ma titlajtlajtokaj uan ma tijneltokakaj tlen tojuantij tijnekij tijneltokasej uan amo tlen ma topanti, kenke, pampa tijnekij ma onkaj tlatepanitalistli. Pampa ni tlatepanitalotl moneki ma tiyejyekokaj, ma tijchiuakaj uan ma tijmanauikaj; ma nojkia kiixmatikaj tekiuajtinij, uejueyij tekiuajtinij, ijkinoj amo onkas nopeka se akajya touampoj san tlen ueli kinekis techchiuilis, technauatis, kinekis technauatis ma tijchiuakaj se tlamantli tlen amo kuali; yeka ni tlatepanitalotl tlauel moneki ipan tonemilis ni tlalpan. Pampa nojkia tlauel moneki ma kuali timouikakaj, ma tielikaj keuak tiiknimej, nochi tlen tlakamej uan siuamej tlen tiitstokej ni tlalpan.
… http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=nhnSlides courtesy Rebecca Knowles
Alignments
If we had word-aligned text, we could easily estimate P(f|e). But we don’t usually have word alignments, and they are expensive to produce by hand… If we had P(f|e) we could produce alignments automatically.
Slides courtesy Rebecca Knowles
Joint model unobserved
IBM Model 1 (1993)
f: vector of French words (visualization of alignment) e: vector of English words a: vector of alignment indices Le chat est sur la chaise verte The cat is on the green chair 0 1 2 3 4 6 5
Slides courtesy Rebecca Knowles
Lexical Translation Model Word Alignment Model For all IBM models, see the original paper (Brown et al, 1993): http://www.aclweb.org/anthology/J93-2003
t(fj|ei) : translation probability of the word fj given the word ei
Expectation Maximization (EM)
and compute other parameter values Two step, iterative algorithm
uncertainty, assuming these parameters
parameters), using uncertain counts
estimated counts
P( | “the cat”) P( | “the cat”)
le chat le chat Slides courtesy Rebecca Knowles
Outline
Recap: MT word alignment Structure in Language: Constituency (Probabilistic) Context Free Grammars
Definitions High-level tasks: Generating and Parsing Some uses for PCFGs
CKY Algorithm: Parsing with a (P)CFG
Parts of Speech
Adapted from Luke Zettlemoyer
Closed class words Open class words Nouns milk cat cats UMBC Baltimore bread speak give can do may Verbs Adjectives would-be wettest large happy red fake
Kamp & Partee (1995)
Adverbs recently happily then there (location)
intransitive
run
ditransitive transitive subsective non- subsective modals, auxiliaries
Numbers I you
1,324 Determiners Prepositions Conjunctions
Pronouns
and
if a the every what in under top Particles
(set) up
so (far) not (call)
because because
Constituency
spans of words that act (syntactically) as a group “X phrase” (noun phrase)
Constituency
spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be.
noun phrase (NP)
Constituency
spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be.
noun phrase (NP) noun phrase (NP)
Constituency
spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be. Is this house a great place to be?
noun phrase (NP) noun phrase (NP)
Constituency
spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be. *This is house a great place to be.
noun phrase (NP) noun phrase (NP)
Constituents Help Form Grammars
constituent: spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be.
S NP V NP
Constituents Help Form Grammars
constituent: spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be.
S NP V NP NP Det Noun NP Noun
Constituents Help Form Grammars
constituent: spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be.
S NP V NP NP Det Noun NP Noun NP Det Adj Noun
Constituents Help Form Grammars
constituent: spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be. The hill is a great place to be.
S NP V NP NP Det Noun NP Noun NP Det Adj Noun
Constituents Help Form Grammars
constituent: spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be. The hill is a great place to be.
S NP V NP NP Det Noun NP Noun NP Det Adj Noun NP NP Prep NP
Constituents Help Form Grammars
constituent: spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be. This red house near the hill is a great place to be. This red house atop the hill is a great place to be. The hill is a great place to be.
S NP V NP NP Det Noun NP Noun NP Det Adj Noun NP NP PP PP P NP
Constituents Help Form Grammars
constituent: spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be. This red house near the hill is a great place to be. This red house atop the hill is a great place to be. The hill is a great place to be.
S NP V NP NP Det Noun NP Noun NP Det AdjP NP NP PP PP P NP AdjP Adj Noun
Constituents Help Form Grammars
constituent: spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be. This red house near the hill is a great place to be. This red house atop the hill is a great place to be. The hill is a great place to be.
S NP VP NP Det Noun NP Noun NP Det AdjP NP NP PP PP P NP AdjP Adj Noun VP V NP
Constituents Help Form Grammars
constituent: spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be. This red house near the hill is a great place to be. This red house atop the hill is a great place to be. The hill is a great place to be.
S NP VP NP Det Noun NP Noun NP Det AdjP NP NP PP PP P NP AdjP Adj Noun VP V NP
Constituents Help Form Grammars
constituent: spans of words that act (syntactically) as a group “X phrase” (noun phrase) Baltimore is a great place to be. This house is a great place to be. This red house is a great place to be. This red house on the hill is a great place to be. This red house near the hill is a great place to be. This red house atop the hill is a great place to be. The hill is a great place to be.
S NP VP NP Det Noun NP Noun NP Det AdjP NP NP PP PP P NP AdjP Adj Noun VP V NP Noun Baltimore
Outline
Recap: MT word alignment Structure in Language: Constituency (Probabilistic) Context Free Grammars
Definitions High-level tasks: Generating and Parsing Some uses for PCFGs
CKY Algorithm: Parsing with a (P)CFG
Context Free Grammar
Set of rewrite rules, comprised of terminals and non-terminals Terminals: the words in the language (the lexicon), e.g., Baltimore Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun (Sometimes) Pre-terminals: symbols that can
S NP VP NP Det Noun NP Noun NP Det AdjP NP NP PP PP P NP AdjP Adj Noun VP V NP Noun Baltimore
Context Free Grammar
Set of rewrite rules, comprised of terminals and non-terminals Terminals: the words in the language (the lexicon), e.g., Baltimore Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun (Sometimes) Pre-terminals: symbols that can
Applications: Learn more in CMSC 331, 431
Theory: Learn more in CMSC 451
S NP VP NP Det Noun NP Noun NP Det AdjP NP NP PP PP P NP AdjP Adj Noun VP V NP Noun Baltimore
How Do We Robustly Handle Ambiguities?
How Do We Robustly Handle Ambiguities?
Add probabilities (to what?)
Probabilistic Context Free Grammar
Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words in the language (the lexicon), e.g., Baltimore Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun (Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun
S NP VP NP Det Noun NP Noun NP Det AdjP NP NP PP PP P NP AdjP Adj Noun VP V NP Noun Baltimore …
Probabilistic Context Free Grammar
Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words in the language (the lexicon), e.g., Baltimore Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun (Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun
Q: What are the distributions? What must sum to 1? S NP VP NP Det Noun NP Noun NP Det AdjP NP NP PP PP P NP AdjP Adj Noun VP V NP Noun Baltimore …
Probabilistic Context Free Grammar
Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words in the language (the lexicon), e.g., Baltimore Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun (Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun
1.0 S NP VP .4 NP Det Noun .3 NP Noun .2 NP Det AdjP .1 NP NP PP 1.0 PP P NP .34 AdjP Adj Noun .26 VP V NP .0003 Noun Baltimore … Q: What are the distributions? What must sum to 1?
A: P(X Y Z | X)
Probabilistic Context Free Grammar
S NP VP
Noun
Baltimore
Verb
NP
is a great city
product of probabilities of individual rules used in the derivation
Probabilistic Context Free Grammar
S NP VP
Noun
Baltimore
Verb
NP
is a great city
S NP VP ) *
product of probabilities of individual rules used in the derivation
Probabilistic Context Free Grammar
S NP VP
Noun
Baltimore
Verb
NP
is a great city
S NP VP ) *
NP
Noun
Noun
Baltimore
product of probabilities of individual rules used in the derivation
Probabilistic Context Free Grammar
S NP VP
Noun
Baltimore
Verb
NP
is a great city
S NP VP ) *
NP
Noun
Noun
Baltimore
VP
Verb
NP
Verb
is
NP
a great city
product of probabilities of individual rules used in the derivation
Log Probabilistic Context Free Grammar
S NP VP
Noun
Baltimore
Verb
NP
is a great city
S NP VP ) +
NP
Noun
Noun
Baltimore
VP
Verb
NP
Verb
is
NP
a great city
sum of log probabilities of individual rules used in the derivation
Estimating PCFGs
Attempt 1:
syntactically annotated sentences), e.g., the English Penn Treebank
Probabilistic Context Free Grammar (PCFG) Tasks
Find the most likely parse (for an observed sequence) Calculate the (log) likelihood of an observed sequence w1, …, wN Learn the grammar parameters
Outline
Recap: MT word alignment Structure in Language: Constituency (Probabilistic) Context Free Grammars
Definitions High-level tasks: Generating and Parsing Some uses for PCFGs
CKY Algorithm: Parsing with a (P)CFG
Context Free Grammar
Generate: Iteratively create a string (or tree 1. derivation) using the rewrite rules Parse: Assign a tree (if possible) to an input string 2.
Generate from a Context Free Grammar
S NP VP NP Det Noun NP Noun NP Det AdjP NP NP PP PP P NP AdjP Adj Noun VP V NP Noun Baltimore S S
Generate from a Context Free Grammar
S NP VP NP Det Noun NP Noun NP Det AdjP NP NP PP PP P NP AdjP Adj Noun VP V NP Noun Baltimore NP VP S NP VP
Generate from a Context Free Grammar
S NP VP NP Det Noun NP Noun NP Det AdjP NP NP PP PP P NP AdjP Adj Noun VP V NP Noun Baltimore Noun VP S NP VP
Noun
Generate from a Context Free Grammar
S NP VP NP Det Noun NP Noun NP Det AdjP NP NP PP PP P NP AdjP Adj Noun VP V NP Noun Baltimore Baltimore VP S NP VP
Noun
Baltimore
Generate from a Context Free Grammar
S NP VP NP Det Noun NP Noun NP Det AdjP NP NP PP PP P NP AdjP Adj Noun VP V NP Noun Baltimore Baltimore V NP S NP VP
Noun
Baltimore
Verb
NP
Generate from a Context Free Grammar
S NP VP NP Det Noun NP Noun NP Det AdjP NP NP PP PP P NP AdjP Adj Noun VP V NP Noun Baltimore … Baltimore is a great city S NP VP
Noun
Baltimore
Verb
NP
is a great city
Generate from a Context Free Grammar
S NP VP NP Det Noun NP Noun NP Det AdjP NP NP PP PP P NP AdjP Adj Noun VP V NP Noun Baltimore … Baltimore is a great city S NP VP
Noun
Baltimore
Verb
NP
is a great city
Assign Structure (Parse) with a Context Free Grammar
S NP VP NP Det Noun NP Noun NP Det AdjP NP NP PP PP P NP AdjP Adj Noun VP V NP Noun Baltimore … Baltimore is a great city S NP VP
Noun
Baltimore
Verb
NP
is a great city
Assign Structure (Parse) with a Context Free Grammar
S NP VP NP Det Noun NP Noun NP Det AdjP NP NP PP PP P NP AdjP Adj Noun VP V NP Noun Baltimore … [S [NP [Noun Baltimore] ] [VP [Verb is] [NP a great city]]] S NP VP
Noun
Baltimore
Verb
NP
is a great city
bracket notation
Assign Structure (Parse) with a Context Free Grammar
S NP VP NP Det Noun NP Noun NP Det AdjP NP NP PP PP P NP AdjP Adj Noun VP V NP Noun Baltimore … (S (NP (Noun Baltimore)) (VP (V is) (NP a great city))) S NP VP
Noun
Baltimore
Verb
NP
is a great city
S-expression
Some CFG Terminology: Derivation/Parse Tree
S NP VP NP Det Noun NP Noun NP Det AdjP NP NP PP PP P NP AdjP Adj Noun VP V `P Noun Baltimore … S NP VP
Noun
Baltimore
Verb
NP
is a great city
derivation, parse tree
Some CFG Terminology: Start Symbol
S NP VP NP Det Noun NP Noun NP Det AdjP NP NP PP PP P NP AdjP Adj Noun VP V NP Noun Baltimore … S NP VP
Noun
Baltimore
Verb
NP
is a great city
start symbol
Some CFG Terminology: Rewrite Choices
S NP VP NP Det Noun | Noun | Det AdjP | NP PP PP P NP AdjP Adj Noun VP V NP Noun Baltimore | …. … S NP VP
Noun
Baltimore
Verb
NP
is a great city
show choices with “|” (vertical bar)
Some CFG Terminology: Chomsky Normal Form (CNF)
non-terminal non-terminal non-terminal non-terminal terminal
X Y Z X a
binary rules can only involve non-terminals unary rules can only involve terminals
Restricted binary and unary rules only No ternary rules (or above)
Outline
Recap: MT word alignment Structure in Language: Constituency (Probabilistic) Context Free Grammars
Definitions High-level tasks: Generating and Parsing Some uses for PCFGs
CKY Algorithm: Parsing with a (P)CFG
What are some benefits to CFGs? Why should you care about syntax?
Some Uses of CFGs
Clearly disambiguate certain ambiguities Morphological derivations Identify “grammatical” sentences …
Clearly Show Ambiguity
I ate the meal with friends
Clearly Show Ambiguity
I ate the meal with friends
Clearly Show Ambiguity
I ate the meal with friends salt
Clearly Show Ambiguity
I ate the meal with friends
NP VP VP NP PP S
Clearly Show Ambiguity
I ate the meal with friends
NP VP VP NP PP S NP VP S VP NP PP NP
Clearly Show Ambiguity
I ate the meal with friends
NP VP VP NP PP S NP VP S VP NP PP NP
PP Attachment
(a common source of errors, even still today)
Clearly Show Ambiguity… But Not Necessarily All Ambiguity
I ate the meal with friends
NP VP VP NP PP S
I ate the meal with gusto I ate the meal with a fork
Other Attachment Ambiguity
We invited the students, Chris and Pat.
Coordination Ambiguity
men women and
Grammars Aren’t Just for Syntax
general
A AV generalize V
VN generalization N
NN
N
Clearly Show Grammaticality (?)
The
man the boats
S NP VP
Clearly Show Grammaticality (?)
The
man the boats
S NP VP S NP NP
Clearly Show Grammaticality (?)
The
man the boats
S NP VP S NP NP
Idea: define grammatical sentences as those that can be parsed by a grammar
Clearly Show Grammaticality (?)
The
man the boats
S NP VP S NP NP
Idea: define grammatical sentences as those that can be parsed by a grammar Issue 1: Which grammar?
Clearly Show Grammaticality (?)
The
man the boats
S NP VP S NP NP
Idea: define grammatical sentences as those that can be parsed by a grammar Issue 1: Which grammar? Issue 2: Discourse demands flexibility Q: What do you see? A: [I see] The old man [and] the boats.
Outline
Recap: MT word alignment Structure in Language: Constituency (Probabilistic) Context Free Grammars
Definitions High-level tasks: Generating and Parsing Some uses for PCFGs
CKY Algorithm: Parsing with a (P)CFG
Parsing with a CFG
Top-down backtracking (brute force) CKY Algorithm: dynamic bottom-up Earley’s Algorithm: dynamic top-down
not covered due to time
CKY Precondition
Grammar must be in Chomsky Normal Form (CNF) non-terminal non-terminal non-terminal non-terminal terminal
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a
Example from Jason Eisner
Entire grammar Assume uniform weights
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a
1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a
1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
Goal: (S, 0, 7)
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
Check 1: What are the non- terminals?
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
Check 1: What are the non- terminals?
S NP VP PP N V P Det
Check 2: What are the terminals?
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
Check 1: What are the non- terminals?
S NP VP PP N V P Det
Check 2: What are the terminals?
Papa caviar spoon ate with the a
Check 3: What are the pre- terminals?
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
Check 1: What are the non- terminals?
S NP VP PP N V P Det
Check 2: What are the terminals?
Papa caviar spoon ate with the a
Check 3: What are the pre- terminals?
N V P Det
Check 4: Is this in CNF?
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
Check 1: What are the non- terminals?
S NP VP PP N V P Det
Check 2: What are the terminals?
Papa caviar spoon ate with the a
Check 3: What are the pre- terminals?
N V P Det
Check 4: Is this in CNF?
Yes
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
First: Let’s find all NPs
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
First: Let’s find all NPs
(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
First: Let’s find all NPs
(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
First: Let’s find all NPs
(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar
Third: Let’s find all Ss
(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
First: Let’s find all NPs
(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar
Third: Let’s find all Ss
(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
First: Let’s find all NPs
(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar
Third: Let’s find all Ss
(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
First: Let’s find all NPs
(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar
Third: Let’s find all Ss
(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a
1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
First: Let’s find all NPs
(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar
Third: Let’s find all Ss
(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar
(NP, 0, 1) (VP, 1, 7) (S, 0, 7)
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a
1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
First: Let’s find all NPs
(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar
Third: Let’s find all Ss
(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar 6 1 2 3 4 5 1 2 3 4 5 6 7
(NP, 0, 1) (VP, 1, 7) (S, 0, 7)
start end
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a
1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
First: Let’s find all NPs
(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar
Third: Let’s find all Ss
(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar
NP
6 1 2 3 4 5 1 2 3 4 5 6 7
(NP, 0, 1) (VP, 1, 7) (S, 0, 7)
start end
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a
1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
First: Let’s find all NPs
(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar
Third: Let’s find all Ss
(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar
NP VP
6 1 2 3 4 5 1 2 3 4 5 6 7
(NP, 0, 1) (VP, 1, 7) (S, 0, 7)
start end
“Papa ate the caviar with a spoon”
S NP VP NP Det N NP NP PP VP V NP VP VP PP PP P NP NP Papa N caviar N spoon V spoon V ate P with Det the Det a
1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammar Assume uniform weights
First: Let’s find all NPs
(NP, 0, 1): Papa (NP, 2, 4): the caviar (NP, 5, 7): a spoon (NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon (VP, 1, 4): ate the caviar
Third: Let’s find all Ss
(S, 0, 7): Papa ate the caviar with a spoon (S, 0, 4): Papa ate the caviar
NP VP
6 1 2 3 4 5 1 2 3 4 5 6 7
(NP, 0, 1) (VP, 1, 7) (S, 0, 7)
start end S
CKY Recognizer
Input: * string of N words * grammar in CNF Output: True (with parse)/False Data structure: N*N table T Rows indicate span start (0 to N-1) Columns indicate span end (1 to N) T[i][j] lists constituents spanning i j
CKY Recognizer
Input: * string of N words * grammar in CNF Output: True (with parse)/False Data structure: N*N table T Rows indicate span start (0 to N-1) Columns indicate span end (1 to N) T[i][j] lists constituents spanning i j For Viterbi in HMMs: build table left-to-right For CKY in trees:
CKY Recognizer
T = Cell[N][N+1]
CKY Recognizer
T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X wordj) }
CKY Recognizer
T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X wordj) } for(width = 2; width ≤ N; ++width) { }
CKY Recognizer
T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width } }
CKY Recognizer
T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { } } }
CKY Recognizer
T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(non-terminal Y : T[start][mid]) { for(non-terminal Z : T[mid][end]) { T[start][end].add(X for rule X Y Z : G) } } } } }
X Y Z Y Z
CKY Recognizer
T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(non-terminal Y : T[start][mid]) { for(non-terminal Z : T[mid][end]) { T[start][end].add(X for rule X Y Z : G) } } } } }
Q: What do we return?
CKY Recognizer
T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(non-terminal Y : T[start][mid]) { for(non-terminal Z : T[mid][end]) { T[start][end].add(X for rule X Y Z : G) } } } } }
Q: What do we return? A: S in T[0][N]
CKY Recognizer
T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(non-terminal Y : T[start][mid]) { for(non-terminal Z : T[mid][end]) { T[start][end].add(X for rule X Y Z : G) } } } } }
Q: How do we get the parse?
CKY Recognizer
T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(non-terminal Y : T[start][mid]) { for(non-terminal Z : T[mid][end]) { T[start][end].add(X for rule X Y Z : G) } } } } }
Q: How do we get the parse? A: Follow backpointers (stored where?)
CKY Recognizer
T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(non-terminal Y : T[start][mid]) { for(non-terminal Z : T[mid][end]) { T[start][end].add(X for rule X Y Z : G) } } } } }
CKY Recognizer
T = Cell[N][N+1] for(j = 1; j ≤ N; ++j) { T[j-1][j].add(X for non-terminal X in G if X wordj) } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(rule X Y Z : G) { T[start][end].add(X if Y in T[start][mid] & Z in T[mid][end]) } } } }
CKY Recognizer
T = bool[K][N][N+1] for(j = 1; j ≤ N; ++j) { for(non-terminal X in G if X wordj) { T[X][j-1][j] = True } } for(width = 2; width ≤ N; ++width) { for(start = 0; start < N - width; ++start) { end = start + width for(mid = start+1; mid < end; ++mid) { for(rule X Y Z : G) { for rule X Y Z : G) { T[X][start][end] = T[Y][start][mid] & T[Z][mid][end] } } } } }
Another PCFG Task: Likelihood of the Observed Words
p(S + w1 w2 w3 … wN) p(w1 w2 w3 … wN)
likelihood of word sequence w1w2…wN
p( )
S
w1 w2 w3 w4
p( )
S
w1 w2 w3 w4
p( )
S
w1 w2 w3 w4
…
likelihood of word sequence w1w2…wN based on starting at S
“syntactic language model”
CKY is Versatile: PCFG Tasks
Task PCFG algorithm name HMM analog Find any parse CKY recognizer none Find the most likely parse (for an
CKY weighted Viterbi Viterbi Calculate the (log) likelihood of an
Inside algorithm Forward algorithm Learn the grammar parameters Inside-outside algorithm (EM) Forward- backward/Baum- Welch (EM)
CKY Algorithms
Weights ⓪ ①
Recognizer Boolean (True/False)
and False True Viterbi [0,1] max * 1 Inside [0,1] + * 1 Outside? Not really (“Semiring Parsing,” Goodman, 1998). But there is a connection between inside-outside and backprop! (“Inside-Outside and Forward-Backward Algorithms are Just Backprop,” Eisner, 2016)
Adapted from Jason Eisner
Outline
Recap: MT word alignment Structure in Language: Constituency (Probabilistic) Context Free Grammars
Definitions High-level tasks: Generating and Parsing Some uses for PCFGs
CKY Algorithm: Parsing with a (P)CFG