{Probabilistic | Stochastic} Context-Free Grammars (PCFGs) 116 The - PowerPoint PPT Presentation

{Probabilistic | Stochastic} Context-Free Grammars (PCFGs) 116

� The velocity of the seismic waves rises to . . . S NP sg VP sg DT NN PP rises to . . . The velocity IN NP pl of the seismic waves 117

PCFGs A PCFG G consists of: � A set of terminals, { w k } , k = 1 , . . . , V � A set of nonterminals, { N i } , i = 1 , . . . , n � A designated start symbol, N 1 � A set of rules, { N i → ζ j } , (where ζ j is a sequence of terminals and nonterminals) � A corresponding set of probabilities on rules such that: P(N i → ζ j ) = 1 � ∀ i j 118

PCFG notation Sentence: sequence of words w 1 · · · w m w ab : the subsequence w a · · · w b ab : nonterminal N i dominates w a · · · w b N i N j w a · · · w b ∗ � ⇒ ζ : Repeated derivation from N i gives ζ . N i 119

PCFG probability of a string � P(w 1 n ) = P(w 1 n , t) t a parse of w 1 n t � = P(t) { t :yield ( t ) = w 1n } 120

A simple PCFG (in CNF) S → NP VP NP → NP PP 1.0 0.4 PP → P NP NP → astronomers 1.0 0.1 VP → V NP NP → ears 0.7 0.18 VP → VP PP NP → saw 0.3 0.04 P → with 1.0 NP → stars 0.18 V → saw NP → telescopes 1.0 0.1 121

t 1 : S 1 . 0 NP 0 . 1 VP 0 . 7 astronomers V 1 . 0 NP 0 . 4 saw NP 0 . 18 PP 1 . 0 stars P 1 . 0 NP 0 . 18 with ears 122

t 2 : S 1 . 0 NP 0 . 1 VP 0 . 3 astronomers VP 0 . 7 PP 1 . 0 V 1 . 0 NP 0 . 18 P 1 . 0 NP 0 . 18 saw stars with ears 123

The two parse trees’ probabilities and the sentence probability P(t 1 ) = 1 . 0 × 0 . 1 × 0 . 7 × 1 . 0 × 0 . 4 × 0 . 18 × 1 . 0 × 1 . 0 × 0 . 18 = 0 . 0009072 P(t 2 ) = 1 . 0 × 0 . 1 × 0 . 3 × 0 . 7 × 1 . 0 × 0 . 18 × 1 . 0 × 1 . 0 × 0 . 18 = 0 . 0006804 P(w 15 ) = P(t 1 ) + P(t 2 ) = 0 . 0015876 124

Assumptions of PCFGs 1. Place invariance (like time invariance in HMM): P(N j ∀ k k(k + c) → ζ) is the same 2. Context-free: P(N j kl → ζ | words outside w k . . . w l ) = P(N j kl → ζ) 3. Ancestor-free: P(N j kl → ζ | ancestor nodes of N j kl ) = P(N j kl → ζ) 125

Let the upper left index in i N j be an arbitrary identifying index for a particular token of a nonterminal. Then,  1 S    P( 1 S 13 → 2 NP 12 3 VP 33 , 2 NP 12 → the 1 man 2 , 3 VP 33 → sno   2 NP 3 VP = P       the man snores = . . . = P(S → NP VP)P(NP → the man)P(VP → snores) 126

Some features of PCFGs Reasons to use a PCFG, and some idea of their limitations: � Partial solution for grammar ambiguity: a PCFG gives some idea of the plausibility of a sentence. � But not a very good idea, as not lexicalized. � Better for grammar induction (Gold 1967) � Robustness. (Admit everything with low probability.) 127

Some features of PCFGs � Gives a probabilistic language model for English. � In practice, a PCFG is a worse language model for English than a trigram model. � Can hope to combine the strengths of a PCFG and a trigram model. � PCFG encodes certain biases, e.g., that smaller trees are normally more probable. 128

Improper (inconsistent) distributions P = 1 � S → rhubarb 3 P = 2 S → S S 3 1 � rhubarb 3 3 × 1 2 3 × 1 3 = 2 rhubarb rhubarb 27 � 2 × � 3 × 2 = � 2 � 1 8 rhubarb rhubarb rhubarb 3 3 243 . . . � P( L ) = 1 3 + 2 243 + . . . = 1 8 27 + 2 � Improper/inconsistent distribution � Not a problem if you estimate from parsed treebank: Chi and Geman 1998). 129

Questions for PCFGs Just as for HMMs, there are three basic questions we wish to answer: � P(w 1 m | G) � arg max t P(t | w 1 m , G) � Learning algorithm. Find G such that P(w 1 m | G) is max- imized. 130

Chomsky Normal Form grammars We’ll do the case of Chomsky Normal Form grammars, which only have rules of the form: N i → N j N k N i → w j Any CFG can be represented by a weakly equivalent CFG in Chomsky Normal Form. It’s straightforward to generalize the algorithm (recall chart parsing). 131

PCFG parameters We’ll do the case of Chomsky Normal Form grammars, which only have rules of the form: N i → N j N k N i → w j The parameters of a CNF PCFG are: P(N j → N r N s | G) A n 3 matrix of parameters P(N j → w k | G) An nt matrix of parameters For j = 1 , . . . , n , P(N j → N r N s ) + P(N j → w k ) = 1 � � r,s k 132

Probabilistic Regular Grammar: N i → w j N k N i → w j Start state, N 1 HMM: � P(w 1 n ) = 1 ∀ n w 1 n whereas in a PCFG or a PRG: � P(w) = 1 w ∈ L 133

Consider: P( John decided to bake a ) High probability in HMM, low probability in a PRG or a PCFG. Implement via sink state. A PRG Π HMM Start Finish 134

Comparison of HMMs (PRGs) and PCFGs N ′ N ′ N ′ � → � → � → � → X : NP sink | | | | O : the big brown box NP N ′ the N ′ big N 0 brown box 135

Inside and outside probabilities This suggests: whereas for an HMM we have: Forwards = α i (t) = P(w 1 (t − 1 ) , X t = i) Backwards = β i (t) = P(w tT | X t = i) for a PCFG we make use of Inside and Outside probabilities, defined as follows: Outside = α j (p, q) = P(w 1 (p − 1 ) , N j pq , w (q + 1 )m | G) Inside = β j (p, q) = P(w pq | N j pq , G) A slight generalization of dynamic Bayes Nets covers PCFG inference by the inside-outside algorithm (and-or tree of conjunctive daughters disjunctively chosen) 136

Inside and outside probabilities in PCFGs. N 1 α N j β w p − 1 w p w q w q + 1 · · · · · · · · · w 1 w m 137

Probability of a string Inside probability P(w 1 m | G) = P(N 1 ⇒ w 1 m | G) = P(w 1 m , N 1 = β 1 ( 1 , m) 1 m , G) Base case: We want to find β j (k, k) (the probability of a rule N j → w k ): β j (k, k) = P(w k | N j kk , G) = P(N j → w k | G) 138

Induction: We want to find β j (p, q) , for p < q . As this is the inductive step using a Chomsky Normal Form grammar, the first rule must be of the form N j → N r N s , so we can proceed by induction, dividing the string in two in various places and summing the result: N j N r N s w p w d w d + 1 w q These inside probabilities can be calculated bottom up. 139

For all j , P(w pq | N j = β j (p, q) pq , G) q − 1 (d + 1 )q | N j P(w pd , N r pd , w (d + 1 )q , N s � � = pq , G) r,s d = p q − 1 (d + 1 )q | N j P(N r pd , N s � � = pq , G) r,s d = p P(w pd | N j pq , N r pd , N s (d + 1 )q , G) P(w (d + 1 )q | N j pq , N r pd , N s (d + 1 )q , w pd , G) q − 1 (d + 1 )q | N j P(N r pd , N s � � = pq , G) r,s d = p P(w pd | N r pd , G)P(w (d + 1 )q | N s (d + 1 )q , G) q − 1 P(N j → N r N s )β r (p, d)β s (d + 1 , q) � � = r,s d = p 140

Calculation of inside probabilities (CKY algorithm) 1 2 3 4 5 1 β NP = 0.1 β S = 0.0126 β S = 0.0015876 2 β NP = 0.04 β VP = 0.126 β VP = 0.015876 β V = 1.0 3 β NP = 0.18 β NP = 0.01296 4 β P = 1.0 β PP = 0.18 5 β NP = 0.18 astronomers saw stars with ears 141

Outside probabilities Probability of a string: For any k , 1 ≤ k ≤ m , P(w 1 (k − 1 ) , w k , w (k + 1 )m , N j � P(w 1 m | G) = kk | G) j P(w 1 (k − 1 ) , N j � = kk , w (k + 1 )m | G) j × P(w k | w 1 (k − 1 ) , N j kk , w (k + 1 )n , G) α j (k, k)P(N j → w k ) � = j Inductive (DP) calculation: One calculates the outside probabilities top down (after determining the inside probabilities). 142

Outside probabilities Base Case: α 1 ( 1 , m) = 1 α j ( 1 , m) = 0 , for j �= 1 Inductive Case: N 1 N f pe N g N j pq (q + 1 )e w 1 · · · w p − 1 w p · · · w q w q + 1 · · · w e w e + 1 · · · w m 143

Outside probabilities Base Case: α 1 ( 1 , m) = 1 α j ( 1 , m) = 0 , for j �= 1 Inductive Case: it’s either a left or right branch – we will some over both possibilities and calculate using outside and inside probabilities N 1 N f pe N j N g pq (q + 1 )e w 1 · · · w p − 1 w p · · · w q w q + 1 · · · w e w e + 1 · · · w m 144

Outside probabilities – inductive case A node N j pq might be the left or right branch of the parent node. We sum over both possibilities. N 1 N f eq N g N j pq e(p − 1 ) w 1 · · · w e − 1 w e · · · w p − 1 w p · · · w q w q + 1 · · · w m 145

Inductive Case: m P(w 1 (p − 1 ) , w (q + 1 )m , N f pe , N j pq , N g � � α j (p, q) = [ (q + 1 )e )] e = q + 1 f,g p − 1 P(w 1 (p − 1 ) , w (q + 1 )m , N f eq , N g e(p − 1 ) , N j � � + [ pq )] e = 1 f,g m P(w 1 (p − 1 ) , w (e + 1 )m , N f pe )P(N j pq , N g (q + 1 )e | N f � � = [ pe ) e = q + 1 f,gnej p − 1 × P(w (q + 1 )e | N g P(w 1 (e − 1 ) , w (q + 1 )m , N f � � (q + 1 )e )] + [ eq ) e = 1 f,g × P(N g e(p − 1 ) , N j pq | N f eq )P(w e(p − 1 ) | N g e(p − 1 )] m α f (p, e)P(N f → N j N g )β g (q + 1 , e)] � � = [ e = q + 1 f,g p − 1 α f (e, q)P(N f → N g N j )β g (e, p − 1 )] � � + [ e = 1 f,g 146

Overall probability of a node existing As with a HMM, we can form a product of the inside and outside probabilities. This time: α j (p, q)β j (p, q) = P(w 1 (p − 1 ) , N j pq , w (q + 1 )m | G)P(w pq | N j pq , G) = P(w 1 m , N j pq | G) Therefore, � p(w 1 m , N pq | G) = α j (p, q)β j (p, q) j Just in the cases of the root node and the preterminals, we know there will always be some such constituent. 147

{Probabilistic | Stochastic} Context-Free Grammars (PCFGs) 116 The - PowerPoint PPT Presentation

{Probabilistic | Stochastic} Context-Free Grammars (PCFGs) 116 The velocity of the seismic waves rises to . . . S NP sg VP sg DT NN PP rises to . . . The velocity IN NP pl of the seismic waves 117 PCFGs A PCFG G consists of: A

Probabilistic Context-Free Probabilistic Context-Free Grammars (PCFGs) Grammars (PCFGs) Berlin

Parameter Estimation and Lexicalization for Problem 1: Assuming Independence PCFGs Problem 2:

Parameter Estimation and Lexicalization for PCFGs Informatics 2A: Lecture 21 John Longley 4

Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview

Probabilistic Context-Free Grammars Zipfs Law Informatics 2A: Lecture 19 2 Probabilistic

Probabilistic Context-Free Grammars Probabilistic Context-Free Grammars Berlin Chen Graduate

Probabilistic Context-Free Grammars Informatics 2A: Lecture 18 Bonnie Webber and Frank Keller

Natural Language Processing Learning PCFGs Parsing II Dan Klein UC Berkeley Treebank PCFGs

Probabilistic Context Free Grammars CMSC 473/673 UMBC Outline Recap: MT word alignment

Lexicalized Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview

Probabilistic Context-Free Grammars Informatics 2A: Lecture 19 Mirella Lapata School of

SI485i : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how CFGs can model English (sort

SI425 : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how CFGs can model English (sort

Probabilistic Context-Free Grammars Informatics 2A: Lecture 20 Shay Cohen 6 November, 2015 1 /

Dependency Grammars and Parsing CMSC 473/673 UMBC Outline Review: PCFGs and CKY Dependency

Derivations Derivations Informatics 2A: Lecture 4 Tree Diagrams Non-Equivalent Derivations

Assessing EARS Ability to Locally Detect the 2009 H1N1 Pandemic Ron Fricker, Katie Hagen,

Toward Mobile 3D Vision Huanle Zhang # , Bo Han & , Prasant Mohapatra # # University of

Whats new in WCAG 2.1? Hi, there! Kara Gaulrapp Front-end Developer at Message Agency

Low cost computer vision implementations for plant phenotyping/identification problems Pablo M.

Event Detection in Airport Surveillance The TRECVid 2008 Evaluation The TRECVid 2008 Evaluation

Management of Epilepsy Talk Like a Neurologist: In Primary Care Practice Seizure Types 1.

Develop Your Data Mindset Module 10 - Classroom Level Goal Monitoring Part 1 - Background

P P

{Probabilistic | Stochastic} Context-Free Grammars (PCFGs) 116 The - PowerPoint PPT Presentation

{Probabilistic | Stochastic} Context-Free Grammars (PCFGs) 116 The velocity of the seismic waves rises to . . . S NP sg VP sg DT NN PP rises to . . . The velocity IN NP pl of the seismic waves 117 PCFGs A PCFG G consists of: A

Probabilistic Context-Free Probabilistic Context-Free Grammars (PCFGs) Grammars (PCFGs) Berlin

Parameter Estimation and Lexicalization for Problem 1: Assuming Independence PCFGs Problem 2:

Parameter Estimation and Lexicalization for PCFGs Informatics 2A: Lecture 21 John Longley 4

Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview

Probabilistic Context-Free Grammars Zipfs Law Informatics 2A: Lecture 19 2 Probabilistic

Probabilistic Context-Free Grammars Probabilistic Context-Free Grammars Berlin Chen Graduate

Probabilistic Context-Free Grammars Informatics 2A: Lecture 18 Bonnie Webber and Frank Keller

Natural Language Processing Learning PCFGs Parsing II Dan Klein UC Berkeley Treebank PCFGs

Probabilistic Context Free Grammars CMSC 473/673 UMBC Outline Recap: MT word alignment

Lexicalized Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview

Probabilistic Context-Free Grammars Informatics 2A: Lecture 19 Mirella Lapata School of

SI485i : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how CFGs can model English (sort

SI425 : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how CFGs can model English (sort

Probabilistic Context-Free Grammars Informatics 2A: Lecture 20 Shay Cohen 6 November, 2015 1 /

Dependency Grammars and Parsing CMSC 473/673 UMBC Outline Review: PCFGs and CKY Dependency

Derivations Derivations Informatics 2A: Lecture 4 Tree Diagrams Non-Equivalent Derivations

Assessing EARS Ability to Locally Detect the 2009 H1N1 Pandemic Ron Fricker, Katie Hagen,

Toward Mobile 3D Vision Huanle Zhang # , Bo Han &amp; , Prasant Mohapatra # # University of

Whats new in WCAG 2.1? Hi, there! Kara Gaulrapp Front-end Developer at Message Agency

Low cost computer vision implementations for plant phenotyping/identification problems Pablo M.

Event Detection in Airport Surveillance The TRECVid 2008 Evaluation The TRECVid 2008 Evaluation

Management of Epilepsy Talk Like a Neurologist: In Primary Care Practice Seizure Types 1.

Develop Your Data Mindset Module 10 - Classroom Level Goal Monitoring Part 1 - Background

P P

Toward Mobile 3D Vision Huanle Zhang # , Bo Han & , Prasant Mohapatra # # University of