Probabilistic Context-Free Grammars Based on Foundations of - PowerPoint PPT Presentation

0. Probabilistic Context-Free Grammars Based on “Foundations of Statistical NLP” by C. Manning & H. Sch¨ utze, ch. 11 MIT Press, 2002

1. A Sample PCFG S → NP VP 1.0 NP → NP PP 0.4 PP → P NP 1.0 NP → astronomers 0.1 VP → V NP 0.7 NP → ears 0.18 VP → VP PP 0.3 NP → saw 0.04 P → with 1.0 NP → stars 0.18 V → saw 1.0 NP → telescopes 0.1

2. The Chomsky Normal Form of CFGs CNF CFG: All non-terminals expand into either two or more non-terminals ( N → X Y ) or a single terminal ( N → w ). Proposition: Any CFG can be converted into a “weakly equivalent” CNF CFG. Definition: Two grammars are weakly equivalent if they generate the same language. They are strongly equivalent if they also assign the same structures to strings.

3. Cocke-Younger-Kasami (CYK) Parsing Algorithm ◦ Works on CNF CFGs • First, add the lexical edges • Then: for w = 2 to N % scan left to right , % combining edges to form edges of width w for i = 0 to N- w for k = 0 to w -1 if (A → BC and B → α ∈ chart[ i, i + k + 1 ] and C → β ∈ chart[ i + k + 1 , i + w ]) add A → BC to chart [ i, i + w ] • Finaly, if S ∈ chart[0,N], return the corresponding parse

4. Example: CYK with Chart Representation S S VP VP S NP VP PP V NP NP P NP 0 1 2 3 4 5 NP astronomers saw stars with ears

5. Chart Representation as a Matrix 1 2 3 4 5 S S 1 NP S 2 V VP VP NP VP 3 NP NP PP 4 P 5 NP astronomers saw stars with ears

6. Assumptions of the PCFG Model j P ( N i → ν j | N i ) = 1 • ∀ i � • Place invariance: the probability of a subtree does not depend on where in the string the words it dominates are • Context-free: the probability of a subtree does not depend on words not dominated by the subtree • Ancestor-free: the probability of a subtree does not depend on nodes outside of the subtree

7. Calculating the Probability of a Sentence So, the probability of a sentence is P ( w 1 m ) = � P ( w 1 m , t ) = � P ( t ) t t : yield ( t )= w 1 m where t is a parse tree of the sentence. To calculate the probability of a tree, multiply the probabilities of all the rules it uses.

8. Inside and Outside Probabilities Outside ( α ): the total probabil- N 1 ity of beginning in N 1 and generating N j and the words outside α p and q N j Inside ( β ): the total probability of generating the words from p to q given that we start at non- β terminal N j w 1 w p−1 w p w q w q+1 w m α j ( p, q ) = P ( w 1( p − 1) , N j β j ( p, q ) = P ( w pq | N j pq , w ( q +1) ,m ) pq )

9. Computing Inside Probabilities Base case: β j ( k, k ) = P ( N j → w k | N j ) N j N r N s w p w d w d+1 w q Induction step: β j ( p, q ) = P ( w pq | N j pq ) = d = p P G ( N j → N r N s | N j ) β r ( p, d ) β s ( d + 1 , q ) � q − 1 � r,s

10. Computing Inside Probabilities — Induction q − 1 β j ( p, q ) = P ( w pq | N j P ( w pd , N r pd , w ( d +1) q , N s ( d +1) q | N j pq ) = � � pq ) r,s d = p q − 1 P ( N r pd , N s ( d +1) q | N j pq ) P ( w pd | N j pq , N r pd , N s = � � ( d +1) q ) r,s d = p × P ( w ( d +1) q | N j pq , N r pd , N s ( d +1) q , w pd ) q − 1 P ( N r pd , N s ( d +1) q | N j pq ) P ( w pd | N r pd ) P ( w ( d +1) q | N s = ( d +1) q ) � � r,s d = p q − 1 P ( N j → N r N s ) β r ( p, d ) β s ( d + 1 , q ) = � � r,s d = p

11. Computing Inside Probabilities 1 2 3 4 5 β NP = 0 . 1 β S = 0 . 0126 β S = 0 . 0015876 1 β NP = 0 . 04 β VP = 0 . 126 β VP = 0 . 015876 2 β V = 1 . 0 β NP = 0 . 18 β NP = 0 . 01296 3 β P = 1 . 0 β PP = 0 . 18 4 β NP = 0 . 18 5 astronomers saw stars with ears

12. Computing Outside Probabilities Base case: α 1 (1 , m ) = 1 , and α j (1 , m ) = 0 for j � = 1 1 1 N N N f N f N j N g N g N j w 1 w p w q w q+1 w e w m w 1 w e w w w q w m p−1 p Induction step: α j ( p, q ) = e = q +1 α f ( p, e ) P G ( N f → N j N g | N f ) β g ( q + 1 , e ) + � m � f,g e =1 α f ( e, q ) P G ( N f → N g N j | N f ) β g ( e, p − 1) � p − 1 � f,g

13. Computing Outside Probabilities — Induction m pq , N g α j ( p, q ) P ( w 1( p − 1) , w ( q +1) m , N f pe , N j � � = ( q +1) e ) + e = q +1 f,g p − 1 eq , N g P ( w 1( p − 1) , w ( q +1) m , N f e ( p − 1) , N j � � pq ) e =1 f,g m pq , N g pe ) P ( w ( q +1) e | N g P ( w 1( p − 1) , w ( e +1) m , N f pe ) P ( N j ( q +1) e | N f � � = ( q +1) e ) + e = q +1 f,g p − 1 eq ) P ( N g eq ) P ( w e ( p − 1) | N g P ( w 1( e − 1) , w ( q +1) m , N f e ( p − 1) , N j pq | N f � � e ( p − 1) ) e =1 f,g m α f ( p, e ) P G ( N f → N j N g | N f ) β g ( q + 1 , e ) + � � = e = q +1 f,g p − 1 α f ( e, q ) P G ( N f → N g N j | N f ) β g ( e, p − 1) � � e =1 f,g

14. Finding the Most Likely Parse Sequence Viterbi Algorithm Base case: δ i ( p, p ) = P ( N i → w p | N i ) Induction step: δ i ( p, q ) = max 1 ≤ j,k ≤ n ; p ≤ r<q P G ( N i → N j N k | N i ) δ j ( p, r ) δ k ( r + 1 , q ) ψ i ( p, q ) = argmax ( j,k,r ) P G ( N i → N j N k | N i ) δ j ( p, r ) δ k ( r + 1 , q ) Termination: P G (ˆ t ) = δ 1 (1 , m ) Path readout (by backtracing): if ˆ X χ = N i pq is in the Viterbi parse, and ψ i ( p, q ) = ( j, k, r ) , then left( ˆ pr , right( ˆ X χ ) = N j X χ ) = N k ( r +1) q ( N 1 1 m is the root node of the Viterbi parse.)

15. Learning PCFGs: The Inside-Outside (EM) Algorithm Combining inside and outside probabilities: α j ( p, q ) β j ( p, q ) = P G ( N 1 ⇒ ∗ w 1 m , N j ⇒ ∗ w pq ) = P G ( N 1 ⇒ ∗ w 1 m ) P G ( N j ⇒ ∗ w pq | N 1 ⇒ ∗ w 1 m ) Denoting π = P G ( N 1 ⇒ ∗ w 1 m ) , it follows that P G ( N j ⇒ ∗ w pq | N 1 ⇒ ∗ w 1 m ) = 1 π α j ( p, q ) β j ( p, q ) P G ( N j → N r N s ⇒ ∗ w pq | N 1 ⇒ ∗ w 1 m ) d = p α j ( p, q ) P G ( N j → N r N s | N j ) β r ( p, d ) β s ( d + 1 , q ) � q − 1 = 1 π P G ( N j → w k | N 1 ⇒ ∗ w 1 m , w k = w h ) π α j ( h, h ) P ( w k = w h ) β j ( h, h ) = 1

The Inside-Outside Algorithm: E-step 16. Assume that we have a set of sentences W = { W 1 , . . . , W ω } q − 1 1 i ( p, q ) P G ( N j → N r N s | N j ) β r α j i ( p, d ) β s f i ( p, q, j, r, s ) = � i ( d + 1 , q ) π i d = p 1 i ( h, h ) P ( w k = w h ) β j α j g i ( h, j, k ) = i ( h, h ) π i 1 i ( p, q ) with π i = P G ( N 1 ⇒ ∗ W i ) α j i ( p, q ) β j h i ( p, q, j ) = π i m i − 1 m i ω P G ( N j → N r N s ) = ˆ � � � f i ( p, q, j, r, s ) p =1 q = p +1 i =1 m i ω P G ( N j → w k ) = ˆ g i ( h, j, k ) � � i =1 h =1 m i m i ω ˆ P G ( N j ) = � � q = p h i ( p, q, j ) � i =1 p =1

17. The Inside-Outside Algorithm: M-step P G ( N j → N r N s ) ˆ P G ′ ( N j → N r N s | N j ) = ˆ P G ( N j ) � m i − 1 � m i � ω q = p +1 f i ( p, q, j, r, s ) p =1 i =1 = � m i � ω � m i q = p h i ( p, q, j ) p =1 i =1 P G ( N j → w k ) ˆ P G ′ ( N j → w k | N j ) = ˆ P G ( N j ) � m i � ω h =1 g i ( h, j, k ) i =1 = � m i � ω � m i q = p h i ( p, q, j ) i =1 p =1 P ( W | G ′ ) ≥ P ( W | G ) (Baum-Welch)

18. Problems with the Inside-Outside Algorithm 1. It is much slower than linear models like HMMs: For each sentence of length m , the training is O ( mn ) , where n is the number of nonterminals in G . 2. The algorithm is very sensitive to the initialization : [ Chiarniak, 1993 ] reports finding different local maxima for each of 300 trials of a PCFG on artificial data!! Proposed solutions: [ Lari & Young, 1990 ] 3. Experiments suggest that statisfactory PCFG learning re- quires many more nonterminals (i.e., about 3 times) than are theoretically needed to describe the language.

19. “Problems” with the Learned PCFGs (Contin.) 4. There is no guarantee that the learned nonterminals will bear any resemblance to linguistically-motivated nonterminals we would use to write the grammar by hand... 5. Even if the grammar is initialized with such nonterminals, the training process may completely change the meaning of those nonterminals. 6. Thus, while grammar induction from unannotated corpora is possible with PCFGs, it is extremely difficult.

Probabilistic Context-Free Grammars Based on Foundations of - PowerPoint PPT Presentation

0. Probabilistic Context-Free Grammars Based on Foundations of Statistical NLP by C. Manning & H. Sch utze, ch. 11 MIT Press, 2002 1. A Sample PCFG S NP VP 1.0 NP NP PP 0.4 PP P NP 1.0 NP astronomers 0.1 VP

Probabilistic Context-Free Grammars Zipfs Law Informatics 2A: Lecture 19 2 Probabilistic

Probabilistic Context-Free Grammars Probabilistic Context-Free Grammars Berlin Chen Graduate

Probabilistic Context-Free Probabilistic Context-Free Grammars (PCFGs) Grammars (PCFGs) Berlin

Probabilistic Context-Free Grammars Informatics 2A: Lecture 18 Bonnie Webber and Frank Keller

Lexicalized Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview

Probabilistic Context-Free Grammars Informatics 2A: Lecture 19 Mirella Lapata School of

Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview

Probabilistic Context-Free Grammars Informatics 2A: Lecture 20 Shay Cohen 6 November, 2015 1 /

Derivations Derivations Informatics 2A: Lecture 4 Tree Diagrams Non-Equivalent Derivations

Derivations Derivations Informatics 2A: Lecture 4 Tree Diagrams Non-Equivalent Derivations

Probabilistic Context Free Grammars CMSC 473/673 UMBC Outline Recap: MT word alignment

Context-Free Grammars and Languages Context-Free Grammars and Languages p.1/40

Grammars, graphs and automata (Probabilistic) finite state machines and context-free grammars

Context Sensitivity Example of a CSG Informatics 2A: Lecture 26 2 Context in Programming

Parsing: Introduction Context-free Grammars Chomsky hierarchy Type 0 Grammars/Languages

Grammars and Parsing Grammars and Sentence Structure What makes a good grammar A

Semantic Analysis of Indonesian Image Description Khumaisa Nuraini 1,3 , Johanes Effendi 1 ,

Probing fmavor-violating decays of squarks at the LHC Amit Chakraborty Theory Center, KEK

Inversion Transduction Grammars Wilker Aziz 3/5/17 Word-based Translation Mary did not slap

Usability and Small Screens SWEN-444 iPhone Android Windows Phone 8 The phrase mobile

Interplay of dark matter and collider physics G. Blanger LAPTH- Annecy Plan Dark matter

SCT Lorentz Angle Measurement and the 1 st Collision ID Plots Reisaburo TANAKA ATLAS-LAL Meeting

1099 for Tax Year 2016 Webinar 1 DECEMBER 1, 2016 PRESENTER: DON HEMWALL Q&A: MIKE SMITH

Reviving a Computer System of 25 Years ago Symposium at ETH 20.2.2014 Niklaus Wirth

Probabilistic Context-Free Grammars Based on Foundations of - PowerPoint PPT Presentation

0. Probabilistic Context-Free Grammars Based on Foundations of Statistical NLP by C. Manning & H. Sch utze, ch. 11 MIT Press, 2002 1. A Sample PCFG S NP VP 1.0 NP NP PP 0.4 PP P NP 1.0 NP astronomers 0.1 VP

Probabilistic Context-Free Grammars Zipfs Law Informatics 2A: Lecture 19 2 Probabilistic

Probabilistic Context-Free Grammars Probabilistic Context-Free Grammars Berlin Chen Graduate

Probabilistic Context-Free Probabilistic Context-Free Grammars (PCFGs) Grammars (PCFGs) Berlin

Probabilistic Context-Free Grammars Informatics 2A: Lecture 18 Bonnie Webber and Frank Keller

Lexicalized Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview

Probabilistic Context-Free Grammars Informatics 2A: Lecture 19 Mirella Lapata School of

Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview

Probabilistic Context-Free Grammars Informatics 2A: Lecture 20 Shay Cohen 6 November, 2015 1 /

Derivations Derivations Informatics 2A: Lecture 4 Tree Diagrams Non-Equivalent Derivations

Derivations Derivations Informatics 2A: Lecture 4 Tree Diagrams Non-Equivalent Derivations

Probabilistic Context Free Grammars CMSC 473/673 UMBC Outline Recap: MT word alignment

Context-Free Grammars and Languages Context-Free Grammars and Languages p.1/40

Grammars, graphs and automata (Probabilistic) finite state machines and context-free grammars

Context Sensitivity Example of a CSG Informatics 2A: Lecture 26 2 Context in Programming

Parsing: Introduction Context-free Grammars Chomsky hierarchy Type 0 Grammars/Languages

Grammars and Parsing Grammars and Sentence Structure What makes a good grammar A

Semantic Analysis of Indonesian Image Description Khumaisa Nuraini 1,3 , Johanes Effendi 1 ,

Probing fmavor-violating decays of squarks at the LHC Amit Chakraborty Theory Center, KEK

Inversion Transduction Grammars Wilker Aziz 3/5/17 Word-based Translation Mary did not slap

Usability and Small Screens SWEN-444 iPhone Android Windows Phone 8 The phrase mobile

Interplay of dark matter and collider physics G. Blanger LAPTH- Annecy Plan Dark matter

SCT Lorentz Angle Measurement and the 1 st Collision ID Plots Reisaburo TANAKA ATLAS-LAL Meeting

1099 for Tax Year 2016 Webinar 1 DECEMBER 1, 2016 PRESENTER: DON HEMWALL Q&amp;A: MIKE SMITH

Reviving a Computer System of 25 Years ago Symposium at ETH 20.2.2014 Niklaus Wirth

1099 for Tax Year 2016 Webinar 1 DECEMBER 1, 2016 PRESENTER: DON HEMWALL Q&A: MIKE SMITH