Viterbi Training for PCFGs: Hardness Results and Competitiveness of - PowerPoint PPT Presentation

Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization Shay Cohen Noah Smith Carnegie Mellon University July 14, 2010

Outline Hardness results for unsupervised learning of PCFGs Background and problem definition Main hardness result Extensions Open problems Conclusion

Viterbi EM Let p ( x , z | θ ) be some parametrized statistical model Viterbi EM identifies θ and z given x

Viterbi EM Let p ( x , z | θ ) be some parametrized statistical model Viterbi EM identifies θ and z given x Let x 1 , ..., x n be the observed data Algorithm (Viterbi EM) 1 start with some θ 2 set z i ← argmax p ( x i , z i | θ ) ⇐ = “E-step” z i n � 3 set θ ← argmax p ( x i , z i | θ ) ⇐ = “M-step” θ i = 1 � �� likelihood 4 go to step 2 unless converged

Viterbi EM Simple and useful algorithm. Recent examples include: Machine translation (Brown et al., 2003) Language acquisition (Goldwater and Johnson, 2005) Coreference resolution (Choi and Cardie, 2007) Question answering (Wang et al., 2007) Grammar induction (Spitkovsky et al., 2010) We focus on Viterbi EM for PCFGs z i - parse tree, x i - sentence, θ - rule probabilities

Viterbi training Viterbi EM is coordinate ascent, and it greedily tries to find: n � � θ, z 1 , ..., z n � = argmax p ( x i , z i | θ ) θ, z 1 ,..., z n i = 1 We call this maximization problem “Viterbi training” Viterbi EM finds local maximum for Viterbi training

Viterbi training Viterbi EM is coordinate ascent, and it greedily tries to find: n � � θ, z 1 , ..., z n � = argmax p ( x i , z i | θ ) θ, z 1 ,..., z n i = 1 We call this maximization problem “Viterbi training” Viterbi EM finds local maximum for Viterbi training Main question: can we hope to optimize this objective function and find the global maximum? ... computational complexity answers this kind of question

Hardness of a problem We usually show that a problem A is hard by showing that another hard problem B can be solved if we could solve A The type of problem we usually do this for is “decision problems” (answer is 0 or 1) “Hardness” in this paper refers to being able to solve all problems in the NP class (“NP hardness”) We convert every input x of B to an input x ′ of A such that A ( x ′ ) = 1 B ( x ) = 1 ⇐ ⇒

Optimization problem → decision problem Viterbi training optimizes an objective function. To convert to a decision problem we define: Problem (Viterbi Train) Input: G context-free grammar, x 1 , . . . , x n sentences, α ∈ [ 0 , 1 ] Output: 1 if there are θ and z 1 , . . . , z n derivation trees such that n � p ( x i , z i | θ ) ≥ α i = 1 and 0 otherwise. Note that knowing how to optimize the likelihood means we can solve this decision problem. Viterbi Train is in NP (witness: parse trees and parameters)

3-SAT We show that Viterbi Train is NP hard by showing that there is a reduction from 3-SAT (an NP hard problem) to Viterbi Train Problem (3-SAT) Input: A formula φ = � m i = 1 ( a i ∨ b i ∨ c i ) in conjunctive normal form, such that each clause has 3 literals. Output: 1 if there is a satisfying assignment for φ and 0 otherwise. For example, if we have the formula φ = ( a ∨ b ∨ c ) ∧ ( ¬ a ∨ b ∨ c ) then a satisfying assignment is a = 0 , b = 0 , c = 1

3-SAT and reductions We map every instance of 3-SAT (a formula φ ) to a grammar G and a string x such that z ,θ p ( x , z | θ ) = 1 max if and only if there is a satisfying assignment for the formula The maximizing z and θ will contain a description of the assignment Since 3-SAT is NP hard, Viterbi Train is NP hard

The reduction (an example) Let φ = ( a ∨ ¬ b ∨ c ) ∧ ( ¬ a ∨ b ∨ c ) ∧ ( d ∨ ¬ c ∨ a ) � �� C 1 C 2 C 3 We create the following context-free grammar: Σ = { 0 , 1 } ⇐ = Terminal symbols For the variables, a , b , c , d we create the rules: V a → 0 V a → 1 V ¬ a → 0 V ¬ a → 1 V b → 0 V b → 1 V ¬ b → 0 V ¬ b → 1 ⇐ = Assignment rules V c → 0 V c → 1 V ¬ c → 0 V ¬ c → 1 V d → 0 V d → 1 V ¬ d → 0 V ¬ d → 1

The reduction (an example) φ = ( a ∨ ¬ b ∨ c ) ∧ ( ¬ a ∨ b ∨ c ) ∧ ( d ∨ ¬ c ∨ a ) � �� C 1 C 2 C 3 We have so far: V • → 0 | 1 and V ¬• → 0 | 1 (assignment rules) For the variables, a , b , c , d we create the rules: U a , 1 → V a V ¬ a U a , 0 → V ¬ a V a U b , 1 → V b V ¬ b U b , 0 → V ¬ b V b ⇐ = Consistency rules U c , 1 → V c V ¬ c U c , 0 → V ¬ c V c U d , 1 → V d V ¬ d U d , 0 → V ¬ d V d

The reduction (an example) φ = ( a ∨ ¬ b ∨ c ) ∧ ( ¬ a ∨ b ∨ c ) ∧ ( d ∨ ¬ c ∨ a ) � �� C 1 C 2 C 3 We have so far: assignment rules and U • , 1 → V • V ¬• and U • , 0 → V ¬• V • (consistency rules) For the clauses C 1 , C 2 and C 3 we create the rules: S 1 → C 1 S 2 → S 1 C 2 ⇐ = Clause rules S 3 → S 2 C 3 S → S 3 S is the start symbol of the grammar

The reduction (an example) φ = ( a ∨ ¬ b ∨ c ) ∧ ( ¬ a ∨ b ∨ c ) ∧ ( d ∨ ¬ c ∨ a ) � �� C 1 C 2 C 3 We have so far: assignment rules, consistency rules and clause rules For the clause C 1 , for example, we create the rules: C 1 → U a , 1 U b , 1 U c , 1 → C 1 U a , 0 U b , 1 U c , 1 C 1 → U a , 1 U b , 0 U c , 1 C 1 → U a , 1 U b , 1 U c , 0 ⇐ = Satisfaction rules for C 1 C 1 → U a , 0 U b , 0 U c , 1 C 1 → U a , 1 U b , 0 U c , 0 C 1 → U a , 0 U b , 0 U c , 0

The reduction (an example) φ = ( a ∨ ¬ b ∨ c ) ∧ ( ¬ a ∨ b ∨ c ) ∧ ( d ∨ ¬ c ∨ a ) � �� C 1 C 2 C 3 We have so far: assignment rules, consistency rules, clause rules and satisfaction rules – that’s the complete grammar! We need to decide on the string to parse, x Set x = 101010 101010 101010 � �� C 1 C 2 C 3

The reduction (an example) φ = ( a ∨ ¬ b ∨ c ) ∧ ( ¬ a ∨ b ∨ c ) ∧ ( d ∨ ¬ c ∨ a ) � �� C 1 C 2 C 3 x = 101010 101010 101010 � �� C 1 C 2 C 3 We can use a parse for x to extract an assignment for the variables

Extracting an assignment φ = ( a ∨ ¬ b ∨ c ) ∧ ( ¬ a ∨ b ∨ c ) ∧ ( d ∨ ¬ c ∨ a ) � �� C 1 C 2 C 3 S 3 � � �� rest of tree C 3 �� U d , 0 U c , 0 U a , 1 � � � �� V ¬ d V d V ¬ c V c V a V ¬ a 0 0 0 1 1 1 If we use the rule V a → 0 set the variable a to 0 If we use the rule V a → 1 set the variable a to 1 Same for other variables Note that we use V a → • and V ¬ a → • together

Consistent assignments φ = ( a ∨ ¬ b ∨ c ) ∧ ( ¬ a ∨ b ∨ c ) ∧ ( d ∨ ¬ c ∨ a ) � �� C 1 C 2 C 3 But! What if we use both V a → 0 and V a → 1?

Consistent assignments φ = ( a ∨ ¬ b ∨ c ) ∧ ( ¬ a ∨ b ∨ c ) ∧ ( d ∨ ¬ c ∨ a ) � �� C 1 C 2 C 3 But! What if we use both V a → 0 and V a → 1? Lemma Let θ be weights for the grammar we constructed. If the (multiplicative) weight of the Viterbi parse of 101010 101010 101010 is 1 , then the assignment extracted � �� C 1 C 2 C 3 from the parse tree is consistent

Finding a satisfying assignment φ = ( a ∨ ¬ b ∨ c ) ∧ ( ¬ a ∨ b ∨ c ) ∧ ( d ∨ ¬ c ∨ a ) � �� C 1 C 2 C 3 Lemma There exists θ such that the Viterbi parse of 101010 101010 101010 is 1 if and only if φ is satisfiable. The � �� C 1 C 2 C 3 satisfying assignment is the one extracted from the parse tree with weight 1

NP hardness result Problem (Viterbi Train) Input: G context-free grammar, x 1 , . . . , x n sentences, α ∈ [ 0 , 1 ] Output: 1 if there are θ and z 1 , . . . , z n derivation trees such that n � p ( x i , z i | θ ) ≥ α i = 1 and 0 otherwise. Corollary Viterbi Train is NP hard In fact, we have NP completeness (Viterbi Train is in NP)

Approximate solutions Reminder, Viterbi Train tries to maximize: n � max p ( x i , z i | θ ) θ, z 1 ,..., z n i = 1 We know it is hard to find the exact maximum. Can we hope to approximate the maximal solution?

Approximate solutions The question we ask is: “is there a ρ ∈ ( 0 , 1 ] such that there is an efficient algorithm which returns z ′ 1 , ..., z ′ n and θ ′ such that � � n n � � p ( x i , z ′ i | θ ′ ) ≥ ρ max p ( x i , z i | θ ) θ, z 1 ,.., z n i = 1 i = 1 for any input sentences x 1 , ..., x n and a grammar G ? ”

Viterbi Training for PCFGs: Hardness Results and Competitiveness of - PowerPoint PPT Presentation

Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization Shay Cohen Noah Smith Carnegie Mellon University July 14, 2010 Outline Hardness results for unsupervised learning of PCFGs Background and problem

Parameter Estimation and Lexicalization for Problem 1: Assuming Independence PCFGs Problem 2:

Parameter Estimation and Lexicalization for PCFGs Informatics 2A: Lecture 21 John Longley 4

Mechanical Properties of Paint Coatings Hardness Measurement Shore Hardness D Barcol Hardness

Natural Language Processing Learning PCFGs Parsing II Dan Klein UC Berkeley Treebank PCFGs

SI485i : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how CFGs can model English (sort

SI425 : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how CFGs can model English (sort

Probabilistic Context-Free Probabilistic Context-Free Grammars (PCFGs) Grammars (PCFGs) Berlin

Natural Language Processing Parsing II Dan Klein UC Berkeley 1 Learning PCFGs 2 Treebank

PCFGs: Viterbi CKY CMSC 473/673 UMBC November 13 th , 2017 Recap from last time

Search and Decoding Lecture 16 CS 753 Instructor: Preethi Jyothi Recall Viterbi search Viterbi

Viterbi decoder on STI CELL processor Michal Blaek (blazem2@fel.cvut.cz) Viterbi algorithm

Monge blunts Bayes: Hardness Results for Adversarial Training Zac Cranko Aditya Krishna Menon

Monge blunts Bayes: Hardness Results for Adversarial Training Zac Cranko Aditya Krishna Menon

PCFGs: Parsing & Evaluation Deep Processing Techniques for NLP Ling 571 January 23, 2017

Parsing with PCFGs Joakim Nivre Uppsala University Department of Linguistics and Philology

Algorithms for NLP Parsing III Maria Ryskina CMU Slides adapted from: Dan Klein UC

Research on Sustainability and Competitiveness in Mexico* Dennis Aigner, UC Antonio Lloret, ITAM

Online Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Paging Algorithm Assume a simple memory

tr ttss

5-Year Regional Economic Competitiveness Strategy Complete Strategic Framework VISION: A REGION

Monitoring of Transport Infrastructures by using Synthetic Aperture Radar Satellites Davide Oscar

competitive analysis in buffer management . Sergey I. Nikolenko 1,2,3 Summer School on

On-line Graph Coloring Iwona Cie slik Algorithmics Research Group, Jagiellonian University,

Final Exam Review A couple of things Paper and peer evaluations due Nov 14 via blackboard

Viterbi Training for PCFGs: Hardness Results and Competitiveness of - PowerPoint PPT Presentation

Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization Shay Cohen Noah Smith Carnegie Mellon University July 14, 2010 Outline Hardness results for unsupervised learning of PCFGs Background and problem

Parameter Estimation and Lexicalization for Problem 1: Assuming Independence PCFGs Problem 2:

Parameter Estimation and Lexicalization for PCFGs Informatics 2A: Lecture 21 John Longley 4

Mechanical Properties of Paint Coatings Hardness Measurement Shore Hardness D Barcol Hardness

Natural Language Processing Learning PCFGs Parsing II Dan Klein UC Berkeley Treebank PCFGs

SI485i : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how CFGs can model English (sort

SI425 : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how CFGs can model English (sort

Probabilistic Context-Free Probabilistic Context-Free Grammars (PCFGs) Grammars (PCFGs) Berlin

Natural Language Processing Parsing II Dan Klein UC Berkeley 1 Learning PCFGs 2 Treebank

PCFGs: Viterbi CKY CMSC 473/673 UMBC November 13 th , 2017 Recap from last time

Search and Decoding Lecture 16 CS 753 Instructor: Preethi Jyothi Recall Viterbi search Viterbi

Viterbi decoder on STI CELL processor Michal Blaek (blazem2@fel.cvut.cz) Viterbi algorithm

Monge blunts Bayes: Hardness Results for Adversarial Training Zac Cranko Aditya Krishna Menon

Monge blunts Bayes: Hardness Results for Adversarial Training Zac Cranko Aditya Krishna Menon

PCFGs: Parsing &amp; Evaluation Deep Processing Techniques for NLP Ling 571 January 23, 2017

Parsing with PCFGs Joakim Nivre Uppsala University Department of Linguistics and Philology

Algorithms for NLP Parsing III Maria Ryskina CMU Slides adapted from: Dan Klein UC

Research on Sustainability and Competitiveness in Mexico* Dennis Aigner, UC Antonio Lloret, ITAM

Online Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Paging Algorithm Assume a simple memory

tr ttss

5-Year Regional Economic Competitiveness Strategy Complete Strategic Framework VISION: A REGION

Monitoring of Transport Infrastructures by using Synthetic Aperture Radar Satellites Davide Oscar

competitive analysis in buffer management . Sergey I. Nikolenko 1,2,3 Summer School on

On-line Graph Coloring Iwona Cie slik Algorithmics Research Group, Jagiellonian University,

Final Exam Review A couple of things Paper and peer evaluations due Nov 14 via blackboard

PCFGs: Parsing & Evaluation Deep Processing Techniques for NLP Ling 571 January 23, 2017