Parameter Estimation and Lexicalization for Problem 1: Assuming - PowerPoint PPT Presentation

Standard PCFGs Standard PCFGs Lexicalized PCFGs Lexicalized PCFGs 1 Standard PCFGs Parameter Estimation Parameter Estimation and Lexicalization for Problem 1: Assuming Independence PCFGs Problem 2: Ignoring Lexical Information Informatics 2A: Lecture 20 2 Lexicalized PCFGs Lexicalization Mirella Lapata Head Lexicalization The Collins Parser School of Informatics University of Edinburgh Reading: J&M 2 nd edition, ch. 14.2–14.6.1, NLTK Book, Chapter 04 November 2011 8, final section on Weighted Grammar 1 / 28 2 / 28 Parameter Estimation Parameter Estimation Standard PCFGs Standard PCFGs Problem 1: Assuming Independence Problem 1: Assuming Independence Lexicalized PCFGs Lexicalized PCFGs Problem 2: Ignoring Lexical Information Problem 2: Ignoring Lexical Information Parameter Estimation Parameter Estimation In a PCFG every rule is associated with a probability. But where do these rule probabilities come from? In a PCFG every rule is associated with a probability. Use a large parsed corpus such as the Penn Treebank. But where do these rule probabilities come from? Use a large parsed corpus such as the Penn Treebank. ( (S (NP-SBJ (DT That) (JJ cold) (, ,) obtain grammar rules by reading them off the trees; (JJ empty) (NN sky) ) Number of times LHS → RHS occurs in corpus over number S → NP - SBJ VP (VP (VBD was) of times LHS occurs VP → VBD ADJP - PRD (ADJP-PRD (JJ full) PP → IN NP (PP (IN of) Count( α → β ) γ Count( α → γ ) = Count( α → β ) NP → NN CC NN P ( α → β | α ) = (NP (NN fire) � Count( α ) (CC and) (NN light) )))) (. .) )) 3 / 28 4 / 28

Parameter Estimation Parameter Estimation Standard PCFGs Standard PCFGs Problem 1: Assuming Independence Problem 1: Assuming Independence Lexicalized PCFGs Lexicalized PCFGs Problem 2: Ignoring Lexical Information Problem 2: Ignoring Lexical Information Parameter Estimation Parameter Estimation With these parameters (rule probabilities), we can now compute the probabilities of the four sentences S1–S4: Corpus of parsed sentences: Compute PCFG probabilities: P ( S 1) = P ( r 1 | S ) P ( r 3 | NP ) P ( r 5 | VP ) ’ S1: [S [NP grass] [VP grows]]’ r Rule α P ( r | α ) = 2 / 4 · 3 / 4 · 3 / 4 = 0 . 28125 ’ S2: [S [NP grass] [VP grows] [AP slowly]]’ r 1 S → NP VP S 2/4 ’ S3: [S [NP grass] [VP grows] [AP fast]]’ r 2 S → NP VP AP S 2/4 ’ S4: [S [NP bananas] [VP grow]]’ P ( S 2) = P ( r 2 | S ) P ( r 3 | NP ) P ( r 5 | VP ) P ( r 7 | AP ) r 3 NP → grass NP 3/4 = 2 / 4 · 3 / 4 · 3 / 4 · 1 / 2 = 0 . 140625 r 4 NP → bananas NP 1/4 r 5 VP → grows VP 3/4 P ( S 3) = P ( r 2 | S ) P ( r 3 | NP ) P ( r 5 | VP ) P ( r 7 | AP ) r 6 VP → grow VP 1/4 = 2 / 4 · 3 / 4 · 3 / 4 · 1 / 2 = 0 . 140625 r 7 AP → fast AP 1/2 r 8 AP → slowly AP 1/2 P ( S 4) = P ( r 1 | S ) P ( r 4 | NP ) P ( r 6 | VP ) = 2 / 4 · 1 / 4 · 1 / 4 = 0 . 03125 5 / 28 6 / 28 Parameter Estimation Parameter Estimation Standard PCFGs Standard PCFGs Problem 1: Assuming Independence Problem 1: Assuming Independence Lexicalized PCFGs Lexicalized PCFGs Problem 2: Ignoring Lexical Information Problem 2: Ignoring Lexical Information Parameter Estimation Problems with Standard PCFGs What if we don’t have a treebank but we do have a While standard PCFGs are useful for a number of applications, (non-probabilistic) parser? they can produce a wrong result when used to choose the correct parse for an ambiguous sentence. 1 Take a CFG and set all rules to have equal probability 2 Parse the corpus with the CFG How can that be? 3 Adjust the probabilities 1 The independence of the rules in a PCFG. 4 Repeat steps two and three until probabilities converge 2 They ignore lexical information until the very end of the analysis, when word classes are rewritten to word tokens. This is the Inside-Outside algorithm (Baker, 1979), a type of Expectation Maximisation algorithm. It can also be used to induce How can this lead to the wrong choice among possible parses? a grammar, but only with limited success. 7 / 28 8 / 28

Parameter Estimation Parameter Estimation Standard PCFGs Standard PCFGs Problem 1: Assuming Independence Problem 1: Assuming Independence Lexicalized PCFGs Lexicalized PCFGs Problem 2: Ignoring Lexical Information Problem 2: Ignoring Lexical Information Problem 1: Assuming Independence Problem 1: Assuming Independence S → NP VP NP → PRO By definition, a CFG assumes that the expansion of non-terminals VP → VBD NP NP → DT NOM is completely independent: It doesn’t matter: The above rules assign the same probability to both these trees, where a non-terminal is in the analysis; because they use the same re-write rules, and probability what else is (or isn’t) in the analysis. calculations do not depend on where rules are used. The same assumption holds for standard PCFGs: The probability of S S a rule is the same, no matter NP VP NP VP where it is applied in the analysis; what else is (or isn’t) in the analysis. VBD NP PRO VBD NP But this assumption is too simple! wrote PRO They wrote them 9 / 28 10 / 28 Parameter Estimation Parameter Estimation Standard PCFGs Standard PCFGs Problem 1: Assuming Independence Problem 1: Assuming Independence Lexicalized PCFGs Lexicalized PCFGs Problem 2: Ignoring Lexical Information Problem 2: Ignoring Lexical Information Problem 1: Assuming Independence Problem 2: Ignoring Lexical Information S → NP VP N → queen | bin But in speech corpus, 91% of 31021 subject NPs are pronouns: NP → NNS | NN NNS → workers | sacks | cars VP → VBD NP | VBD NP PP V → dumped | repaired (1) a. She’s able to take her baby to work with her. PP → P NP DT → a | the b. My wife worked until we had a family. NP → DT NN P → into | of while only 34% of 7489 object NPs are pronouns: Consider the sentences: (2) a. Some laws absolutely prohibit it. (3) a. Workers dumped sacks into a bin. b. It wasn’t clear how NL and Mr. Simmons would b. Workers repaired cars of the queen. respond if Georgia Gulf spurns them again. Because rules for rewriting non-terminals ignore word tokens until So the probability of NP → PRO should depend on where in the the very end, let’s consider these simply as strings of POS tags: analysis it applies (e.g., subject or object position). (4) a. PRO V DT N PREP DT N b. PRO V DT N PREP DT N 11 / 28 12 / 28

Parameter Estimation Lexicalization Standard PCFGs Standard PCFGs Problem 1: Assuming Independence Head Lexicalization Lexicalized PCFGs Lexicalized PCFGs Problem 2: Ignoring Lexical Information The Collins Parser Problem 1: Ignoring Lexical Information Lexicalized PCFGs S S A PCFG can be lexicalised by associating a word and part-of-speech tag with every non-terminal in the grammar. NP VP NP VP It is head-lexicalised if the word is the head of the constituent described by the non-terminal. NNS NNS VBD NP VBD NNS PP Each non-terminal has a head that determines syntactic properties NP PP of phrase (e.g., which other phrases it can combine with). P NP NNS P NP DT N Example DT NN Noun Phrase (NP): Noun Adjective Phrase (AP): Adjective Which do we want for “Workers dumped sacks into a bin” ? Which Verb Phrase (VP): Verb for “Workers repaired cars of the queen” ? Prepositional Phrase (PP): Preposition Most appropriate analysis depends, in part, on the actual words. 13 / 28 14 / 28 Lexicalization Lexicalization Standard PCFGs Standard PCFGs Head Lexicalization Head Lexicalization Lexicalized PCFGs Lexicalized PCFGs The Collins Parser The Collins Parser Lexicalization Lexicalization Example We can lexicalize a PCFG by annotating each non-terminal with its TOP head word, starting with the terminals – replacing S VP → VBD NP PP VP → VBD NP NP → DT NN NP VP NP → NNS PP → P NP NNS VBD NP PP with rules of the form workers dumped NNS P NP VP(dumped) → V(dumped) NP(sacks) PP(into) VP(repaired) → V(repaired) NP(cars) PP(of) DT NN sacks into VP(dumped) → V(dumped) NP(sacks) VP(repaired) → V(repaired) NP(cars) a bin NP(queen) → DT(the) NN(queen) PP(into) → P(into) NP(bins) 15 / 28 16 / 28

Parameter Estimation and Lexicalization for Problem 1: Assuming - PowerPoint PPT Presentation

Standard PCFGs Standard PCFGs Lexicalized PCFGs Lexicalized PCFGs 1 Standard PCFGs Parameter Estimation Parameter Estimation and Lexicalization for Problem 1: Assuming Independence PCFGs Problem 2: Ignoring Lexical Information Informatics

Parameter Estimation and Lexicalization for PCFGs Informatics 2A: Lecture 21 John Longley 4

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French Abhishek Arun and

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Martin Emms September 20, 2019 4CSLL5

Maximum-likelihood and Bayesian parameter estimation Andrea Passerini passerini@disi.unitn.it

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

Outline Introduction Knowledge Structures Parameter Estimation Maximum Likelihood Estimation

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Parameter Passing and Pointers Parameter passing and functions I: reference parameters

10/16/19 Parameters and Parameter Tuning Genetic Algorithms History Taxonomy

Strong Lexicalization of Tree-Adjoining Grammars Andreas Maletti 1 and Joost Engelfriet 2 1 IMS,

Risk-parameter estimation in volatility models Christian Francq Jean-Michel Zakoan CREST and

Outcomes Following Primary Percutaneous Coronary Intervention: A Comparison Between Hospitals

Follow-The-Sun Methodology in a Stochastic Modeling Perspective Ricardo M. Czekster, Paulo

A Geometric View to Optimal Transportation and Generative Model David Xianfeng Gu 1 1 Computer

Constraint Optimization: Main Result From Efficient Computation Additional Result Comparison to

Register allocation Michel Schinz Advanced Compiler Construction 2008-05-16 Register

Unconstrained and Constrained Optimal Control of Piecewise Deterministic Markov Processes

INF5140 Specification and Verification of Parallel Systems Lecture 5 - Introduction to

10Modal Logic IV; Lambda Calculus UIT2206: The Importance of Being Formal Martin Henz March