Dealing with Ambiguity Consider possible parses but weighted by - PowerPoint PPT Presentation

Statistical Constituency Parsing Dealing with Ambiguity ◮ Consider possible parses but weighted by probability ◮ Return likeliest parse ◮ Return likeliest parse along with a probability Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 149

Statistical Constituency Parsing PCFG: Probabilistic Context-Free Grammar ◮ Components of PCFG: G = � N , Σ , R , S � ◮ Σ, an alphabet or set of terminal symbols ◮ N , a set of nonterminal symbols, N ∩ Σ = / 0 ◮ S ∈ N , a start symbol (distinguished nonterminal) ◮ R , a set of rules or productions of the form A − → β [ p ] ◮ A ∈ N is a single nonterminal and β ∈ (Σ ∪ N ) ∗ is a finite string of terminals and nonterminals ◮ p = P ( A − → β | A ) is the probability of expanding A to β ∑ P ( A − → β | A ) = 1 β ◮ Consistency: ◮ Probability of a sentence is nonzero if and only if it is in the language ◮ Sum of probabilities of sentences in the language is 1 Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 150

Statistical Constituency Parsing Languages from Grammars ◮ Simple CFG: Nominal is the start symbol Nominal − → Nominal Noun Nominal − → Noun Noun − → olive Noun − → jar ◮ Simpler CFG: Nominal is the start symbol Nominal − → Nominal Noun Noun − → olive Noun − → jar ◮ Simple PCFG: Nominal is the start symbol Nominal Noun [ 2 Nominal − → 3 ] Noun [ 1 Nominal − → 3 ] Noun − → jar [1] Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 151

Statistical Constituency Parsing Consistent PCFG Probability of the language is 1 ◮ Consider the same simple PCFG as before Nominal Noun [ 2 Nominal − → 3 ] Noun [ 1 Nominal − → 3 ] Noun − → jar [1] ◮ Write out all parse trees for jar k ◮ Probability of jar k is sum of probabilities for its parse trees ◮ Sum up the probabilities for the entire language Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 152

Statistical Constituency Parsing Inconsistent PCFG Probability of generating the language is not 1 ◮ Consider a modified PCFG: Nominal is the start symbol Nominal Nominal [ 2 Nominal − → 3 ] jar [ 1 Nominal − → 3 ] ◮ Write out all parse trees for jar k ◮ Probability of jar k is sum of probabilities for its parse trees ◮ Sum up the probabilities for the entire language The argument gets cumbersome Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 153

Statistical Constituency Parsing PCFG: Markovian Argument ◮ Consider how a derivation proceeds ◮ One production increases the count of nonterminals by one ◮ One production decreases the count of nonterminals by one ◮ We start with one nonterminal (the start symbol) ◮ Any derivation that ends in zero nonterminals yields a string in the language ◮ L ( n +1) (left move): probability of starting from n +1 nonterminals and arriving at a state with n nonterminals The probability of generating a string in this language is L (1) ◮ L (0) is never used and could be left undefined or set to zero ◮ PCFGs respect the Markov assumption: any nonterminal has an equal chance of being expanded regardless of history ◮ Therefore, L ( n +1) is a constant, L Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 154

Statistical Constituency Parsing Inconsistent PCFG: Markovian Derivation ◮ Probabilities of stepping right q and left 1 − q ◮ L (probability of eventually moving one left) equals ◮ Stepping one left immediately plus ◮ Stepping one right followed by two paths moving one step left each L = 1 − q + qL 2 ◮ Solve qL 2 − L +1 − q = 0 1 ± √ 1 − 4 q (1 − q ) ◮ L = 2 q ◮ � 1 − 4 q (1 − q ) = (2 q − 1) ◮ Therefore, L has two solutions, of which the minimum is appropriate ◮ Trivial solution: L = 1 − (1 − 2 q ) = 1 2 q ◮ Left-right odds: L = 1 − (2 q − 1) = 1 − q 2 q q 1 3 ) = 1 ◮ For our example, L = min(1 , 3 2 � = 1—indicating inconsistency 2 ◮ If we reverse the probabilities, then min(1 , 2) = 1 Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 155

Statistical Constituency Parsing Probability of a Parse Tree ◮ Tree T obtained from sentence W , i.e., T yields W P ( T , W ) = P ( T ) P ( W | T ) P ( T , W ) = P ( T ) since P ( W | T ) = 1 ◮ Obtaining T via n expansions A i − → β i and S = A 1 is the start symbol n ∏ P ( T , W ) = P ( β i | A i ) i =1 ◮ Best tree for W P ( T , W ) � T ( W ) = argmax P ( T | W ) = argmax P ( W ) T yields W T yields W ◮ Since P ( T , W ) = P ( T ) and P ( W ) is constant ( W being fixed) � T ( W ) = argmax P ( T ) T yields W Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 156

Statistical Constituency Parsing Probabilistic CKY Parsing ◮ Like CKY, as discussed earlier, except that ◮ Each cell contains not a set of, but a probability distribution over, nonterminals ◮ Specifying probabilities for Chomsky Normal Form ◮ Consider each transformation used in the normalization ◮ Supply the probabilities below ◮ Replace A − → α B γ [ p ] and B − → β [ q ] by A − → αβγ [?] ◮ Replace A − → BC γ [ p ] by A − → BX [?] and X − → C γ [?] ◮ Store a probability distribution over nonterminals in each cell ◮ Return likeliest parse Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 157

Statistical Constituency Parsing Learning PCFG Probabilities ◮ Simplest estimator: Assume a treebank ◮ Estimate the probability of A − → β as Count( A − → β ) → γ ) = Count( A − → β ) P ( A − → β | A ) = ∑ γ Count( A − Count( A ) ◮ Without a treebank but with a corpus ◮ Assume a traditional parser ◮ Initialize all rule probabilities as equal ◮ Iteratively ◮ Parse each sentence in the corpus ◮ Credit each rule A − → β i by the counts weighted by the probabilities of the rules leading to that nonterminal, A ◮ Revise the probability estimates ◮ More properly described as an expectation maximization algorithm Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 158

Statistical Constituency Parsing Shortcomings of PCFGs PCFGs break ties between rules in a fixed manner ◮ Na¨ ıve context-free assumption regarding probabilities ∗ ◮ NP − → Pronoun much likelier for a Subject NP than an object NP ◮ PCFGs (and CFGs) disregard the path on which the NP was produced ◮ Lack of lexical dependence ◮ VP − → VBD NP NP is likelier for a ditransitive verb ◮ Consider prepositional phrase attachment ◮ Either: prefer PP attached to VP (“dumped sacks into a bin”) ◮ VP − → VBD NP PP ◮ Or: prefer PP attached to NP (“caught tons of herring”) ◮ VP − → VBD NP ◮ NP − → NP PP ◮ Coordination ambiguities: each parse gets the same probability because all parses use the same rules Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 159

Statistical Constituency Parsing Split Nonterminals to Refine a PCFG ◮ Split nonterminals for syntactic roles, e.g., NP subject versus NP object ◮ Then learn different probabilities for their productions ◮ Capture part of path by a parent annotation ◮ Annotating only the phrasal nonterminals (NPˆS versus NPˆVP) S NPˆS VPˆS Pronoun Verb NPˆVP I need Determiner Noun a flight ◮ Likewise, split preterminals , i.e., nonterminals that yield terminals ◮ Adverbs depend on where they occur: RBˆAdvP (also, now), RBˆVP (not), RBˆNP (only, just)

Statistical Constituency Parsing Example of Preterminals with Sentential Complements Klein and Manning: Left parse is wrong VPˆS VPˆS TO VPˆVP TOˆVP VPˆVP to VB PPˆVP to VBˆVP SBARˆVP see IN NPˆPP see INˆSBAR SˆSBAR if NN NNS if NPˆS VPˆS advertising works NNˆNP VBZˆVP advertising works IN includes preps, complementizers (that), subord conjs (if, as) Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 161

Statistical Constituency Parsing Lexicalized Parse Tree Variant of previous such tree with parts of speech inserted TOP S (dumped, VBD) NP (workers, NNS) VP (dumped, VBD) NNS (workers, NNS) VBD (dumped, VBD) NP (sacks, NNS) PP (into, P) workers dumped NNS (sacks, NNS) P (into, P) NP (bin, NN) sacks into DT (a, DT) NN (bin, NN) a bin TOP − → S(dumped, VBD) S(dumped, VBD) − → NP(workers, NNS) VP(dumped, VBD) VP(dumped, VBD) − → VBD(dumped, VBD) NP(sacks, NNS) PP(into, P) . . . VBD(dumped, VBD) − → dumped . . . Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 162

Statistical Constituency Parsing Estimating the Probabilities ◮ In general, we estimate the probability of A − → β as Count( A − → β ) → γ ) = Count( A − → β ) P ( A − → β | A ) = ∑ γ Count( A − Count( A ) ◮ But the new productions are highly specific ◮ Collins Model 1 makes independence assumptions ◮ Treat β as β 1 ... β H ... β n : β H is the head and β 1 = β n = stop ◮ Generate the head ◮ Generate its premodifiers until getting to stop ◮ Generate its post-modifiers until getting to stop ◮ Apply Na¨ ıve Bayes P ( A − → β ) = P ( A − → β H ) × P ( β 1 ... β H − 1 | β H ) × P ( β H +1 ... β n | β H ) H − 1 n ∏ ∏ ≈ P ( A − → β H ) × P ( β k | β H ) × P ( β k | β H ) k =1 k = H +1 ◮ Estimate each probability from smaller amounts of data Munindar P. Singh (NCSU) Natural Language Processing Fall 2020 163

Dealing with Ambiguity Consider possible parses but weighted by - PowerPoint PPT Presentation

Statistical Constituency Parsing Dealing with Ambiguity Consider possible parses but weighted by probability Return likeliest parse Return likeliest parse along with a probability Munindar P. Singh (NCSU) Natural Language Processing

Ambiguity and the Lexicon in Natural Language 2 The Lexicon Informatics 2A: Lecture 12 Closed vs.

Ambiguity and the Lexicon in Natural Language Informatics 2A: Lecture 12 2 The Lexicon Word

Dealing With The Irate Customer Dealing With The Irate Customer Dealing with difficult

Lexical Ambiguity Why is there Lexical Ambiguity? Ling 580E,F,I Quicky definition: Term

Creating a treebank Lecture 3: 7/15/2011 Ambiguity Phonological ambiguity: (ASR)

An NFR Pattern Approach to Dealing An NFR Pattern Approach to Dealing An NFR Pattern Approach to

Resolution of Ambiguity through HUMINT An M&S Methodology Briefing to ISMOR 29 August 2007

Semantics and Pragmatics of NLP Lascarides & Klein Ambiguity and Underspecification Outline

FTG Summer School 2019 Ambiguity Aversion Uday Rajan Stephen M. Ross School of Business June

Efficient Allocations under Ambiguity Tomasz Strzalecki (Harvard University) Jan Werner

Requirements Analysis - Ambiguity R. Kuehl/J. Scott Hawker p. 1 R I T Lecture 4-1 Software

CDD ambiguity and irrelevant CDD ambiguity and irrelevant deformations of 2D QFT deformations

CS453 LR(1), LALR, AMBIGUITY CS453 Shift-Reduce Cont' 1 LR(1), LALR, Ambiguity The plan:

Risk assessment for uncertain cash flows: Model ambiguity, discounting ambiguity, and the role of

Ambiguity and the Lexicon in Natural Language Informatics 2A: Lecture 14 Mirella Lapata School

Ambiguity Resolution: Statistical Method Prof. Ahmed Rafea Ch.7 Ambiguity Resolution:Statistical

Visualization Design Maneesh Agrawala CS 448B: Visualization Fall 2017 Last Time: Data and

An Introduction to Nominal Sets Andrew Pi tu s Computer Science & Technology EWSCS 2020 1/70

Data Engineering Data preprocessing and transformation Just apply a learner? NO! l Algorithms

CS 472 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS

0669R Review of MN & GDW Arrangements 15 th April 2019 Contents page 01 Recap of

Board of Trustees August 22, 2019 1 Board Agenda Executive Director Announcements

2017 HSA ELECTION February 1, 2017 NOMINATIONS NEEDED! Deadline Approaching: February 20, 2017

and Education (WIRE) Jane Hillston University of Edinburgh What is WIRE? WIRE is a working

Dealing with Ambiguity Consider possible parses but weighted by - PowerPoint PPT Presentation

Statistical Constituency Parsing Dealing with Ambiguity Consider possible parses but weighted by probability Return likeliest parse Return likeliest parse along with a probability Munindar P. Singh (NCSU) Natural Language Processing

Ambiguity and the Lexicon in Natural Language 2 The Lexicon Informatics 2A: Lecture 12 Closed vs.

Ambiguity and the Lexicon in Natural Language Informatics 2A: Lecture 12 2 The Lexicon Word

Dealing With The Irate Customer Dealing With The Irate Customer Dealing with difficult

Lexical Ambiguity Why is there Lexical Ambiguity? Ling 580E,F,I Quicky definition: Term

Creating a treebank Lecture 3: 7/15/2011 Ambiguity Phonological ambiguity: (ASR)

An NFR Pattern Approach to Dealing An NFR Pattern Approach to Dealing An NFR Pattern Approach to

Resolution of Ambiguity through HUMINT An M&amp;S Methodology Briefing to ISMOR 29 August 2007

Semantics and Pragmatics of NLP Lascarides &amp; Klein Ambiguity and Underspecification Outline

FTG Summer School 2019 Ambiguity Aversion Uday Rajan Stephen M. Ross School of Business June

Efficient Allocations under Ambiguity Tomasz Strzalecki (Harvard University) Jan Werner

Requirements Analysis - Ambiguity R. Kuehl/J. Scott Hawker p. 1 R I T Lecture 4-1 Software

CDD ambiguity and irrelevant CDD ambiguity and irrelevant deformations of 2D QFT deformations

CS453 LR(1), LALR, AMBIGUITY CS453 Shift-Reduce Cont' 1 LR(1), LALR, Ambiguity The plan:

Risk assessment for uncertain cash flows: Model ambiguity, discounting ambiguity, and the role of

Ambiguity and the Lexicon in Natural Language Informatics 2A: Lecture 14 Mirella Lapata School

Ambiguity Resolution: Statistical Method Prof. Ahmed Rafea Ch.7 Ambiguity Resolution:Statistical

Visualization Design Maneesh Agrawala CS 448B: Visualization Fall 2017 Last Time: Data and

An Introduction to Nominal Sets Andrew Pi tu s Computer Science &amp; Technology EWSCS 2020 1/70

Data Engineering Data preprocessing and transformation Just apply a learner? NO! l Algorithms

CS 472 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS

0669R Review of MN &amp; GDW Arrangements 15 th April 2019 Contents page 01 Recap of

Board of Trustees August 22, 2019 1 Board Agenda Executive Director Announcements

2017 HSA ELECTION February 1, 2017 NOMINATIONS NEEDED! Deadline Approaching: February 20, 2017

and Education (WIRE) Jane Hillston University of Edinburgh What is WIRE? WIRE is a working

Resolution of Ambiguity through HUMINT An M&S Methodology Briefing to ISMOR 29 August 2007

Semantics and Pragmatics of NLP Lascarides & Klein Ambiguity and Underspecification Outline

An Introduction to Nominal Sets Andrew Pi tu s Computer Science & Technology EWSCS 2020 1/70

0669R Review of MN & GDW Arrangements 15 th April 2019 Contents page 01 Recap of