Ambiguity Resolution: Statistical Method Prof. Ahmed Rafea Ch.7 - PowerPoint PPT Presentation

Ambiguity Resolution: Statistical Method Prof. Ahmed Rafea Ch.7 Ambiguity Resolution:Statistical Method 1

Outline • Estimating Probability • Part of Speech Tagging • Obtaining Lexical Probability • Probabilistic Context-free Grammars • Best First Parsing Ch.7 Ambiguity Resolution:Statistical Method 2

Estimating Probability • Example : Having corpus having 1,273,000 words. Say we find 1000 uses of the word flies, 400 is N sense, and 600 in the V sense. Then we can have the following probabilities: – Prob(flies) = 1000/1,273,000 = .0008 – Prob(flies & V) = 600/ 1,273,000 = .0005 – Prob(V|flies)= .0005/.0008 = .625 • This is called maximum likelihood estimator(MLE) • In NL application we may have sparse data which means that some words may have 0 probability. To solve this problem we may add small amount say .5 to every count. This is called expected likelihood estimator (ELE) • If a word w occurred 0 times in 40 classes (L 1 ,….L 40 ) then using ELE Prob(L i |w) will be 0.5/0.5*40= .025 otherwise this probability cannot be estimated. If w appears 5 times once as a verb and 4 times as noun then using MLE Prob(N|w)= .8 and using ELE it will be 4.5/25= .18 Ch.7 Ambiguity Resolution:Statistical Method 3

Part of Speech Tagging(1) • Simple algorithm is to estimate the category of the word using the probability obtained from the training corpus as indicated above • To improve reliability local context may be used as follows: – Prob(c 1 , …c t |w 1 , …w t ), large data, not possible – Prob(c 1 , ..c t )* Prob(w 1 ,..w t |c 1 , ..c t )/Prob(w 1 , ..w t) Bay Rule – Prob(c 1 , ..c t )* Prob(w 1 ,..w t |c 1 , ..c t ), denominator will not affect the answer Π i=1,T Prob(c i |c i-1 )*Prob(w i |c i ) by approximation of Prob(c 1 , ..c t ) to – be the product of the bi-gram probability and the Prob(w 1 ,..w t |c 1 , ..c t ), to be the product of the probability that each word occurs in the indicated part of speech Ch.7 Ambiguity Resolution:Statistical Method 4

Part of Speech Tagging(1) • Given all these probabilities estimates, how might you find the sequence of categories that has the highest probability of generating a specific sentence? • The brute force method can generate N T possible sequence where N is the number of categories and T is the number of words • We can use Markov chain which is a special form of probabilistic finite state machine, to compute the bi-gram probability the Π i=1,T Prob(c i |c i-1 ) Ch.7 Ambiguity Resolution:Statistical Method 5

Markov Chain .65 .44 .13 .71 .43 Φ ART N V P 1 .35 .29 A Markov Chain capturing the bi-gram probabilities Ch.7 Ambiguity Resolution:Statistical Method 6

What is an HMM? • Graphical Model • Circles indicate states • Arrows indicate probabilistic dependencies between states Ch.7 Ambiguity Resolution:Statistical Method 7

What is an HMM? • Green circles are hidden states • Dependent only on the previous state Ch.7 Ambiguity Resolution:Statistical Method 8

Example .65 .44 .13 .71 .43 Phi ART N V P 1 .35 .29 ..o25 .1 .36 .063 a flower 0 Flies like • Purple nodes are observed states • Dependent only on their corresponding hidden state • Example: Flies like a flower N V ART N – Prob(w 1 ,……w T |c 1 ,…….c T ) = Π ι=1,Τ Prob(c i |c i-1 )*Prob(w i |c i ) = (.29*.43*.65*1)*(.025*.1*.36*.063)= 0.081*0.0000567= 0.0000045927 Ch.7 Ambiguity Resolution:Statistical Method 9

Viterbi Algorithm Flies like a flower V .0000076 .000312 0 .0000000026 .000013 .00000012 .00725 .0000043 N 0 .00022 0 0 P 0 0 0 .000072 ART Ch.7 Ambiguity Resolution:Statistical Method 10

Obtaining Lexical Probability • Context Independent probability of w – Prob(L j ,w)= count(L j & w)/ Σ i=1,N count(L i &w) • This estimate is not reliable because it does not take context into account • Example for taking context into account: The flies like flowers Prob(flies/N|The flies)= Prob(flies/N&The flies)/Prob(The flies) Prob(flies/N&Theflies)=Prob(the|ART)*Prob(flies|N)*Prob(ART| Φ) Prob(N|ART)+ Prob(the|N)*Prob(flies|N)*Prob(N| Φ) Prob(N|N)+ Prob(the|P)*Prob(flies|N)*Prob(P| Φ) Prob(N|P) Prob(The flies)= Prob(flies/N & The flies)+Prob(flies/V & The flies) (see page 206 for numeric values) Ch.7 Ambiguity Resolution:Statistical Method 11

Forward Probability α i (t) = Prob(w t /L i ,w 1 ,…. w t ) • e.g. with the sentence The flies like flowers α 2 (3) would be the sum of values computed for all sequences ending in V (2 nd category) in position 3 given the input The flies like. • Using conditional probability – Prob(w t /L i |w 1 ,…w t )= prob(w t /L i ,w 1 ,… w t )/Prob(w 1 ,….w t ) = α i (t) / Σ j=1,N α j (t) Ch.7 Ambiguity Resolution:Statistical Method 12

Backward Probability β i (t) is the probability of producing the • sequence w t ,….. w T beginning from state w t /L j • A better method of estimating the lexical probability for word w t would be to consider the entire sentence: – Prob(w t /L i )= ( α i (t)* β i (t))/ Σ j=1,N ( α j (t)* β j (t)) Ch.7 Ambiguity Resolution:Statistical Method 13

Probabilistic Context Free Grammar Prob(R j |C) = Count(#times R j used)/ Σ i=1,m (#times R j used) • Where the grammar contains m rules: R 1 , …. R m with the left hand side C • Parsing is to find the most likely parse tree that could have generated a sentence • Independent assumption should be made about rule use, e.g. NP rules probabilities are the same whether the NP is a subject, the object of a verb, or the object of a preposition. • Inside probability which is the probability that a constituent C generates a sequence of words w i , w i+1 ,…. w j (w i,j ) : Prob(w i,j )|C) • Example the inside probability of the NP a flower ( using Rule 6 and Rule 8 in Grammar 7.17 page 209) is given by Prob(a flower| NP)= Prob(R8|NP)*Prob(a|ART)*Prob(flower|N)+ Prob(R6|NP)*Prob(a|N)*Prob(flower|N) Ch.7 Ambiguity Resolution:Statistical Method 14

Example of a PCFG Rule Count of LHS Count of Rule Probability S � NP VP 1. 300 300 1 VP � V 2. 300 116 .386 VP � V NP 3. 300 118 .393 VP � V NP PP 4. 300 66 .22 NP � NP PP 5. 1023 241 .24 NP � N N 6. 1023 92 .09 NP � N 7. 1023 141 .14 NP � ART N 8. 1023 558 .55 PP � P NP 9. 307 307 1 Ch.7 Ambiguity Resolution:Statistical Method 15

Example of PCFG Parse Trees S S 0.000009 S 0.00193 0.0000002 1 1 1 0.012 0.00006 0.154 0.0014 0.154 0.0001 NP VP NP VP NP VP .386 .386 .55 .09 .14 .393 .006 ART N V N N V N V NP .14 .36 N .063 .4 .01 .063 .4 .01 .05 .04 a flower wilted a flower wilted a flower wilted Ch.7 Ambiguity Resolution:Statistical Method 16

Best First Parsing • Best First parsing leads to significant improvement in efficiency • One implementation problem is that if you use multiplicative method to combine the scores , the scores of constituent tend to fall quickly and consequently the search will be like breadth first search. Some algorithms use a different function to compute the score for constituents such as Score (C ) = Min (Score (C � C1,…Cn), Score(C1)… Score (Cn)) Ch.7 Ambiguity Resolution:Statistical Method 17

Ambiguity Resolution: Statistical Method Prof. Ahmed Rafea Ch.7 - PowerPoint PPT Presentation

Ambiguity Resolution: Statistical Method Prof. Ahmed Rafea Ch.7 Ambiguity Resolution:Statistical Method 1 Outline Estimating Probability Part of Speech Tagging Obtaining Lexical Probability Probabilistic Context-free

Ambiguity and the Lexicon in Natural Language 2 The Lexicon Informatics 2A: Lecture 12 Closed vs.

Ambiguity and the Lexicon in Natural Language Informatics 2A: Lecture 12 2 The Lexicon Word

Resolution of Ambiguity through HUMINT An M&S Methodology Briefing to ISMOR 29 August 2007

Lexical Ambiguity Why is there Lexical Ambiguity? Ling 580E,F,I Quicky definition: Term

Creating a treebank Lecture 3: 7/15/2011 Ambiguity Phonological ambiguity: (ASR)

SIGBI Limited General Meeting 2019 Resolutions 1-6 Resolution 1 Resolution 2 Resolution 3

Patagonia Gold Plc 2009 Patagonia Gold VOTING ORDINARY SPECIAL Resolution 1 Resolution 2

The Single Resolution Mechanism Elke Knig Chair of the Single Resolution Board FDIC Systemic

Semantics and Pragmatics of NLP Lascarides & Klein Ambiguity and Underspecification Outline

FTG Summer School 2019 Ambiguity Aversion Uday Rajan Stephen M. Ross School of Business June

Efficient Allocations under Ambiguity Tomasz Strzalecki (Harvard University) Jan Werner

Requirements Analysis - Ambiguity R. Kuehl/J. Scott Hawker p. 1 R I T Lecture 4-1 Software

CDD ambiguity and irrelevant CDD ambiguity and irrelevant deformations of 2D QFT deformations

CS453 LR(1), LALR, AMBIGUITY CS453 Shift-Reduce Cont' 1 LR(1), LALR, Ambiguity The plan:

Risk assessment for uncertain cash flows: Model ambiguity, discounting ambiguity, and the role of

Ambiguity and the Lexicon in Natural Language Informatics 2A: Lecture 14 Mirella Lapata School

Genesis 22 THE MOTHER OF ALL TESTS Setting the T able From the very first God had promised

Beyond Local Search Marco Chiarandini Department of Mathematics & Computer Science

Words and Feet P . S. Langeslag Feet (or breath-groups): A repeating pattern of

Gerard Manley Hopkins 1844-1889 First Published 1918 Journals: (page 1557 Vol E)May 3:

L4S Issues Related to CE Ambiguity #16, #17, #20, #21 ,#22 Issue #16 L4S - Interaction w/

CSE 105 THEORY OF COMPUTATION Fall 2016 http://cseweb.ucsd.edu/classes/fa16/cse105-abc/ Today's

Resurgence of Instantons in Resurgence Applications String Theory Summary/Future Directions

Variational Decoding for Statistical Machine Translation Zhifei Li, Jason Eisner, and Sanjeev

Ambiguity Resolution: Statistical Method Prof. Ahmed Rafea Ch.7 - PowerPoint PPT Presentation

Ambiguity Resolution: Statistical Method Prof. Ahmed Rafea Ch.7 Ambiguity Resolution:Statistical Method 1 Outline Estimating Probability Part of Speech Tagging Obtaining Lexical Probability Probabilistic Context-free

Ambiguity and the Lexicon in Natural Language 2 The Lexicon Informatics 2A: Lecture 12 Closed vs.

Ambiguity and the Lexicon in Natural Language Informatics 2A: Lecture 12 2 The Lexicon Word

Resolution of Ambiguity through HUMINT An M&amp;S Methodology Briefing to ISMOR 29 August 2007

Lexical Ambiguity Why is there Lexical Ambiguity? Ling 580E,F,I Quicky definition: Term

Creating a treebank Lecture 3: 7/15/2011 Ambiguity Phonological ambiguity: (ASR)

SIGBI Limited General Meeting 2019 Resolutions 1-6 Resolution 1 Resolution 2 Resolution 3

Patagonia Gold Plc 2009 Patagonia Gold VOTING ORDINARY SPECIAL Resolution 1 Resolution 2

The Single Resolution Mechanism Elke Knig Chair of the Single Resolution Board FDIC Systemic

Semantics and Pragmatics of NLP Lascarides &amp; Klein Ambiguity and Underspecification Outline

FTG Summer School 2019 Ambiguity Aversion Uday Rajan Stephen M. Ross School of Business June

Efficient Allocations under Ambiguity Tomasz Strzalecki (Harvard University) Jan Werner

Requirements Analysis - Ambiguity R. Kuehl/J. Scott Hawker p. 1 R I T Lecture 4-1 Software

CDD ambiguity and irrelevant CDD ambiguity and irrelevant deformations of 2D QFT deformations

CS453 LR(1), LALR, AMBIGUITY CS453 Shift-Reduce Cont' 1 LR(1), LALR, Ambiguity The plan:

Risk assessment for uncertain cash flows: Model ambiguity, discounting ambiguity, and the role of

Ambiguity and the Lexicon in Natural Language Informatics 2A: Lecture 14 Mirella Lapata School

Genesis 22 THE MOTHER OF ALL TESTS Setting the T able From the very first God had promised

Beyond Local Search Marco Chiarandini Department of Mathematics &amp; Computer Science

Words and Feet P . S. Langeslag Feet (or breath-groups): A repeating pattern of

Gerard Manley Hopkins 1844-1889 First Published 1918 Journals: (page 1557 Vol E)May 3:

L4S Issues Related to CE Ambiguity #16, #17, #20, #21 ,#22 Issue #16 L4S - Interaction w/

CSE 105 THEORY OF COMPUTATION Fall 2016 http://cseweb.ucsd.edu/classes/fa16/cse105-abc/ Today's

Resurgence of Instantons in Resurgence Applications String Theory Summary/Future Directions

Variational Decoding for Statistical Machine Translation Zhifei Li, Jason Eisner, and Sanjeev

Resolution of Ambiguity through HUMINT An M&S Methodology Briefing to ISMOR 29 August 2007

Semantics and Pragmatics of NLP Lascarides & Klein Ambiguity and Underspecification Outline

Beyond Local Search Marco Chiarandini Department of Mathematics & Computer Science