Overview Maximum-Likelihood Estimation Models with hidden variables - PowerPoint PPT Presentation

Overview • Maximum-Likelihood Estimation • Models with hidden variables 6.864 (Fall 2007) • The EM algorithm for a simple example (3 coins) The EM Algorithm, Part I • The general form of the EM algorithm • Hidden Markov models 1 3 An Experiment/Some Intuition Maximum Likelihood Estimation • I have three coins in my pocket, • We have data points x 1 , x 2 , . . . x n drawn from some set X Coin 0 has probability λ of heads; Coin 1 has probability p 1 of heads; • We have a parameter vector Θ Coin 2 has probability p 2 of heads • For each trial I do the following: • We have a parameter space Ω First I toss Coin 0 If Coin 0 turns up heads , I toss coin 1 three times If Coin 0 turns up tails , I toss coin 2 three times • We have a distribution P ( x | Θ) for any Θ ∈ Ω , such that � P ( x | Θ) = 1 and P ( x | Θ) ≥ 0 for all x I don’t tell you whether Coin 0 came up heads or tails, or whether Coin 1 or 2 was tossed three times, x ∈X but I do tell you how many heads/tails are seen at each trial • you see the following sequence: • We assume that our data points x 1 , x 2 , . . . x n are drawn � HHH � , � TTT � , � HHH � , � TTT � , � HHH � at random (independently, identically distributed) from a distribution P ( x | Θ ∗ ) for some Θ ∗ ∈ Ω What would you estimate as the values for λ, p 1 and p 2 ? 2 4

Log-Likelihood Maximum Likelihood Estimation • We have data points x 1 , x 2 , . . . x n drawn from some set X • Given a sample x 1 , x 2 , . . . x n , choose � Θ ML = argmax Θ ∈ Ω L (Θ) = argmax Θ ∈ Ω log P ( x i | Θ) • We have a parameter vector Θ , and a parameter space Ω i • We have a distribution P ( x | Θ) for any Θ ∈ Ω • For example, take the coin example: say x 1 . . . x n has Count ( H ) heads, and ( n − Count ( H )) tails • The likelihood is ⇒ n � Likelihood (Θ) = P ( x 1 , x 2 , . . . x n | Θ) = P ( x i | Θ) � Θ Count ( H ) × (1 − Θ) n − Count ( H ) � L (Θ) = log i =1 = Count ( H ) log Θ + ( n − Count ( H )) log(1 − Θ) • The log-likelihood is • We now have n � Θ ML = Count ( H ) L (Θ) = log Likelihood (Θ) = log P ( x i | Θ) i =1 n 5 7 A First Example: Coin Tossing A Second Example: Probabilistic Context-Free Grammars • X = { H,T } . Our data points x 1 , x 2 , . . . x n are a sequence of • X is the set of all parse trees generated by the underlying heads and tails, e.g. context-free grammar. Our sample is n trees T 1 . . . T n such that each T i ∈ X . HHTTHHHTHH • R is the set of rules in the context free grammar N is the set of non-terminals in the grammar • Parameter vector Θ is a single parameter, i.e., the probability of coin coming up heads • Θ r for r ∈ R is the parameter for rule r • Parameter space Ω = [0 , 1] • Let R ( α ) ⊂ R be the rules of the form α → β for some α • Distribution P ( x | Θ) is defined as • The parameter space Ω is the set of Θ ∈ [0 , 1] | R | such that � Θ If x = H P ( x | Θ) = 1 − Θ If x = T � for all α ∈ N Θ r = 1 r ∈ R ( α ) 6 8

Multinomial Distributions • We have Θ Count ( T,r ) � • X is a finite set, e.g., X = { dog , cat , the , saw } P ( T | Θ) = r r ∈ R • Our sample x 1 , x 2 , . . . x n is drawn from X where Count ( T, r ) is the number of times rule r is seen in the tree T e.g., x 1 , x 2 , x 3 = dog , the , saw � ⇒ log P ( T | Θ) = Count ( T, r ) log Θ r • The parameter Θ is a vector in R m where m = |X| r ∈ R e.g., Θ 1 = P ( dog ) , Θ 2 = P ( cat ) , Θ 3 = P ( the ) , Θ 4 = P ( saw ) • The parameter space is m � Ω = { Θ : Θ i = 1 and ∀ i, Θ i ≥ 0 } i =1 • If our sample is x 1 , x 2 , x 3 = dog , the , saw , then L (Θ) = log P ( x 1 , x 2 , x 3 = dog , the , saw ) = log Θ 1 +log Θ 3 +log Θ 4 9 11 Maximum Likelihood Estimation for PCFGs Overview • We have • Maximum-Likelihood Estimation � log P ( T | Θ) = Count ( T, r ) log Θ r r ∈ R • Models with hidden variables where Count ( T, r ) is the number of times rule r is seen in the tree T • The EM algorithm for a simple example (3 coins) • And, � � � L (Θ) = log P ( T i | Θ) = Count ( T i , r ) log Θ r • The general form of the EM algorithm i i r ∈ R • Hidden Markov models • Solving Θ ML = argmax Θ ∈ Ω L (Θ) gives � i Count ( T i , r ) Θ r = � � s ∈ R ( α ) Count ( T i , s ) i where r is of the form α → β for some β 10 12

Models with Hidden Variables Overview • Now say we have two sets X and Y , and a joint distribution • Maximum-Likelihood Estimation P ( x, y | Θ) • Models with hidden variables • If we had fully observed data , ( x i , y i ) pairs, then � L (Θ) = log P ( x i , y i | Θ) • The EM algorithm for a simple example (3 coins) i • The general form of the EM algorithm • If we have partially observed data , x i examples, then � • Hidden Markov models L (Θ) = log P ( x i | Θ) i � � = log P ( x i , y | Θ) i y ∈Y 13 15 The Three Coins Example • The EM (Expectation Maximization) algorithm is a method for finding • e.g., in the three coins example: Y = { H,T } � � Θ ML = argmax Θ log P ( x i , y | Θ) X = { HHH,TTT,HTT,THH,HHT,TTH,HTH,THT } y ∈Y i Θ = { λ, p 1 , p 2 } • and P ( x, y | Θ) = P ( y | Θ) P ( x | y, Θ) where � λ If y = H P ( y | Θ) = 1 − λ If y = T and � p h 1 (1 − p 1 ) t If y = H P ( x | y, Θ) = p h 2 (1 − p 2 ) t If y = T where h = number of heads in x , t = number of tails in x 14 16

The Three Coins Example The Three Coins Example • Various probabilities can be calculated, for example: • Various probabilities can be calculated, for example: λp 1 (1 − p 1 ) 2 λp 1 (1 − p 1 ) 2 P ( x = THT , y = H | Θ) = P ( x = THT , y = H | Θ) = (1 − λ ) p 2 (1 − p 2 ) 2 (1 − λ ) p 2 (1 − p 2 ) 2 P ( x = THT , y = T | Θ) = P ( x = THT , y = T | Θ) = P ( x = THT | Θ) = P ( x = THT , y = H | Θ) P ( x = THT | Θ) = P ( x = THT , y = H | Θ) + P ( x = THT , y = T | Θ) + P ( x = THT , y = T | Θ) λp 1 (1 − p 1 ) 2 + (1 − λ ) p 2 (1 − p 2 ) 2 λp 1 (1 − p 1 ) 2 + (1 − λ ) p 2 (1 − p 2 ) 2 = = P ( x = THT , y = H | Θ) P ( x = THT , y = H | Θ) P ( y = H | x = THT , Θ) = P ( y = H | x = THT , Θ) = P ( x = THT | Θ) P ( x = THT | Θ) λp 1 (1 − p 1 ) 2 λp 1 (1 − p 1 ) 2 = = λp 1 (1 − p 1 ) 2 + (1 − λ ) p 2 (1 − p 2 ) 2 λp 1 (1 − p 1 ) 2 + (1 − λ ) p 2 (1 − p 2 ) 2 17 19 The Three Coins Example The Three Coins Example • Various probabilities can be calculated, for example: • Various probabilities can be calculated, for example: λp 1 (1 − p 1 ) 2 λp 1 (1 − p 1 ) 2 P ( x = THT , y = H | Θ) = P ( x = THT , y = H | Θ) = (1 − λ ) p 2 (1 − p 2 ) 2 (1 − λ ) p 2 (1 − p 2 ) 2 P ( x = THT , y = T | Θ) = P ( x = THT , y = T | Θ) = P ( x = THT | Θ) = P ( x = THT , y = H | Θ) P ( x = THT | Θ) = P ( x = THT , y = H | Θ) + P ( x = THT , y = T | Θ) + P ( x = THT , y = T | Θ) λp 1 (1 − p 1 ) 2 + (1 − λ ) p 2 (1 − p 2 ) 2 λp 1 (1 − p 1 ) 2 + (1 − λ ) p 2 (1 − p 2 ) 2 = = P ( x = THT , y = H | Θ) P ( x = THT , y = H | Θ) P ( y = H | x = THT , Θ) = P ( y = H | x = THT , Θ) = P ( x = THT | Θ) P ( x = THT | Θ) λp 1 (1 − p 1 ) 2 λp 1 (1 − p 1 ) 2 = = λp 1 (1 − p 1 ) 2 + (1 − λ ) p 2 (1 − p 2 ) 2 λp 1 (1 − p 1 ) 2 + (1 − λ ) p 2 (1 − p 2 ) 2 18 20

The Three Coins Example The Three Coins Example • Partially observed data might look like: • Fully observed data might look like: � HHH � , � TTT � , � HHH � , � TTT � , � HHH � ( � HHH � , H ) , ( � TTT � , T ) , ( � HHH � , H ) , ( � TTT � , T ) , ( � HHH � , H ) • If current parameters are λ, p 1 , p 2 P ( � HHH � , H ) P ( y = H | x = � HHH � ) = P ( � HHH � , H ) + P ( � HHH � , T ) • In this case maximum likelihood estimates are: λp 3 1 = λ = 3 λp 3 1 + (1 − λ ) p 3 2 5 p 1 = 9 P ( � TTT � , H ) 9 P ( y = H | x = � TTT � ) = P ( � TTT � , H ) + P ( � TTT � , T ) p 2 = 0 λ (1 − p 1 ) 3 6 = λ (1 − p 1 ) 3 + (1 − λ )(1 − p 2 ) 3 21 23 The Three Coins Example The Three Coins Example • If current parameters are λ, p 1 , p 2 • Partially observed data might look like: λp 3 1 P ( y = H | x = � HHH � ) = λp 3 1 + (1 − λ ) p 3 � HHH � , � TTT � , � HHH � , � TTT � , � HHH � 2 λ (1 − p 1 ) 3 • How do we find the maximum likelihood parameters? P ( y = H | x = � TTT � ) = λ (1 − p 1 ) 3 + (1 − λ )(1 − p 2 ) 3 • If λ = 0 . 3 , p 1 = 0 . 3 , p 2 = 0 . 6 : P ( y = H | x = � HHH � ) = 0 . 0508 P ( y = H | x = � TTT � ) = 0 . 6967 22 24

Overview Maximum-Likelihood Estimation Models with hidden variables - PowerPoint PPT Presentation

Overview Maximum-Likelihood Estimation Models with hidden variables 6.864 (Fall 2007) The EM algorithm for a simple example (3 coins) The EM Algorithm, Part I The general form of the EM algorithm Hidden Markov models 1 3 An

01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 |

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 SF park overview OVERVIEW PRESENTATION / 2

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 Acknowledgements OVERVIEW PRESENTATION / 2 SF

INVESTOR PRESENTATION FEBRUARY 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

INVESTOR PRESENTATION MAY 2019 Index Executive Summary Company Overview Business Overview

INVESTOR PRESENTATION MARCH 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

1 Overview Overview Regional demographic overview Regional demographic overview Workforce

Covid-19 and Business Interruption: Maximizing Insurance Coverage and Federal Grants Counsel

OVERVIEW OVERVIEW OVERVIEW OVERVIEW The qualifications are aimed at primary school

An overview to Maltese An overview to Maltese An overview to Maltese An overview to Maltese

GSM System Overview GSM System Overview GSM System Overview GSM System Overview Phone Lin

Butterball Employees Butterball Employees Butterball Employees Benefits Overview Ruan Benefits

Program-for-Results Financing Overview Overview Overview of World Bank Instruments

INVESTOR PRESENTATION Index Executive Summary Company Overview Business Overview Industry

Key Maths 3 UK Assessm ent overview Claire Parsons Overview 1. Key Maths 3 UK (overview) 2.

Federal Fiscal Year 2017-18 CHASE Fee Program June 21, 2018 Overview CHASE Overview Fee

CS70: Alex Psomas: Lecture 19. 1. Random Variables: Brief Review 2. Some details on

Lecture 1 : The Mathematical Theory of Probability 0/ 30 1. Introduction Today we will do 2.1

Probability: Terminology and Examples 18.05 Spring 2014 January 1, 2017 1 / 22 Board

MACRA, MIPS, APMs & CPC+: What to Expect from All These Acronyms?! Monthly National Briefing

2.22.3 Introduction to Probability and Sample Spaces Prof. Tesler Math 186 Winter 2019

Foundations of Computing II Lecture 5: Introduction to probability Stefano Tessaro

CMPS 2200 Fall 2015 Probability and Expected Values Carola Wenk 11/18/15 CMPS 2200

Lets make set theory great again! John Harrison Amazon Web Services AITP 2018, Aussois 27th

Overview Maximum-Likelihood Estimation Models with hidden variables - PowerPoint PPT Presentation

Overview Maximum-Likelihood Estimation Models with hidden variables 6.864 (Fall 2007) The EM algorithm for a simple example (3 coins) The EM Algorithm, Part I The general form of the EM algorithm Hidden Markov models 1 3 An

01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 |

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 SF park overview OVERVIEW PRESENTATION / 2

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 Acknowledgements OVERVIEW PRESENTATION / 2 SF

INVESTOR PRESENTATION FEBRUARY 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

INVESTOR PRESENTATION MAY 2019 Index Executive Summary Company Overview Business Overview

INVESTOR PRESENTATION MARCH 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

1 Overview Overview Regional demographic overview Regional demographic overview Workforce

Covid-19 and Business Interruption: Maximizing Insurance Coverage and Federal Grants Counsel

OVERVIEW OVERVIEW OVERVIEW OVERVIEW The qualifications are aimed at primary school

An overview to Maltese An overview to Maltese An overview to Maltese An overview to Maltese

GSM System Overview GSM System Overview GSM System Overview GSM System Overview Phone Lin

Butterball Employees Butterball Employees Butterball Employees Benefits Overview Ruan Benefits

Program-for-Results Financing Overview Overview Overview of World Bank Instruments

INVESTOR PRESENTATION Index Executive Summary Company Overview Business Overview Industry

Key Maths 3 UK Assessm ent overview Claire Parsons Overview 1. Key Maths 3 UK (overview) 2.

Federal Fiscal Year 2017-18 CHASE Fee Program June 21, 2018 Overview CHASE Overview Fee

CS70: Alex Psomas: Lecture 19. 1. Random Variables: Brief Review 2. Some details on

Lecture 1 : The Mathematical Theory of Probability 0/ 30 1. Introduction Today we will do 2.1

Probability: Terminology and Examples 18.05 Spring 2014 January 1, 2017 1 / 22 Board

MACRA, MIPS, APMs &amp; CPC+: What to Expect from All These Acronyms?! Monthly National Briefing

2.22.3 Introduction to Probability and Sample Spaces Prof. Tesler Math 186 Winter 2019

Foundations of Computing II Lecture 5: Introduction to probability Stefano Tessaro

CMPS 2200 Fall 2015 Probability and Expected Values Carola Wenk 11/18/15 CMPS 2200

Lets make set theory great again! John Harrison Amazon Web Services AITP 2018, Aussois 27th

MACRA, MIPS, APMs & CPC+: What to Expect from All These Acronyms?! Monthly National Briefing