Hidden Markov Models II Machine Learning 10-601B Seyoung - PowerPoint PPT Presentation

Hidden ¡Markov ¡Models ¡II ¡ Machine ¡Learning ¡10-‑601B ¡ Seyoung ¡Kim ¡ Many ¡of ¡these ¡slides ¡are ¡derived ¡from ¡Tom ¡ Mitchell, ¡Ziv ¡Bar-‑Joseph. ¡Thanks! ¡

A ¡Hidden ¡Markov ¡Model ¡ • The ¡joint ¡probability ¡of ¡(Q,O) ¡is ¡defined ¡as ¡ T ∏ P ( Q , O ) = p ( q 1 ) p ( o 1 | q 1 ) p ( q t | q t − 1 ) p ( o t | q t ) t = 2 emission ¡ transiPon ¡ IniPal ¡probability ¡ probability ¡ probability ¡ …. ¡ q 1 ¡ q 2 ¡ q 3 ¡ q T ¡ o 1 ¡ o 2 ¡ o 3 ¡ o T ¡

Learning ¡HMMs ¡ • UnPl ¡now ¡we ¡assumed ¡that ¡the ¡emission ¡and ¡transiPon ¡ probabiliPes ¡are ¡known ¡ • This ¡is ¡usually ¡not ¡the ¡case ¡ ¡ ¡ ¡ ¡-‑ ¡How ¡is ¡“AI” ¡pronounced ¡by ¡different ¡individuals? ¡ ¡ ¡ ¡ ¡-‑ ¡What ¡is ¡the ¡probability ¡of ¡hearing ¡“class” ¡a[er ¡“AI”? ¡ 3 ¡

Learning ¡HMM ¡When ¡Hidden ¡States ¡are ¡ Observed ¡ • Assume ¡both ¡hidden ¡and ¡observed ¡states ¡are ¡observed ¡ – Data: ¡((O 1 ,Q 1 ), ¡…, ¡(O K ,Q K )) ¡for ¡K ¡sequences, ¡where ¡O k ¡= ¡(o 1 k ,…,o T k ) ¡ Q k =(q 1 k ,…,q T k ) ¡ • MLE ¡for ¡learning! ¡ argmax log p (( O 1 , Q 1 ),...,( O K , Q K )) T k | q 1 k | q t − 1 k | q t ∏ ( q 1 k ) p ( o 1 k ) ∏ k ) k ) argmax log p p ( q t p ( o t k t = 2 4 ¡

Learning ¡HMM ¡When ¡Hidden ¡States ¡are ¡ Observed ¡ • MLE ¡for ¡HMM ¡ log p (( O 1 , Q 1 ),...,( O K , Q K )) T k | q 1 k | q t − 1 k | q t ∏ ( q 1 ∏ k ) p ( o 1 k ) k ) k ) = log p p ( q t p ( o t k t = 2 k | q 1 k | q t k | q t − 1 ∑ ∑ ∑ ∑ ∑ ∑ k ) k ) + k ) k ) log p ( q 1 log p ( o 1 log p ( o t log p ( q t = + + k k k t k t Involves ¡only ¡ Involves ¡only ¡ Involves ¡only ¡ iniPal ¡ emission ¡ transiPon ¡ probabiliPes ¡ probabiliPes ¡ probabiliPes ¡ • DifferenPate ¡w.r.t. ¡each ¡parameters ¡and ¡set ¡it ¡to ¡0 ¡and ¡solve! ¡ Closed ¡form ¡soluPon ¡ 5 ¡

Example ¡ • Assume ¡the ¡model ¡below ¡ • We ¡also ¡observe ¡the ¡following ¡sequence: ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ 1,2,2,5,6,5,1 ¡ ¡ ¡ ¡ ¡ ¡ 1,3,2,5,6,5,2 ¡ ¡ ¡ ¡ ¡ ¡ 3,2,1,3,6,5,4 ¡ • How ¡can ¡we ¡determine ¡the ¡iniPal, ¡transiPon ¡and ¡emission ¡ probabiliPes? ¡ A ¡ B ¡ 6 ¡

Ini>al ¡probabili>es ¡ Q: ¡assume ¡we ¡can ¡observe ¡the ¡following ¡sets ¡of ¡states: ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ AAABBAA ¡ 1,2,2,5,6,5,1 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡AABBBBB ¡ 1,3,2,5,6,5,2 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡BAABBAB ¡ 3,2,1,3,6,5,4 ¡ ¡ ¡ ¡ ¡ ¡how ¡can ¡we ¡learn ¡the ¡iniPal ¡probabiliPes? ¡ k ¡is ¡the ¡number ¡of ¡ sequences ¡avialable ¡for ¡ A: ¡Maximum ¡likelihood ¡esPmaPon ¡ training ¡ ¡ ¡ ¡ ¡Find ¡the ¡iniPal ¡probabiliPes ¡ π ¡such ¡that ¡ T k | q 1 k | q t − 1 k | q t π * = argmax log ∏ ( q 1 k ) p ( o 1 k ) ∏ k ) k ) p p ( q t p ( o t k t = 2 π * = argmax log ∏ ( q 1 k ) p k π A ¡= ¡#A/ ¡(#A+#B) ¡ A ¡ B ¡ 7 ¡

Transi>on ¡probabili>es ¡ Q: ¡assume ¡we ¡can ¡observe ¡the ¡set ¡of ¡states: ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡AAABBAA ¡ 1,2,2,5,6,5,1 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡AABBBBB ¡ 1,3,2,5,6,5,2 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡BAABBAB ¡ 3,2,1,3,6,5,4 ¡ remember ¡that ¡we ¡defined ¡ ¡ ¡ ¡ ¡ ¡how ¡can ¡we ¡learn ¡the ¡transiPon ¡probabiliPes? ¡ a i,j =p(q t =s j |q t-‑1 =s i ) ¡ A: ¡Maximum ¡likelihood ¡esPmaPon ¡ ¡ ¡ ¡ ¡Find ¡a ¡transiPon ¡matrix ¡ a ¡such ¡that ¡ T k | q 1 k | q t − 1 k | q t a * = argmax log ∏ ( q 1 ∏ k ) p ( o 1 k ) k ) k ) p p ( q t p ( o t k t = 2 T k | q t − 1 a * = argmax log ∏ ∏ k ) p ( q t k t = 2 A ¡ B ¡ a A,B ¡= ¡#AB ¡/ ¡(#AB+#AA) ¡ 8 ¡

Transi>on ¡probabili>es ¡ Q: ¡assume ¡we ¡can ¡observe ¡the ¡set ¡of ¡states: ¡ Moving ¡window ¡of ¡size ¡2 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡AAABBAA ¡ 1,2,2,5,6,5,1 ¡ -‑>#AA, ¡#AB, ¡#BA, ¡#BB ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡AABBBBB ¡ 1,3,2,5,6,5,2 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡BAABBAB ¡ 3,2,1,3,6,5,4 ¡ remember ¡that ¡we ¡defined ¡ ¡ ¡ ¡ ¡ ¡how ¡can ¡we ¡learn ¡the ¡transiPon ¡probabiliPes? ¡ a i,j =p(q t =s j |q t-‑1 =s i ) ¡ A: ¡Maximum ¡likelihood ¡esPmaPon ¡ ¡ ¡ ¡ ¡Find ¡a ¡transiPon ¡matrix ¡ a ¡such ¡that ¡ T k | q 1 k | q t − 1 k | q t a * = argmax log ∏ ( q 1 ∏ k ) p ( o 1 k ) k ) k ) p p ( q t p ( o t k t = 2 T k | q t − 1 a * = argmax log ∏ ∏ k ) p ( q t k t = 2 A ¡ B ¡ a A,B ¡= ¡#AB ¡/ ¡(#AB+#AA) ¡ 9 ¡

Emission ¡probabili>es ¡ Q: ¡assume ¡we ¡can ¡observe ¡the ¡set ¡of ¡states: ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡AAABBAA ¡ 1,2,2,5,6,5,1 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡AABBBBB ¡ 1,3,2,5,6,5,2 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡BAABBAB ¡ 3,2,1,3,6,5,4 ¡ remember ¡that ¡we ¡defined ¡ ¡ ¡ ¡ ¡ ¡how ¡can ¡we ¡learn ¡the ¡transiPon ¡probabiliPes? ¡ b i (o t ) ¡= ¡P(o t ¡| ¡s i ) ¡ A: ¡Maximum ¡likelihood ¡esPmaPon ¡ ¡ ¡ ¡ ¡Find ¡an ¡emission ¡matrix ¡ b ¡ such ¡that ¡ T k | q 1 k | q t − 1 k | q t b * = argmax log ∏ ( q 1 k ) p ( o 1 k ) ∏ k ) k ) p p ( q t p ( o t k t = 2 T k | q 1 k | q t b * = argmax log ∏ ∏ k ) k ) p ( o 1 p ( o t k t = 2 A ¡ b A (5)= ¡#A5 ¡/ ¡(#A1+#A2 ¡+ ¡… ¡+#A6)=#A5/#A ¡ B ¡ 10 ¡

Learning ¡HMMs ¡ • In ¡most ¡case ¡we ¡do ¡not ¡know ¡what ¡states ¡generated ¡each ¡of ¡ the ¡outputs ¡(hidden ¡states ¡are ¡unobserved) ¡ – … ¡but ¡had ¡we ¡known, ¡it ¡would ¡be ¡very ¡easy ¡to ¡determine ¡an ¡emission ¡ and ¡transiPon ¡model! ¡ – On ¡the ¡other ¡hand, ¡if ¡we ¡had ¡such ¡a ¡model ¡we ¡could ¡determine ¡the ¡set ¡ of ¡states ¡using ¡the ¡inference ¡methods ¡we ¡discussed ¡ 11 ¡

Expecta>on ¡Maximiza>on ¡(EM) ¡ • Appropriate ¡for ¡problems ¡with ¡‘missing ¡values’ ¡for ¡the ¡ variables. ¡ • For ¡example, ¡in ¡HMMs ¡we ¡usually ¡do ¡not ¡observe ¡the ¡states ¡ • Assume ¡complete ¡data ¡log ¡likelihood ¡and ¡maximize ¡expected ¡ log ¡likelihood ¡ argmax E [log p (( O 1 , Q 1 ),...,( O K , Q K ))] T k | q 1 k | q t − 1 k | q t ∏ ( q 1 k ) p ( o 1 k ) ∏ k ) k )] argmax E [log p p ( q t p ( o t k t = 2 ¡ ¡ ¡ ¡where ¡the ¡expectaPon ¡is ¡taken ¡with ¡respect ¡to ¡p(Q|O, ¡ parameters) ¡ ¡ ¡ 12 ¡

Expecta>on ¡Maximiza>on ¡(EM): ¡Quick ¡ reminder ¡ • Two ¡steps ¡ – E ¡step: ¡Fill ¡in ¡the ¡missing ¡variables ¡with ¡the ¡expected ¡values ¡ – M ¡step: ¡Regular ¡maximum ¡likelihood ¡esPmaPon ¡(MLE) ¡using ¡the ¡values ¡computed ¡in ¡the ¡ E ¡step ¡and ¡the ¡values ¡of ¡the ¡other ¡variables ¡ • Guaranteed ¡to ¡converge ¡(though ¡only ¡to ¡a ¡local ¡minima). ¡ expected ¡values ¡for ¡ (missing) ¡variables ¡ M ¡step ¡ E ¡step ¡ parameters ¡ 13 ¡

E ¡Step ¡ • In ¡our ¡example, ¡with ¡complete ¡data, ¡we ¡needed ¡ – #A, ¡#B ¡to ¡esPmate ¡iniPal ¡probabiliPes ¡and ¡emission ¡probabiliPes ¡ – #AA, ¡#AB, ¡#BA, ¡#BB ¡to ¡esPmate ¡transiPon ¡probabiliPes ¡ • When ¡hidden ¡states ¡are ¡not ¡observed, ¡we ¡need ¡“expected ¡ counts” ¡in ¡E ¡step ¡ A ¡ B ¡ 14 ¡ 14 ¡

Hidden Markov Models II Machine Learning 10-601B Seyoung - PowerPoint PPT Presentation

Hidden Markov Models II Machine Learning 10-601B Seyoung Kim Many of these slides are derived from Tom Mitchell, Ziv Bar-Joseph. Thanks! A

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich;

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Markov Chains and Hidden Markov Models COMP 571 - Spring 2015 Luay Nakhleh, Rice University

The Hidden Markov The Hidden Markov Model (HMM) Model (HMM) 1 Lecture Outline Lecture Outline

Hidden Markov Models Markov Model (Finite State Machine with Probs) Modeling a sequence of

A spectral algorithm for learning hidden Markov models . . . h 3 h 2 h 1 x 3 x 2 x 1 Daniel Hsu

CS 4495 Computer Vision Hidden Markov Models Aaron Bobick School of Interactive Computing

Outline Sequential Data - Part 2 Greg Mori - CMPT 419/726 Hidden Markov Models - Most Likely

Introduction to time-resolved spectroscopy With applications in biophysics and physical chemistry

Approach Worked with Alaska Native Coalition on Employment and Training (ANCET) members to

TRANSIT BUS EMISSION STUDY COMPARISON OF EMISSIONS FROM DIESEL AND NATURAL GAS BUSES Nils-Olof

Workshop 2 Key Inputs & Assumptions March 11, 2020 WELCOME! Thanks for coming back

scandal John German University of Michigan Conference Transportation Economics, Energy and the

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

GHG Regulation Impact Analysis Initial Study Results September 17, 2014 The purpose of

Wavelength Optimization Review ELM; January 26, 2004 Optimization of a red fluorophore with a

Sambuz

Useful Links

Newsletter

Mail Us

Hidden Markov Models II Machine Learning 10-601B Seyoung - PowerPoint PPT Presentation

Hidden Markov Models II Machine Learning 10-601B Seyoung Kim Many of these slides are derived from Tom Mitchell, Ziv Bar-Joseph. Thanks! A

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich;

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Markov Chains and Hidden Markov Models COMP 571 - Spring 2015 Luay Nakhleh, Rice University

The Hidden Markov The Hidden Markov Model (HMM) Model (HMM) 1 Lecture Outline Lecture Outline

Hidden Markov Models Markov Model (Finite State Machine with Probs) Modeling a sequence of

A spectral algorithm for learning hidden Markov models . . . h 3 h 2 h 1 x 3 x 2 x 1 Daniel Hsu

CS 4495 Computer Vision Hidden Markov Models Aaron Bobick School of Interactive Computing

Outline Sequential Data - Part 2 Greg Mori - CMPT 419/726 Hidden Markov Models - Most Likely

Introduction to time-resolved spectroscopy With applications in biophysics and physical chemistry

Approach Worked with Alaska Native Coalition on Employment and Training (ANCET) members to

TRANSIT BUS EMISSION STUDY COMPARISON OF EMISSIONS FROM DIESEL AND NATURAL GAS BUSES Nils-Olof

Workshop 2 Key Inputs &amp; Assumptions March 11, 2020 WELCOME! Thanks for coming back

scandal John German University of Michigan Conference Transportation Economics, Energy and the

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

GHG Regulation Impact Analysis Initial Study Results September 17, 2014 The purpose of

Wavelength Optimization Review ELM; January 26, 2004 Optimization of a red fluorophore with a

Sambuz

Useful Links

Newsletter

Mail Us

Workshop 2 Key Inputs & Assumptions March 11, 2020 WELCOME! Thanks for coming back