Partially observed GMs Speech recognition Y 1 Y 2 Y 3 Y T ... X 1 - PDF document

School of Computer Science Learning Partially Observed GM: the Expectation-Maximization algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 11, Oct 22, 2007 Receptor A Receptor A Receptor B Receptor B X 1 X 1 X 1 X 2 X 2 X 2 Eric Xing Eric Xing Kinase C Kinase C X 3 X 3 X 3 Kinase D Kinase D X 4 X 4 X 4 Kinase E Kinase E X 5 X 5 X 5 TF F TF F X 6 X 6 X 6 Reading: J-Chap. 10,11; KF-Chap. 17 Gene G Gene G X 7 X 7 X 7 Gene H Gene H X 8 X 8 X 8 1 Partially observed GMs � Speech recognition Y 1 Y 2 Y 3 Y T ... X 1 A X 2 A X 3 A ... X T A Eric Xing 2 1

Partially observed GM � Biological Evolution ancestor ? T years Q m Q h G A A G C A C A Eric Xing 3 Mixture models � A density model p ( x ) may be multi-modal. � We may be able to model it as a mixture of uni-modal distributions (e.g., Gaussians). � Each mode may correspond to a different sub-population (e.g., male and female). Eric Xing 4 2

Unobserved Variables � A variable can be unobserved (latent) because: it is an imaginary quantity meant to provide some simplified and � abstractive view of the data generation process e.g., speech recognition models, mixture models … � it is a real-world object and/or phenomena, but difficult or impossible to � measure e.g., the temperature of a star, causes of a disease, evolutionary � ancestors … it is a real-world object and/or phenomena, but sometimes wasn’t � measured, because of faulty sensors, etc. � Discrete latent variables can be used to partition/cluster data into sub-groups. � Continuous latent variables (factors) can be used for dimensionality reduction (factor analysis, etc). Eric Xing 5 Gaussian Mixture Models (GMMs) � Consider a mixture of K Gaussian components: ∑ µ Σ = π µ Σ ( , ) ( , | , ) p x N x n k k k k mixture proportion mixture component � This model can be used for unsupervised clustering. This model (fit by AutoClass) has been used to discover new kinds of � stars in astronomical data, etc. Eric Xing 6 3

Gaussian Mixture Models (GMMs) � Consider a mixture of K Gaussian components: Z Z is a latent class indicator vector: � ( ) ∏ k = π = π z X p ( z ) multi ( z : ) n n n k k X is a conditional Gaussian variable with a class-specific mean/covariance � 1 { } 1 − 1 = µ Σ = 1 µ Σ µ k T p ( x | z , , ) exp - ( x - ) ( x - ) 2 1 2 n n 2 2 n k k n k π Σ / m / ( ) k The likelihood of a sample: � mixture component mixture proportion ∑ 1 1 µ Σ = = π = µ Σ k k p ( x , ) p ( z | ) p ( x , | z , , ) n ( ) ∑ k ∑ ∏ ( ) k = π µ Σ k = π µ Σ z z ( : , ) ( , | , ) n N x n N x k n k k k k k z k k n Eric Xing 7 Why is Learning Harder? � In fully observed iid settings, the log likelihood decomposes into a sum of local terms (at least for directed models). θ = θ = θ + θ l ( ; ) log ( , | ) log ( | ) log ( | , ) D p x z p z p x z c z x � With latent variables, all the parameters become coupled together via marginalization ∑ ∑ l θ = θ = θ θ ( ; ) log ( , | ) log ( | ) ( | , ) D p x z p z p x z c z x z z Eric Xing 8 4

Toward the EM algorithm � Recall MLE for completely observed data z i � Data log-likelihood x i ∏ N ∏ = = π µ σ l θ ( ; ) log ( , ) log ( | ) ( | , , ) D p z x p z p x z n n n n n n n ∑ ∑ ∏ ∏ = π k + µ σ k z z log log ( ; , ) n N x n k n k n k n k ∑∑ ∑∑ 2 = π 1 µ + k k z log - z ( x - ) C 2 2 n k n σ n k n k n k � MLE π = l θ ˆ , arg max ( ; D ), π ∑ k MLE k z x µ = l ⇒ µ = n n θ ˆ , arg max ( ; ) ˆ n D ∑ µ k MLE k , MLE k z n σ = l n θ ˆ , arg max ( ; ) D σ k MLE � What if we do not know z n ? Eric Xing 9 Recall: K-means − 1 = − µ Σ − µ ( t ) ( t ) T ( t ) ( t ) z arg max ( x ) ( x ) n n k k n k k ∑ δ ( ) t ( , ) z k x 1 µ + = n n ( ) t n ∑ δ k ( t ) ( , ) z k n n Eric Xing 10 5

Expectation-Maximization � Start: "Guess" the centroid µ k and coveriance Σ k of each of the K clusters � � Loop Eric Xing 11 Example: Gaussian mixture model A mixture of K Gaussians: � Z n Z is a latent class indicator vector � ( ) k ∏ z p z = z π = π X n n ( ) multi ( : ) n n k k N X is a conditional Gaussian variable with class-specific mean/covariance � 1 { } p x z 1 x x k = µ Σ = 1 µ T Σ − 1 µ ( | , , ) exp - ( - ) ( - ) n n 2 n k k n k 1 2 2 π m 2 Σ / / ( ) k The likelihood of a sample: � ∑ p x µ Σ = p z k = 1 π p x z k = 1 µ Σ ( , ) ( | ) ( , | , , ) n k ( ) ∑ ∑ ∏ ( ) k z N x z k N x = π µ Σ = π µ Σ n ( : , ) n ( , | , ) k n k k k k k z k k n The expected complete log likelihood � ∑ ∑ x z p z p x z = π + µ Σ l θ ( ; , ) log ( | ) log ( | , , ) c n n n p z x p z x ( | ) ( | ) n n ( ) ∑∑ ∑∑ 1 z k z k x T − 1 x C = π − − µ Σ − µ + Σ + log ( ) ( ) log n k n n k k n k k 2 n k n k Eric Xing 12 6

E-step l ( θ � We maximize iteratively using the following ) c iterative procedure: ─ Expectation step: computing the expected value of the sufficient statistics of the hidden variables (i.e., z ) given current est. of the parameters (i.e., π and µ ). t N x t t π µ Σ ( ) ( ) ( ) ( , | , ) k t z k p z k 1 x t t τ = = = µ Σ = k n k k ( ) ( ) ( ) ( | , , ) ∑ n n n t t N x t t q π µ Σ ( ) ( ) ( ) ( ) ( , | , ) i n i i i � Here we are essentially doing inference Eric Xing 13 M-step l � We maximize iteratively using the following ( θ ) c iterative procudure: ─ Maximization step: compute the parameters under current results of the expected value of the hidden variables ∑ ⇒ 0 ∀ 1 π = ∂ = π = * θ θ arg max ( ) , ( ) , , s.t. l l k ∂ π k c c k k k ∑ ∑ τ k z ( ) k t n ⇒ n π = ( ) = = * n t n q k n N N N k ∑ τ k ( t ) x ⇒ + 1 µ = µ = n n * θ ( t ) n arg max l ( ) , ∑ Fact : k k τ k ( t ) 1 ∂ − n log A n = T ∑ A 1 1 ∂ − 1 τ − µ + − µ + k ( t ) ( t ) ( t ) T A ( x )( x ) ⇒ 1 Σ = Σ + = n n k n k * θ ( t ) n T arg max l ( ) , ∑ ∂ x x A T k k τ = k ( t ) xx ∂ n A n This is isomorphic to MLE except that the variables that are hidden are � replaced by their expectations (in general they will by replaced by their corresponding " sufficient statistics ") Eric Xing 14 7

Compare: K-means and EM The EM algorithm for mixtures of Gaussians is like a "soft version" of the K-means algorithm. K-means EM � � In the K-means “E-step” we do hard E-step � � assignment: τ = k ( t ) k z n n ( ) t − 1 q = − µ Σ − µ ( t ) ( t ) T ( t ) ( t ) z arg max ( x ) ( x ) n n k k n k π µ Σ ( t ) ( t ) ( t ) N ( x , | , ) k 1 = = µ Σ = k ( t ) ( t ) p ( z | x , , ) ∑ k n k k n π µ Σ ( t ) ( t ) ( t ) ( , | , ) N x i n i i In the K-means “M-step” we update the � i means as the weighted sum of the data, but now the weights are 0 or 1: M-step � ∑ ∑ τ δ k ( t ) ( t ) x ( z , k ) x + 1 n n µ + 1 = µ ( ) = ( t ) n n t n n ∑ ∑ δ k τ k ( t ) k ( t ) ( , ) z k n n n n Eric Xing 15 Theory underlying EM � What are we doing? � Recall that according to MLE, we intend to learn the model parameter that would have maximize the likelihood of the data. � But we do not observe z , so computing ∑ ∑ l θ = θ = θ θ ( ; ) log ( , | ) log ( | ) ( | , ) D p x z p z p x z c z x z z is difficult! � What shall we do? Eric Xing 16 8

Partially observed GMs Speech recognition Y 1 Y 2 Y 3 Y T ... X 1 - PDF document

School of Computer Science Learning Partially Observed GM: the Expectation-Maximization algorithm Probabilistic Graphical Models (10- Probabilistic Graphical Models (10 -708) 708) Lecture 11, Oct 22, 2007 Receptor A Receptor A Receptor

Probabilistic Graphical Models 10-708 Learning Partially Observed Learning Partially Observed

Greater Mekong Sub-region North-South Economic Corridor (GMS 2 - NSEC) Part 1: GMS NSEC Snap

Galactic Cosmic Galactic Cosmic- - Rays Observed by Rays Observed by Rays Observed by Rays

Galactic Cosmic Galactic Cosmic- - Rays Observed by Rays Observed by Rays Observed by Rays

How comfortable are you with GMS? Submit Your Response Online: ( enter link on your device )

Financial Sector Asian Development Bank 22 November 2013 Outline The GMS Economic Cooperation

$TITLE: M2-4.GMS quick introduction to sets and scenarios using M2-2 * MAXIMIZE UTILITY SUBJECT

Model reduction of partially-observed Motivation stochastic differential equations A control

ESRC Annual HCP Review Kahuku Wind Power and Kaheawa Wind Power I Kahuku Wind Power 2 Observed

Rayleigh- -Taylor instability Taylor instability Rayleigh in partially ionized in partially

Mobility Reporting Volume or Value? presented by Steve Mock, COO, Metric Insights Kelly Tepera,

ISO Containers on Wheels The Way Forward for the GMS Presenter: Allan Davidson Head of Vehicle

ADAPTATION(EBA) AT SUBNATIONAL LEVEL FOR THE GMS- IMPLEMENTATION AND MAINSTREAMING RAJI DHITAL

RESEARCH and EDUCATION NETWORKS in the GMS by Kanchana KANCHANASUT Vice President for

GMS Program: Linkages with the WG Health Cooperation Pinsuda Alexander Asian Development Bank

Stock Taking on Axle Load Control of GMS Countries 21st Subregional Transport Forum Luang

Two types of GMs Directed edges give causality relationships ( Bayesian Network or Directed

Assessing and Exploiting BigNum Vulnerabilities Ralf-Philipp Weinmann Director of Research -

GRAVITATIONAL LENSING LESSON 1 - DEFLECTION OF LIGHT Massimo Meneghetti AA 2016-2017 TEACHER

Almost Gorenstein rings of higher dimension . Naoki Taniguchi Meiji University Joint work with

2021 Competitive Grant Awards ~ Next Steps ~ August 5, 2020 Presented By Bethany Wilson- Grants

CS5412: VIRTUAL SYNCHRONY Lecture XIV Ken Birman Group Communication idea 2 System

2013 GMS Basic Navigation Lori Adams Shane Rhian Kiley Taylor Cathy Von Kaenel NDE Portal

Remaking the Economy December 17, 2019 Steve Dubb, Nonprofit Quarterly image courtesy of artist