expectation maximization
play

Expectation Maximization Henrik I. Christensen Robotics & - PowerPoint PPT Presentation

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Expectation Maximization Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu


  1. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Expectation Maximization Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) EM 1 / 23

  2. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Outline Introduction 1 Another view of EM 2 Bernoulli Mixtures 3 EM for Bayesian regression 4 EM Algorithm in General 5 Summary 6 Henrik I. Christensen (RIM@GT) EM 2 / 23

  3. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Introduction Last time we discussed mixture models Use of K-means and EM as a way to partition data More generally estimation of latent variables such as class membership Today a few other perspectives on EM will be discussed. Henrik I. Christensen (RIM@GT) EM 3 / 23

  4. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Outline Introduction 1 Another view of EM 2 Bernoulli Mixtures 3 EM for Bayesian regression 4 EM Algorithm in General 5 Summary 6 Henrik I. Christensen (RIM@GT) EM 4 / 23

  5. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary An alternative view Find ML solution to model with latent variables Have a set of observed variables - X Have a set of latent variables - Z Have a set of model parameters - θ Our criteria function is �� � ln p ( X | θ ) = ln p ( X , Z | θ ) Z Unfortunately sum inside ln expression Henrik I. Christensen (RIM@GT) EM 5 / 23

  6. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary An anternative view If Z was observed or known it would be simpler If { X, Z } - the complete set was known - great X alone is considered an incomplete dataset. However we can compute / estimate p ( X | Z , θ ) Iteratively we can update Z to be a good estimate of the distribution. The estimate of Z can be used to update the model parameters Henrik I. Christensen (RIM@GT) EM 6 / 23

  7. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary An alternative view 1 Choose initial value of θ old 2 E-Step: Compute p ( Z | X , θ old ) 3 M-Step: Compute θ new θ new = arg max Q ( θ, θ old ) θ where (complete data log likelihood is) Q ( θ, θ old ) = � p ( Z | x , θ old ) ln P ( X , Z | θ ) Z 4 Check for convergence - return to 2 if not done. Henrik I. Christensen (RIM@GT) EM 7 / 23

  8. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Outline Introduction 1 Another view of EM 2 Bernoulli Mixtures 3 EM for Bayesian regression 4 EM Algorithm in General 5 Summary 6 Henrik I. Christensen (RIM@GT) EM 8 / 23

  9. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Mixtures of Bernoulli Distributions What if we had a mixture of discrete random variables? Consider a Bernoulli example X is here described by D binary variables, x i , controlled by the average µ i , ie. D � µ x i i (1 − µ i ) (1 − x i ) p ( x | µ ) = i =1 then we have E [ x ] = µ cov [ c ] = diag { µ i (1 − µ i ) } Henrik I. Christensen (RIM@GT) EM 9 / 23

  10. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Mixtures of Bernoulli Distributions A mixture would then be K � p ( x | µ, π ) = π k p ( x | µ k ) k =1 and K � E [ x ] = π k µ k k =1 K � � � Σ k + µ k µ T − E [ x ] E [ x ] T cov [ x ] = π k k k =1 Our objective function would be � K N � � � ln p ( X | µ, π ) = ln π k p ( x i | µ k ) i =1 k =1 Henrik I. Christensen (RIM@GT) EM 10 / 23

  11. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary EM for Bernoulli Mixtures If we have an unobserved latent variable, z. K � p ( x | µ k ) z k p ( x | z , µ ) = k =1 and the mixture of variables K � π z k p ( z | π ) = k k =1 The objective function is then � � N K D � � � ln p ( X , Z | µ, π ) = ln π k + [ x ni ln µ ki + (1 − x ni ) ln(1 − µ ki )] z nk n =1 k =1 i =1 Henrik I. Christensen (RIM@GT) EM 11 / 23

  12. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary EM for Bernoulli Mixtures As before we can compute the responsibility γ π k p ( x n | µ k ) γ ( z nk ) = E [ z nk ] = � K j =1 π j p ( x n | µ j ) From this we can derive a structure as seen earlier N � = γ ( z nk ) N k n =1 N 1 � µ k = ¯ x k = γ ( z nk ) x n N k n =1 N k π k = N Henrik I. Christensen (RIM@GT) EM 12 / 23

  13. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Small Bernoulli Mixture Example Henrik I. Christensen (RIM@GT) EM 13 / 23

  14. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Outline Introduction 1 Another view of EM 2 Bernoulli Mixtures 3 EM for Bayesian regression 4 EM Algorithm in General 5 Summary 6 Henrik I. Christensen (RIM@GT) EM 14 / 23

  15. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Bayesian Linear Regression We have p ( w | t ) = N ( w | m N , S N ) where S N ( S − 1 0 m 0 + α Φ T t ) m N = S − 1 S − 1 + β ΦΦ T = N 0 The log-likelihood is when ln p ( t , w | α, β ) = ln p ( t | w , β ) + ln p ( w | α ) Henrik I. Christensen (RIM@GT) EM 15 / 23

  16. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary EM for Bayesian Linear Regression In the E step - compute posterior for w In the M step - compute α and β given w We can derive (see book) M α = m T N m n + Tr ( S N ) and a similar expression for β For responsibility we get likewise γ = M − α Tr ( S N ) Henrik I. Christensen (RIM@GT) EM 16 / 23

  17. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Outline Introduction 1 Another view of EM 2 Bernoulli Mixtures 3 EM for Bayesian regression 4 EM Algorithm in General 5 Summary 6 Henrik I. Christensen (RIM@GT) EM 17 / 23

  18. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary A general version of EM The general problem we are trying to address is We have a set of observed variables - X We have a set of latent variables - Z We have a model parameter set - θ Goal to maximize p ( X | θ ) Assumption: Hard to optimize p ( X | θ ) directly Easier to optimize p ( X , Z | θ ) Lets assume we can define a distribution q ( Z ) Henrik I. Christensen (RIM@GT) EM 18 / 23

  19. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary A general version of EM We are trying to optimize ln p ( X | θ ) = ln p ( X , Z | θ ) − ln p ( Z | X , θ ) We can rewrite this to ln p ( X | θ ) = L ( q , θ ) + KL ( q || p ) where � p ( X , Z | θ ) � � L ( q , θ ) = q ( Z ) ln q ( Z ) Z � p ( Z | X , θ ) � � KL ( q || p ) = − q ( Z ) ln q ( Z ) Z So L ( q , θ ) is an estimate of the joint distribution and KL is the Kullback-Leibler comparison of q ( Z ) to p ( Z | X , θ ). Henrik I. Christensen (RIM@GT) EM 19 / 23

  20. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary A general version of EM We can now formulate the general algorithm The E-step is used for maximization of L ( q , θ ) with a fixed θ The M-step allow optimization of L ( . ) wrt θ Henrik I. Christensen (RIM@GT) EM 20 / 23

  21. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Outline Introduction 1 Another view of EM 2 Bernoulli Mixtures 3 EM for Bayesian regression 4 EM Algorithm in General 5 Summary 6 Henrik I. Christensen (RIM@GT) EM 21 / 23

  22. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Summary Expectation maximization is widely used in robotics and estimation in general Basically iterative generation of a model and optimization of the model Particularly useful for estimation with mixture models - optimize the models and the mixture coefficients iteratively rather than in batch An important tool to have available for estimation and learning Henrik I. Christensen (RIM@GT) EM 22 / 23

  23. Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary A useful reference M. J. Wainwright & M. Jordan, Graphical Models, Exponential Families and Variational Inference , Foundations and Trends in Machine Learning , No 1-2, Vol 1., 2008 http://www.nowpublishers.com/product.aspx?product=MAL Henrik I. Christensen (RIM@GT) EM 23 / 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend