Expectation Maximization Henrik I. Christensen Robotics & - PowerPoint PPT Presentation

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Expectation Maximization Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) EM 1 / 23

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Outline Introduction 1 Another view of EM 2 Bernoulli Mixtures 3 EM for Bayesian regression 4 EM Algorithm in General 5 Summary 6 Henrik I. Christensen (RIM@GT) EM 2 / 23

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Introduction Last time we discussed mixture models Use of K-means and EM as a way to partition data More generally estimation of latent variables such as class membership Today a few other perspectives on EM will be discussed. Henrik I. Christensen (RIM@GT) EM 3 / 23

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary An alternative view Find ML solution to model with latent variables Have a set of observed variables - X Have a set of latent variables - Z Have a set of model parameters - θ Our criteria function is �� ln p ( X | θ ) = ln p ( X , Z | θ ) Z Unfortunately sum inside ln expression Henrik I. Christensen (RIM@GT) EM 5 / 23

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary An anternative view If Z was observed or known it would be simpler If { X, Z } - the complete set was known - great X alone is considered an incomplete dataset. However we can compute / estimate p ( X | Z , θ ) Iteratively we can update Z to be a good estimate of the distribution. The estimate of Z can be used to update the model parameters Henrik I. Christensen (RIM@GT) EM 6 / 23

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary An alternative view 1 Choose initial value of θ old 2 E-Step: Compute p ( Z | X , θ old ) 3 M-Step: Compute θ new θ new = arg max Q ( θ, θ old ) θ where (complete data log likelihood is) Q ( θ, θ old ) = � p ( Z | x , θ old ) ln P ( X , Z | θ ) Z 4 Check for convergence - return to 2 if not done. Henrik I. Christensen (RIM@GT) EM 7 / 23

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Mixtures of Bernoulli Distributions What if we had a mixture of discrete random variables? Consider a Bernoulli example X is here described by D binary variables, x i , controlled by the average µ i , ie. D � µ x i i (1 − µ i ) (1 − x i ) p ( x | µ ) = i =1 then we have E [ x ] = µ cov [ c ] = diag { µ i (1 − µ i ) } Henrik I. Christensen (RIM@GT) EM 9 / 23

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Mixtures of Bernoulli Distributions A mixture would then be K � p ( x | µ, π ) = π k p ( x | µ k ) k =1 and K � E [ x ] = π k µ k k =1 K � � � Σ k + µ k µ T − E [ x ] E [ x ] T cov [ x ] = π k k k =1 Our objective function would be � K N � � � ln p ( X | µ, π ) = ln π k p ( x i | µ k ) i =1 k =1 Henrik I. Christensen (RIM@GT) EM 10 / 23

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary EM for Bernoulli Mixtures If we have an unobserved latent variable, z. K � p ( x | µ k ) z k p ( x | z , µ ) = k =1 and the mixture of variables K � π z k p ( z | π ) = k k =1 The objective function is then � � N K D � � � ln p ( X , Z | µ, π ) = ln π k + [ x ni ln µ ki + (1 − x ni ) ln(1 − µ ki )] z nk n =1 k =1 i =1 Henrik I. Christensen (RIM@GT) EM 11 / 23

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary EM for Bernoulli Mixtures As before we can compute the responsibility γ π k p ( x n | µ k ) γ ( z nk ) = E [ z nk ] = � K j =1 π j p ( x n | µ j ) From this we can derive a structure as seen earlier N � = γ ( z nk ) N k n =1 N 1 � µ k = ¯ x k = γ ( z nk ) x n N k n =1 N k π k = N Henrik I. Christensen (RIM@GT) EM 12 / 23

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Small Bernoulli Mixture Example Henrik I. Christensen (RIM@GT) EM 13 / 23

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Bayesian Linear Regression We have p ( w | t ) = N ( w | m N , S N ) where S N ( S − 1 0 m 0 + α Φ T t ) m N = S − 1 S − 1 + β ΦΦ T = N 0 The log-likelihood is when ln p ( t , w | α, β ) = ln p ( t | w , β ) + ln p ( w | α ) Henrik I. Christensen (RIM@GT) EM 15 / 23

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary EM for Bayesian Linear Regression In the E step - compute posterior for w In the M step - compute α and β given w We can derive (see book) M α = m T N m n + Tr ( S N ) and a similar expression for β For responsibility we get likewise γ = M − α Tr ( S N ) Henrik I. Christensen (RIM@GT) EM 16 / 23

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary A general version of EM The general problem we are trying to address is We have a set of observed variables - X We have a set of latent variables - Z We have a model parameter set - θ Goal to maximize p ( X | θ ) Assumption: Hard to optimize p ( X | θ ) directly Easier to optimize p ( X , Z | θ ) Lets assume we can define a distribution q ( Z ) Henrik I. Christensen (RIM@GT) EM 18 / 23

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary A general version of EM We are trying to optimize ln p ( X | θ ) = ln p ( X , Z | θ ) − ln p ( Z | X , θ ) We can rewrite this to ln p ( X | θ ) = L ( q , θ ) + KL ( q || p ) where � p ( X , Z | θ ) � � L ( q , θ ) = q ( Z ) ln q ( Z ) Z � p ( Z | X , θ ) � � KL ( q || p ) = − q ( Z ) ln q ( Z ) Z So L ( q , θ ) is an estimate of the joint distribution and KL is the Kullback-Leibler comparison of q ( Z ) to p ( Z | X , θ ). Henrik I. Christensen (RIM@GT) EM 19 / 23

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary A general version of EM We can now formulate the general algorithm The E-step is used for maximization of L ( q , θ ) with a fixed θ The M-step allow optimization of L ( . ) wrt θ Henrik I. Christensen (RIM@GT) EM 20 / 23

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Summary Expectation maximization is widely used in robotics and estimation in general Basically iterative generation of a model and optimization of the model Particularly useful for estimation with mixture models - optimize the models and the mixture coefficients iteratively rather than in batch An important tool to have available for estimation and learning Henrik I. Christensen (RIM@GT) EM 22 / 23

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary A useful reference M. J. Wainwright & M. Jordan, Graphical Models, Exponential Families and Variational Inference , Foundations and Trends in Machine Learning , No 1-2, Vol 1., 2008 http://www.nowpublishers.com/product.aspx?product=MAL Henrik I. Christensen (RIM@GT) EM 23 / 23

Expectation Maximization Henrik I. Christensen Robotics & - PowerPoint PPT Presentation

Introduction Another view of EM Bernoulli Mix EM Regress General EM Summary Expectation Maximization Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu

Expectation Maximization CMSC 691 UMBC Outline EM (Expectation Maximization) Basic idea Three

Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9

Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9 K-Means Gaussian Mixture

more on expectation 1 2 properties of expectation properties of expectation Linearity, II

CS70: Jean Walrand: Lecture 27. Expectation; Conditional Expectation; B(n, p); G(p) 1. Review of

Notes on Neal and Hintons Generalized Expectation Maximization (GEM) Algorithm Mark Johnson

Applied Machine Learning Expectation Maximization for Mixture of Gaussians Siamak Ravanbakhsh

Expectation Will Perkins January 21, 2013 Expectation Definition The expectation of a random

Submodular Maximization Seffi Naor Lecture 2 4th Cargese Workshop on Combinatorial Optimization

Submodular Maximization Seffi Naor Lecture 3 4th Cargese Workshop on Combinatorial Optimization

Statistical Machine Learning Lecture 06 Extra: Expectation Maximization Kristian Kersting TU

Machine Learning Supervised Learning Unsupervised Learning CSE 446: Expectation Maximization

Foundations of Computer Science Lecture 20 Expected Value of a Sum Linearity of Expectation

Foundations of Computer Science Lecture 20 Expected Value of a Sum Linearity of Expectation

Maximization of Submodular Functions Seffi Naor Lecture 1 4th Cargese Workshop on Combinatorial

On the dual problem of utility maximization Yiqing LIN Joint work with L. GU and J. YANG

MINING Text Data: Nave Bayes Instructor: Yizhou Sun yzsun@cs.ucla.edu December 7, 2017

Discrete Probability: a brief review CMPS 4750/6750: Computer Networks 1 Applications of

Bernoulli Equations Bernd Schr oder logo1 Bernd Schr oder Louisiana Tech University,

COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 21. Independence

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Relations for Barnes Zeta Functions Abdelmejid Bayad Universit e dEvry Val dEssonne

Generalization of Bernoulli numbers and polynomials to the multiple case Olivier Bouillot,

Orthogonal Polynomials for Bernoulli and Euler Polynomials Lin JIU Dalhousie University Number