mixture models and em
play

Mixture Models and EM Henrik I. Christensen Robotics & - PowerPoint PPT Presentation

Introduction K-means Clustering MoG Summary Mixture Models and EM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT)


  1. Introduction K-means Clustering MoG Summary Mixture Models and EM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Mixture Models/EM 1 / 29

  2. Introduction K-means Clustering MoG Summary Outline Introduction 1 K-means Clustering 2 Mixtures of Gaussians 3 Summary 4 Henrik I. Christensen (RIM@GT) Mixture Models/EM 2 / 29

  3. Introduction K-means Clustering MoG Summary Introduction In many cases the uni-modal assumption of a normal distribution is a major challenge. I.e. handling multiple hypotheses or modelling of multiple instances - people, ... Mixtures of Gaussians are a way to model richer distributions The mixture of Gaussians can be considered a model with latent variables. Expectation Maximization (EM) is a general technique to find maximum likelihood estimators for models with latent variables Mixture models are widely used for clustering of data K-means is another clustering technique that has similarities to EM Henrik I. Christensen (RIM@GT) Mixture Models/EM 3 / 29

  4. Introduction K-means Clustering MoG Summary Outline Introduction 1 K-means Clustering 2 Mixtures of Gaussians 3 Summary 4 Henrik I. Christensen (RIM@GT) Mixture Models/EM 4 / 29

  5. Introduction K-means Clustering MoG Summary K-means clustering Consider clustering of data { x 1 , x 2 , ..., x N } into K groups. Assume for now that each data point is in a D-dimensional Euclidean space Each cluster is represented by a “center” estimate µ i Challenge: how to find an optimal assignment of data to clusters? Have an indicator variable r ni ∈ { 0 , 1 } Named the 1-of-K coding Henrik I. Christensen (RIM@GT) Mixture Models/EM 5 / 29

  6. Introduction K-means Clustering MoG Summary K-means - Objective Function We can then define an objective function / distortion measure N K � � r ni || x n − µ i || 2 J = n =1 i =1 Basically the squared distance to the “centres” Goal: r ni the optimal assignment to clusters µ i the centers of clusters to minimize J Henrik I. Christensen (RIM@GT) Mixture Models/EM 6 / 29

  7. Introduction K-means Clustering MoG Summary Iterative Algorithm 1 Choose initial values for µ i 2 Minimize J wrt r ni 3 Minimize J wrt µ i 4 Repeat 2 - 3 until convergence Henrik I. Christensen (RIM@GT) Mixture Models/EM 7 / 29

  8. Introduction K-means Clustering MoG Summary Algorithm details Consider the indicator � 1 i = arg min j || x n − µ j || 2 if r ni = 0 otherwise Extremum for J is then defined by N � 2 r ni ( x n − µ i ) = 0 n =1 or � n r ni x n µ i = � n r ni So µ is the mean of the k th cluster thus the name k-means Henrik I. Christensen (RIM@GT) Mixture Models/EM 8 / 29

  9. Introduction K-means Clustering MoG Summary Small Example (a) (b) (c) 2 2 2 0 0 0 −2 −2 −2 −2 0 2 −2 0 2 −2 0 2 (d) (e) (f) 2 2 2 0 0 0 −2 −2 −2 −2 0 2 −2 0 2 −2 0 2 (g) (h) (i) 2 2 2 0 0 0 −2 −2 −2 −2 0 2 −2 0 2 −2 0 2 Henrik I. Christensen (RIM@GT) Mixture Models/EM 9 / 29

  10. Introduction K-means Clustering MoG Summary Objective Function 1000 J 500 0 1 2 3 4 Henrik I. Christensen (RIM@GT) Mixture Models/EM 10 / 29

  11. Introduction K-means Clustering MoG Summary Considerations The iteration over all data points in each iteration can be a challenge The “smart” selection of candidate points for the cluster centers is important. Even uniform distribution could be ok. Organization of data is graph/mesh can be essential for efficient access / handling of data Henrik I. Christensen (RIM@GT) Mixture Models/EM 11 / 29

  12. Introduction K-means Clustering MoG Summary Iterative Updating Sequential updating can be organized with: µ new = µ old + η t ( x n − µ old ) i i i Where η t is the learning rate and it typically decreases as more points are considered. Henrik I. Christensen (RIM@GT) Mixture Models/EM 12 / 29

  13. Introduction K-means Clustering MoG Summary Generalization of K-means In general the Euclidean norm might not always be optimal The generalized version of the objective / distortion function is N K � � J = r ni D ( x n , µ i ) n =1 i =1 Here D ( ., . ) is a dissimilarity measure that even might handle robust outlier rejection. Henrik I. Christensen (RIM@GT) Mixture Models/EM 13 / 29

  14. Introduction K-means Clustering MoG Summary Example of clustering - Image Compression K=2 K=3 K=10 Original Original image K = 2 K = 3 K = 10 Henrik I. Christensen (RIM@GT) Mixture Models/EM 14 / 29

  15. Introduction K-means Clustering MoG Summary Example of clustering - Image Compression K=2 K=3 K=10 Original Henrik I. Christensen (RIM@GT) Mixture Models/EM 15 / 29

  16. Introduction K-means Clustering MoG Summary Outline Introduction 1 K-means Clustering 2 Mixtures of Gaussians 3 Summary 4 Henrik I. Christensen (RIM@GT) Mixture Models/EM 16 / 29

  17. Introduction K-means Clustering MoG Summary Mixtures of Gaussians Recall the original definition of mixtures K � p ( x ) = π i N ( x | µ k Σ k ) i =1 Define an indicator variable z i that is characterized by z k ∈ { 0 , 1 } Only one of the dimensions has unit value � k z k = 1 Assume p ( x , z ) and a conditional p ( x | z ) We can then assume p ( z k = 1) = π k Henrik I. Christensen (RIM@GT) Mixture Models/EM 17 / 29

  18. Introduction K-means Clustering MoG Summary The parameterization We have for { π i } that 0 ≤ π i ≤ 1 � i π i = 1 p ( z ) can be be considered K � π z i p ( z ) = i i =1 Similarly p ( x | z i = 1) = N ( x | µ i , Σ i ) or K � N ( x | µ i , Σ i ) z i p ( x | z ) = i =1 ⇒ K � � p ( x ) = p ( z ) p ( x | z ) = π i N ( x | µ i , Σ i ) z i =1 Henrik I. Christensen (RIM@GT) Mixture Models/EM 18 / 29

  19. Introduction K-means Clustering MoG Summary Mixtures So why all the extra stuff? We can think of p ( x ) as an observation over a joint distribution p ( x , z ) where z is a latent variable. For reference introduce p ( z i = 1 | x ) also denoted γ ( z i ) π i N ( x | µ i , Σ i ) γ ( z i ) = p ( z i = 1 | x ) = � j π j N ( x | µ j , Σ j ) Henrik I. Christensen (RIM@GT) Mixture Models/EM 19 / 29

  20. Introduction K-means Clustering MoG Summary Data Example 1 1 1 (a) (b) (c) 0.5 0.5 0.5 0 0 0 0 0.5 1 0 0.5 1 0 0.5 1 Henrik I. Christensen (RIM@GT) Mixture Models/EM 20 / 29

  21. Introduction K-means Clustering MoG Summary Maximum Likelihood Suppose we have a dataset X = { x 1 , x 2 , ..., x N } How can we model it using a mixture model? � K N � � � ln p ( X | π, µ, Σ) = ln π i N ( x j | µ i , Σ i ) j =1 i =1 z n π x n µ Σ N Henrik I. Christensen (RIM@GT) Mixture Models/EM 21 / 29

  22. Introduction K-means Clustering MoG Summary EM for Gaussian Mixures Consider the extremum for the ln p () N π i N ( x i | µ k , Σ k ) � j π j N ( x i | µ j , Σ j )Σ − 1 k ( x i − µ k ) = 0 � i =1 ⇒ N µ k = 1 � γ ( z ik ) x i N k i =1 where � N k = γ ( z ik ) i Henrik I. Christensen (RIM@GT) Mixture Models/EM 22 / 29

  23. Introduction K-means Clustering MoG Summary EM for Gaussian Mixures In a similar fashion we can compute the co-variance N Σ k = 1 � γ ( z ik )( x i − µ k )( x i − µ k ) T N k i =1 If we maximize wrt to mixing ( π i ) we need to optimize ln p but also consider the constraint � π = 1 Using a Lagrange multiplier we have � K � � ln p ( X | π, µ, Σ) + λ π i − 1 i =1 Henrik I. Christensen (RIM@GT) Mixture Models/EM 23 / 29

  24. Introduction K-means Clustering MoG Summary EM for Gaussian Mixtures We obtain N N ( x i | µ k , Σ k ) � j π j N ( x i | µ j , Σ j ) + λ � i =1 Which creates the intuitive solution π i = N i N Henrik I. Christensen (RIM@GT) Mixture Models/EM 24 / 29

  25. Introduction K-means Clustering MoG Summary EM for Gaussian Mixtures Select a set of values for π , µ , and Σ Perform an initial analysis (expectation) Re-estimate the values (maximize the likelihood) Iterate Henrik I. Christensen (RIM@GT) Mixture Models/EM 25 / 29

  26. Introduction K-means Clustering MoG Summary The detailed version 1 Initialize parameters 2 Evaluate (E Step) π i N ( x | µ i , Σ i ) γ ( z i ) = p ( z i = 1 | x ) = � j π j N ( x | µ j , Σ j ) 3 Re-estimate parameters, µ new , Σ new and π new k k k 4 Evaluate ln p ( X | π, µ, Σ) and check for convergence Henrik I. Christensen (RIM@GT) Mixture Models/EM 26 / 29

  27. Introduction K-means Clustering MoG Summary Small Example 2 2 2 2 L = 1 0 0 0 −2 −2 −2 (a) (b) (c) −2 0 2 −2 0 2 −2 0 2 2 2 2 L = 2 L = 5 L = 20 0 0 0 −2 −2 −2 (d) (e) (f) −2 0 2 −2 0 2 −2 0 2 Henrik I. Christensen (RIM@GT) Mixture Models/EM 27 / 29

  28. Introduction K-means Clustering MoG Summary Outline Introduction 1 K-means Clustering 2 Mixtures of Gaussians 3 Summary 4 Henrik I. Christensen (RIM@GT) Mixture Models/EM 28 / 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend