expectation maximization
play

Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9 - PowerPoint PPT Presentation

K-Means Gaussian Mixture Models Expectation-Maximization Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9 K-Means Gaussian Mixture Models Expectation-Maximization Learning Parameters to Probability Distributions We


  1. K-Means Gaussian Mixture Models Expectation-Maximization Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9

  2. K-Means Gaussian Mixture Models Expectation-Maximization Learning Parameters to Probability Distributions • We discussed probabilistic models at length • In assignment 3 you showed that given fully observed training data, setting parameters θ i to probability distributions is straight-forward • However, in many settings not all variables are observed (labelled) in the training data: x i = ( x i , h i ) • e.g. Speech recognition: have speech signals, but not phoneme labels • e.g. Object recognition: have object labels (car, bicycle), but not part labels (wheel, door, seat) • Unobserved variables are called latent variables 20 40 60 80 100 120 140 160 180 figs from Fergus et al.

  3. K-Means Gaussian Mixture Models Expectation-Maximization Outline K-Means Gaussian Mixture Models Expectation-Maximization

  4. K-Means Gaussian Mixture Models Expectation-Maximization Outline K-Means Gaussian Mixture Models Expectation-Maximization

  5. K-Means Gaussian Mixture Models Expectation-Maximization Unsupervised Learning • We will start with an unsupervised (a) 2 learning (clustering) problem: • Given a dataset { x 1 , . . . , x N } , each 0 x i ∈ R D , partition the dataset into K clusters − 2 • Intuitively, a cluster is a group of − 2 0 2 points, which are close together and far from others

  6. K-Means Gaussian Mixture Models Expectation-Maximization Distortion Measure (a) 2 • Formally, introduce prototypes (or cluster centers) µ k ∈ R D 0 • Use binary r nk , 1 if point n is in cluster k , − 2 0 otherwise (1-of- K coding scheme − 2 0 2 again) (i) 2 • Find { µ k } , { r nk } to minimize distortion measure: 0 N K � � r nk || x n − µ k || 2 J = −2 n = 1 k = 1 −2 0 2

  7. K-Means Gaussian Mixture Models Expectation-Maximization Minimizing Distortion Measure • Minimizing J directly is hard N K � � r nk || x n − µ k || 2 J = n = 1 k = 1 • However, two things are easy • If we know µ k , minimizing J wrt r nk • If we know r nk , minimizing J wrt µ k • This suggests an iterative procedure • Start with initial guess for µ k • Iteration of two steps: • Minimize J wrt r nk • Minimize J wrt µ k • Rinse and repeat until convergence

  8. K-Means Gaussian Mixture Models Expectation-Maximization Minimizing Distortion Measure • Minimizing J directly is hard N K � � r nk || x n − µ k || 2 J = n = 1 k = 1 • However, two things are easy • If we know µ k , minimizing J wrt r nk • If we know r nk , minimizing J wrt µ k • This suggests an iterative procedure • Start with initial guess for µ k • Iteration of two steps: • Minimize J wrt r nk • Minimize J wrt µ k • Rinse and repeat until convergence

  9. K-Means Gaussian Mixture Models Expectation-Maximization Minimizing Distortion Measure • Minimizing J directly is hard N K � � r nk || x n − µ k || 2 J = n = 1 k = 1 • However, two things are easy • If we know µ k , minimizing J wrt r nk • If we know r nk , minimizing J wrt µ k • This suggests an iterative procedure • Start with initial guess for µ k • Iteration of two steps: • Minimize J wrt r nk • Minimize J wrt µ k • Rinse and repeat until convergence

  10. K-Means Gaussian Mixture Models Expectation-Maximization Determining Membership Variables • Step 1 in an iteration of K-means is to minimize distortion measure J wrt (a) 2 cluster membership variables r nk N K 0 � � r nk || x n − µ k || 2 J = n = 1 k = 1 −2 −2 0 2 • Terms for different data points x n are (b) independent, for each data point set r nk 2 to minimize 0 K � r nk || x n − µ k || 2 k = 1 −2 −2 0 2 • Simply set r nk = 1 for the cluster center µ k with smallest distance

  11. K-Means Gaussian Mixture Models Expectation-Maximization Determining Membership Variables • Step 1 in an iteration of K-means is to minimize distortion measure J wrt (a) 2 cluster membership variables r nk N K 0 � � r nk || x n − µ k || 2 J = n = 1 k = 1 −2 −2 0 2 • Terms for different data points x n are (b) independent, for each data point set r nk 2 to minimize 0 K � r nk || x n − µ k || 2 k = 1 −2 −2 0 2 • Simply set r nk = 1 for the cluster center µ k with smallest distance

  12. K-Means Gaussian Mixture Models Expectation-Maximization Determining Membership Variables • Step 1 in an iteration of K-means is to minimize distortion measure J wrt (a) 2 cluster membership variables r nk N K 0 � � r nk || x n − µ k || 2 J = n = 1 k = 1 −2 −2 0 2 • Terms for different data points x n are (b) independent, for each data point set r nk 2 to minimize 0 K � r nk || x n − µ k || 2 k = 1 −2 −2 0 2 • Simply set r nk = 1 for the cluster center µ k with smallest distance

  13. K-Means Gaussian Mixture Models Expectation-Maximization Determining Cluster Centers • Step 2: fix r nk , minimize J wrt the cluster centers µ k (b) 2 K N r nk || x n − µ k || 2 switch order of sums � � J = 0 k = 1 n = 1 • So we can minimze wrt each µ k separately −2 −2 0 2 • Take derivative, set to zero: (c) 2 N � r nk ( x n − µ k ) = 0 2 0 n = 1 � n r nk x n −2 ⇔ µ k = −2 0 2 � n r nk i.e. mean of datapoints x n assigned to cluster k

  14. K-Means Gaussian Mixture Models Expectation-Maximization Determining Cluster Centers • Step 2: fix r nk , minimize J wrt the cluster centers µ k (b) 2 K N r nk || x n − µ k || 2 switch order of sums � � J = 0 k = 1 n = 1 • So we can minimze wrt each µ k separately −2 −2 0 2 • Take derivative, set to zero: (c) 2 N � r nk ( x n − µ k ) = 0 2 0 n = 1 � n r nk x n −2 ⇔ µ k = −2 0 2 � n r nk i.e. mean of datapoints x n assigned to cluster k

  15. K-Means Gaussian Mixture Models Expectation-Maximization K-means Algorithm • Start with initial guess for µ k • Iteration of two steps: • Minimize J wrt r nk • Assign points to nearest cluster center • Minimize J wrt µ k • Set cluster center as average of points in cluster • Rinse and repeat until convergence

  16. K-Means Gaussian Mixture Models Expectation-Maximization K-means example (a) 2 0 −2 −2 0 2

  17. K-Means Gaussian Mixture Models Expectation-Maximization K-means example (b) 2 0 −2 −2 0 2

  18. K-Means Gaussian Mixture Models Expectation-Maximization K-means example (c) 2 0 −2 −2 0 2

  19. K-Means Gaussian Mixture Models Expectation-Maximization K-means example (d) 2 0 −2 −2 0 2

  20. K-Means Gaussian Mixture Models Expectation-Maximization K-means example (e) 2 0 −2 −2 0 2

  21. K-Means Gaussian Mixture Models Expectation-Maximization K-means example (f) 2 0 −2 −2 0 2

  22. K-Means Gaussian Mixture Models Expectation-Maximization K-means example (g) 2 0 −2 −2 0 2

  23. K-Means Gaussian Mixture Models Expectation-Maximization K-means example (h) 2 0 −2 −2 0 2

  24. K-Means Gaussian Mixture Models Expectation-Maximization K-means example (i) 2 0 −2 −2 0 2 Next step doesn’t change membership – stop

  25. K-Means Gaussian Mixture Models Expectation-Maximization K-means Convergence • Repeat steps until no change in cluster assignments • For each step, value of J either goes down, or we stop • Finite number of possible assignments of data points to clusters, so we are guarranteed to converge eventually • Note it may be a local maximum rather than a global maximum to which we converge

  26. K-Means Gaussian Mixture Models Expectation-Maximization K-means Example - Image Segmentation Original image �✂✁☎✄ �✂✁☎✄ �✂✁☎✄✝✆ • K-means clustering on pixel colour values • Pixels in a cluster are coloured by cluster mean • Represent each pixel (e.g. 24-bit colour value) by a cluster number (e.g. 4 bits for K = 10 ), compressed version • This technique known as vector quantization • Represent vector (in this case from RGB, R 3 ) as a single discrete value

  27. K-Means Gaussian Mixture Models Expectation-Maximization Outline K-Means Gaussian Mixture Models Expectation-Maximization

  28. K-Means Gaussian Mixture Models Expectation-Maximization Hard Assignment vs. Soft Assignment • In the K-means algorithm, a hard (i) 2 assignment of points to clusters is made • However, for points near the decision 0 boundary, this may not be such a good idea −2 • Instead, we could think about making a −2 0 2 soft assignment of points to clusters

  29. K-Means Gaussian Mixture Models Expectation-Maximization Gaussian Mixture Model 1 1 (b) (a) 0.5 0.5 0 0 0 0.5 1 0 0.5 1 • The Gaussian mixture model (or mixture of Gaussians MoG) models the data as a combination of Gaussians • Above shows a dataset generated by drawing samples from three different Gaussians

  30. K-Means Gaussian Mixture Models Expectation-Maximization Generative Model 1 (a) z 0.5 0 x 0 0.5 1 • The mixture of Gaussians is a generative model • To generate a datapoint x n , we first generate a value for a discrete variable z n ∈ { 1 , . . . , K } • We then generate a value x n ∼ N ( x | µ k , Σ k ) for the corresponding Gaussian

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend