e m method for latent variable models
play

E-M method for latent variable models Define augmented likelihood n - PowerPoint PPT Presentation

E-M method for latent variable models Define augmented likelihood n k R ij ln p ( x i , y i = j ) L ( ; R ) := , R ij i =1 j =1 with responsibility matrix R R n,k := { R [0 , 1] n k : R 1 k = 1 n } . Alternate two


  1. Parameter constraints. E-M for GMMs still works if we freeze or constrain some parameters. Examples: ◮ No weights: initialize π = ( 1 / k , . . . , 1 / k ) and never update it. ◮ Diagonal covariance matrices: update everything as before, except Σ j := diag(( σ j ) 2 1 , . . . , ( σ j ) 2 d ) where � n i =1 R ij ( x i − µ j ) 2 ( σ j ) 2 l l := ; nπ j that is: we use coordinate-wise sample variances weighted by R . Why is this a good idea? 38 / 70

  2. Parameter constraints. E-M for GMMs still works if we freeze or constrain some parameters. Examples: ◮ No weights: initialize π = ( 1 / k , . . . , 1 / k ) and never update it. ◮ Diagonal covariance matrices: update everything as before, except Σ j := diag(( σ j ) 2 1 , . . . , ( σ j ) 2 d ) where � n i =1 R ij ( x i − µ j ) 2 ( σ j ) 2 l l := ; nπ j that is: we use coordinate-wise sample variances weighted by R . Why is this a good idea? Computation (of inverse), sample complexity, . . . 38 / 70

  3. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  4. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  5. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  6. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  7. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  8. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  9. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  10. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  11. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  12. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  13. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  14. Gaussian Mixture Model with diagonal covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 39 / 70

  15. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  16. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  17. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  18. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  19. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  20. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  21. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  22. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  23. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  24. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  25. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  26. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  27. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  28. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  29. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  30. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  31. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  32. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  33. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  34. Gaussian Mixture Model with diagonal covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 39 / 70

  35. Singularities E-M with GMMs suffers from singularities : trivial situations where the likelihood goes to ∞ but the solution is bad. ◮ Suppose: d = 1 , k = 2 , π j = 1 / 2 , n = 3 with x 1 = − 1 and x 2 = +1 and x 3 = +3 . Initialize with µ 1 = 0 and σ 1 = 1 , but µ 2 = +3 = x 3 and σ 2 = 1 / 100 . Then σ 2 → 0 and L ↑ ∞ . 40 / 70

  36. Interpolating between k -means and GMM E-M Same M-step: fix π = ( 1 / k , . . . , 1 / k ) and Σ j = c I for a fixed c > 0 . 41 / 70

  37. Interpolating between k -means and GMM E-M Same M-step: fix π = ( 1 / k , . . . , 1 / k ) and Σ j = c I for a fixed c > 0 . Same E-step: define q ij := 1 2 � x i − µ j � 2 ; the E-step chooses R ij := p θ ( y i = j | x i ) = p θ ( y i = j, x i ) p θ ( y i = j, x i ) = p θ ( x i ) � k l =1 p θ ( y i = l, x i ) π j p µ j , Σ j ( x i ) exp( − q ij /c ) = = � k � k l =1 exp( − q il /c ) l =1 π l p µ l , Σ l ( x i ) Fix i ∈ { 1 , . . . , n } and suppose minimum q i := min j q ij is unique: 41 / 70

  38. Interpolating between k -means and GMM E-M Same M-step: fix π = ( 1 / k , . . . , 1 / k ) and Σ j = c I for a fixed c > 0 . Same E-step: define q ij := 1 2 � x i − µ j � 2 ; the E-step chooses R ij := p θ ( y i = j | x i ) = p θ ( y i = j, x i ) p θ ( y i = j, x i ) = p θ ( x i ) � k l =1 p θ ( y i = l, x i ) π j p µ j , Σ j ( x i ) exp( − q ij /c ) = = � k � k l =1 exp( − q il /c ) l =1 π l p µ l , Σ l ( x i ) Fix i ∈ { 1 , . . . , n } and suppose minimum q i := min j q ij is unique: exp( − q ij /c ) exp( q i − q ij /c ) lim c ↓ 0 R ij = lim = lim � k � k c ↓ 0 l =1 exp( − q il /c ) c ↓ 0 l =1 exp( q i − q il /c ) � 1 q ij = q i , = q ij � = q i . 0 That is, R becomes hard assignment A as c ↓ 0 . 41 / 70

  39. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  40. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  41. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  42. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  43. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  44. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  45. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  46. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  47. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  48. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  49. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  50. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 5 0 5 10 15 42 / 70

  51. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 42 / 70

  52. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 42 / 70

  53. Interpolating between k -means and GMM E-M (part 2) We can interpolate algorithmically, meaning we can create algorithms that have elements of both. Here’s something like k -means but with weights and covariances. 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0 10 5 0 5 10 15 42 / 70

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend