accelerating the em algorithm for mixture density
play

Accelerating the EM Algorithm for Mixture Density Estimation Homer - PowerPoint PPT Presentation

Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department Worcester Polytechnic Instititute t r t s Pss Pr


  1. Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department Worcester Polytechnic Instititute ❏♦✐♥t ✇♦r❦ ✇✐t❤ ❏♦s❤ P❧❛ss❡ ✭❲P■✴■♠♣❡r✐❛❧ ❈♦❧❧❡❣❡✮✳ ❘❡s❡❛r❝❤ s✉♣♣♦rt❡❞ ✐♥ ♣❛rt ❜② ❉❖❊ ●r❛♥t ❉❊✲❙❈✵✵✵✹✽✽✵ ❛♥❞ ◆❙❋ ●r❛♥t ❉▼❙✲✶✸✸✼✾✹✸✳ Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 1/18

  2. Mixture Densities Consider a (finite) mixture density m � p ( x | Φ) = α i p i ( x | φ i ) . i =1 Problem: Estimate Φ = ( α 1 , . . . , α m , φ 1 , . . . , φ m ) using an “unlabeled” sample { x k } N k =1 on the mixture. Maximum-Likelihood Estimate (MLE): Determine Φ = arg max L (Φ), where N � L (Φ) ≡ log p ( x k | Φ) . k =1 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 2/18

  3. The EM (Expectation-Maximization) Algorithm The general formulation and name were given in . . . A. P. Dempster, N. M. Laird, and D. B. Rubin (1977), Maximum-likelihood from incomplete data via the EM algorithm , J. Royal Statist. Soc. Ser. B (methodological), 39, pp. 1-38. General idea: Determine the next approximate MLE to maximize the expectation of the complete-data log-likelihood function, given the observed incomplete data and the current approximate MLE. Marvelous property: The log-likelihood function increases at each iteration. Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 3/18

  4. The EM Algorithm for Mixture Densities For a mixture density, an EM iteration is . . . N α c i p i ( x k | φ c = 1 i ) α + � p ( x k | Φ c ) , i N k =1 N log p i ( x k | φ i ) α c i p i ( x k | φ c i ) � φ + = arg max i p ( x k | Φ c ) k =1 For a derivation, convergence analysis, history, etc., see . . . R. A. Redner and HW (1984), Mixture densities, maximum-likelihood, and the EM algorithm , SIAM Review, 26, 195–239. Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 4/18

  5. Particular Example: Normal (Gaussian) Mixtures Assume (multivariate) normal densities . For each i , φ i = ( µ i , Σ i ) and 1 (2 π ) n / 2 ( det Σ i ) 1 / 2 e − ( x − µ i ) T Σ − 1 ( x − µ i ) / 2 p i ( x | φ i ) = i EM iteration: For i = 1, . . . , m , N α c i p i ( x k | φ c = 1 i ) α + � , i p ( x k | Φ c ) N k =1 � N � � � N � α c i p i ( x k | φ c α c i p i ( x k | φ c i ) i ) µ + � � = x k , i p ( x k | Φ c ) p ( x k | Φ c ) k =1 k =1 � N � � � N � i ) T α c i p i ( x k | φ c α c i p i ( x k | φ c i ) i ) Σ + � ( x k − µ + i )( x k − µ + � = . i p ( x k | Φ c ) p ( x k | Φ c ) k =1 k =1 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 5/18

  6. EM Iterations Demo A Univariate Normal Mixture. e − ( x − µ i ) 2 / (2 σ 2 i ) for i = 1, . . . , 5. 1 p i ( x | φ i ) = ◮ � 2 πσ 2 i Sample of 100,000 observations. ◮ — [ α 1 , . . . , α 5 ] = [ . 2 , . 3 , . 3 , . 1 , . 1] — [ µ 1 , . . . , µ 5 ] = [0 , 1 , 2 , 3 , 4], — [ σ 2 1 , . . . , σ 2 5 ] = [ . 2 , 2 , . 5 , . 1 , . 1]. � � �� N �� N � EM iterations on the means: µ + α i p i ( x k | φ i ) α i p i ( x k | φ i ) i = k =1 x k ◮ . k =1 p ( x k | Φ) p ( x k | Φ) 0.3 0.25 0.2 0.15 0.1 0.05 0 � 0.05 � 0.1 � 3 � 2 � 1 0 1 2 3 4 5 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 6/18

  7. EM Iterations Demo A Univariate Normal Mixture. e − ( x − µ i ) 2 / (2 σ 2 i ) for i = 1, . . . , 5. 1 p i ( x | φ i ) = ◮ � 2 πσ 2 i Sample of 100,000 observations. ◮ — [ α 1 , . . . , α 5 ] = [ . 2 , . 3 , . 3 , . 1 , . 1] — [ µ 1 , . . . , µ 5 ] = [0 , 1 , 2 , 3 , 4], — [ σ 2 1 , . . . , σ 2 5 ] = [ . 2 , 2 , . 5 , . 1 , . 1]. � � �� N �� N � EM iterations on the means: µ + α i p i ( x k | φ i ) α i p i ( x k | φ i ) i = k =1 x k ◮ . k =1 p ( x k | Φ) p ( x k | Φ) 0.3 0 0.25 −2 0.2 Log Residual Norm −4 0.15 −6 0.1 −8 0.05 −10 0 −12 � 0.05 � 0.1 −14 � 3 � 2 � 1 0 1 2 3 4 5 0 20 40 60 80 100 Iteration Number Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 6/18

  8. Anderson Acceleration Derived from a method of D. G. Anderson, Iterative procedures for nonlinear integral equations , J. Assoc. Comput. Machinery, 12 (1965), 547–560. Consider a fixed-point iteration x + = g ( x ), g : R n → R n . Anderson Acceleration: Given x 0 and mMax ≥ 1. Set x 1 = g ( x 0 ). Iterate: For k = 1, 2, . . . Set m k = min { mMax , k } . Set F k = ( f k − m k , . . . , f k ), where f i = g ( x i ) − x i . Solve min α ∈ R mk +1 � F k α � 2 s. t. � m k i =0 α i = 1. Set x k +1 = � m k i =0 α i g ( x k − m k + i ). Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 7/18

  9. EM Iterations Demo (cont.) e − ( x − µ i ) 2 / (2 σ 2 i ) for i = 1, . . . , 5. 1 p i ( x | φ i ) = ◮ � 2 πσ 2 i Sample of 100,000 observations. ◮ — [ α 1 , . . . , α 5 ] = [ . 2 , . 3 , . 3 , . 1 , . 1] — [ µ 1 , . . . , µ 5 ] = [0 , 1 , 2 , 3 , 4], — [ σ 2 1 , . . . , σ 2 5 ] = [ . 2 , 2 , . 5 , . 1 , . 1]. � � �� N �� N � EM iterations on the means: µ + α i p i ( x k | φ i ) α i p i ( x k | φ i ) i = ◮ k =1 x k . k =1 p ( x k | Φ) p ( x k | Φ) 0.3 0.25 0.2 0.15 0.1 0.05 0 � 0.05 � 0.1 � 3 � 2 � 1 0 1 2 3 4 5 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 8/18

  10. EM Iterations Demo (cont.) e − ( x − µ i ) 2 / (2 σ 2 i ) for i = 1, . . . , 5. 1 p i ( x | φ i ) = ◮ � 2 πσ 2 i Sample of 100,000 observations. ◮ — [ α 1 , . . . , α 5 ] = [ . 2 , . 3 , . 3 , . 1 , . 1] — [ µ 1 , . . . , µ 5 ] = [0 , 1 , 2 , 3 , 4], — [ σ 2 1 , . . . , σ 2 5 ] = [ . 2 , 2 , . 5 , . 1 , . 1]. � � �� N �� N � EM iterations on the means: µ + α i p i ( x k | φ i ) α i p i ( x k | φ i ) i = ◮ k =1 x k . k =1 p ( x k | Φ) p ( x k | Φ) 0.3 0 0.25 −2 0.2 Log Residual Norm −4 0.15 −6 0.1 −8 0.05 −10 0 −12 � 0.05 −14 � 0.1 0 10 20 30 40 50 60 70 80 90 100 � 3 � 2 � 1 0 1 2 3 4 5 Iteration Number Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 8/18

  11. EM Convergence and “Separation” Redner–W (1984): For mixture densities, the convergence is linear and depends on the “separation” of the component populations: “well-separated” (fast convergence) if, whenever i � = j , p j ( x | φ ∗ j ) p i ( x | φ ∗ i ) p ( x | Φ ∗ ) ≈ 0 for all x ∈ IR n ; p ( x | Φ ∗ ) · “poorly separated” (slow convergence) if, for some i � = j , p j ( x | φ ∗ j ) p i ( x | φ ∗ i ) p ( x | Φ ∗ ) for all x ∈ R n . p ( x | Φ ∗ ) ≈ Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 9/18

  12. Example: EM Convergence and “Separation” A Univariate Normal Mixture. e − ( x − µ i ) 2 / (2 σ 2 i ) for i = 1, . . . , 3. 1 p i ( x | φ i ) = ◮ � 2 πσ 2 i � � �� N �� N � α i p i ( x k | φ i ) α i p i ( x k | φ i ) EM iterations on the means: µ + i = k =1 x k . ◮ p ( x k | Φ) k =1 p ( x k | Φ) Sample of 100,000 observations. ◮ [ σ 2 1 , σ 2 2 , σ 2 — [ α 1 , α 2 , α 3 ] = [ . 3 , . 3 , . 4], 3 ] = [1 , 1 , 1]. — [ µ 1 , µ 2 , µ 3 ] = [0 , 2 , 4], [0 , 1 , 2], [0 , . 5 , 1]. 0 −2 Log Residual Norm −4 −6 −8 −10 −12 −14 0 10 20 30 40 50 60 70 80 90 100 Iteration Number Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 10/18

  13. Example: EM Convergence and “Separation” A Univariate Normal Mixture. e − ( x − µ i ) 2 / (2 σ 2 i ) for i = 1, . . . , 3. 1 p i ( x | φ i ) = ◮ � 2 πσ 2 i � � �� N �� N � α i p i ( x k | φ i ) α i p i ( x k | φ i ) EM iterations on the means: µ + i = k =1 x k . ◮ p ( x k | Φ) k =1 p ( x k | Φ) Sample of 100,000 observations. ◮ [ σ 2 1 , σ 2 2 , σ 2 — [ α 1 , α 2 , α 3 ] = [ . 3 , . 3 , . 4], 3 ] = [1 , 1 , 1]. — [ µ 1 , µ 2 , µ 3 ] = [0 , 2 , 4], [0 , 1 , 2], [0 , . 5 , 1]. 0 −2 Log Residual Norm −4 −6 −8 −10 −12 −14 0 10 20 30 40 50 60 70 80 90 100 Iteration Number Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 10/18

  14. Experiments with Multivariate Normal Mixtures Experiment with Anderson acceleration applied to . . . EM iteration: For i = 1, . . . , m , N α c i p i ( x k | φ c i ) = 1 α + � , i p ( x k | Φ c ) N k =1 � N � � � N � α c i p i ( x k | φ c α c i p i ( x k | φ c i ) i ) µ + � � = x k , i p ( x k | Φ c ) p ( x k | Φ c ) k =1 k =1 � N � � � N � i ) T α c i p i ( x k | φ c α c i p i ( x k | φ c i ) i ) Σ + � ( x k − µ + i )( x k − µ + � = . i p ( x k | Φ c ) p ( x k | Φ c ) k =1 k =1 Assume m is known. Ultimate interest: very large N . Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 11/18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend