dimensionality reduction
play

Dimensionality Reduction Jia-Bin Huang Virginia Tech Spring 2019 - PowerPoint PPT Presentation

Dimensionality Reduction Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative HW 3 due March 27. HW 4 out tonight J. Mark Sowers Distinguished Lecture Michael Jordan Pehong Chen Distinguished Professor


  1. Dimensionality Reduction Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

  2. Administrative • HW 3 due March 27. • HW 4 out tonight

  3. J. Mark Sowers Distinguished Lecture • Michael Jordan • Pehong Chen Distinguished Professor Department of Statistics and Electrical Engineering and Computer Sciences • University of California, Berkeley • 3/28/19 • 7:30 PM, McBryde 100

  4. ECE Faculty Candidate Talk • Siheng Chen • Ph.D. Carnegie Mellon University • Data science with graphs: From social network analysis to autonomous driving • Time: 10:00 AM - 11:00 AM March 28 • Location: 457B Whittemore

  5. Expectation Maximization (EM) Algorithm • Goal: Find 𝜄 that maximizes log-likelihood σ 𝑗 log 𝑞(𝑦 𝑗 ; 𝜄) σ 𝑗 log 𝑞(𝑦 𝑗 ; 𝜄) = σ 𝑗 log σ 𝑨 (𝑗) 𝑞(𝑦 𝑗 , 𝑨 (𝑗) ; 𝜄) 𝑞 𝑦 𝑗 ,𝑨 𝑗 ;𝜄 = σ 𝑗 log σ 𝑨 (𝑗) 𝑅 𝑗 𝑨 𝑗 𝑅 𝑗 (𝑨 (𝑗) ) 𝑞 𝑦 𝑗 ,𝑨 𝑗 ;𝜄 ≥ σ 𝑗 σ 𝑨 (𝑗) 𝑅 𝑗 𝑨 𝑗 log 𝑅 𝑗 (𝑨 (𝑗) ) Jensen’s inequality: 𝑔 𝐹 𝑌 ≥ 𝐹[𝑔 𝑌 ]

  6. Expectation Maximization (EM) Algorithm • Goal: Find 𝜄 that maximizes log-likelihood σ 𝑗 log 𝑞(𝑦 𝑗 ; 𝜄) 𝑞 𝑦 𝑗 ,𝑨 𝑗 ;𝜄 σ 𝑗 log 𝑞(𝑦 𝑗 ; 𝜄) ≥ σ 𝑗 σ 𝑨 (𝑗) 𝑅 𝑗 𝑨 𝑗 log 𝑅 𝑗 (𝑨 (𝑗) ) - The lower bound works for all possible set of distributions 𝑅 𝑗 - We want tight lower-bound: 𝑔 𝐹 𝑌 = 𝐹[𝑔 𝑌 ] - When will that happen? 𝑌 = 𝐹 𝑌 with probability 1 ( 𝑌 is a constant) 𝑞 𝑦 𝑗 , 𝑨 𝑗 ; 𝜄 = 𝑑 𝑅 𝑗 (𝑨 (𝑗) )

  7. How should we choose 𝑅 𝑗 (𝑨 (𝑗) ) ? 𝑞 𝑦 𝑗 ,𝑨 𝑗 ;𝜄 • = 𝑑 𝑅 𝑗 (𝑨 (𝑗) ) • 𝑅 𝑗 (𝑨 (𝑗) ) ∝ 𝑞 𝑦 𝑗 , 𝑨 𝑗 ; 𝜄 • σ 𝑨 𝑅 𝑗 (𝑨 (𝑗) ) = 1 (because it is a distribution) 𝑞 𝑦 𝑗 ,𝑨 𝑗 ;𝜄 𝑞 𝑦 𝑗 ,𝑨 𝑗 ;𝜄 • 𝑅 𝑗 𝑨 𝑗 = σ 𝑨 𝑞 𝑦 𝑗 ,𝑨 𝑗 ;𝜄 = 𝑞 𝑦 𝑗 ;𝜄 = 𝑞(𝑨 𝑗 |𝑦 𝑗 ; 𝜄)

  8. EM algorithm Repeat until convergence{ (E-step) For each 𝑗 , set ≔ 𝑞(𝑨 𝑗 |𝑦 𝑗 ; 𝜄) 𝑅 𝑗 𝑨 𝑗 (Probabilistic inference) (M-step) Set 𝑞 𝑦 𝑗 ,𝑨 𝑗 ;𝜄 𝜄 ≔ argmax 𝜄 σ 𝑗 σ 𝑨 (𝑗) 𝑅 𝑗 𝑨 𝑗 log 𝑅 𝑗 (𝑨 (𝑗) ) }

  9. Expectation Maximization (EM) Algorithm       ˆ     Goal: argmax log , | p x z    z Log of sums is intractable for concave functions f(x)          Jensen’s Inequality E E f X f X (so we maximize the lower bound!) See here for proof: www.stanford.edu/class/cs229/notes/cs229-notes8.ps

  10. Expectation Maximization (EM) Algorithm       ˆ     Goal: argmax log , | p x z    z 1. E-step: compute                  ( ) t E log , | log , | | , p p p x z x z z x  ( ) t | , z x z 2. M-step: solve             ( 1 ) ( ) t t argmax log , | | , p p x z z x  z

  11. log of expectation of P(x|z)               ˆ      Goal: argmax log , | p E E f X f X x z    z 1. E-step: compute expectation of log of P(x|z)                  ( ) t E log , | log , | | , p p p x z x z z x  ( ) t | , z x z 2. M-step: solve             ( 1 ) ( ) t t argmax log , | | , p p x z z x  z

  12. EM for Mixture of Gaussians - derivation         2   1 x  2 π             n m μ σ 2 2 exp | , , , | , ,   p x p x z m  2 m n n n m m m    2 m m m m                  ( ) 1. E-step: t E log , | log , | | , p p p x z x z z x  ( ) t | , z x z             ( 1 ) ( ) t t argmax log , | | , p p 2. M-step: x z z x  z

  13. EM for Mixture of Gaussians         2   1 x  2 π             n m 2 2 exp μ σ | , , , | , ,   p x p x z m  2 m n n n m m m    2 m m m m                  1. E-step: ( ) t E log , | log , | | , p p p x z x z z x  ( ) t | , z x z             ( 1 ) ( ) t t argmax log , | | , p p 2. M-step: x z z x  z ( )    t μ ( ) σ 2 π ( ) t t ( | , , , ) p z m x nm n n   1   1    nm ( t 1 )       ˆ 2 ˆ 2    ˆ ( 1 ) t    x ˆ ( t 1 ) x  n    m nm n m m nm n m N n n nm nm n n

  14. EM algorithm - derivation http://lasa.epfl.ch/teaching/lectures/ML_Phd/Notes/GP-GMM.pdf

  15. EM algorithm – E-Step

  16. EM algorithm – E-Step

  17. EM algorithm – M-Step

  18. EM algorithm – M-Step Take derivative with respect to 𝜈 𝑚

  19. EM algorithm – M-Step −1 Take derivative with respect to σ 𝑚

  20. EM Algorithm for GMM

  21. EM Algorithm • Maximizes a lower bound on the data likelihood at each iteration • Each step increases the data likelihood • Converges to local maximum • Common tricks to derivation • Find terms that sum or integrate to 1 • Lagrange multiplier to deal with constraints

  22. Convergence of EM Algorithm

  23. “Hard EM” • Same as EM except compute z* as most likely values for hidden variables • K-means is an example • Advantages • Simpler: can be applied when cannot derive EM • Sometimes works better if you want to make hard predictions at the end • But • Generally, pdf parameters are not as accurate as EM

  24. Dimensionality Reduction • Motivation • Data compression • Data visualization • Principal component analysis • Formulation • Algorithm • Reconstruction • Choosing the number of principal components • Applying PCA

  25. Dimensionality Reduction • Motivation • Principal component analysis • Formulation • Algorithm • Reconstruction • Choosing the number of principal components • Applying PCA

  26. Data Compression • Reduces the required time and storage space • Removing multi-collinearity improves the interpretation of the parameters of the machine learning model. 𝑦 (1) ∈ 𝑆 2 → 𝑨 1 ∈ 𝑆 𝑦 (2) ∈ 𝑆 2 → 𝑨 1 ∈ 𝑆 𝑦 2 ⋮ 𝑦 (𝑛) ∈ 𝑆 2 → 𝑨 𝑛 ∈ 𝑆 𝑦 1 𝑨 1

  27. Data Compression • Reduces the required time and storage space • Removing multi-collinearity improves the interpretation of the parameters of the machine learning model. 𝑦 (1) ∈ 𝑆 2 → 𝑨 1 ∈ 𝑆 𝑦 (2) ∈ 𝑆 2 → 𝑨 1 ∈ 𝑆 𝑦 2 ⋮ 𝑦 (𝑛) ∈ 𝑆 2 → 𝑨 𝑛 ∈ 𝑆 𝑦 1 𝑨 1

  28. Data Compression • Reduce data from 3D to 2D (in general 1000D -> 100D) 𝑦 3 𝑨 2 𝑨 2 𝑦 3 𝑨 1 𝑦 2 𝑦 1 𝑦 2 𝑦 1 𝑨 1

  29. Dimensionality Reduction • Motivation • Principal component analysis • Formulation • Algorithm • Reconstruction • Choosing the number of principal components • Applying PCA

  30. Principal Component Analysis Formulation 𝑦 2 𝑦 1

  31. Principal Component Analysis Formulation 𝑣 (1) 𝑦 2 𝑣 (2) 𝑣 (1) 𝑦 1 • Reduce n-D to k-D: find 𝑣 (1) , 𝑣 (2) , ⋯ , 𝑣 (𝑙) ∈ 𝑆 𝑜 onto which to project the data, so as to minimize the projection error

  32. PCA vs. Linear regression 𝑦 2 𝑧 𝑦 1 𝑦 1

  33. Data pre-processing • Training set: 𝑦 (1) , 𝑦 (2) , ⋯ , 𝑦 (𝑛) • Preprocessing (feature scaling/mean normalization) 𝜈 𝑘 = 1 (𝑗) 𝑛 ෍ 𝑦 𝑘 𝑗 (𝑗) with 𝑦 𝑘 − 𝜈 𝑘 Replace each 𝑦 𝑘 If different features on different scales, scale features to have comparable range of values (𝑗) − 𝜈 𝑘 𝑦 𝑘 (𝑗) ← 𝑦 𝑘 𝑡 𝑘

  34. Principal Component Analysis Algorithm • Goal: Reduce data from n-dimensions to k-dimensions • Step 1: Compute “covariance matrix” 𝑜 Σ = 1 ⊤ 𝑦 𝑗 𝑦 𝑗 𝑛 ෍ 𝑗=1 • Step 2: Compute “eigenvectors” of the covariance matrix [U, S, V] = svd(Sigma); U = 𝑣 (1) , 𝑣 (2) , ⋯ , 𝑣 (𝑜) ∈ 𝑆 𝑜×𝑜 Principal components: 𝑣 (1) , 𝑣 (2) , ⋯ , 𝑣 𝑙 ∈ 𝑆 𝑜

  35. Principal Component Analysis Algorithm • Goal: Reduce data from n-dimensions to k-dimensions • Principal components: 𝑣 (1) , 𝑣 (2) , ⋯ , 𝑣 𝑙 ∈ 𝑆 𝑜 ⊤ 𝑦 (𝑗) ∈ 𝑆 𝑙 𝑨 𝑗 = 𝑣 1 , 𝑣 2 , ⋯ , 𝑣 𝑙

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend