information theoretically optimal sparse pca
play

Information-theoretically Optimal Sparse PCA Yash Deshpande and - PowerPoint PPT Presentation

Information-theoretically Optimal Sparse PCA Yash Deshpande and Andrea Montanari Stanford University July 3 rd , 2014 Problem Definition n xx T Y = + Z . 2 Problem Definition n xx T Y = + Z . n n


  1. Information-theoretically Optimal Sparse PCA Yash Deshpande and Andrea Montanari Stanford University July 3 rd , 2014

  2. Problem Definition � λ n xx T Y λ = + Z . 2

  3. Problem Definition � λ n xx T Y λ = + Z . � λ � λ n n � λ � λ 0 n n Z ij = Z ji 0 0 x i ∼ Bernoulli( ε ) , Z ij ∼ Normal(0 , 1) independent. Estimate X = xx T from Y λ 2

  4. An example: gene expression data [Baechler et al, 2003 PNAS] • Genes × patients matrix • Blue - lupus patients, Aqua - healthy controls • Black - a subset of immune system specific genes 3

  5. An example: gene expression data [Baechler et al, 2003 PNAS] • Genes × patients matrix • Blue - lupus patients, Aqua - healthy controls • Black - a subset of immune system specific genes A simple probabilistic model 3

  6. Related work Detection and estimation: Y = X + noise . • X ∈ S ⊂ { 0 , 1 } n , a known set • Goal: hypothesis testing, support recovery • [Donoho, Jin 2004], [Addario-Berry et al. 2010], [Arias-Castro et al. 2011] . . . 4

  7. Related work Machine learning: maximize � v , Y λ v � subject to: � v � 2 ≤ 1 , v is sparse. • Goal: maximize “variance”, support recovery • [d’Aspremont et al. 2004], [Moghaddam et al. 2005], [Zou et al. 2006], [Amini, Wainwright 2009] , [Papailiopoulos et al. 2013]. . . 4

  8. Related work Information theory: minimize � Y λ − vv T � 2 F + f ( v ) . • Probabilistic model for x , Y λ • Propose approximate message passing algorithm • [Rangan, Fletcher 2012], [Kabashima et al. 2014] 4

  9. A first try: simple PCA � λ n xx T + Z . Y λ = 5

  10. A first try: simple PCA � λ n xx T + Z . Y λ = Estimate x using scaled principal eigenvector x 1 ( Y λ ). 5

  11. Limitations of PCA 6

  12. Limitations of PCA If λε 2 > 1 Limiting Spectral Density − 2 2 � x 1 ( Y λ ) , x � lim n →∞ > 0 a. s. √ n ε 6

  13. Limitations of PCA If λε 2 > 1 If λε 2 < 1 Limiting Spectral Density Limiting Spectral Density − 2 2 − 2 2 � x 1 ( Y λ ) , x � � x 1 ( Y λ ) , x � lim n →∞ > 0 a. s. lim n →∞ = 0 a. s. √ n ε √ n ε 6

  14. Limitations of PCA If λε 2 > 1 If λε 2 < 1 Limiting Spectral Density Limiting Spectral Density − 2 2 − 2 2 � x 1 ( Y λ ) , x � � x 1 ( Y λ ) , x � lim n →∞ > 0 a. s. lim n →∞ = 0 a. s. √ n ε √ n ε [Knowles, Yin, 2011] 6

  15. Our contributions • Poly-time algorithm that exploits sparsity 7

  16. Our contributions • Poly-time algorithm that exploits sparsity • Provably optimal in terms of MSE when ε > ε c 7

  17. Our contributions • Poly-time algorithm that exploits sparsity • Provably optimal in terms of MSE when ε > ε c • “Single-letter” characterization of MMSE 7

  18. Single letter characterization Original high-dimensional problem � λ n xx T + Z , Y λ = 8

  19. Single letter characterization Original high-dimensional problem � λ n xx T + Z , Y λ = � � M-mmse( λ, n ) ≡ 1 � X − E { X | Y λ }� 2 . n 2 E F 8

  20. Single letter characterization Original high-dimensional problem � λ n xx T + Z , Y λ = � � M-mmse( λ, n ) ≡ 1 � X − E { X | Y λ }� 2 . n 2 E F Scalar problem √ Y λ = λ X 0 + Z , 8

  21. Single letter characterization Original high-dimensional problem � λ n xx T + Z , Y λ = � � M-mmse( λ, n ) ≡ 1 � X − E { X | Y λ }� 2 . n 2 E F Scalar problem √ Y λ = λ X 0 + Z , � ( X 0 − E { X 0 | Y λ } ) 2 � S-mmse( λ ) ≡ E . 8

  22. Single letter characterization Original high-dimensional problem � λ n xx T + Z , Y λ = � � M-mmse( λ, n ) ≡ 1 � X − E { X | Y λ }� 2 . n 2 E F Scalar problem √ Y λ = λ X 0 + Z , � ( X 0 − E { X 0 | Y λ } ) 2 � S-mmse( λ ) ≡ E . Here X 0 ∼ Bernoulli( ε ) , Z ∼ Normal(0 , 1). 8

  23. Main result Theorem (Deshpande, Montanari 2014) There exists an ε c < 1 such that the following happens. For every ε > ε c n →∞ M - mmse( λ, n ) = ε 2 − τ 2 lim ∗ where τ ∗ = ε − S - mmse( λτ ∗ ) . Further there exists a polynomial time algorithm that achieves this MSE. 9

  24. Main result Theorem (Deshpande, Montanari 2014) There exists an ε c < 1 such that the following happens. For every ε > ε c n →∞ M - mmse( λ, n ) = ε 2 − τ 2 lim ∗ where τ ∗ = ε − S - mmse( λτ ∗ ) . Further there exists a polynomial time algorithm that achieves this MSE. ε c ≈ 0 . 05 (solution to scalar non-linear equation) 9

  25. Making use of sparsity 10

  26. Making use of sparsity The power iteration with A = Y λ / √ n : x t +1 = A x t . 10

  27. Making use of sparsity The power iteration with A = Y λ / √ n : x t +1 = A x t . Improvement: x t +1 = A F t ( x t ) , where F t ( x t ) = ( f t ( x t 1 ) , . . . f t ( x t n )) T . Choose f t to exploit sparsity. 10

  28. A heuristic analysis Expanding the i th entry of x t +1 : � √ � � λ � x , F t ( x t ) � x i + 1 x t +1 Z ij f t ( x t = √ n j ) i n � �� � j � �� � ≈ µ t ≈ Normal(0 ,τ t ) 11

  29. A heuristic analysis Expanding the i th entry of x t +1 : � √ � � λ � x , F t ( x t ) � x i + 1 x t +1 Z ij f t ( x t = √ n j ) i n � �� � j � �� � ≈ µ t ≈ Normal(0 ,τ t ) Thus: ≈ µ t x + √ τ t z , where z ∼ Normal(0 , I n ) x t +1 d 11

  30. Approximate Message Passing (AMP) This analysis is obviously wrong, but. . . 12

  31. Approximate Message Passing (AMP) This analysis is obviously wrong, but. . . is asymptotically exact for the modified iteration: x t +1 = A � x t − b t � x t − 1 , x t = F t ( x t ) . � [Donoho, Maleki, Montanari 2009], [Bayati, Montanari 2011], [Rangan, Fletcher 2012]. 12

  32. Asymptotic behavior t = 2 hist( x t hist( x t i − µ t x i ) i − µ t x i ) 150 150 100 100 50 50 0 0 − 2 − 1 0 1 2 − 2 − 1 0 1 2 3 Power method AMP 13

  33. Asymptotic behavior t = 4 hist( x t i − µ t x i ) hist( x t i − µ t x i ) 150 150 100 100 50 50 0 0 − 2 0 2 4 − 1 − 0 . 5 0 0 . 5 1 1 . 5 Power method AMP 13

  34. Asymptotic behavior t = 8 hist( x t i − µ t x i ) hist( x t i − µ t x i ) 200 150 150 100 100 50 50 0 0 − 10 − 5 0 5 10 15 20 − 0 . 6 − 0 . 4 − 0 . 2 0 0 . 2 0 . 4 0 . 6 Power method AMP 13

  35. Asymptotic behavior t = 12 hist( x t i − µ t x i ) hist( x t i − µ t x i ) 200 150 150 100 100 50 50 0 0 − 50 0 50 100 − 0 . 3 − 0 . 2 − 0 . 1 0 0 . 1 0 . 2 0 . 3 Power method AMP 13

  36. Asymptotic behavior t = 16 hist( x t hist( x t i − µ t x i ) i − µ t x i ) 200 150 150 100 100 50 50 0 0 − 0 . 1 − 5 · 10 − 2 0 0 . 1 0 . 15 − 200 0 200 400 600 5 · 10 − 2 Power method AMP 13

  37. Asymptotic behavior: a lemma Lemma Let f t be a sequence of Lipschitz functions. For every fixed t and uniformly random i: ( X 0 , µ t X 0 + √ τ t Z ) almost surely. d ( x i , x t i ) → 14

  38. State evolution Deterministic recursions: √ λ f t ( µ t X 0 + √ τ t Z ) } µ t +1 = E { τ t +1 = E { f t ( µ t X 0 + √ τ t Z ) 2 } . 15

  39. State evolution Deterministic recursions: √ λ f t ( µ t X 0 + √ τ t Z ) } µ t +1 = E { τ t +1 = E { f t ( µ t X 0 + √ τ t Z ) 2 } . With optimal f t : √ µ t +1 = λτ t +1 τ t +1 = ε − S-mmse( λτ t ) . 15

  40. State evolution: an illustration τ t +1 ε − S - mmse ( λτ t ) τ t 16

  41. State evolution: an illustration τ t +1 ε − S - mmse ( λτ t ) τ t 16

  42. State evolution: an illustration τ t +1 ε − S - mmse ( λτ t ) τ 1 τ t 16

  43. State evolution: an illustration τ t +1 ε − S - mmse ( λτ t ) τ t 16

  44. State evolution: an illustration τ t +1 ε − S - mmse ( λτ t ) τ 2 τ t 16

  45. State evolution: an illustration τ t +1 ε − S - mmse ( λτ t ) τ t 16

  46. State evolution: an illustration τ t +1 ε − S - mmse ( λτ t ) τ 3 τ t 16

  47. State evolution: an illustration τ t +1 ε − S - mmse ( λτ t ) τ t 16

  48. State evolution: an illustration τ t +1 ε − S - mmse ( λτ t ) M - mmse ( λ ) = ε 2 − τ 2 ∗ τ ∗ τ t 16

  49. Proof sketch: MSE expression X t = � Using estimator � x t ( � x t ) T : X t , λ ) = 1 x t ) T − xx T � 2 mse( � n 2 E {� � x ( � F } � � � � x t , x � 2 = 1 n 2 E {� x � 4 } + 1 x � 4 } − 2 E n 2 E {� � n 2 → ε 2 − τ 2 t +1 . 17

  50. Proof sketch: MSE expression X t = � Using estimator � x t ( � x t ) T : X t , λ ) = 1 x t ) T − xx T � 2 mse( � n 2 E {� � x ( � F } � � � � x t , x � 2 = 1 n 2 E {� x � 4 } + 1 x � 4 } − 2 E n 2 E {� � n 2 → ε 2 − τ 2 t +1 . Thus X t , λ ) = ε 2 − τ 2 n →∞ mse( � mse AMP ( λ ) = lim t →∞ lim ∗ . 17

  51. Proof sketch: I-MMSE identity M-mmse( λ ) ≤ mse AMP ( λ ) 18

  52. Proof sketch: I-MMSE identity � ∞ � ∞ 1 1 M-mmse( λ ) d λ ≤ mse AMP ( λ ) d λ 4 4 0 0 18

  53. Proof sketch: I-MMSE identity � ∞ � ∞ 1 1 M-mmse( λ ) d λ ≤ mse AMP ( λ ) d λ 4 4 0 0 I ( X ; Y ∞ ) − I ( X ; Y 0 ) 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend