spectral methods for latent variable models
play

Spectral Methods for Latent Variable Models Kaizheng Wang - PowerPoint PPT Presentation

Spectral Methods for Latent Variable Models Kaizheng Wang Department of ORFE Princeton University March 20 th 2020 Data Diversity Unstructured, heterogeneous and incomplete information: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Credit:


  1. Spectral Methods for Latent Variable Models Kaizheng Wang Department of ORFE 
 Princeton University March 20 th 2020

  2. Data Diversity Unstructured, heterogeneous and incomplete information: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Credit: https://www.mathworks.com/help/textanalytics/gs/getting-started-with-topic-modeling.html, https://www.alliance-scotland.org.uk/alliance-homepage-holding-people-networking-2017-01-3/, 2 https://medicalxpress.com/news/2015-04-tumor-only-genetic-sequencing-misguide-cancer .html, https://www.nature.com/articles/nature21386/figures/1, https://viterbi-web.usc.edu/ soltanol/RSC.pdf, Dzenan Hamzic

  3. Matrix Representations Object-by-feature ( ): 
 n × d ? ? ? ? • Texts: document-term; ? ? ? ? • Genetics: individual-marker; ? ? ? ? • Recomm. systems: user-item. ? ? ? Object-by-object ( ): n × n • Networks: adjacency matrices. 3 Credit (upper right): https://viterbi-web.usc.edu/ soltanol/RSC.pdf.

  4. Matrix Representations Common belief: high ambient dim. but low intrinsic dim. Low-rank approximation: = + 4

  5. 
 Matrix Representations Low-dimensional embedding via latent variables :       X 1 f 1 E 1 . . . . . . B  = +       . . .  .     X n f n E n n × d r × d n × d n × r latent 
 latent 
 samples noises coordinates bases Principal Component Analysis (PCA) 
 — truncated SVD 5 0 2

  6. Example: Genes Mirror Geography within Europe Novembre et al. (2008), Nature. n = 1387 individuals and d = 197146 SNPs; Figure 1a: 2-dim. embedding vs. labels. PC1 2 C P 6

  7. Outline • Distributed PCA and linearization of eigenvectors • An ℓ p theory for spectral methods • Summary and future directions

  8. Distributed PCA and linearization of eigenvectors

  9. Principal Component Analysis { X i } n i =1 ✓ R d E ( X i X > E X i = 0 i ) = Σ Data : i.i.d., , . n Goal : estimate the principal subspace spanned by the K leading eigenvectors of . ) = Σ SVD ˆ X = ( X 1 , · · · , X n ) > u K ) 2 R d ⇥ K PCA : U = (ˆ u 1 , · · · , ˆ 0 9 2

  10. Distributed PCA Data 5 Data 1 Center Data 4 Data 2 Data 3 10

  11. Distributed PCA m local machines in total, each has n samples. 1. PCA in parallel : the ℓ - th machine conducts X ( ` ) 2 R n ⇥ d U ( ` ) 2 R d ⇥ K SVD ˆ and sends to the central server; ˆ U ( ` ) 2. Aggregation : . { ˆ ˆ U ( ` ) } m U 2 R d ⇥ K ` =1 Related works: Mcdonald et al. 2009; Zhang et al. 2013; Lee et al., 2015; Battey et al., 2015; Qu et al. 2002; El Karoui and d’Aspremont 2010; Liang et al. 2014. 11

  12. Center of Subspaces ˆ { ˆ U ( ` ) } m How to find that best summarizes ? U 2 O d ⇥ K ` =1 12

  13. Center of Subspaces ˆ { ˆ U ( ` ) } m How to find that best summarizes ? U 2 O d ⇥ K ` =1 • Subspace distance : ⇢ ( V , W ) = k V V > � W W > k F . • Least squares : m X ˆ ⇢ 2 ( V , ˆ U ( ` ) ) . U = argmin V 2 O d × K ` =1 • Algorithm : SVD ( ˆ U (1) , · · · , ˆ ˆ U ( m ) ) 2 R d ⇥ mK U 2 O d ⇥ K . 13

  14. Theoretical Results k Σ � 1 / 2 X i k 2 . 1 Assume that is sub-Gaussian, i.e. . X i Define the effective rank and condition number as r = Tr( Σ ) / � 1 ,  = � 1 / ( � K � � K +1 ) . 14

  15. Theoretical Results k Σ � 1 / 2 X i k 2 . 1 Assume that is sub-Gaussian, i.e. . X i Define the effective rank and condition number as r = Tr( Σ ) / � 1 ,  = � 1 / ( � K � � K +1 ) . Theorem (FW W Z, AoS 2019 ) n � C  2 p There exists constant C such that when , Kr p r � � Kr Kr U > � UU > k F � k ˆ U ˆ +  2 � � . 1 .  � mn n | {z } | {z } bias variance 15

  16. Theoretical Results k Σ � 1 / 2 X i k 2 . 1 Assume that is sub-Gaussian, i.e. . X i Define the effective rank and condition number as r = Tr( Σ ) / � 1 ,  = � 1 / ( � K � � K +1 ) . Theorem (FW W Z, AoS 2019 ) n � C  2 p There exists constant C such that when , Kr p r � � Kr Kr U > � UU > k F � k ˆ U ˆ +  2 � � . 1 .  � mn n | {z } | {z } bias variance • If , distributed PCA is optimal. m . n/ (  2 r ) n � C  2 p • The condition cannot be improved. Kr 16

  17. Analysis of Aggregation X (1) 2 R n ⇥ d U (1) 2 O d ⇥ K ˆ . . . . ˆ SVD SVD . . U 2 O d ⇥ K X ( m ) 2 R n ⇥ d U ( m ) 2 O d ⇥ K ˆ P m U ( ` ) ˆ ` =1 ˆ ˆ 1 U ( ` ) > : eigenvectors of . U m 17

  18. Analysis of Aggregation X (1) 2 R n ⇥ d U (1) 2 O d ⇥ K ˆ . . . . ˆ SVD SVD . . U 2 O d ⇥ K X ( m ) 2 R n ⇥ d U ( m ) 2 O d ⇥ K ˆ P m U ( ` ) ˆ ` =1 ˆ ˆ 1 U ( ` ) > : eigenvectors of . U m Averaging reduces variance but retains bias . • Variance: controlled by Davis-Kahan: U ( ` ) ˆ k ˆ U ( ` ) > � UU > k F . k ( ˆ Σ ( ` ) � Σ ) U k F / ∆ . • Bias: how large it is? 18

  19. 
 Linearization of Eigenvectors Theorem (FW W Z, AoS 2019 ) U ( ` ) ˆ U ( ` ) > � [ UU > + f ( ˆ Σ ( ` ) � Σ )] k F . [ k ( ˆ Σ ( ` ) � Σ ) U k F / ∆ ] 2 , k ˆ > � k f : R d ⇥ d ! R d ⇥ d is a linear functional determined by . ) = Σ 19

  20. 
 Linearization of Eigenvectors Theorem (FW W Z, AoS 2019 ) U ( ` ) ˆ U ( ` ) > � [ UU > + f ( ˆ Σ ( ` ) � Σ )] k F . [ k ( ˆ Σ ( ` ) � Σ ) U k F / ∆ ] 2 , k ˆ > � k f : R d ⇥ d ! R d ⇥ d is a linear functional determined by . ) = Σ More precise than Davis-Kahan : U ( ` ) ˆ k ˆ U ( ` ) > � UU > k F . k ( ˆ Σ ( ` ) � Σ ) U k F / ∆ . PCA has small bias: U ( ` ) ˆ Σ ( ` ) � Σ ) U k F / ∆ ] 2 . k E ( ˆ U ( ` ) > ) � UU > k F . [ k ( ˆ 20

  21. Summary Theoretical guarantees for distributed PCA : • Bias and variance of PCA; • Linearization of eigenvectors, high-order Davis-Kahan. Paper (alphabetical order): • Fan, Wang, Wang and Zhu. Distributed estimation of principal eigenspaces. The Annals of Statistics , 2019. 21

  22. Example: Genes Mirror Geography within Europe Novembre et al. (2008), Nature. n = 1387 individuals and d = 197146 SNPs; Figure 1a: 2-dim. embedding vs. labels. PC1 2 C P 22

  23. A Pipeline for Spectral Methods 1. Similarity matrix construction 
 e.g. Gram , adjacency ; XX > A 2. Spectral decomposition 
 r get r eigen-pairs ; � λ j , u j j =1 3. r -dim. embedding 
 e.g. using the rows of ; ( u 1 , u 2 , . . . , u r ) 4. Downstream tasks 
 e.g. visualization. Ext.: { robust, probabilistic, sparse, nonnegative } PCA. Pearson (1901), Hotelling (1933), Schölkopf (1997), Tipping and Bishop (1999), Shi and Malik (2000), Ng et al. (2002), Belkin and Niyogi (2003), Von Luxburg (2007) 23

  24. An ℓ p theory for spectral methods • Network analysis and Wigner-type matrices • Mixture model and Wishart-type matrices

  25. Community Detection and SBM Community detection in networks: 25 Credit: Yuxin Chen.

  26. Community Detection and SBM Community detection in networks: Stochastic Block Model (Holland et al., 1983) Symmetric adjacency matrix , : | J | = | J c | = n A ∈ { 0 , 1 } n × n 2 ( p, if i, j or i, j P ( A ij = 1) = if i, j or i, j . q, McSherry (2001), Coja-Oghlan (2006), Rohe et al. (2011), Mossel et al. (2013), Massoulie (2014), Lelarge et al. (2015), Chin et al. (2015), Abbe et al. (2016), Zhang and Zhou (2016). 26 Credit: Yuxin Chen.

  27. Community Detection and SBM ✓ 1 J ✓ p 1 J,J ◆ ◆ � = p + q q 1 J,J c 2 n 11 > + p − q 1 > − 1 > � E A = . J J c q 1 J c ,J p 1 J c ,J c − 1 J c 2 n 1 The 2 nd eigenvector reveals . u = √ n ( 1 J − 1 J c ) ( J, J c ) ¯ A = E A + A = E A + + A − E A . A = E A 27 Credit: Yuxin Chen.

  28. 
 Community Detection and SBM ✓ 1 J ✓ p 1 J,J ◆ ◆ � = p + q q 1 J,J c 2 n 11 > + p − q 1 > − 1 > � E A = . J J c q 1 J c ,J p 1 J c ,J c − 1 J c 2 n 1 The 2 nd eigenvector reveals . u = √ n ( 1 J − 1 J c ) ( J, J c ) ¯ Spectral method: SVD the 2 nd eigenvector . 
 sgn( u ) A u To recover , we need in a uniform way. ( J, J c ) u ≈ ¯ u Classical ℓ 2 bounds (Davis and Kahan, 1970) are too loose! 28

  29. 
 Optimality of Spectral Method Theorem (AF W Z, AoS 2020+ ) ( a log n , if i, j or i, j Let and . n P ( A ij = 1) = a 6 = b b log n , if i, j or i, j n � 2 > 2 • Exact recovery w.h.p. when ; √ � √ a − b √ � √ a − • Error rate when . √ � 2 ≤ 2 n − ( √ a − b ) 2 / 2 b • Optimality (Abbe et al., 2016; Zhang and Zhou, 2016) . 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend