singular value decomposition for high dimensional tensor
play

Singular Value Decomposition for High-dimensional Tensor Data Anru - PowerPoint PPT Presentation

Singular Value Decomposition for High-dimensional Tensor Data Anru Zhang Department of Statistics University of Wisconsin-Madison Introduction Introduction Tensors are arrays with multiple directions. Tensors of order three or higher


  1. Singular Value Decomposition for High-dimensional Tensor Data Anru Zhang Department of Statistics University of Wisconsin-Madison

  2. Introduction Introduction • Tensors are arrays with multiple directions. • Tensors of order three or higher are called high-order tensors. A ∈ R p 1 ×···× p d , A = ( A i 1 ··· i d ) , 1 ≤ i k ≤ p k , k = 1 , . . . , d . Anru Zhang (UW-Madison) Tensor SVD 2

  3. Introduction Importance of High-Order Methods More High-Order Data Are Emerging • Brain imaging • Microbiome studies • Matrix-valued time series Anru Zhang (UW-Madison) Tensor SVD 3

  4. Introduction Importance of High-Order Methods High Order Enables Solutions for Harder Problems High-order Interaction Pursuits • Model (Hao, Z. , Cheng, 2018) � � � y i = β 0 + X i β i + γ ij X i X j + η ijk X i X j X k + ε i , i = 1 , . . . , n . i i , j i , j , k � �� �� �� � � ������ �� ������ � � ����������� �� ����������� � Main effect Pairwise interaction Triple-wise • Rewrite as y i = � B , X i � + ε i . Anru Zhang (UW-Madison) Tensor SVD 4

  5. Introduction Importance of High-Order Methods High Order Enables Solutions for Harder Problems Estimation of Mixture Models • A mixture model incorporates subpopulations in an overall population. • Examples: ◮ Gaussian mixture model (Lindsay & Basak, 1993; Hsu & Kakade, 2013) ◮ Topic modeling (Arora et al, 2013) ◮ Hidden Markov Process (Anandkumar, Hsu, & Kakade, 2012) ◮ Independent component analysis (Miettinen, et al., 2015) ◮ Additive index model (Balasubramanian, Fan & Yang, 2018) ◮ Mixture regression model (De Veaux, 1989; Jordan & Jacobs, 1994) ◮ ... • Method of Moment (MoM): ◮ First moment → vector; ◮ Second moment → matrix; ◮ High-order moment → high-order tensors. Anru Zhang (UW-Madison) Tensor SVD 5

  6. Introduction Importance of High-Order Methods High Order is ... • High order is more charming! • High order is harder! Tensor problems are far more than extension of matrices. ◮ More structures ◮ High-dimensionality ◮ Computational difficulty ◮ Many concepts not well defined or NP-hard Anru Zhang (UW-Madison) Tensor SVD 6

  7. Introduction Importance of High-Order Methods High Order Casts New Problems and Challenges • Tensor Completion • Tensor SVD • Tensor Regression • Biclustering/Triclustering • ... Anru Zhang (UW-Madison) Tensor SVD 7

  8. Introduction Importance of High-Order Methods In this talk, we focus on tensor SVD . Anru Zhang (UW-Madison) Tensor SVD 8

  9. Introduction Importance of High-Order Methods Part I: Tensor SVD: Statistical and Computational Limits Anru Zhang (UW-Madison) Tensor SVD 9

  10. Tensor SVD SVD and PCA • Singular value decomposition (SVD) is one of the most important tools in multivariate analysis. • Goal: Find the underlying low-rank structure from the data matrix. • Closely related to Principal component analysis (PCA): Find the one/multiple directions that explain most of the variance . Anru Zhang (UW-Madison) Tensor SVD 10

  11. Tensor SVD Tensor SVD • We propose a general framework for tensor SVD. • Y = X + Z , where ◮ Y ∈ R p 1 × p 2 × p 3 is the observation; ◮ Z is the noise of small amplitude; ◮ X is a low-rank tensor. • We wish to recover the high-dimensional low-rank structure X . → Unfortunately, there is no uniform definition for tensor rank. Anru Zhang (UW-Madison) Tensor SVD 11

  12. Tensor SVD Tensor Rank Has No Uniform Definition • Canonical polyadic (CP) rank: r cp = min r s.t. r � X = λ i · u i ◦ v i ◦ w i i = 1 • Tucker rank: X = S × 1 U 1 × 2 U 2 × 3 U 3 S ∈ R r 1 × r 2 × r 3 , U k ∈ R p k × r k Smallest possible ( r 1 , r 2 , r 3 ) are Tucker rank of X . • See Kolda and Balder (2009) for a comprehensive survey. Picture Source: Guoxu Zhou’s website. http://www.bsp.brain.riken.jp/ zhougx/tensor.html Anru Zhang (UW-Madison) Tensor SVD 12

  13. Tensor SVD Model • Observations: Y ∈ R p 1 × p 2 × p 3 , Y = X + Z = S × 1 U 1 × 2 U 2 × 3 U 3 + Z , Z iid S ∈ R r 1 × r 2 × r 3 . ∼ N (0 , σ 2 ) , U k ∈ O p k , r k , • Goal: estimate U 1 , U 2 , U 3 , and the original tensor X . Anru Zhang (UW-Madison) Tensor SVD 13

  14. Tensor SVD Straightforward Idea 1: Higher order SVD (HOSVD) • Since U k is the subspace for M k ( X ) , let ˆ U k = SVD r k ( M k ( Y )) , k = 1 , 2 , 3 . i.e. the leading r k singular vectors of all mode- k fibers. Note: SVD r ( · ) represents the first r left singular vectors of any given matrix. Anru Zhang (UW-Madison) Tensor SVD 14

  15. Tensor SVD Straightforward Idea 1: Higher order SVD (HOSVD) (De Lathauwer, De Moor, and Vandewalle, SIAM J. Matrix Anal. & Appl. 2000a) • Advantage : easy to implement and analyze. • Disadvantage: perform sub-optimally. Reason: simply unfolding the tensor fails to utilize the tensor structure! Anru Zhang (UW-Madison) Tensor SVD 15

  16. Tensor SVD Straightforward Idea 2: Maximum Likelihood Estimator • Maximum-likelihood estimator mle = argmax mle mle mle ˆ 1 , ˆ 2 , ˆ 3 , ˆ � Y − S × 1 U 1 × 2 U 2 × 3 U 3 � 2 S U U U F U 1 , U 2 , U 3 , S mle mle mle • Equivalently, ˆ 1 , ˆ 2 , ˆ U U U can be calculated via 3 � � � 2 � Y × 1 V ⊤ 1 × 2 V ⊤ 2 × 3 V ⊤ max � � 3 F subject to V 1 ∈ O p 1 , r 1 , V 2 ∈ O p 2 , r 2 , V 3 ∈ O p 3 , r 3 . • Advantage : achieves statistical optimality. (will be shown later) • Disadvantage : ◮ Non-convex, computational intractable. ◮ NP-hard to approximate even r = 1 (Hillar and Lim, 2013). Anru Zhang (UW-Madison) Tensor SVD 16

  17. Tensor SVD Phase Transition in Tensor SVD • The difficulty is driven by signal-to-noise ratio (SNR). λ = min k = 1 , 2 , 3 σ r k ( M k ( X )) = least non-zero singular value of M k ( X ) , k = 1 , 2 , 3 , σ = SD ( Z ) = noise level . • Suppose p 1 ≍ p 2 ≍ p 3 ≍ p . Three phases: λ/σ ≥ Cp 3 / 4 (Strong SNR case) , λ/σ < cp 1 / 2 (Weak SNR case) , p 1 / 2 ≪ λ/σ ≪ p 3 / 4 (Moderate SNR case) . Anru Zhang (UW-Madison) Tensor SVD 17

  18. Tensor SVD Strong SNR Case Strong SNR Case: Methodology • When λ/σ ≥ Cp 3 / 4 , apply higher-order orthogonal iteration (HOOI). (De Lathauwer, Moor, and Vandewalle, SIAM. J. Matrix Anal. & Appl. 2000b) • (Step 1. Spectral initialization) (0) ˆ U = SVD r k ( M k ( Y )) , k = 1 , 2 , 3 . k • (Step 2. Power iterations) Repeat Let t = t + 1 . Calculate � � ( t ) ( t − 1) ) ⊤ × 3 ( ˆ ( t − 1) ˆ M 1 ( Y × 2 ( ˆ ) ⊤ ) 1 = SVD r 1 U U U , 2 3 � � 1 ) ⊤ × 3 ( ˆ ( t − 1) ( t ) ( t ) ) ⊤ ) ˆ M 2 ( Y × 1 ( ˆ U 2 = SVD r 2 U U , 3 � � ( t ) ( t ) 1 ) ⊤ × 2 ( ˆ ( t ) ˆ M 3 ( Y × 1 ( ˆ 2 ) ⊤ ) 3 = SVD r 3 U U U . Until t = t max or convergence. Anru Zhang (UW-Madison) Tensor SVD 18

  19. Tensor SVD Strong SNR Case Interpretation 1. Spectral initialization provides a “warm start.” 2. Power iteration refines the initializations. ( t − 1) ( t − 1) ( t − 1) Given ˆ , ˆ , ˆ , denoise Y via: U U U 1 2 3 ( t − 1) ( t − 1) Y × 2 ˆ × 3 ˆ U U . 2 3 ◮ Mode-1 singular subspace is reserved; ◮ Noise can be highly reduced. Thus, we update � � �� ( t − 1) ( t − 1) ( t ) ˆ Y × 2 ˆ × 3 ˆ U 1 = SVD r 1 M r 1 U U . 2 3 Anru Zhang (UW-Madison) Tensor SVD 19

  20. Tensor SVD Strong SNR Case Higher-order orthogonal iteration (HOOI) (De Lathauwer, Moor, and Vandewalle, SIAM. J. Matrix Anal. & Appl. 2000b) Anru Zhang (UW-Madison) Tensor SVD 20

  21. Tensor SVD Strong SNR Case Strong SNR Case: Theoretical Analysis Theorem (Upper Bound) Suppose λ/σ > Cp 3 / 4 and other regularity conditions hold, after at most O � log( p /λ ) ∨ 1 � iterations, • (Recovery of U 1 , U 2 , U 3 ) � F ≤ C √ p k r k � � � ˆ � � E min U k − U k O , k = 1 , 2 , 3; λ/σ O ∈ O r • (Recovery of X ) � � � 2 � ˆ F ≤ C ( p 1 r 1 + p 2 r 2 + p 3 r 3 ) σ 2 , � � sup k = 1 , 2 , 3 E max X − X X ∈F p , r ( λ ) k = 1 , 2 , 3 E � ˆ X − X � 2 ≤ C ( p 1 + p 2 + p 3 ) σ 2 F sup max . � X � 2 λ 2 X ∈F p , r ( λ ) F Anru Zhang (UW-Madison) Tensor SVD 21

  22. Tensor SVD Strong SNR Case Strong SNR Case: Lower Bound Define the following class of low-rank tensors with signal strength λ . F p , r ( λ ) = � X ∈ R p 1 × p 2 × p 3 : rank ( X ) = ( r 1 , r 2 , r 3 ) , σ r k ( M k ( X )) ≥ λ � Theorem (Lower Bound) (Recovery of U 1 , U 2 , U 3 ) √ p k r k � � � ˜ � U k − U k O � F ≥ c � inf sup E min λ/σ , k = 1 , 2 , 3 . ˜ O ∈ O r U k X ∈F p , r ( λ ) (Recovery of X ) � � 2 � � ˆ F ≥ c ( p 1 r 1 + p 2 r 2 + p 3 r 3 ) σ 2 , inf sup E � X − X � ˆ X X ∈F p , r ( λ ) E � ˆ X − X � 2 ≥ c ( p 1 + p 2 + p 3 ) σ 2 F inf sup . � X � 2 λ 2 ˆ X X ∈F p , r ( λ ) F Anru Zhang (UW-Madison) Tensor SVD 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend