Singular Value Decomposition for High-dimensional Tensor Data Anru - PowerPoint PPT Presentation

Singular Value Decomposition for High-dimensional Tensor Data Anru Zhang Department of Statistics University of Wisconsin-Madison

Introduction Introduction • Tensors are arrays with multiple directions. • Tensors of order three or higher are called high-order tensors. A ∈ R p 1 ×···× p d , A = ( A i 1 ··· i d ) , 1 ≤ i k ≤ p k , k = 1 , . . . , d . Anru Zhang (UW-Madison) Tensor SVD 2

Introduction Importance of High-Order Methods More High-Order Data Are Emerging • Brain imaging • Microbiome studies • Matrix-valued time series Anru Zhang (UW-Madison) Tensor SVD 3

Introduction Importance of High-Order Methods High Order Enables Solutions for Harder Problems High-order Interaction Pursuits • Model (Hao, Z. , Cheng, 2018) � � � y i = β 0 + X i β i + γ ij X i X j + η ijk X i X j X k + ε i , i = 1 , . . . , n . i i , j i , j , k � �� Main effect Pairwise interaction Triple-wise • Rewrite as y i = � B , X i � + ε i . Anru Zhang (UW-Madison) Tensor SVD 4

Introduction Importance of High-Order Methods High Order Enables Solutions for Harder Problems Estimation of Mixture Models • A mixture model incorporates subpopulations in an overall population. • Examples: ◮ Gaussian mixture model (Lindsay & Basak, 1993; Hsu & Kakade, 2013) ◮ Topic modeling (Arora et al, 2013) ◮ Hidden Markov Process (Anandkumar, Hsu, & Kakade, 2012) ◮ Independent component analysis (Miettinen, et al., 2015) ◮ Additive index model (Balasubramanian, Fan & Yang, 2018) ◮ Mixture regression model (De Veaux, 1989; Jordan & Jacobs, 1994) ◮ ... • Method of Moment (MoM): ◮ First moment → vector; ◮ Second moment → matrix; ◮ High-order moment → high-order tensors. Anru Zhang (UW-Madison) Tensor SVD 5

Introduction Importance of High-Order Methods High Order is ... • High order is more charming! • High order is harder! Tensor problems are far more than extension of matrices. ◮ More structures ◮ High-dimensionality ◮ Computational difficulty ◮ Many concepts not well defined or NP-hard Anru Zhang (UW-Madison) Tensor SVD 6

Introduction Importance of High-Order Methods High Order Casts New Problems and Challenges • Tensor Completion • Tensor SVD • Tensor Regression • Biclustering/Triclustering • ... Anru Zhang (UW-Madison) Tensor SVD 7

Introduction Importance of High-Order Methods In this talk, we focus on tensor SVD . Anru Zhang (UW-Madison) Tensor SVD 8

Introduction Importance of High-Order Methods Part I: Tensor SVD: Statistical and Computational Limits Anru Zhang (UW-Madison) Tensor SVD 9

Tensor SVD SVD and PCA • Singular value decomposition (SVD) is one of the most important tools in multivariate analysis. • Goal: Find the underlying low-rank structure from the data matrix. • Closely related to Principal component analysis (PCA): Find the one/multiple directions that explain most of the variance . Anru Zhang (UW-Madison) Tensor SVD 10

Tensor SVD Tensor SVD • We propose a general framework for tensor SVD. • Y = X + Z , where ◮ Y ∈ R p 1 × p 2 × p 3 is the observation; ◮ Z is the noise of small amplitude; ◮ X is a low-rank tensor. • We wish to recover the high-dimensional low-rank structure X . → Unfortunately, there is no uniform definition for tensor rank. Anru Zhang (UW-Madison) Tensor SVD 11

Tensor SVD Tensor Rank Has No Uniform Definition • Canonical polyadic (CP) rank: r cp = min r s.t. r � X = λ i · u i ◦ v i ◦ w i i = 1 • Tucker rank: X = S × 1 U 1 × 2 U 2 × 3 U 3 S ∈ R r 1 × r 2 × r 3 , U k ∈ R p k × r k Smallest possible ( r 1 , r 2 , r 3 ) are Tucker rank of X . • See Kolda and Balder (2009) for a comprehensive survey. Picture Source: Guoxu Zhou’s website. http://www.bsp.brain.riken.jp/ zhougx/tensor.html Anru Zhang (UW-Madison) Tensor SVD 12

Tensor SVD Model • Observations: Y ∈ R p 1 × p 2 × p 3 , Y = X + Z = S × 1 U 1 × 2 U 2 × 3 U 3 + Z , Z iid S ∈ R r 1 × r 2 × r 3 . ∼ N (0 , σ 2 ) , U k ∈ O p k , r k , • Goal: estimate U 1 , U 2 , U 3 , and the original tensor X . Anru Zhang (UW-Madison) Tensor SVD 13

Tensor SVD Straightforward Idea 1: Higher order SVD (HOSVD) • Since U k is the subspace for M k ( X ) , let ˆ U k = SVD r k ( M k ( Y )) , k = 1 , 2 , 3 . i.e. the leading r k singular vectors of all mode- k fibers. Note: SVD r ( · ) represents the first r left singular vectors of any given matrix. Anru Zhang (UW-Madison) Tensor SVD 14

Tensor SVD Straightforward Idea 1: Higher order SVD (HOSVD) (De Lathauwer, De Moor, and Vandewalle, SIAM J. Matrix Anal. & Appl. 2000a) • Advantage : easy to implement and analyze. • Disadvantage: perform sub-optimally. Reason: simply unfolding the tensor fails to utilize the tensor structure! Anru Zhang (UW-Madison) Tensor SVD 15

Tensor SVD Straightforward Idea 2: Maximum Likelihood Estimator • Maximum-likelihood estimator mle = argmax mle mle mle ˆ 1 , ˆ 2 , ˆ 3 , ˆ � Y − S × 1 U 1 × 2 U 2 × 3 U 3 � 2 S U U U F U 1 , U 2 , U 3 , S mle mle mle • Equivalently, ˆ 1 , ˆ 2 , ˆ U U U can be calculated via 3 � � � 2 � Y × 1 V ⊤ 1 × 2 V ⊤ 2 × 3 V ⊤ max � � 3 F subject to V 1 ∈ O p 1 , r 1 , V 2 ∈ O p 2 , r 2 , V 3 ∈ O p 3 , r 3 . • Advantage : achieves statistical optimality. (will be shown later) • Disadvantage : ◮ Non-convex, computational intractable. ◮ NP-hard to approximate even r = 1 (Hillar and Lim, 2013). Anru Zhang (UW-Madison) Tensor SVD 16

Tensor SVD Phase Transition in Tensor SVD • The difficulty is driven by signal-to-noise ratio (SNR). λ = min k = 1 , 2 , 3 σ r k ( M k ( X )) = least non-zero singular value of M k ( X ) , k = 1 , 2 , 3 , σ = SD ( Z ) = noise level . • Suppose p 1 ≍ p 2 ≍ p 3 ≍ p . Three phases: λ/σ ≥ Cp 3 / 4 (Strong SNR case) , λ/σ < cp 1 / 2 (Weak SNR case) , p 1 / 2 ≪ λ/σ ≪ p 3 / 4 (Moderate SNR case) . Anru Zhang (UW-Madison) Tensor SVD 17

Tensor SVD Strong SNR Case Strong SNR Case: Methodology • When λ/σ ≥ Cp 3 / 4 , apply higher-order orthogonal iteration (HOOI). (De Lathauwer, Moor, and Vandewalle, SIAM. J. Matrix Anal. & Appl. 2000b) • (Step 1. Spectral initialization) (0) ˆ U = SVD r k ( M k ( Y )) , k = 1 , 2 , 3 . k • (Step 2. Power iterations) Repeat Let t = t + 1 . Calculate � � ( t ) ( t − 1) ) ⊤ × 3 ( ˆ ( t − 1) ˆ M 1 ( Y × 2 ( ˆ ) ⊤ ) 1 = SVD r 1 U U U , 2 3 � � 1 ) ⊤ × 3 ( ˆ ( t − 1) ( t ) ( t ) ) ⊤ ) ˆ M 2 ( Y × 1 ( ˆ U 2 = SVD r 2 U U , 3 � � ( t ) ( t ) 1 ) ⊤ × 2 ( ˆ ( t ) ˆ M 3 ( Y × 1 ( ˆ 2 ) ⊤ ) 3 = SVD r 3 U U U . Until t = t max or convergence. Anru Zhang (UW-Madison) Tensor SVD 18

Tensor SVD Strong SNR Case Interpretation 1. Spectral initialization provides a “warm start.” 2. Power iteration refines the initializations. ( t − 1) ( t − 1) ( t − 1) Given ˆ , ˆ , ˆ , denoise Y via: U U U 1 2 3 ( t − 1) ( t − 1) Y × 2 ˆ × 3 ˆ U U . 2 3 ◮ Mode-1 singular subspace is reserved; ◮ Noise can be highly reduced. Thus, we update � � �� ( t − 1) ( t − 1) ( t ) ˆ Y × 2 ˆ × 3 ˆ U 1 = SVD r 1 M r 1 U U . 2 3 Anru Zhang (UW-Madison) Tensor SVD 19

Tensor SVD Strong SNR Case Higher-order orthogonal iteration (HOOI) (De Lathauwer, Moor, and Vandewalle, SIAM. J. Matrix Anal. & Appl. 2000b) Anru Zhang (UW-Madison) Tensor SVD 20

Tensor SVD Strong SNR Case Strong SNR Case: Theoretical Analysis Theorem (Upper Bound) Suppose λ/σ > Cp 3 / 4 and other regularity conditions hold, after at most O � log( p /λ ) ∨ 1 � iterations, • (Recovery of U 1 , U 2 , U 3 ) � F ≤ C √ p k r k � � � ˆ � � E min U k − U k O , k = 1 , 2 , 3; λ/σ O ∈ O r • (Recovery of X ) � � � 2 � ˆ F ≤ C ( p 1 r 1 + p 2 r 2 + p 3 r 3 ) σ 2 , � � sup k = 1 , 2 , 3 E max X − X X ∈F p , r ( λ ) k = 1 , 2 , 3 E � ˆ X − X � 2 ≤ C ( p 1 + p 2 + p 3 ) σ 2 F sup max . � X � 2 λ 2 X ∈F p , r ( λ ) F Anru Zhang (UW-Madison) Tensor SVD 21

Tensor SVD Strong SNR Case Strong SNR Case: Lower Bound Define the following class of low-rank tensors with signal strength λ . F p , r ( λ ) = � X ∈ R p 1 × p 2 × p 3 : rank ( X ) = ( r 1 , r 2 , r 3 ) , σ r k ( M k ( X )) ≥ λ � Theorem (Lower Bound) (Recovery of U 1 , U 2 , U 3 ) √ p k r k � � � ˜ � U k − U k O � F ≥ c � inf sup E min λ/σ , k = 1 , 2 , 3 . ˜ O ∈ O r U k X ∈F p , r ( λ ) (Recovery of X ) � � 2 � � ˆ F ≥ c ( p 1 r 1 + p 2 r 2 + p 3 r 3 ) σ 2 , inf sup E � X − X � ˆ X X ∈F p , r ( λ ) E � ˆ X − X � 2 ≥ c ( p 1 + p 2 + p 3 ) σ 2 F inf sup . � X � 2 λ 2 ˆ X X ∈F p , r ( λ ) F Anru Zhang (UW-Madison) Tensor SVD 22

Singular Value Decomposition for High-dimensional Tensor Data Anru - PowerPoint PPT Presentation

Singular Value Decomposition for High-dimensional Tensor Data Anru Zhang Department of Statistics University of Wisconsin-Madison Introduction Introduction Tensors are arrays with multiple directions. Tensors of order three or higher

[11] The Singular Value Decomposition The Singular Value Decomposition Gene Golubs license

Singular Value Decomposition Presented by Matthew Motoki 1 What is a singular value

Eigenvalue Problems and Singular Value Decomposition Sanzheng Qiao Department of Computing and

Singular Value Decomposition (matrix factorization) Singular Value Decomposition The SVD is a

1 Singular Value Decomposition The singular vector decomposition allows us to write any matrix A

The Singular Value Decomposition COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

8. Tensor Field Visualization Tensor: extension of concept of scalar and vector Tensor data

Chapter 5 Singular value decomposition and principal component analysis In A Practical Approach to

Investigation into a Parallel Singular Value Decomposition Travis Askham Steven Delong Michael

CS475 / CS675 Lecture 19: July 5, 2016 Singular value decomposition Reading: [TB] Chapter 31

Reduced-Rank Singular Value Decomposition for Dimension Reduction with High-Dimensional Data

Rate-Optimal Perturbation Bounds for Singular Subspaces with Applications to High-Dimensional

Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate

Singular Value Decomposition and Digital Image Compression Chris Bingham December 12, 2016

Thermal decomposition of the Thermal decomposition of the Thermal decomposition of the Thermal

Polar Decomposition of a Matrix Garrett Buffington May 4, 2014 The Polar Decomposition SVD and

On the integrality gap of hypergraphic Steiner tree relaxations Neil Olver Department of

The Digital Health Guide Bridging the Digital Divide Between Engaged Patients and Health

F UNDAMENTALS OF O BSTETRICS Christine Pecci, MD Associate Clinical Professor UCSF Department of

UCLA Sequence of Rx for Spectrum of Clinical Symptoms Subclavian Vein Thrombosis with Paget

Decision Theoretic Foundations for Statistical Network Models Carter T. Butts Department of

Intracellular potentials Extracellular Intracellular Intracellular microelectrotrode

The effectiveness of a self-learning manual of continuous renal replacement therapy on critical

Brentuximab Vedotin in PTCL: Other than ALCL Steven M. Horwitz M.D. Associate Member Lymphoma