low rank approximation lecture 7
play

Low Rank Approximation Lecture 7 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 7 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Tensor Train (TT) decomposition A tensor X is in TT decomposition if it can be written as r d


  1. Low Rank Approximation Lecture 7 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1

  2. Tensor Train (TT) decomposition A tensor X is in TT decomposition if it can be written as r d − 1 r 1 � � X ( i 1 , . . . , i d ) = · · · U 1 ( 1 , i 1 , k 1 ) U 2 ( k 1 , i 2 , k 2 ) · · · U d ( k d − 1 , i d , 1 ) . k 1 = 1 k d − 1 = 1 ◮ Smallest possible tuple ( r 1 , . . . , r d − 1 ) is called TT rank of X . ◮ U µ ∈ R r µ − 1 × n µ × r µ (formally set r 0 = r d = 1) are called TT cores for µ = 1 , . . . , d . ◮ If TT ranks are not large � high compression ratio as d grows. ◮ TT decomposition multilinear wrt cores. ◮ TT decomposition connects to ◮ matrix products � Matrix Product States (MPS) in physics (see [Grasedyck/Kressner/Tobler’2013] for references) ◮ simultaneous matrix factorizations � SVD-based compression ◮ contractions and tensor network diagrams � design of efficient contraction-based algorithms 2

  3. TT decomposition and matrix products r d − 1 r 1 � � X ( i 1 , . . . , i d ) = · · · U 1 ( 1 , i 1 , k 1 ) U 2 ( k 1 , i 2 , k 2 ) · · · U d ( k d − 1 , i d , 1 ) . k 1 = 1 k d − 1 = 1 Let U µ ( i µ ) be i µ th slice of µ th core: U µ ( i µ ) := U µ (: , i µ , :) ∈ R r µ − 1 × r µ . Then X ( i 1 , i 2 , . . . , i d ) = U 1 ( i 1 ) U 2 ( i 2 ) · · · U d ( i d ) . Remark: Error analysis of matrix multiplication [Higham’2002] shows that TT decomposition may suffer from numerical instabilities if � U 1 ( i 1 ) � 2 � U 2 ( i 2 ) � 2 · · · � U d ( i d ) � 2 ≫ |X ( i 1 , i 2 , . . . , i d ) | . See [Bachmayr/Kazeev: arXiv:1802.09062] for more details. 3

  4. TT decomposition and matrix factorizations � X ( i 1 , . . . , i d ) = U 1 ( 1 , i 1 , k 1 ) U 2 ( k 1 , i 2 , k 2 ) · · · U d ( k d − 1 , i d , 1 ) . k 1 , k 2 ,..., k d − 1 For any 1 ≤ µ ≤ d − 1 group first µ factors and last d − µ factors together: X ( i 1 , . . . , i µ , i µ + 1 , . . . i d ) r µ � � � � = U 1 ( 1 , i 1 , k 1 ) · · · U µ ( k µ − 1 , i µ , k µ ) k µ = 1 k 1 ,..., k µ − 1 � � � · U µ + 1 ( k µ , i µ + 1 , k µ + 1 ) · · · U d ( k d − 1 , i d , 1 ) k µ + 1 ,..., k d − 1 This can be interpreted as a matrix-matrix product of two (large) matrices! 4

  5. TT decomposition and matrix factorizations The µ th unfolding of X ∈ R n 1 × n 2 ×···× n d is obtained by arranging the entries in a matrix X <µ> ∈ R ( n 1 n 2 ··· n µ ) × ( n µ + 1 ··· n d ) where the corresponding index map is given by ι : R n 1 ×···× n d → R n 1 ··· n µ × R n µ + 1 ··· n d , ι ( i 1 , . . . , i d ) = ( i row , i col ) , µ ν − 1 d ν − 1 � � � � i row = 1 + ( i ν − 1 ) n τ , i col = 1 + ( i ν − 1 ) n τ . ν = 1 τ = 1 ν = µ + 1 τ = µ + 1 5

  6. TT decomposition and matrix factorizations Define interface matrices X ≤ µ ∈ R n 1 n 2 ··· n µ × r µ , X ≥ µ + 1 ∈ R r µ × n µ + 1 n µ + 2 ··· n d as � X ≤ µ ( i row , j ) = U 1 ( 1 , i 1 , k 1 ) · · · U µ − 1 ( k µ − 2 , i µ − 1 , k µ − 1 ) U µ ( k µ − 1 , i µ , j ) k 1 ,..., k µ − 1 � X ≥ µ + 1 ( j , i col ) = U µ + 1 ( j , i µ + 1 , k µ + 1 ) U µ + 2 ( k µ + 1 , i µ + 2 , k µ + 2 ) · · · U d ( k d − 1 , i d , 1 ) k µ + 1 ,..., k d − 1 Matrix factorizations X <µ> = X ≤ µ X ≥ µ + 1 , µ = 1 , . . . , d − 1 . 6

  7. TT decomposition and matrix factorizations Important: These matrix factorizations are nested! X ≤ µ = ( I n µ ⊗ X ≤ µ − 1 ) U L X T ≥ µ = U R µ ( X T and ≥ µ + 1 ⊗ I n µ ) , µ , where U L µ = U < 2 > U R µ = U ( 1 ) = U < 1 > , . µ µ µ The relations X ≤ 1 = U L 1 and X ≤ µ = ( I n µ ⊗ X ≤ µ − 1 ) U L µ = 2 , . . . , d , µ , fully characterize the TT decomposition: vec ( X ) = X ≤ d ( I ⊗ X ≤ d − 1 ) U L = d ( I ⊗ I ⊗ X ≤ d − 2 )( I ⊗ U L d − 1 ) U L = d . . . ( I ⊗ · · · ⊗ I ⊗ U L 1 ) · · · ( I ⊗ U L d − 1 ) U L = d Perform an analogous calculation for X ≥ µ , that is, resolve the recursion X ≥ µ = U R EFY. µ ( X ≥ µ + 1 ⊗ In µ ) . 7

  8. TT decomposition and matrix factorizations Lemma The TT rank of a tensor is given by rank X < 1 > , . . . , rank X < d − 1 > � � Proof. Because of the connection to matrix factorizations, the TT rank � rank X < 1 > , . . . , rank X < d − 1 > � cannot be smaller than . We need to exclude that it can be larger. For this purpose, we construct a TT decomposition with rank X < 1 > , . . . , rank X < d − 1 > � � ( r 1 , . . . , r d − 1 ) := . Step 1: Factorize X < 1 > = U 1 ˜ X < 1 > ∈ R r 1 × n 2 ··· n d , X < 1 > , U 1 ∈ R n 1 × r 1 , ˜ and hence X < 1 > = U † ˜ 1 X < 1 > , U † 1 = ( U T 1 U 1 ) − 1 U T 1 In terms of tensors: X = U 1 ◦ 1 ˜ X . U 1 ≡ U 1 is the first TT core (and X ≥ 1 := ˜ X < 1 > ). 8

  9. Relation for second unfolding via Kronecker product: X < 2 > = ( I n 2 ⊗ U 1 )˜ X < 2 > . Together with full column rank of U 1 , this implies rank (˜ X < 2 > ) = rank ( X < 2 > ) = r 2 . Step 2: Factorize X < 2 > = U L X < 1 > ∈ R r 2 × n 3 ··· n d , ˜ 2 ˆ ˆ X < 1 > , U L 2 ∈ R r 1 n 2 × r 2 , 2 gives the second TT core U 2 ∈ R r 1 × n 2 × r 2 and X T ≥ 2 := ˆ U L X < 1 > . Relation for third unfolding via Kronecker product: X < 3 > = ( I n 3 ⊗ I n 2 ⊗ U 1 )˜ X < 3 > = ( I n 3 ⊗ I n 2 ⊗ U 1 )( I n 3 ⊗ U L 2 )ˆ X < 2 > Together with full column ranks of U 1 and U L 2 , this implies rank (ˆ X < 2 > ) = rank ( X < 3 > ) = r 3 . Continuing in this manner gives cores U µ ∈ R r µ − 1 × n µ × r µ such that vec ( X ) = ( I ⊗ · · · ⊗ I ⊗ U 1 ) · · · ( I ⊗ U L d − 1 ) vec ( U d ) This defines a TT decomposition. 9

  10. Truncation in TT format Proof of Lemma can be turned into practical algorithm (TT-SVD by [Oseledets’2011]) for approximating a given tensor X in TT format: Input: X ∈ R n 1 ×···× n d , target TT rank ( r 1 , . . . , r d − 1 ) . Output: TT cores U µ ∈ R r µ − 1 × n µ × r µ that define a TT decomposition approximating X . 1: Set r 0 = r d = 1. (and formally add leading singleton dimension to X ∈ R 1 × n 1 ×···× n d ). 2: for µ = 1 , . . . , d − 1 do Reshape X into X < 2 > ∈ R r µ − 1 n µ × n µ + 1 ··· n d . 3: Compute rank- r µ approximation X < 2 > ≈ U Σ V T (e.g., via SVD) 4: Reshape U into U µ ∈ R r µ − 1 × n µ × r µ . 5: Update X via X < 2 > ← U T X < 2 > = Σ V T . 6: 7: end for 8: Set U d = X . 10

  11. Truncation in TT format Theorem Let X SVD denote the tensor in TT decomposition obtained from TT-SVD. Then � ε 2 1 + · · · + ε 2 �X − X SVD � ≤ d , where µ = � X <µ> − T r µ ( X <µ> ) � 2 F = σ r µ + 1 ( X <µ> ) 2 + · · · . ε 2 Proof. After µ steps of the algorithm we have the following situation: ◮ Core tensors U 1 , . . . , U µ have been computed, defining X ≤ µ . ◮ Remaining tensor has size r µ × n µ + 1 × · · · × n d . Reshape remaining tensor into matrix Y ≥ µ ∈ R r µ × n µ + 1 ··· n d . Will prove relations X T Y ≥ µ + 1 = X T ≤ µ X <µ> , ≤ µ X ≤ µ = I , and � X <µ> − X ≤ µ Y ≥ µ + 1 � F ≤ � ε 2 1 + · · · + ε 2 (1) µ for µ = 1 , . . . , d − 1 by induction. For µ = d − 1, this shows the theorem. 11

  12. Line 3 in the µ th step of the algorithm proceeds by reshaping the remaining tensor from step µ − 1 (corresponding to Y ≥ µ ) into an array of size Y < 2 > ∈ R r µ − 1 n µ × n µ + 1 ··· n d . By induction assumption, Y < 2 > = ( I n µ ⊗ X T Y ≥ µ = X ≤ µ − 1 X <µ − 1 > ≤ µ − 1 ) X <µ> . ⇒ The matrix U L µ ≡ U computed in Line 4 contains left singular vectors of Y < 2 > . In particular, ( U L µ ) T U L µ = I . Together with the induction assumption and the relation X ≤ µ = ( I n µ ⊗ X ≤ µ − 1 ) U L µ , this implies X T ≤ µ X ≤ µ = I and Y T ≥ µ + 1 = X T ≤ µ X <µ> . Moreover, � Y < 2 > − T r µ ( Y < 2 > ) � F � ( I − U L µ ( U L µ ) T ) Y < 2 > � F = � X <µ> − T r µ ( X <µ> ) � F = ε µ . ≤ 12

  13. Finally, we obtain: � X <µ> − X ≤ µ X T ≤ µ X <µ> � 2 F � X <µ> − ( I ⊗ X ≤ µ − 1 ) U L µ ( U L µ ) T ( I ⊗ X ≤ µ − 1 ) T X <µ> � 2 = F � X <µ> − ( I ⊗ X ≤ µ − 1 )( I ⊗ X ≤ µ − 1 ) T X <µ> � 2 = F � ( I ⊗ X ≤ µ − 1 )( I ⊗ X ≤ µ − 1 ) T X <µ> + − ( I ⊗ X ≤ µ − 1 ) U L µ ( U L µ ) T ( I ⊗ X ≤ µ − 1 ) T X <µ> � 2 F � X <µ − 1 > − X ≤ µ − 1 X T ≤ µ − 1 X <µ − 1 > � 2 F + � ( I − U L µ ( U L µ ) T ) Y < 2 > � 2 = F ε 2 1 + · · · + ε 2 µ − 1 + ε 2 ≤ µ This completes the proof. 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend