low rank approximation lecture 8
play

Low Rank Approximation Lecture 8 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 8 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Linear operators in TT decomposition Consider linear operator L : R n 1 n d R n 1


  1. Low Rank Approximation Lecture 8 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1

  2. Linear operators in TT decomposition Consider linear operator L : R n 1 ×···× n d → R n 1 ×···× n d and corresponding matrix representation L (( i 1 , . . . , i d ) , ( j 1 , . . . , j d )) . L is TT operator if it takes the form L (( i 1 , . . . , i d ) , ( j 1 , . . . , j d )) = R d − 1 R 1 � � · · · A 1 ( 1 , i 1 , j 1 , k 1 ) A 2 ( k 1 , i 2 , j 2 , k 2 ) · · · A d ( k d − 1 , i d , j d , 1 ) k 1 = 1 k d − 1 = 1 with R µ − 1 × n µ × n µ × R µ tensors A µ . Operator TT rank of L = Smallest possible values of ( R 1 , . . . , R µ − 1 ) . Matrix Product Operator (MPO): L (( i 1 , . . . , i d ) , ( j 1 , . . . , j d )) = A 1 ( i 1 , j 1 ) A 2 ( i 2 , j 2 ) · · · A d ( i d , j d ) , with A µ ( i µ , j µ ) = A µ (: , i µ , j µ , :) . 2

  3. Linear operators in TT decomposition Alternative view: Reinterpret matrix representation of L as tensor ten ( L ) ∈ R n 2 1 ×···× n 2 d , by merging indices ( i µ , j µ ) applying TT format, and separating indices ( i µ , j µ ) . In practice: Perfect shuffle permutation of modes. Tensor network diagram for operator TT decomposition: EFY. Show that the identity matrix has operator TT ranks 1. 3

  4. Linear operators in TT decomposition Example: Discrete Laplace-like operator takes the form L : X �→ L 1 ◦ 1 X + · · · + L d ◦ d X for matrices L µ ∈ R n µ × n µ . Matrix representation L = I n d ⊗ · · · ⊗ I n 2 ⊗ L 1 + · · · + L d ⊗ I n d − 1 ⊗ · · · ⊗ I n 1 TT representation with cores � � A 1 ( i 1 , j 1 ) = L 1 ( i 1 , j 1 ) I n 1 ( i 1 , j 1 ) � � I n µ ( i µ , j µ ) 0 A µ ( i µ , j µ ) = L µ ( i µ , j µ ) I n µ ( i µ , j µ ) � I n d ( i d , j d ) � A d ( i d , j d ) = L d ( i d , j d ) 4

  5. Multiplication with linear operators in TT decomposition x = L x or, equivalently, ˜ Consider matrix-vector product ˜ X = L ( X ) . Assume that L and U are in TT decomposition � ˜ X ( i 1 , . . . , i d ) � = L (( i 1 , . . . , i d ) , ( j 1 , . . . , j d )) X ( j 1 , . . . , j d ) j 1 ,..., j d � = A 1 ( i 1 , j 1 ) · · · A d ( i d , j d ) U 1 ( j 1 ) · · · U d ( j d ) j 1 ,..., j d � = ( A 1 ( i 1 , j 1 ) ⊗ U 1 ( j 1 )) · · · ( A d ( i d , j d ) ⊗ U d ( j d )) j 1 ,..., j d � � � � � � = A 1 ( i 1 , j 1 ) ⊗ U 1 ( j 1 ) · · · A d ( i d , j d ) ⊗ U d ( j d ) j 1 j d U 1 ( i 1 ) · · · ˜ ˜ =: U d ( i d ) , that is, TT ranks of ˜ X are bounded by TT ranks of X multiplied with TT operator ranks of L . 5

  6. Multiplication with linear operators in TT decomposition Illustration by tensor network diagrams: ◮ Carrying out contractions requires O ( dnr 4 ) operations for tensors of order d . 6

  7. TT decomposition – Summary of operations Easy: ◮ (partial) contractions ◮ multiplication with TT operators ◮ recompression/truncation Medium: ◮ entrywise products Hard: ◮ almost everything else Software: ◮ TT toolox (Matlab, Python), TTeMPS (Matlab), . . . 7

  8. Alternating least-squares / linear scheme General setting: Solve optimization problem min X f ( X ) , where X is a (large) matrix or tensor and f is “simple” (e.g., convex). Constrain X to M r , set of rank- r matrices or tensors and aim at solving X ∈M r f ( X ) , min Set X = i ( U 1 , U 2 , . . . , U d ) . (e.g., X = U 1 U T 2 ). Low-rank formats are multilinear � there is hope that optimizing for each component is simple: min U µ f ( i ( U 1 , U 2 , . . . , U d )) . 8

  9. Alternating least-squares / linear scheme Set f ( U 1 , . . . , U d ) := f ( i ( U 1 , . . . , U d )) . ALS: 1: while not converged do U 1 ← arg min U 1 f ( i ( U 1 , U 2 , . . . , U d )) 2: U 2 ← arg min U 1 f ( i ( U 1 , U 2 , . . . , U d )) 3: . . . 4: U d ← arg min U 1 f ( i ( U 1 , U 2 , . . . , U d )) 5: 6: end while Examples: ◮ ALS for fitting CP decomposition ◮ Subspace iteration. Closely related: Block Gauss-Seidel, Block Coordinate Descent. Difficulties: ◮ Representation ( U 1 , U 2 , . . . , U d ) often non-unique, parameters may become unbounded. ◮ M r not closed ◮ Convergence (analysis) 9

  10. Subspace iteration and ALS Given A ∈ R m × n , consider computation of best rank- r approximation: f ( U , V ) := � A − UV T � 2 U ∈ R m × r , V ∈ R n × r f ( U , V ) , min F ◮ Representation UV T is unique for each U , V individually if U , V have rank r . ◮ f is convex wrt U and V individually. f ( U + H , V ) − f ( U , V ) + O ( � H � 2 �∇ U f ( U , V ) , H � = 2 ) − 2 � AV − UV T V , H � . = Hence, 0 = ∇ U f ( U , V ) = − 2 ( AV − UV T V ) U = AV ( V T V ) − 1 . ⇔ For stability it is advisable to choose V such that it has orthonormal columns. 10

  11. Subspace iteration and ALS ALS for low-rank matrix approximation: 1: while not converged do Compute economy size QR factorization: V = QR and update 2: V ← Q . U ← AV 3: Compute economy size QR factorization: U = QR and update 4: U ← Q . V ← A T U 5: 6: end while Returns an approximation A ≈ UV T . This is the subspace iteration from Lecture 1! EFY. Develop an ALS method for solving the weighted low-rank approximation problem � WL ( A − UVT ) WR � F min U , V with square and invertible matrices WL , WR . 11

  12. Linear matrix equations For linear operator L : R m × n → R m × n , consider linear system C , X ∈ R m × n . L ( X ) = C , Examples: 1 ◮ Sylvester matrix equation: A ∈ R m × m , B ∈ R n × n , C , X ∈ R m × n . AX + XB = C , Applications: Discretized 2D Laplace on rectangle, stability analysis, optimal control, model reduction of linear control systems. Special case Lyapunov equations: m = n , A = B T , C symmetric (and often negative semi-definite) ◮ Stochastic Galerkin methods in uncertainty quantification. ◮ Stochastic control. 1 See [V. Simoncini, Computational methods for linear matrix equations , SIAM Rev., 58 (2016), pp. 377–441] for details and references. 12

  13. Linear matrix equations Using the matrix M L representing L in canonical bases, we can rewrite L ( X ) = B as linear system M L ( vec ( X )) = vec ( C ) . Assumption: M L has low Kronecker rank: M L = B 1 ⊗ A 1 + · · · + B R ⊗ A R , R ≪ m , n . Equivalently, L ( X ) = A 1 XB T 1 + · · · + A R XB T R EFY. Develop a variant of ACA (from Lecture 3) that aims at approximating a given sparse matrix A by a matrix of low Kronecker rank for given m , n . EFY. Show that if m = n , M L is symmetric and has Kronecker rank R , one can find symmetric matrices A 1 , . . . , AR , B 1 , . . . , BR such that L ( X ) = A 1 XB 1 + · · · + AR XBR . Is it always possible to choose all Ak , Bk positive semi-definite if M L is positive definite? 13

  14. Linear matrix equations Two ways of turning L ( X ) = C into optimization problem: 1. If M L is symmetric positive definite: 1 min 2 �L ( X ) , X � − � X , B � . X 2. General L : X �L ( X ) − B � 2 min F Will focus on spd M L in the following. 14

  15. Linear matrix equations Low-rank approximation of L ( X ) = B obtained by solving f ( U , V ) = 1 2 �L ( UV T ) , UV T � − � UV T , C � . min U , V f ( U , V ) for Let L have Kronecker rank R . Then R R � � �L ( UV T ) , UV T � � A k UV T B k , UV T � = � A k UV T B k V , U � . = k = 1 k = 1 This shows that arg min U f ( U , V ) is solution of linear matrix equation A 1 U ( V T B 1 V ) + · · · + A R U ( V T B R V ) = CV . Show that this linear matrix equation always has a unique solution under the assumption that L is symmetric positive definite. EFY. ◮ For R = 2, can be reduced to R linear systems of size n × n . ◮ For R > 2, need to solve Rn × Rn system. 15

  16. Linear matrix equations ALS for linear matrix equations: 1: while not converged do Compute economy size QR factorization: V = QR and update 2: V ← Q . Solve A 1 U ( V T B 1 V ) + · · · + A R U ( V T B R V ) = CV for U . 3: Compute economy size QR factorization: U = QR and update 4: U ← Q . Solve ( U T A 1 U ) V T B 1 + · · · + ( U T A R U ) V T B R = U T C for V . 5: 6: end while Returns an approximation X ≈ UV T . For R = 2, there are better alternatives: ADI, Krylov subspace methods, ... [Simoncini’2016]. 16

  17. 2D eigenvalue problem ◮ −△ u ( x ) + V ( x ) u = λ u ( x ) in Ω = [ 0 , 1 ] × [ 0 , 1 ] with Dirichlet b.c. and Henon-Heiles potential V ◮ Regular discretization ◮ Reshaped ground state into matrix Ground state Singular values 0 10 −5 10 −10 10 −15 10 −20 10 0 100 200 300 Excellent rank-10 approximation possible 17

  18. Rayleigh quotients wrt low-rank matrices Consider symmetric n 2 × n 2 matrix A . Then � x , A x � λ min ( A ) = min � x , x � . x � = 0 We now... ◮ reshape vector x into n × n matrix X ; ◮ reinterpret A x as linear operator A : X �→ A ( X ) . 18

  19. Rayleigh quotients wrt low-rank matrices Consider symmetric n 2 × n 2 matrix A . Then � X , A ( X ) � λ min ( A ) = min � X , X � X � = 0 with matrix inner product �· , ·� . We now... ◮ restrict X to low-rank matrices. 19

  20. Rayleigh quotients wrt low-rank matrices Consider symmetric n 2 × n 2 matrix A . Then � X , A ( X ) � λ min ( A ) ≈ min . � X , X � X = UV T � = 0 ◮ Approximation error governed by low-rank approximability of X . ◮ Solved by Riemannian optimization techniques or ALS. 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend