computing the best rank r 1 r 2 r 3 approximation of a
play

Computing the Best Rank ( r 1 , r 2 , r 3 ) Approximation of a - PowerPoint PPT Presentation

Computing the Best Rank ( r 1 , r 2 , r 3 ) Approximation of a Tensor Lars Eld en Department of Mathematics Link oping University Sweden Joint work with Berkant Savas Harrachov 2007 1 Best rank k approximation of a matrix


  1. Computing the Best Rank − ( r 1 , r 2 , r 3 ) Approximation of a Tensor Lars Eld´ en Department of Mathematics Link¨ oping University Sweden Joint work with Berkant Savas Harrachov 2007

  2. 1 Best rank − k approximation of a matrix Assume X T k X k = I and Y T k Y k = I � A − X k S k Y T min k � F =: min � A − ( X k , Y k ) · S k � F X k ,Y k ,S k X k ,Y k ,S k (Almost) equivalent problem: � X T max k AY k � F = max � A · ( X k , Y k ) � F X k ,Y k X k ,Y k Solution by SVD: X k S k Y T k = U k Σ k V T k = ( U k , V k ) · Σ k Eckart-Young property – Harrachov 2007 –

  3. 2 Sketch of “proof”: Determine u 1 and v 1 ( k = 1 ) Put u 1 and v 1 in orthogonal matrices ( u 1 � U ) and ( v 1 � V ) � σ 1 � 0 ( u 1 � U ) T A ( v 1 � v ) = 0 B Optimality = ⇒ zeros = ⇒ deflation: continue with B Orthogonality of vectors comes automatically Number of degrees of freedom in U k and V k is equal to the number of zeros produced. – Harrachov 2007 –

  4. 3 Best rank − ( k, k, k ) approximation of a tensor Assume X T X = Y T Y = Z T Z = I k X,Y,Z, S �A − ( X, Y, Z ) · S� F min ⇐ ⇒ X,Y,Z �A · ( X, Y, Z ) � F max Why is this problem much more complicated? Not enough degrees of freedom in X, Y, Z to zero many ( O ( k 3 ) + O ( kn 2 ) ) elements in A ⇓ Deflation is impossible in general ⇓ Orthogonality constraints must be enforced – Harrachov 2007 –

  5. 4 Talk outline • Some basic tensor concepts (For simplicity: only tensors of order 3) • Best rank- ( r 1 , r 2 , r 3 ) approximation problem • Optimization on the Grassmann manifold • Newton-Grassmann for solving the best rank- ( r 1 , r 2 , r 3 ) approximation problem • Numerical examples • Ongoing work – Harrachov 2007 –

  6. 5 “Old and New” Research Area • Tensor methods have been used since the 1960’s in psychometrics and chemometrics! Only recently in numerical community. • Available mathematical theory deals very little with computational aspects. Many fundamental mathematical problems are open! • Applications in signal processing and various areas of data mining. – Harrachov 2007 –

  7. 6 Two aspects of SVD Singular Value Decomposition: R m × n X = U Σ V T ❅ 0 V T ❅ ❅ = X U 0 m × n m × m m × n Singular value expansion: sum of rank-1 matrices: n � σ i u i v T X = i = + + · · · i =1 – Harrachov 2007 –

  8. 7 Two approaches to tensor decomposition � � � Tucker Model U (3) � � � � � � � � � � � � � � � = U (1) U (2) A S � � � � � � • Tucker 1966, numerous papers in psychometrics and chemometrics • De Lathauwer, De Moor, Vandewalle, SIMAX 2000: notation, theory. – Harrachov 2007 –

  9. 8 Expansion in rank-1 terms � � � � � � � � � � � � A = + + · · · � � � • Parafac/Candecomp/Kruskal: Harshman, Caroll, Chang 1970 • Numerous papers in psychometrics and chemometrics • Kolda, SIMAX 2001, Zhang, Golub, SIMAX 2001, De Silva and Lim 2006 – Harrachov 2007 –

  10. 9 Parafac/... model: low rank approximation � � � � � � � � � � � � � � ❛❛❛❛❛ ≈ ❛ A � � � � � The core tensor is zero except along the superdiagonal. – Harrachov 2007 –

  11. 10 Parafac/... model: low rank approximation � � � � � � � � � � � � � � ❛❛❛❛❛ ≈ ❛ A � � � � � The core tensor is zero except along the superdiagonal. Why is it difficult to obtain this? Because we do not have enough degrees of freedom to zero the tensor elements: O ( k 2 ) and O ( k 3 ) – Harrachov 2007 –

  12. 11 The Parafac approximation problem may be ill-posed! 1 Theorem 1. There are tensors A for which the problem min x i ,y i ,z i �A − x 1 ⊗ y 1 ⊗ z 1 − x 2 ⊗ y 2 ⊗ z 2 � F does not have a solution. The set of tensors for which the approximation problem does not have a solution has positive volume. The problem is illposed! (in exact arithmetic) A well-posed problem (in floating point) near to an ill-posed one is ill- conditioned: = ⇒ unstable computations. Still: There are applications (e.g. in chemistry) where the Parafac model corresponds closely to the process that generates the tensor data. 1 See De Silva and Lim (2006), Bini (1986) – Harrachov 2007 –

  13. 12 Mode − I multiplication of a tensor by a matrix 2 Contravariant multiplication n � R n × n × n ∋ B = ( W ) { 1 } · A , B ( i, j, k ) = w iν a νjk . ν =1 All column vectors in the 3-tensor are multiplied by the matrix W . Covariant multiplication n � R n × n × n ∋ B = A · ( W ) { 1 } , B ( i, j, k ) = a νjk w νi . ν =1 2 Lim’s notation – Harrachov 2007 –

  14. 13 Matrix-tensor multiplication performed in all modes in the same expression: ( X, Y, Z ) · A = A · ( X T , Y T , Z T ) Standard matrix multiplication of three matrices: XAY T = ( X, Y ) · A – Harrachov 2007 –

  15. 14 Inner product, orthogonality and norm Inner product (contraction: R n × n × n → R ) � �A , B� = a ijk b ijk i,j,k The Frobenius norm of a tensor is �A� = �A , A� 1 / 2 Matrix case � A, B � = tr( A T B ) – Harrachov 2007 –

  16. 15 Tensor SVD (HOSVD) 3 : A = ( U (1) , U (2) , U (3) ) · S � � � U (3) � � � � � � � � � � � � � � � = U (1) U (2) A S � � � � � � The “mass” of S is concentrated around the (1 , 1 , 1) corner. Not optimal: does not solve min rank( B )=( r 1 ,r 2 ,r 3 ) �A − B� 3 De Lathauwer et al (2000) – Harrachov 2007 –

  17. 16 Best rank − ( r 1 , r 2 , r 3 ) Approximation T Z S Y T ≈ A X Best rank − ( r 1 , r 2 , r 3 ) approximation: X T X = I, Y T Y = I, Z T Z = I X,Y,Z, S �A − ( X, Y, Z ) · S� , min The problem is over-parameterized! – Harrachov 2007 –

  18. 17 Best approximation: min rank( B )=( r 1 ,r 2 ,r 3 ) �A − B� Equivalent to � X,Y,Z Φ( X, Y, Z ) = 1 2 �A · ( X, Y, Z ) � 2 = 1 A 2 max jkl , 2 j,k,l � A jkl = a λµν x λj y µk z νl , λ,µ,ν subject to X T X = I r 1 , Y T Y = I r 2 , Z T Z = I r 3 – Harrachov 2007 –

  19. 18 Grassmann Optimization The Frobenius norm is invariant under orthogonal transformations: Φ( X, Y, Z ) = Φ( XU, Y V, ZW ) = 1 2 �A · ( XU, Y V, ZW ) � 2 for orthogonal U ∈ R r 1 × r 1 , V ∈ R r 2 × r 2 , and W ∈ R r 3 × r 3 . Maximize Φ over equivalence classes [ X ] = { XU | U orthogonal } . Product of Grassmann manifolds: Gr 3 = Gr( J, r 1 ) × Gr( K, r 2 ) × Gr( L, r 3 ) 1 ( X,Y,Z ) ∈ Gr 3 Φ( X, Y, Z ) = max max 2 �A · ( X, Y, Z ) , A · ( X, Y, Z ) � ( X,Y,Z ) ∈ Gr 3 – Harrachov 2007 –

  20. 19 Newton’s Method on one Grassmann Manifold Taylor expansion + linear algebra on tangent space 4 at X G ( X ( t )) ≈ G ( X (0)) + � ∆ , ∇ G � + 1 2 � ∆ , H (∆) � , Grassmann gradient: ( G x ) jk = ∂G Π X = I − XX T ∇ G = Π X G x , , ∂x jk The Newton equation for determining ∆ : ∂ 2 G Π X �G xx , ∆ � 1:2 − ∆ � X, G x � 1 = −∇ G, ( G xx ) jklm = . ∂X jk ∂X lm 4 Tangent space at X : all matrices Z satisfying Z T X = 0 . – Harrachov 2007 –

  21. 20 Newton-Grassmann Algorithm on Gr 3 Here: local coordinates Given tensor A and starting points ( X 0 , Y 0 , Z 0 ) ∈ Gr 3 repeat compute the Grassmann gradient ∇ � Φ compute the Grassmann Hessian � H matricize � H and vectorize ∇ � Φ solve D = ( D x , D y , D z ) from the Newton equation take a geodesic step along the direction D , giving new iterates (X,Y,Z) until �∇ � Φ � / Φ < TOL Implementation using TensorToolbox and object-oriented Grassmann classes in Matlab – Harrachov 2007 –

  22. 21 Newton’s method on Gr 3 Differentiate Φ( X, Y, Z ) along a geodesic curve ( X ( t ) , Y ( t ) , Z ( t )) in the direction (∆ x , ∆ y , ∆ z ) : ∂x st ∂t = (∆ x ) st , and � dX ( t ) � , dY ( t ) , dZ ( t ) = (∆ x , ∆ y , ∆ z ) , dt dt dt Since A · ( X, Y, Z ) is linear in X, Y, Z separately: d ( A · ( X, Y, Z )) = A · (∆ x , Y, Z ) + A · ( X, ∆ y , Z ) + A · ( X, Y, ∆ z ) . dt – Harrachov 2007 –

  23. 22 First Derivative d Φ dt = 1 d dt �A · ( X, Y, Z ) , A · ( X, Y, Z ) � = �A · (∆ x , Y, Z ) , A · ( X, Y, Z ) � 2 + �A · ( X, ∆ y , Z ) , A · ( X, Y, Z ) � + �A · ( X, Y, ∆ z ) , A · ( X, Y, Z ) � . We want to write �A · (∆ x , Y, Z ) , A · ( X, Y, Z ) � in the form � ∆ x , Φ x � Define the tensor F = A · ( X, Y, Z ) and write �A · (∆ x , Y, Z ) , F� =: �K x (∆ x ) , F� = � ∆ x , K ∗ x F� , Linear operator: ∆ x �− → K x (∆ x ) = A · (∆ x , Y, Z ) – Harrachov 2007 –

  24. 23 Adjoint Operator Linear operator: ∆ x �− → K x (∆ x ) = A · (∆ x , Y, Z ) with adjoint �K x (∆ x ) , F� = � ∆ x , K ∗ x F� = � ∆ x , �A · ( I, Y, Z ) , F � − 1 � where the partial contraction is defined � �B , C� − 1 ( i 1 , i 2 ) = b i 1 µν c i 2 µν µ,ν – Harrachov 2007 –

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend