Computing the Best Rank ( r 1 , r 2 , r 3 ) Approximation of a - PowerPoint PPT Presentation

Computing the Best Rank − ( r 1 , r 2 , r 3 ) Approximation of a Tensor Lars Eld´ en Department of Mathematics Link¨ oping University Sweden Joint work with Berkant Savas Harrachov 2007

1 Best rank − k approximation of a matrix Assume X T k X k = I and Y T k Y k = I � A − X k S k Y T min k � F =: min � A − ( X k , Y k ) · S k � F X k ,Y k ,S k X k ,Y k ,S k (Almost) equivalent problem: � X T max k AY k � F = max � A · ( X k , Y k ) � F X k ,Y k X k ,Y k Solution by SVD: X k S k Y T k = U k Σ k V T k = ( U k , V k ) · Σ k Eckart-Young property – Harrachov 2007 –

2 Sketch of “proof”: Determine u 1 and v 1 ( k = 1 ) Put u 1 and v 1 in orthogonal matrices ( u 1 � U ) and ( v 1 � V ) � σ 1 � 0 ( u 1 � U ) T A ( v 1 � v ) = 0 B Optimality = ⇒ zeros = ⇒ deflation: continue with B Orthogonality of vectors comes automatically Number of degrees of freedom in U k and V k is equal to the number of zeros produced. – Harrachov 2007 –

3 Best rank − ( k, k, k ) approximation of a tensor Assume X T X = Y T Y = Z T Z = I k X,Y,Z, S �A − ( X, Y, Z ) · S� F min ⇐ ⇒ X,Y,Z �A · ( X, Y, Z ) � F max Why is this problem much more complicated? Not enough degrees of freedom in X, Y, Z to zero many ( O ( k 3 ) + O ( kn 2 ) ) elements in A ⇓ Deflation is impossible in general ⇓ Orthogonality constraints must be enforced – Harrachov 2007 –

4 Talk outline • Some basic tensor concepts (For simplicity: only tensors of order 3) • Best rank- ( r 1 , r 2 , r 3 ) approximation problem • Optimization on the Grassmann manifold • Newton-Grassmann for solving the best rank- ( r 1 , r 2 , r 3 ) approximation problem • Numerical examples • Ongoing work – Harrachov 2007 –

5 “Old and New” Research Area • Tensor methods have been used since the 1960’s in psychometrics and chemometrics! Only recently in numerical community. • Available mathematical theory deals very little with computational aspects. Many fundamental mathematical problems are open! • Applications in signal processing and various areas of data mining. – Harrachov 2007 –

6 Two aspects of SVD Singular Value Decomposition: R m × n X = U Σ V T ❅ 0 V T ❅ ❅ = X U 0 m × n m × m m × n Singular value expansion: sum of rank-1 matrices: n � σ i u i v T X = i = + + · · · i =1 – Harrachov 2007 –

7 Two approaches to tensor decomposition � � � Tucker Model U (3) � � � � � � � � � � � � � � � = U (1) U (2) A S � � � � � � • Tucker 1966, numerous papers in psychometrics and chemometrics • De Lathauwer, De Moor, Vandewalle, SIMAX 2000: notation, theory. – Harrachov 2007 –

8 Expansion in rank-1 terms � � � � � � � � � � � � A = + + · · · � � � • Parafac/Candecomp/Kruskal: Harshman, Caroll, Chang 1970 • Numerous papers in psychometrics and chemometrics • Kolda, SIMAX 2001, Zhang, Golub, SIMAX 2001, De Silva and Lim 2006 – Harrachov 2007 –

9 Parafac/... model: low rank approximation � � � � � � � � � � � � � � ❛❛❛❛❛ ≈ ❛ A � � � � � The core tensor is zero except along the superdiagonal. – Harrachov 2007 –

10 Parafac/... model: low rank approximation � � � � � � � � � � � � � � ❛❛❛❛❛ ≈ ❛ A � � � � � The core tensor is zero except along the superdiagonal. Why is it difficult to obtain this? Because we do not have enough degrees of freedom to zero the tensor elements: O ( k 2 ) and O ( k 3 ) – Harrachov 2007 –

11 The Parafac approximation problem may be ill-posed! 1 Theorem 1. There are tensors A for which the problem min x i ,y i ,z i �A − x 1 ⊗ y 1 ⊗ z 1 − x 2 ⊗ y 2 ⊗ z 2 � F does not have a solution. The set of tensors for which the approximation problem does not have a solution has positive volume. The problem is illposed! (in exact arithmetic) A well-posed problem (in floating point) near to an ill-posed one is ill- conditioned: = ⇒ unstable computations. Still: There are applications (e.g. in chemistry) where the Parafac model corresponds closely to the process that generates the tensor data. 1 See De Silva and Lim (2006), Bini (1986) – Harrachov 2007 –

12 Mode − I multiplication of a tensor by a matrix 2 Contravariant multiplication n � R n × n × n ∋ B = ( W ) { 1 } · A , B ( i, j, k ) = w iν a νjk . ν =1 All column vectors in the 3-tensor are multiplied by the matrix W . Covariant multiplication n � R n × n × n ∋ B = A · ( W ) { 1 } , B ( i, j, k ) = a νjk w νi . ν =1 2 Lim’s notation – Harrachov 2007 –

13 Matrix-tensor multiplication performed in all modes in the same expression: ( X, Y, Z ) · A = A · ( X T , Y T , Z T ) Standard matrix multiplication of three matrices: XAY T = ( X, Y ) · A – Harrachov 2007 –

14 Inner product, orthogonality and norm Inner product (contraction: R n × n × n → R ) � �A , B� = a ijk b ijk i,j,k The Frobenius norm of a tensor is �A� = �A , A� 1 / 2 Matrix case � A, B � = tr( A T B ) – Harrachov 2007 –

15 Tensor SVD (HOSVD) 3 : A = ( U (1) , U (2) , U (3) ) · S � � � U (3) � � � � � � � � � � � � � � � = U (1) U (2) A S � � � � � � The “mass” of S is concentrated around the (1 , 1 , 1) corner. Not optimal: does not solve min rank( B )=( r 1 ,r 2 ,r 3 ) �A − B� 3 De Lathauwer et al (2000) – Harrachov 2007 –

16 Best rank − ( r 1 , r 2 , r 3 ) Approximation T Z S Y T ≈ A X Best rank − ( r 1 , r 2 , r 3 ) approximation: X T X = I, Y T Y = I, Z T Z = I X,Y,Z, S �A − ( X, Y, Z ) · S� , min The problem is over-parameterized! – Harrachov 2007 –

17 Best approximation: min rank( B )=( r 1 ,r 2 ,r 3 ) �A − B� Equivalent to � X,Y,Z Φ( X, Y, Z ) = 1 2 �A · ( X, Y, Z ) � 2 = 1 A 2 max jkl , 2 j,k,l � A jkl = a λµν x λj y µk z νl , λ,µ,ν subject to X T X = I r 1 , Y T Y = I r 2 , Z T Z = I r 3 – Harrachov 2007 –

18 Grassmann Optimization The Frobenius norm is invariant under orthogonal transformations: Φ( X, Y, Z ) = Φ( XU, Y V, ZW ) = 1 2 �A · ( XU, Y V, ZW ) � 2 for orthogonal U ∈ R r 1 × r 1 , V ∈ R r 2 × r 2 , and W ∈ R r 3 × r 3 . Maximize Φ over equivalence classes [ X ] = { XU | U orthogonal } . Product of Grassmann manifolds: Gr 3 = Gr( J, r 1 ) × Gr( K, r 2 ) × Gr( L, r 3 ) 1 ( X,Y,Z ) ∈ Gr 3 Φ( X, Y, Z ) = max max 2 �A · ( X, Y, Z ) , A · ( X, Y, Z ) � ( X,Y,Z ) ∈ Gr 3 – Harrachov 2007 –

19 Newton’s Method on one Grassmann Manifold Taylor expansion + linear algebra on tangent space 4 at X G ( X ( t )) ≈ G ( X (0)) + � ∆ , ∇ G � + 1 2 � ∆ , H (∆) � , Grassmann gradient: ( G x ) jk = ∂G Π X = I − XX T ∇ G = Π X G x , , ∂x jk The Newton equation for determining ∆ : ∂ 2 G Π X �G xx , ∆ � 1:2 − ∆ � X, G x � 1 = −∇ G, ( G xx ) jklm = . ∂X jk ∂X lm 4 Tangent space at X : all matrices Z satisfying Z T X = 0 . – Harrachov 2007 –

20 Newton-Grassmann Algorithm on Gr 3 Here: local coordinates Given tensor A and starting points ( X 0 , Y 0 , Z 0 ) ∈ Gr 3 repeat compute the Grassmann gradient ∇ � Φ compute the Grassmann Hessian � H matricize � H and vectorize ∇ � Φ solve D = ( D x , D y , D z ) from the Newton equation take a geodesic step along the direction D , giving new iterates (X,Y,Z) until �∇ � Φ � / Φ < TOL Implementation using TensorToolbox and object-oriented Grassmann classes in Matlab – Harrachov 2007 –

21 Newton’s method on Gr 3 Differentiate Φ( X, Y, Z ) along a geodesic curve ( X ( t ) , Y ( t ) , Z ( t )) in the direction (∆ x , ∆ y , ∆ z ) : ∂x st ∂t = (∆ x ) st , and � dX ( t ) � , dY ( t ) , dZ ( t ) = (∆ x , ∆ y , ∆ z ) , dt dt dt Since A · ( X, Y, Z ) is linear in X, Y, Z separately: d ( A · ( X, Y, Z )) = A · (∆ x , Y, Z ) + A · ( X, ∆ y , Z ) + A · ( X, Y, ∆ z ) . dt – Harrachov 2007 –

22 First Derivative d Φ dt = 1 d dt �A · ( X, Y, Z ) , A · ( X, Y, Z ) � = �A · (∆ x , Y, Z ) , A · ( X, Y, Z ) � 2 + �A · ( X, ∆ y , Z ) , A · ( X, Y, Z ) � + �A · ( X, Y, ∆ z ) , A · ( X, Y, Z ) � . We want to write �A · (∆ x , Y, Z ) , A · ( X, Y, Z ) � in the form � ∆ x , Φ x � Define the tensor F = A · ( X, Y, Z ) and write �A · (∆ x , Y, Z ) , F� =: �K x (∆ x ) , F� = � ∆ x , K ∗ x F� , Linear operator: ∆ x �− → K x (∆ x ) = A · (∆ x , Y, Z ) – Harrachov 2007 –

23 Adjoint Operator Linear operator: ∆ x �− → K x (∆ x ) = A · (∆ x , Y, Z ) with adjoint �K x (∆ x ) , F� = � ∆ x , K ∗ x F� = � ∆ x , �A · ( I, Y, Z ) , F � − 1 � where the partial contraction is defined � �B , C� − 1 ( i 1 , i 2 ) = b i 1 µν c i 2 µν µ,ν – Harrachov 2007 –

Computing the Best Rank ( r 1 , r 2 , r 3 ) Approximation of a - PowerPoint PPT Presentation

Computing the Best Rank ( r 1 , r 2 , r 3 ) Approximation of a Tensor Lars Eld en Department of Mathematics Link oping University Sweden Joint work with Berkant Savas Harrachov 2007 1 Best rank k approximation of a matrix

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC

6. Approximation and fitting norm approximation least-norm problems regularized

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

Low Rank Approximation Lecture 10 Daniel Kressner Chair for Numerical Algorithms and HPC

ECS231 Low-rank approximation revisited (Introduction to Randomized Algorithms) May 23, 2019

Low Rank Approximation Lecture 5 Daniel Kressner Chair for Numerical Algorithms and HPC

Optimal Rank-1 Hankel Approximation of Matrices Gerlind Plonka University of Gttingen CodEx

2018 - 2019 Teacher Salary Comparison Report 0-Year 5-Year 10-Year 15-Year 20-Year District

Introduction to rank-based cryptography Philippe Gaborit University of Limoges, France ASCRYPTO

Web Mining Mining content Simple rank is confused by rank sinks, e.g. two pages that

10. Learning to Rank Outline 10.1. Why Learning to Rank (LeToR)? 10.2. Pointwise, Pairwise,

Selection Problem Rank Given n unsorted elements, determine the Rank of an element is its

Estimation of Transformations Shao-Yi Chien Department of Electrical Engineering

Machine learning and black-box expensive optimization S ebastien Verel Laboratoire

Volatility is Rough, Part 2: Pricing Jim Gatheral (joint work with Christian Bayer, Peter Friz,

Deep Learning: From Theory to Algorithm Outline: 1. Overview of

Kolmogorov-Chaitin Complexity of Linear Digital Controllers Implemented using Fixed-point

Learning and Imbalanced Data January 28, 2019 David Rimshnick Data Science in the Wild, Spring

T witte r F e e ds Pr ofiling With T F - IDF Juraj Petrik & Daniela Chuda 1 T a sk

Mixed-Signal VLSI Design Course Code: EE719 Department: Electrical Engineering Lecture 38: April

Sambuz

Useful Links

Newsletter

Mail Us

Computing the Best Rank ( r 1 , r 2 , r 3 ) Approximation of a - PowerPoint PPT Presentation

Computing the Best Rank ( r 1 , r 2 , r 3 ) Approximation of a Tensor Lars Eld en Department of Mathematics Link oping University Sweden Joint work with Berkant Savas Harrachov 2007 1 Best rank k approximation of a matrix

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC

6. Approximation and fitting norm approximation least-norm problems regularized

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

Low Rank Approximation Lecture 10 Daniel Kressner Chair for Numerical Algorithms and HPC

ECS231 Low-rank approximation revisited (Introduction to Randomized Algorithms) May 23, 2019

Low Rank Approximation Lecture 5 Daniel Kressner Chair for Numerical Algorithms and HPC

Optimal Rank-1 Hankel Approximation of Matrices Gerlind Plonka University of Gttingen CodEx

2018 - 2019 Teacher Salary Comparison Report 0-Year 5-Year 10-Year 15-Year 20-Year District

Introduction to rank-based cryptography Philippe Gaborit University of Limoges, France ASCRYPTO

Web Mining Mining content Simple rank is confused by rank sinks, e.g. two pages that

10. Learning to Rank Outline 10.1. Why Learning to Rank (LeToR)? 10.2. Pointwise, Pairwise,

Selection Problem Rank Given n unsorted elements, determine the Rank of an element is its

Estimation of Transformations Shao-Yi Chien Department of Electrical Engineering

Machine learning and black-box expensive optimization S ebastien Verel Laboratoire

Volatility is Rough, Part 2: Pricing Jim Gatheral (joint work with Christian Bayer, Peter Friz,

Deep Learning: From Theory to Algorithm Outline: 1. Overview of

Kolmogorov-Chaitin Complexity of Linear Digital Controllers Implemented using Fixed-point

Learning and Imbalanced Data January 28, 2019 David Rimshnick Data Science in the Wild, Spring

T witte r F e e ds Pr ofiling With T F - IDF Juraj Petrik &amp; Daniela Chuda 1 T a sk

Mixed-Signal VLSI Design Course Code: EE719 Department: Electrical Engineering Lecture 38: April

Sambuz

Useful Links

Newsletter

Mail Us

T witte r F e e ds Pr ofiling With T F - IDF Juraj Petrik & Daniela Chuda 1 T a sk