dimitri nion lieven de lathauwer
play

Dimitri Nion & Lieven De Lathauwer K.U. Leuven, Kortrijk campus, - PowerPoint PPT Presentation

The decomposition of a third-order tensor in R block-terms of rank-(L,L,1) Model, Algorithms, Uniqueness, Estimation of R and L Dimitri Nion & Lieven De Lathauwer K.U. Leuven, Kortrijk campus, Belgium E-mails:


  1. The decomposition of a third-order tensor in R block-terms of rank-(L,L,1) Model, Algorithms, Uniqueness, Estimation of R and L Dimitri Nion & Lieven De Lathauwer K.U. Leuven, Kortrijk campus, Belgium E-mails: Dimitri.Nion@kuleuven-kortrijk.be Lieven.DeLathauwer@kuleuven-kortrijk.be TRICAP 2009 , Nurià, Spain, June 14th-19th, 2009

  2. Introduction Tensor Decompositions = Powerful multi-linear algebra tools that generalize matrix decompositions. Motivation: increasing number of applications involving manipulation of multi-way data, rather than 2-way data. Key research axes: � Development of new models/decompositions � Development of algorithms to compute decompositions � Uniqueness of tensor decompositions � Use these tools in new applications, or existing applications where the multi-way nature of data was ignored until now � Tensor decompositions under constraints (e.g. imposing non-negativity or specific algebraic structures) 2

  3. From matrix SVD to tensor HOSVD Matrix SVD J R v 1 H v R H d 11 d RR V H R Y U I = = + … + D u 1 u R Tensor HOSVD (third-order case) L M N = ∑∑∑ W y u v w h K ijk il jm kn lmn N = = = l m n V T 1 1 1 = � � U U V W = × × × L I � � M 1 2 3 J � One unitary matrix (U U, V V, W W) per mode U U V V W W �� is the representation of � � in the reduced spaces. � �� �� �� � � ≠ ≠ L M N � We may have � is not not diagonal (difference with matrix SVD). not not 3 � � � �

  4. From matrix SVD to PARAFAC Matrix SVD J R H H v 1 v R d 11 d RR V H Y R U I = = D + … + u 1 u R PARAFAC decomposition C � is diagonal K � � � R B T = ( if i=j=k, h ijk =1, else, h ijk =0 ) � I A R � R J c c c c 1 c R c c c 1 1 1 R R R Sum of R rank-1 tensors: + … + b 1 b b b R b b b b � 1 +…+ � � � � � � � R = 1 1 1 R R R R R R a R a a a a a 1 a a R R R 1 1 1 K C �� �� �� �� = set of K matrices of the B T form: = A � (:,:,k)=A A A diag(C A C(k,:)) B C C B T B B � � �

  5. From PARAFAC/HOSVD to Block Components Decompositions (BCD) [De Lathauwer and Nion] BCD in rank (L r ,L r ,1) terms c c R 1 K B T B T L 1 L 1 L R R L R 1 � = + … + I A A 1 R J BCD in rank (L r , M r , . ) terms K K K B T B � T = � � +…+ A L R 1 R A L 1 1 I 1 1 R M R M 1 J BCD in rank (L r , M r , N r ) terms C C 1 R K N 1 N R B T B T = � � +…+ � R A 1 R A L 1 I L R 1 1 R M 1 5 M R J

  6. Content of this talk BCD - (L r ,L r ,1) c c R 1 K B B T T L 1 L 1 L R R 1 L R � = + … + I A A 1 R J � Model ambiguities � Algorithms � Uniqueness � Estimation of the parameters L r (r = 1,…,R) and R � An application in telecommunications 6

  7. BCD - (L r ,L r ,1) : Model ambiguities c c R 1 K B T F − 1 F R F F − B T R 1 R L R + … + � = 1 1 1 L 1 I A A R 1 J � Unknown matrices: L 1 L R L 1 L R C = ... B = A = A A B B ... ... K I J R R 1 1 c c R 1 � BCD-(L r ,L r ,1) is said essentially unique if the only ambiguities are: Arbitrary permutation of the R blocks in A A and B B and of the R columns of C C A A B B C C + Each block of A A A A and B B B B post- - -multiplied by arbitrary non-singular matrix, each - column of C C arbitrarily scaled. C C = A = A = A = A and B B B B estimated up to multiplication by a block block block block- - -wise - wise wise wise permuted block- diagonal matrix and C C C C by a permuted diagonal matrix.

  8. BCD - (L r ,L r ,1) : Algorithms � Usual approach: estimate A , B and C by minimization of 2 R ∑ = T A B c − � outer product Φ � � � ( ) r r r r = F 1 The model is fitted for a given choice of the parameters {L r , R} Exploit algebraic structure of matrix unfoldings J J Y Y × Y = ... Y I I I KJ K 1 k � � � � K K J Y Y × = Y Y ... J IK i J I K 1 Y I I j Y = Y × Y ... K J K JI 1 8

  9. BCD - (L r ,L r ,1) : ALS Algorithm Y C Z B A = ⋅ Y C Z B A Φ = − ⋅ 2 ( , ) ( , ) K × JI K × JI 1 1 F Y B Z A C = ⋅ ( , ) Φ = Y − B ⋅ Z A C 2 ( , ) J IK × 2 J × IK 2 F Y = A ⋅ Z C B ( , ) Φ = Y − A ⋅ Z C B 2 I KJ × ( , ) 3 I × KJ 3 F Z 1 , Z 2 and Z 3 are built from 2 matrices only and have a block-wise Khatri- Rao product structure. A B = ˆ k ˆ ( 0 ) ( 0 ) Initialisa tion : , , 1 - ) Φ k − − Φ k > ε ε = while 6 ( 1 ) ( ) (e.g. 10 [ ] C Y Z B A k = ⋅ k − k − ˆ ˆ ˆ ( ) ( 1 ) ( 1 ) ( , ) ( 1 ) K JI × 1 [ ] B Y Z A − C k = ⋅ k k ˆ ˆ ˆ ( ) ( 1 ) ( ) ( , ) ( 2 ) J IK × 2 [ ] A Y Z C B k = ⋅ k k ˆ ˆ ˆ ( ) ( ) ( ) ( , ) ( 3 ) I × KJ 3 ← + k k 1 9

  10. ALS algorithm: problem of swamps ALS algorithm: problem of swamps ALS algorithm: problem of swamps ALS algorithm: problem of swamps Observation: ALS is fast in many problems, but sometimes, a long swamp is encountered before convergence. Long swamp 27000 iterations ! Long Swamps typically occur when: � The loading matrices of the decomposition (i.e. the objective matrices) are ill-conditioned � The updated matrices become ill-conditionned (impact of initialization) � One of the R tensor-components in ����� ����� �� ����� ����� �� ������� ������� ������� ������� R has a much higher �� �� 10 norm than the R-1 others (e.g. « near-far » effect in telecommunications)

  11. Improvement 1 of ALS: Line Search Improvement 1 of ALS: Line Search Improvement 1 of ALS: Line Search Improvement 1 of ALS: Line Search Purpose: reduce the length of swamps Principle: for each iteration, interpolate A A A A, B B B B and C C from their estimates of C C 2 previous iterations and use the interpolated matrices in input of ALS 1.Line Search: Search directions C C − C − C − new = k + ρ k − k ( ) ( 2 ) ( 1 ) ( 2 ) ( ) B B − B − B − new = k + ρ k − k ( ) ( 2 ) ( 1 ) ( 2 ) ρ ( ) Choice of crucial A new = A k − + ρ A k − − A k − ρ ( ) ( 2 ) ( 1 ) ( 2 ) ( ) =1 annihilates LS step (i.e. we get standard ALS) 2.Then ALS update [ ] C Y Z B A k = ⋅ new new ˆ ˆ ˆ ( ) ( ) ( ) ( , ) ( 1 ) K × JI 1 [ ] B Y Z A C k = ⋅ new k ˆ ˆ ˆ ( ) ( ) ( ) ( , ) ( 2 ) J × IK 2 [ ] A Y Z C B k = ⋅ k k ˆ ˆ ˆ ( ) ( ) ( ) ( , ) ( 3 ) I × KJ 3 ← + k k 1 11

  12. Improvement 1 of ALS: Line Search Improvement 1 of ALS: Line Search Improvement 1 of ALS: Line Search Improvement 1 of ALS: Line Search ρ = [Harshman, 1970] « LSH » Choose 1 . 25 ρ = [Bro, 1997] « LSB » k 1 / 3 Choose and validate LS step if decrease in Fit [Rajih, Comon, 2005] « Enhanced Line Search (ELS) » A S H Φ new new new = Φ ρ = th ( ) ( ) ( ) For REAL tensors order polynomial . ( , , ) ( ) 6 A S H ρ Φ new new new ( ) ( ) ( ) Optimal is the root that minimizes ( , , ) [Nion, De Lathauwer, 2006] « Enhanced Line Search with Complex Step (ELSCS) » θ ρ = i m e For complex tensors, look for optimal . Φ A new S new H new = Φ θ m ( ) ( ) ( ) We have ( , , ) ( , ) θ m Alternate update of and : ∂ Φ θ m ( , ) θ = m m th Update for fixed, 5 order polynomial in : ∂ m ∂ Φ θ θ m ( , ) θ = = m t th Update : for fixed, 6 order polynomial in tan( ) ∂ θ 2 12

  13. Improvement 1 of ALS: Line Search Improvement 1 of ALS: Line Search Improvement 1 of ALS: Line Search Improvement 1 of ALS: Line Search «easy» problem «difficult» problem 2000 iterations 27000 iterations � ELS � Large reduction of the number of iterations at a very low additional complexity w.r.t. standard ALS 13

  14. Improvement 2 of ALS: Dimensionality reduction Improvement 2 of ALS: Dimensionality reduction Improvement 2 of ALS: Dimensionality reduction Improvement 2 of ALS: Dimensionality reduction C C K N B T B T = � = +…+ � A L I A M J STEP 1: STEP 2: HOSVD of � BCD of the small core tensor �� �� �� �� (compressed space) STEP 3: Come back to original space + a few refinement iterations in original space � Compression � Large reduction of the cost per iteration since the model is 14 fitted in compressed space.

  15. Improvement 3 of ALS: Good initialization Improvement 3 of ALS: Good initialization Improvement 3 of ALS: Good initialization Improvement 3 of ALS: Good initialization Comparison ALS and ALS+ELS, with three random initializations Instead of using random initializations, could we use the observed tensor itself ? YES For the BCD-(L,L,1), if A and B are full column rank (so I and J have to be long enough), there is an easy way to find a good intialization, in same spirit as Direct Trilinear Decomposition (DTLD) used to initialize PARAFAC (not detailed in 15 this talk) .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend