Swamp Reducing Technique for Tensor Decomposition Carmeliza Navasca - - PowerPoint PPT Presentation

swamp reducing technique for tensor decomposition
SMART_READER_LITE
LIVE PREVIEW

Swamp Reducing Technique for Tensor Decomposition Carmeliza Navasca - - PowerPoint PPT Presentation

Swamp Reducing Technique for Tensor Decomposition Carmeliza Navasca Department of Mathematics Clarkson University Potsdam, New York cnavasca@clarkson.edu http://people.clarkson.edu/ cnavasca joint work with Lieven De Lathauwer, KU


slide-1
SLIDE 1

Swamp Reducing Technique for Tensor Decomposition Carmeliza Navasca

Department of Mathematics Clarkson University

Potsdam, New York cnavasca@clarkson.edu http://people.clarkson.edu/∼cnavasca

joint work with Lieven De Lathauwer, KU Leuven, Belgium Stefan Kindermann, Johannes Kepler Universitat, Linz, Austria

AIP 2009, Vienna 23 July 2009

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 1 / 25

slide-2
SLIDE 2

Swamps

Adirondacks Swamps Swamps are artifacts of the ALS

  • algorithm. Swamps describe the

slow convergence in ALS. ”as if dragging its feet in mud” ALS Swamps

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 2 / 25

slide-3
SLIDE 3

Tensors

Why do tensors? data are typically represented in tables, two-way array; i.e. matrices now data are more complex, linked intricately data analysis in multi-dimensional arrays is multi-way analysis (multilinear algebra) multi-dimensional arrays are higher-order tensors

  • rder of tensors refers to the dimension of the index set

a matrix is a second-order tensor a vector is a first-order tensor Examples are in third-order tensor....very easy to visualize

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 3 / 25

slide-4
SLIDE 4

Tensor is a Vital Tool in Signal Processing

blind multiuser separation-equalization-detection for DS-CDMA [Sidiropoulous,De Lathauwer, Nion, . . . ] DS-CDMA system: an allocation technique for allowing several users active over the total bandwidth at the same time With tensors all signal info and output parameters are recovered simultaneously from different users from the observed data. The tensor is called the diversity data-cube with entries tknp =

R

  • r=1

a(k, r) br(p) cr(n) where T ∈ CK×N×P, a(k, r) is fading/gain between user r and antenna element k, br(p) is pth chip of the spreading code of user r and cr(n) nth symbol transmitted by user. T ∈ CK×N×P contains observation arranged in terms of the spatial, temporal and spreading diversities.

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 4 / 25

slide-5
SLIDE 5

Other Tensor Applications

blind source separation, blind deconvolution blind multichannel system identification scientific computing: reducing computational complexity, separated representation [Beylkin, Hackbush, Khoromskij, Mohlenkamp,. . . ] genomic signal processing [Alter,. . . ] data mining [Bader, Berry, Kolda,. . . ] computer vision [Vasilescu, Terzopoulous, . . . ] survey paper of Bader and Kolda and references therein

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 5 / 25

slide-6
SLIDE 6

Tensor Products

Definition The Kronecker product of matrices AN×K and BM×J is defined as the matrix in RNM×KJ

A ⊗ B =    a11B a12B . . . a21B a22B . . . . . . . . . ...    .

Definition The column-wise Khatri-Rao product of AI×R and BJ×R is defined as the matrix in RIJ×R A ⊙c B = [a1 ⊗ b1 a2 ⊗ b2 . . .] when A = [a1 a2 . . . aR] and B = [b1 b2 . . . bR].

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 6 / 25

slide-7
SLIDE 7

More Tensor Products

Kronecker Product of matrices A and B

A ⊗ B =    a11B a12B . . . a21B a22B . . . . . . . . .   

Khatri Rao Product of A and B A ⊙ B = [A1 ⊗ B1 A2 ⊗ B2 . . .] when A = [A1 A2 . . . AR] and B = [B1 . . . BR] Column-wise Khatri Rao Product A ⊙c B = [a1 ⊗ b1 a2 ⊗ b2 . . .] when A = [a1 a2 . . . aR] and B = [b1 b2 . . . bR]

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 7 / 25

slide-8
SLIDE 8

Tensor Multiplication

Tensor × Matrix: T •n M = T Tensor × Vector: T •n v = T

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 8 / 25

slide-9
SLIDE 9

Tucker mode-n product

Given a tensor T ∈ CI×J×K and the matrices U ∈ Cˆ

I×I, V ∈ Cˆ J×J and

W ∈ Cˆ

K×K, then the Tucker mode-n products are the following:

(T •1 U)ˆ

i,j,k

=

I

  • i=1

tijkuˆ

ii,

(mode-1 product) (T •2 V)i,ˆ

j,k

=

J

  • j=1

tijkvˆ

jj,

(mode-2 product) (T •3 W)i,j,ˆ

k

=

K

  • k=1

tijkwˆ

kk,

(mode-3 product)

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 9 / 25

slide-10
SLIDE 10

Tensor Rank

Definition (Mode-n vector) Given a tensor T ∈ CI×J×K, there are three types of mode vectors, namely, mode-1, mode-2, and mode-3. There are J · K mode-1 vectors that are of length I which are obtained by fixing the indices (j, k) while varying i. Similarly, the mode-2 vector (mode-3 vector) is of length J (K) obtained from the tensor by varying j (k) with fixed (k, i) (i, j). Definition (Mode-n rank) The mode-n rank of a tensor T is the dimension of the subspace spanned by the mode-n vectors. Definition (rank-(L,M,N)) A third-order tensor is rank-(L, M, N) if the mode-1 rank is L, the mode-2 rank is M and the mode-3 rank is N.

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 10 / 25

slide-11
SLIDE 11

Fibers and Tensor Rank

Mode-1 Vectors Mode-2 Vectors Mode-3 Vectors Definition (Mode-n vector) Given a tensor T ∈ CI×J×K, there are three types of mode vectors, namely, mode-1, mode-2, and mode-3. There are J · K mode-1 vectors that are of length I which are obtained by fixing the indices (j, k) while varying i. Similarly, the mode-2 vector (mode-3 vector) is of length J (K) obtained from the tensor by varying j (k) with fixed (k, i) (i, j).

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 11 / 25

slide-12
SLIDE 12

Tensor Decomposition I: PARAFAC/CANDECOMP

Sum of rank-1 tensors [Harshman, Chang & Carrol, 1970] T =

R

  • r=1

λr ar ◦ br ◦ cr Generalization to rank-1 matrices: M =

R

  • r=1

λr ar ◦ br

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 12 / 25

slide-13
SLIDE 13

Tensor Decomposition II: Tucker or HO-SVD

Tucker decomposition [Tucker 1966, De Lathauwer 1997] A = S •1 U •2 V •3 W Generalization of SVD A = S •1 U •2 V = USVT

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 13 / 25

slide-14
SLIDE 14

Tensor Decomposition III: BTD of rank-(Lr, Lr, 1)

Sums of Tucker tensors [De Lathauwer, 2008] T =

R

  • r=1

Er ◦ cr =

R

  • r=1

(Ar · BrT) ◦ cr T =

R

  • r=1

Dr •1 Ar •2 Br •3 cr

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 14 / 25

slide-15
SLIDE 15

Matricization: Tensors to Matrices

PARAFAC framework: three standard matricization left-right, front-back and top-bottom slices TJK×I =      T1(K×I) T2(K×I) . . . TJ(K×I)      TKI×J =      T1(I×J) T2(I×J) . . . TK(I×J)      TIJ×K =      T1(J×K) T2(J×K) . . . TI(J×K)      Re-expressed through Khatri-Rao product TJK×I = (B ⊙c C)A′ TKI×J = (C ⊙c A)B′ and TIJ×K = (A ⊙c B)C′

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 15 / 25

slide-16
SLIDE 16

Matricization: Tensors to Matrices

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 16 / 25

slide-17
SLIDE 17

Matricization: Tensors to Matrices

BTD in rank-(Lr, Lr, 1): three standard slices Tj(K×I) = C · diag{(B1)j, (B2)j, . . . , (BR)j} · A′, j = 1, . . . , J Tk(I×J) = A · diag{ck,1 · diag(1L1), ck,2 · diag(1L2), . . . , ck,R · diag(1LR)} · B′, k = 1, . . . , K Ti(J×K) = B · diag{(A1)′

i, (A2)′ i, . . . , (AR)′ i} · C′, i = 1, . . . , I.

where diag{V1, V2, . . . , Vn} is a block diagonal matrix of Vi Re-expressed through Kronecker and Khatri-Rao products TJK×I = [B ⊙ C]A′, TKI×J = [C ⊙ A]B′ TIJ×K = [(A1 ⊙c B1)1L1 . . . (AR ⊙c BR)1LR]C′ where 1Lr is vector of 1’s of length Lr

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 17 / 25

slide-18
SLIDE 18

L-S and Regularization Methods for Tensors

Problem Formulation: Recover the best tensor T from noisy tensor T Standard Minimization: let the residual tensor R = T − T min R2

F

= min

T

  • T − T 2

F

⇐ ⇒ min

A,B,C

  • T −

R

  • r=1

ar ◦ br ◦ cr

  • 2

F

PARAFAC ⇐ ⇒ min

A,B,C

  • T −

R

  • r=1

ArBrT ◦ cr

  • 2

F

BTD Frobenius Norm: A2

F = I

  • i=1

J

  • j=

K

  • k=1

|aijk|2

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 18 / 25

slide-19
SLIDE 19

Numerical Method: Alternating Least-Squares

A more tractable approach min

A

TJK×I − QAT2

F

min

B

TKI×J − RBT2

F

min

C

TIJ×K − SCT2

F

PARAFAC Q = B ⊙c C, R = C ⊙c A, and S = A ⊙c B BTD Q = B ⊙ C R = C ⊙ A S = [(A1 ⊙c B1)1L1 . . . (AR ⊙c BR)1LR]

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 19 / 25

slide-20
SLIDE 20

Numerical Method: ALS with Iterated Tikhonov Reg

min

A

TJK×I − QiAiT2

F + αiAT − AiT2 F

min

B

TKI×J − Ri+1BiT2

F + αiBT − BiT2 F

min

C

TIJ×K − Si+1CiT2

F + αiCT − CiT2 F

PARAFAC Qi = Bi−1 ⊙c Ci−1, Ri = Ci−1 ⊙c Ai, and Si = Ai ⊙c Bi BTD Qi = Bi−1 ⊙ Ci−1, Ri = Ci−1 ⊙ Ai Si = [(Ai

1 ⊙c Bi 1)1L1 . . . (Ai R ⊙c Bi R)1LR]

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 20 / 25

slide-21
SLIDE 21

Numerical Example I: PARAFAC

Simple PARAFAC model:

A = 1 cos θ sin θ 1

  • , B =

  3 √ 2 cos θ sin θ 1 sin θ   , C =   1 1 1  

T − Test2

F < 1 × 10−5

θ ALS-ITR ALS

π 60

41 683

π 90

69 12007

π 120

311 12026

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 21 / 25

slide-22
SLIDE 22

Numerical Example I: PARAFAC

100 200 300 400 500 600 700 10

−5

10

−4

10

−3

10

−2

10

−1

10 10

1

Itd Tikhonov Reg ALS

Residual vs. Iterations 683 iterations (ALS), 41 iterations (ALS-ITR)

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 22 / 25

slide-23
SLIDE 23

Numerical Example II: BTD

BTD model: T ∈ R4×4×6 of rank-(2, 2, 1) where Ai, Bi ∈ R4×2, i = 1, 2, 3 and C ∈ R6×3

A1 =     1 cos θ     , A2 =     1 cos θ sin θ 1 cos θ     , A3 =     sin θ 1 cos θ 1 1    

B = A and C are randomly generated T − Test2

F < 1 × 10−7

θ ALS-ITR ALS

π 60

2500 68630

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 23 / 25

slide-24
SLIDE 24

Numerical Example II: BTD

1 2 3 4 5 6 7 x 10

4

10

!6

10

!5

10

!4

10

!3

10

!2

10

!1

10 10

1

10

2

It Tikhonov ALS

Residual vs. Iterations 68630 iterations (ALS), 2500 iterations (ALS-ITR)

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 24 / 25

slide-25
SLIDE 25

Conclusion

Algorithm for Regularized ALS

Given Imax , b T , c, ǫ A0 = randn(I, R), B0 = randn(J, R) and C0 = randn(K, R) α0 = 1 for i = 1, 2, 3, . . . , Imax Ai ← − min b TJK×I − Qi−1AT 2

F + αi AT − Ai−1 T 2 F

Ri = [(ci−1)1 ⊗ (Ai)1 . . . (ci−1)R ⊗ (Ai)R] Bi ← − min b TKI×J − RiBT 2

F + αi BT − Bi−1 T 2 F

S = [(Ai)1 ⊙ (Bi)1)1L1 . . . (Ai)R ⊙ (Bi)R)1LR Ci ← − min b TIJ×K − SiCT 2

F + αi CT − Ci−1 T 2 F

Qi = [(Bi)1 ⊗ (ci)1 . . . (Bi)R ⊗ (ci)R] Ti = create tensor(Ai, Bi, Ci) if b T − Ti < ǫ i = Imax end αi = cαi−1 end

dramatically accelerates the ALS method easy implementation works for component factors with collinear columns can approximate decomposition with highly ill-conditioned factors

Navasca (Clarkson University) Swamp Reducing Technique 23 July 09 25 / 25