[PPT] - Krylov methods for tensors I Lars Eldn and Berkant Savas Department PowerPoint Presentation

SLIDE 1

Krylov methods for tensors I

Lars Eldén and Berkant Savas

Department of Mathematics Linköping University, Sweden

NSF Workshop, February 2009

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 1 / 38

SLIDE 2

Outline

1

Introduction

2

Tensor concepts Matrix-tensor multiplication Inner Product and Norm Contractions

3

Best Approximation Grassmann Optimization Numerical Examples

4

Sparse Tensors: Krylov Methods

5

Conclusions

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 2 / 38

SLIDE 3

Tech Report

Download the tech report from http://www.mai.liu.se/~besav/files/tensorKrylov.pdf

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 3 / 38

SLIDE 4

Tensor Decomposition: Tucker Model

A

= U(1)

S

U(2)

U(3)

Tucker 1964, numerous papers in psychometrics and chemometrics De Lathauwer et al., SIMAX 2000: notation, theory. The matrices U(i) are usually orthogonal.

This talk: Tucker model for 3-tensors only! Generalization

straightforward.

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 4 / 38

SLIDE 5

Mode−I Multiplication of a Tensor by a Matrix

Assume that dimensions are such that all operations are well-defined. Mostly 3-tensors. Lim’s notation. (No standard notation yet) B = (X)1 · A, B(i, j, k) =

n

ν=1

xiνaνjk. All column vectors are multiplied by the matrix X. Multiplication in all modes at the same time: B = (X, Y, Z) · A, B(i, j, k) =

ν,µ,λ

xiνyjµzkλaνµλ. For convenience we write B = (X T, Y T, Z T) · A = A · (X, Y, Z)

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 5 / 38

SLIDE 6

Inner Product and Norm

Inner product (contraction: Rn×n×n → R ) A, B =

i,j,k

aijkbijk The Frobenius norm: A = A, A1/2 Matrix case A, B = tr(ATB)

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 6 / 38

SLIDE 7

Partial Contractions

C = A, B1 , cjklm =

λ

aλjkbλlm , (4-tensor) , D = A, B1:2 , djk =

λ,µ

aλµjbλµk , (2-tensor), e = A, B = A, B1:3 , e =

λ,µ,ν

aλµνbλµν , (scalar). Notation (3-tensor): A, B1:2 = A, B−3

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 7 / 38

SLIDE 8

Best Rank−(r1, r2, r3) Approximation

A S X Y T

Z T

≈

Best rank−(r1, r2, r3) approximation: min

X,Y,Z,S A − (X, Y, Z) · S,

X TX = I, Y TY = I, Z TZ = I The problem is over-parameterized!

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 8 / 38

SLIDE 9

Best Approximation

min

rank(B)=(r1,r2,r3) A − B

is equivalent to max

X,Y,Z Φ(X, Y, Z) = 1

2A · (X, Y, Z)2 = 1 2

j,k,l

 

λ,µ,ν

aλµνxλjyµkzνl  

2

, subject to X TX = Ir1, Y TY = Ir2, Z TZ = Ir3

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 9 / 38

SLIDE 10

Grassmann Optimization

The Frobenius norm is invariant under orthogonal transformations: Φ(X, Y, Z) = Φ(XU, YV, ZW) = 1 2A · (XU, YV, ZW)2 for orthogonal U ∈ Rr1×r1, V ∈ Rr2×r2, and W ∈ Rr3×r3. Maximize Φ over equivalence classes [X] = {XU | U orthogonal}. Product of manifolds: Gr3 = Gr(J, r1) × Gr(K, r2) × Gr(L, r3) max

(X,Y,Z)∈Gr3 Φ(X, Y, Z) =

max

(X,Y,Z)∈Gr3

1 2A · (X, Y, Z), A · (X, Y, Z)

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 10 / 38

SLIDE 11

Methods for Best Approximation

Grassmann-based

1

Newton (LE, B. Savas)

2

Trust region/Newton (Ishteva, De Lathauwer et al.)

3

BFGS quasi-Newton (Savas, Lim)

4

Limited memory BFGS (Savas, Lim)

Alternating

1

HOOI (Kroonenberg, De Lathauwer)

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 11 / 38

SLIDE 12

Numerical Example I

20 40 60 80 100 10

−15

10

−10

10

−5

RELATIVE NORM OF THE GRADIENT ITERATION # BFGS L−BFGS HOOI NG

A random tensor A ∈ R20×20×20 with random entries N(0, 1) approximated with a rank −(5, 5, 5) tensor.

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 12 / 38

SLIDE 13

Numerical Example II

200 300 400 500 600 700 800 10

−15

10

−10

10

−5

RELATIVE NORM OF THE GRADIENT ITERATION # BFGS L−BFGS HOOI

A random tensor A ∈ R100×100×100 with random entries N(0, 1) approximated with a rank −(5, 10, 20) tensor.

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 13 / 38

SLIDE 14

Sparse Tensors in Information Sciences

In information sciences the tensors are often sparse: Term-document-author analysis (Dunlavy et al) Graphs, web link analysis (Kolda et al, PARAFAC model) aijk = 1 if page i points to page j using term k

therwise

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 14 / 38

SLIDE 15

Web page with links

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 15 / 38

SLIDE 16

Sparse Matrices

Krylov methods give low rank approximations: AVk = UkHk A ≈ = UkHkV T

k .

The matrix is only used as operator: u = Av

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 16 / 38

SLIDE 17

Sparse Tensors

Can we generalize Krylov methods to tensors and obtain low rank approximations? A S X Y T

Z T

≈

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 17 / 38

SLIDE 18

Golub-Kahan Bidiagonalization for Rectangular Matrix

β1u1 = b, v0 = 0 for i = 1 : k αivi = ATui − βivi−1, βi+1ui+1 = Avi − αiui end The coefficients αi and βi are chosen to normalize the vectors.

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 18 / 38

SLIDE 19

Golub-Kahan Bidiagonalization for Rectangular Matrix

β1u1 = b, v0 = 0 for i = 1 : k αivi = ATui − βivi−1, [αivi = A · (ui)1 − βivi−1, ] βi+1ui+1 = Avi − αiui [βi+1ui+1 = A · (vi)2 − αiui] end The coefficients αi and βi are chosen to normalize the vectors.

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 19 / 38

SLIDE 20

Krylov Method for Tensor Approximation

Arnoldi style (i.e., including Gram-Schmidt orthogonalization) Let u1 and v1 be given h111w1 = A · (u1, v1)1,2 for ν = 2 : m hu = A · (Uν−1, vν−1, wν−1) hν,ν−1,ν−1uν = A · (vν−1, wν−1)2,3 − Uν−1hu hv = A · (uν, Vν−1, wν−1) hν,ν,ν−1vν = A · (uν, wν−1)1,3 − Vν−1hv hw = A · (uν, vν, Wν−1) hνννwν = A · (uν, vν)1,2 − Wν−1hw end Approximate A ≈ (Um, Vm, Wm) · H, H =

UT

m, V T m, W T m

· A

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 20 / 38

SLIDE 21

Krylov Method

Arnoldi style (i.e., including Gram-Schmidt orthogonalization) Let u1 and v1 be given h111w1 = A · (u1, v1)1,2 for ν = 2 : m hu = A · (Uν−1, vν−1, wν−1) hν,ν−1,ν−1uν = A · (vν−1, wν−1)2,3 − Uν−1hu hv = A · (uν, Vν−1, wν−1) hν,ν,ν−1vν = A · (uν, wν−1)1,3 − Vν−1hv hw = A · (uν, vν, Wν−1) hνννwν = A · (uν, vν)1,2 − Wν−1hw end

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 21 / 38

SLIDE 22

Gram-Schmidt, closer look

for ν = 2 : m hu = A · (Uν−1, vν−1, wν−1) hν,ν−1,ν−1uν = A · (vν−1, wν−1)2,3 − Uν−1hu . . . . . . end The algebra is straightforward: hu is a vector u-vectors live in first mode, Uν−1 = (u1, u2, . . . , uν−1) Multiply by Uν−1 in first mode: hν,ν−1,ν−1UT

ν−1uν = A · (Uν−1, vν−1, wν−1) − hu = 0

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 22 / 38

SLIDE 23

Minimal Krylov Method

Let u1 and v1 be given h111w1 = A · (u1, v1)1,2 for ν = 2 : m hu = A · (Uν−1, vν−1, wν−1) hν,ν−1,ν−1uν = A · (vν−1, wν−1)2,3 − Uν−1hu hv = A · (uν, Vν−1, wν−1) hν,ν,ν−1vν = A · (uν, wν−1)1,3 − Vν−1hv hw = A · (uν, vν, Wν−1) hνννwν = A · (uν, vν)1,2 − Wν−1hw end Richer combinatorial structure: Let µ ≤ ν − 1 and λ ≤ ν − 1: hu = A · (Uν−1, vµ, wλ) huν = A · (vµ, wλ)2,3 − Uν−1hu

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 23 / 38

SLIDE 24

Maximal Krylov Method

u1 and v1 are given, all possible combinations are formed

1

{u1} × {v1} − → w1

2

{v1} × {w1} − → u2

3

u1 u2

× {w1} −

→ v2 v3

4

u1 u2

×

   v1 v2 v3    − →                w1 w2 w3 w4 w5 w6               

5

   v1 v2 v3    ×          w1 w2 . . . w6          − →          u2 u3 . . . u19         

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 24 / 38

SLIDE 25

Maximal Krylov Method I

Maximal Krylov recursion ˆ hw1 = A · (u1, v1)1,2 ν = µ = λ = 1 while ν ≤ νmax and µ ≤ µmax and λ ≤ λmax do %—– u-loop —–% for all (¯ µ, ¯ λ) such that ¯ µ ≤ µ and ¯ λ ≤ λ do if the pair (¯ µ, ¯ λ) has not been used before then ν = ν + 1 hu = A · (Uν−1, v¯

µ, w¯ λ)

hν¯

µ¯ λuν = A · (v¯ µ, w¯ λ)2,3 − Uν−1hu

H(:, ¯ µ, ¯ λ) = hu hν¯

µ¯ λ

% Mode 1

end if end for

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 25 / 38

SLIDE 26

Maximal Krylov Method II

%—– v-loop —– ANALOGOUS %—– w-loop —–% ANALOGOUS end while

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 26 / 38

SLIDE 27

Krylov factorization for maximal recursion

Theorem (Tensor Krylov factorizations)

After a complete u-loop: A · (Vk, Wl)2,3 =

Uj
1 · Hjkl.

(1) After a complete v-loop: A ·

Uj, Wl
1,3 = (Vm)2 · Hjml.

(2) After a complete w-loop: A ·

Uj, Vm
1,2 = (Wn)3 · Hjmn.

(3)

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 27 / 38

SLIDE 28

Krylov-Schur Method for Refining Eigenvalues

Arnoldi factorization: AV = VH + vcT, H = XSX T, (4) where S is triangular. Put U = VX = ⇒ Krylov-Schur factorization: AU = US + ubT, (5) Partition: A(U1 U2) = (U1 U2) S11 S12 S22

+ u(bT

1 bT 2);

(6) Discard the part that contains unwanted eigenvalues: AU1 = U1S11 + ubT

1,

(7) Restart!

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 28 / 38

SLIDE 29

Krylov-Schur-like Method for Best Approximation

1

Compute a Krylov factorization: A · (V, W)2,3 = (U)1 · H.

2

Compute a best low-rank approximation (or HOSVD) of H, make a change of bases, and truncate the factorization.

3

Restart!

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 29 / 38

SLIDE 30

Krylov subspaces of contracted tensor products

A ∈ Rm×n×l, u ∈ Rm, v ∈ Rn, w ∈ Rl and consider Kk(A, A−1 , u), A, A−1 = A(1)(A(1))T ∈ Rm×m Kk(A, A−2 , v), A, A−2 = A(2)(A(2))T ∈ Rn×n Kk(A, A−3 , w), A, A−3 = A(3)(A(3))T ∈ Rl×l Symmetric matrices, apply the Lanczos recurrence! Recall: for a matrix A ∈ Rm×n we have A, A−1 = AAT ∈ Rm×m, A, A−2 = ATA ∈ Rn×n

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 30 / 38

SLIDE 31

Applications: are these methods useful?

Handwritten digit classification on US Postal Service database digits 16 × 16 pixel, gray scale images

1 2 3 4 5 6 7 8 9 1194 1005 731 658 652 556 664 645 542 644 359 264 198 166 200 160 170 147 166 177

7291 training digits 2007 test digits

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 31 / 38

SLIDE 32

Comparison: relative error

D F Up V T

q

≈

Relative error = D − Dp,q,10/D in % p × q 20 × 40 30 × 60 40 × 80 THOSVD 29.6 24.7 21.1 D, D−i 33.9 28.8 24.9 Kmin(D, u, v, w) 39.5 32.8 27.4 Kmax(D, u, v, w) 35.5 31.5 28.4

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 32 / 38

SLIDE 33

Error rate

2 4 6 8 10 12 14 16 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 Number of basis vectors Error rate Classification with 40 x 80 core (99.33 %) Truncated HOSVD Krylov1 KrylovMax KrylovCTP

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 33 / 38

SLIDE 34

Restarted Krylov approach

D ≈ (U20, V40)1,2 · F.

5 10 15 20 0.3 0.32 0.34 0.36 0.38 0.4 # OF KRYLOV−SCHUR TYPE ITERATIONS RELATIVE ERROR Min Krylov−Schur procedure Truncated HOSVD 5 10 15 20 0.294 0.296 0.298 0.3 0.302 0.304 0.306 0.308 0.31 # OF KRYLOV−SCHUR TYPE ITERATIONS RELATIVE ERROR Krylov procedure Truncated HOSVD

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 34 / 38

SLIDE 35

Consistency of subspaces

Minimal Krylov procedure: U(2) U(3) U(4) U(5) V(2) V(3) V(4) V(5) U(1) 0.09 0.11 0.10 0.10 0.64 0.70 0.65 0.68 V(1) U(2) 0.10 0.11 0.10 0.72 0.71 0.74 V(2) U(3) 0.10 0.10 0.68 0.68 V(3) U(4) 0.12 0.63 V(4) Maximal Krylov procedure U(2) U(3) U(4) U(5) V(2) V(3) V(4) V(5) U(1) 0.17 0.17 0.18 0.18 0.11 0.10 0.11 0.10 V(1) U(2) 0.16 0.15 0.18 0.12 0.09 0.11 V(2) U(3) 0.20 0.19 0.11 0.11 V(3) U(4) 0.20 0.11 V(4)

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 35 / 38

SLIDE 36

Text-graph analysis

A

❄

terms

✠

✲

graph

Query: q, starting vector for Krylov Low rank approximation: A ≈ (Q, V, W) · S Find relevant documents:

qT

1 · A ≈

qTQ, V, W
· S =
eT

1 , V, W

· S

Collapses the term mode: graph matrix

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 36 / 38

SLIDE 37

Conclusions I

Tensor methods/algorithms without index-wrestling

Indices hidden using matrix-inspired notation and object-oriented software Generalization to higher order tensors is straightforward Partial contractions play the role of adjoints

Grassmann optimization (for Tucker model)

Needed because tensors cannot be deflated like matrices Unconstrained optimization Newton: Quadratic convergence

Sparse tensors: Krylov methods

Suitable for

tensors in operator form sparse tensors tensors whose dimensions vary rapidly (new data)

What are the convergence properties?

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 37 / 38

SLIDE 38

Conclusions II

How to construct a method that gives an intermediate between the minimal and maximal sequence? Which variant is more economical in terms of the number of tensor-vector operations, taking into account the convergence rate? If one of the modes is of small dimension, so that a complete basis is quickly computed, how can one modify the recursion so that one can use the already computed information as efficiently as possible? .....

Many fundamental mathematical and algorithmic problems remain Numerous new applications in information sciences Tensor algorithms and computations can be (easily) managed if we define the right abstractions!

Lars Eldén and Berkant Savas (LiU) Tensor-Krylov Methods NSF Workshop, February 2009 38 / 38