Numerical methods for the best low multilinear rank approximation of - - PowerPoint PPT Presentation

numerical methods for the best low multilinear rank
SMART_READER_LITE
LIVE PREVIEW

Numerical methods for the best low multilinear rank approximation of - - PowerPoint PPT Presentation

Introduction Some applications Algorithms Local minima Numerical methods for the best low multilinear rank approximation of higher-order tensors M. Ishteva 1 .-A. Absil 1 S. Van Huffel 2 L. De Lathauwer 2 , 3 P 1 Department of Mathematical


slide-1
SLIDE 1

Introduction Some applications Algorithms Local minima

Numerical methods for the best low multilinear rank approximation of higher-order tensors

  • M. Ishteva1

P .-A. Absil1

  • S. Van Huffel2
  • L. De Lathauwer2,3

1Department of Mathematical Engineering, Université catholique de Louvain 2Department of Electrical Engineering ESAT/SCD, K.U.Leuven 3Group Science, Engineering and Technology, K.U.Leuven Campus Kortrijk

CESAME seminar, February 9, 2010

slide-2
SLIDE 2

Introduction Some applications Algorithms Local minima

Outline

1

Introduction

2

Some applications Epileptic seizure onset localization Parameter estimation

3

Algorithms Geometric Newton algorithm Trust-region based algorithm Conjugate gradient based algorithm

4

Local minima

slide-3
SLIDE 3

Introduction Some applications Algorithms Local minima

Outline

1

Introduction

2

Some applications Epileptic seizure onset localization Parameter estimation

3

Algorithms Geometric Newton algorithm Trust-region based algorithm Conjugate gradient based algorithm

4

Local minima

slide-4
SLIDE 4

Introduction Some applications Algorithms Local minima

Scalars

R 4 67 3.23 −45.8 2 √ 2 −14/8 −339/7534 −4.397534

slide-5
SLIDE 5

Introduction Some applications Algorithms Local minima

Vectors

Rn 4 15 −45.3 12.345 1.3 1.7 2.1 1.4 0.2 65 −15 −23.44

slide-6
SLIDE 6

Introduction Some applications Algorithms Local minima

Matrices

Rm×n 1.3 0.6 1.1 2.0 1.9 1.7 1.0 1.3 2.8 2.6 2.1 2.1 1.6 3.7 2.7 1.4 1.4 1.4 2.2 2.0 0.2 1.3 0.9 1.9 1.7

slide-7
SLIDE 7

Introduction Some applications Algorithms Local minima

Tensors

Rm×n×p

slide-8
SLIDE 8

Introduction Some applications Algorithms Local minima

Ranks

Rank-(R1, R2, R3) Rank-R

  • Rank-1 tensor:
  • R = min(r), s.t. A =

r

  • i=1

{rank-1 tensor}i In general, R1 = R2 = R3 = R1 and R1, R2, R3 ≤ R.

slide-9
SLIDE 9

Introduction Some applications Algorithms Local minima

Decompositions

Singular value decomposition (SVD) Higher-order SVD (HOSVD) PARAFAC/CANDECOMP

slide-10
SLIDE 10

Introduction Some applications Algorithms Local minima

Main problem

Truncated SVD → Best rank-R approximation of a matrix Truncated HOSVD → Good but not best rank-(R1, R2, R3) approximation of a tensor Best rank-(R1, R2, R3) approximation of a tensor: Given: third-order tensor A ∈ RI1×I2×I3 minimize A − ˆ A 2 ˆ A ∈ RI1×I2×I3 subject to: rank1( ˆ A) ≤ R1, rank2( ˆ A) ≤ R2, rank3( ˆ A) ≤ R3

slide-11
SLIDE 11

Introduction Some applications Algorithms Local minima

Reformulation of the main problem

Given: third-order tensor A ∈ RI1×I2×I3 maximize A •1 UT •2 VT •3 WT 2 = UT A(1)(V ⊗ W) 2 U ∈ St(R1, I1) V ∈ St(R2, I2) W ∈ St(R3, I3) Then ˆ A = A •1 UUT •2 VVT •3 WWT. Notation : St(R, I) = {X ∈ RI×R : XTX = I}

slide-12
SLIDE 12

Introduction Some applications Algorithms Local minima

Higher-order orthogonal iteration (HOOI)

Initial values: HOSVD. To maximize UT A(1)(V ⊗ W) 2 (∗), iterate until convergence:

  • ptimize (∗) over U :

left dominant subspace of A(1)(V ⊗ W) columns of U

  • ptimize (∗) over V ,
  • ptimize (∗) over W .

→ Linear convergence

slide-13
SLIDE 13

Introduction Some applications Algorithms Local minima

Goal and motivation

Goal: conceptually faster algorithms matrix-based algorithms → tensor-based algorithms

  • ptimization w.r.t. numerical accuracy & comput. efficiency

Applications of the best rank-(R1, R2, R3) approximation Chemometrics, biomedical signal processing, telecommunications, etc. Dimensionality reduction; signal subspace estimation

slide-14
SLIDE 14

Introduction Some applications Algorithms Local minima

Outline

1

Introduction

2

Some applications Epileptic seizure onset localization Parameter estimation

3

Algorithms Geometric Newton algorithm Trust-region based algorithm Conjugate gradient based algorithm

4

Local minima

slide-15
SLIDE 15

Introduction Some applications Algorithms Local minima

Outline

1

Introduction

2

Some applications Epileptic seizure onset localization Parameter estimation

3

Algorithms Geometric Newton algorithm Trust-region based algorithm Conjugate gradient based algorithm

4

Local minima

slide-16
SLIDE 16

Introduction Some applications Algorithms Local minima

Electrodes

slide-17
SLIDE 17

Introduction Some applications Algorithms Local minima

Electroencephalogram

slide-18
SLIDE 18

Introduction Some applications Algorithms Local minima

Epileptic seizure onset localization: results of PARAFAC with 2 atoms

temporal component

1 2 3 4 5 6 7 8 9 10 −0.06 −0.04 −0.02 0.02 0.04 0.06

Time (sec) temporal atom of seizure activity

1 2 3 4 5 6 7 8 9 10 −0.08 −0.06 −0.04 −0.02 0.02 0.04 0.06 0.08 0.1 0.12

temporal atom of eye−blink activity Time (sec)

frequency component

5 10 15 20 25 30 35 40 45 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18

frequency distribution of seizure activity Freq (Hz)

5 10 15 20 25 30 35 40 45 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2

Freq (Hz) frequency distribution of eye−blink activity

spatial component

red atom: related to the seizure, blue atom: related to the eye blinks.

slide-19
SLIDE 19

Introduction Some applications Algorithms Local minima

Dimensionality reduction

A Rank-(R1, R2, R3) approx. Rank-R approx.

slide-20
SLIDE 20

Introduction Some applications Algorithms Local minima

Outline

1

Introduction

2

Some applications Epileptic seizure onset localization Parameter estimation

3

Algorithms Geometric Newton algorithm Trust-region based algorithm Conjugate gradient based algorithm

4

Local minima

slide-21
SLIDE 21

Introduction Some applications Algorithms Local minima

Problem formulation: Multi-channel case

x(3) x(3)

1

x(3)

2

x(3)

3

x(3)

M−1

x(3)

3

x(3)

4

x(3)

5

x(3)

L−1

x(3)

N−1

x(2) x(2)

1

x(2)

2

x(2)

3

x(2)

M−1

x(2)

1

x(2)

2

x(2)

3

x(2)

L−1

x(2)

N−1

x(1) x(1)

1

x(1)

2

x(1)

3

x(1)

M−1

x(1)

1

x(1)

2

x(1)

3

x(1)

L−1

x(1)

N−1

x(1) x(1)

1

x(1)

2

x(1)

3

x(1)

4

x(1)

5

· · ·

ch 1

x(2) x(2)

1

x(2)

2

x(2)

3

x(2)

4

x(2)

5

· · ·

ch 2

x(3) x(3)

1

x(3)

2

x(3)

3

x(3)

4

x(3)

5

· · ·

ch 3

. . .

x(q)

n

= K

k=1 c(q) k zn k + e(q) n

= K

k=1 a(q) k

exp{jϕ(q)

k } exp{(−αk + 2jπνk) tn} + e(q) n

. Given: samples x(q)

n , n=0,...,N−1, q=1,...,Q

Find: estimates of the complex amplitudes c(q)

k

and the poles zk.

slide-22
SLIDE 22

Introduction Some applications Algorithms Local minima

HO-HTLSstack algorithm

Compute the best rank-(K, K, R3) approx. (R3 ≤ min(K, Q)), ˆ H = S •1 U •2 V •3 W. Compute the eigenvalues λk of Z : U↑ ≈ U↓ Z. Estimate the poles: ˆ zk = λk. Estimate the complex amplitudes: ˆ zk, x(q)

n

→ ˆ c(q)

k .

slide-23
SLIDE 23

Introduction Some applications Algorithms Local minima

Observations

H

=

1 1 1        1 · · · 1 z1 · · · zK z2

1

· · · z2

K

. . . · · · . . . zL

1

· · · zL

K

          1 z1 z2

1

· · · zM

1

. . . . . . . . . · · · . . . 1 zK z2

K

· · · zM

K

  

     c 1

1

c 2

1

c 3

1

. . . c Q

1

. . . . . . . . . · · · . . . c 1

K

c 2

K

c 3

K

. . . c Q

K

    

If the matrix of the amplitudes is ill-conditioned then reduce the mode-3 rank,

slide-24
SLIDE 24

Introduction Some applications Algorithms Local minima

Outline

1

Introduction

2

Some applications Epileptic seizure onset localization Parameter estimation

3

Algorithms Geometric Newton algorithm Trust-region based algorithm Conjugate gradient based algorithm

4

Local minima

slide-25
SLIDE 25

Introduction Some applications Algorithms Local minima

Outline

1

Introduction

2

Some applications Epileptic seizure onset localization Parameter estimation

3

Algorithms Geometric Newton algorithm Trust-region based algorithm Conjugate gradient based algorithm

4

Local minima

slide-26
SLIDE 26

Introduction Some applications Algorithms Local minima

Reformulation

g = UT A(1)(V⊗W) 2, X = (U, V, W), R1(X) = UTA(1)(V⊗W). New goal F(X) ≡ (F1(X), F2(X), F3(X)) = 0 , F1(X) = U R1(X)R1(X)T − A(1)(V ⊗ W)R1(X)T. Invariance property F(X) = 0 ⇔ F(XQ) = 0, XQ = (UQ1, VQ2, WQ3), Qi–orth. ⇒ the zeros of F are not isolated ⇒ convergence issues → Work on RI1×R1

/OR1 × RI2×R2

/OR2 × RI3×R3

/OR3.

slide-27
SLIDE 27

Introduction Some applications Algorithms Local minima

Summary: Geometric Newton algorithm

Initial estimates X0: HOSVD + HOOI Iterate until convergence (k = 0, 1, 2, . . .): Solve the linear system of equations (MATLAB’s GMRES):

  • Ph

U D(Ph UF1)(Xk)[∆k] = −Ph UF1(Xk)

Ph

V D(Ph V F2)(Xk)[∆k] = −Ph V F2(Xk)

Ph

W D(Ph WF3)(Xk)[∆k] = −Ph WF3(Xk)

Xk+1 = Xk + ∆k , ∆k ∈ HU × HV × HW

  • D(Ph

X F)(X)[∆] is the derivative of Ph X F(X) along ∆

  • Ph

Y Z = Z − Y skew((YTY)−1YTZ), skew(B) = (B − BT)/2.

  • Quadratic convergence in the neighborhood of the

nondegenerate zeros of Ph

X F(X).

slide-28
SLIDE 28

Introduction Some applications Algorithms Local minima

Outline

1

Introduction

2

Some applications Epileptic seizure onset localization Parameter estimation

3

Algorithms Geometric Newton algorithm Trust-region based algorithm Conjugate gradient based algorithm

4

Local minima

slide-29
SLIDE 29

Introduction Some applications Algorithms Local minima

Trust-region scheme and the manifold

Trust-region subproblem Obtain and minimize a quadratic model of the cost function around the current iterate. Evaluate; accept/reject the step; update the radius. Invariance property g(U, V, W) = UT A(1)(V ⊗ W) 2 . g(U, V, W) = g(UQ(1), VQ(2), WQ(3)) . Manifold M M = St(R1, I1)/OR1 × St(R2, I2)/OR2 × St(R3, I3)/OR3 .

slide-30
SLIDE 30

Introduction Some applications Algorithms Local minima

Retraction

M TXM X ξ RX(ξ) g : M → R, R : TM → M, ˆ gX : TXM → R : ξ → g(RX(ξ)). Retraction R(U,V,W)(ZU, ZV, ZW) = (qf(U + ZU), qf(V + ZV), qf(W + ZW)).

slide-31
SLIDE 31

Introduction Some applications Algorithms Local minima

Summary: Trust-region algorithm

Initial values X0 = (U0, V0, W0): HOSVD Iterate until convergence:

mXk(Z) = −g(Xk)+−grad g(Xk), Z+ 1

2−Hess g(Xk)[Z], Z.

Solve (tCG) min

Z∈TXk M mXk(Z),

subject to Z, Z ≤ ∆2

k.

ρk = −g(Xk) + g(RXk (Zk)) mXk(0Xk) − mXk(Zk) . Xk+1 =

  • RXk(Zk),

if ρk is large Xk,

  • therwise

∆k+1 =   

1 4∆k,

if ρk is small 2∆k, if ρk is large ∆k,

  • therwise
slide-32
SLIDE 32

Introduction Some applications Algorithms Local minima

Outline

1

Introduction

2

Some applications Epileptic seizure onset localization Parameter estimation

3

Algorithms Geometric Newton algorithm Trust-region based algorithm Conjugate gradient based algorithm

4

Local minima

slide-33
SLIDE 33

Introduction Some applications Algorithms Local minima

CG and vector transport

Linear conjugate gradient method

1 4 3

2 5

Manifold M = St(R1, I1)/OR1 × St(R2, I2)/OR2 × St(R3, I3)/OR3 . Vector transport TηXξX = Ph

X+¯

ηX ¯

ξX .

M

TXM

η ξ

RX(η)

Tηξ ξ X

slide-34
SLIDE 34

Introduction Some applications Algorithms Local minima

Summary: CG algorithm

Initial iterate: X0 = (U0, V0, W0) ∈ M. Set η0 = −grad g(X0). Iterate until convergence (k = 0, 1, 2, . . .)

Compute αk; e.g., use Armijo stepsize. Set Xk+1 = RXk(αkηk). Compute βk+1, e.g., via the Fletcher-Reeves formula βk+1 = gradg(Xk+1), gradg(Xk+1) gradg(Xk), gradg(Xk) . Set ηk+1 = −gradg(Xk+1) + βk+1Tαkηk (ηk).

slide-35
SLIDE 35

Introduction Some applications Algorithms Local minima

Summary: all algorithms

HOOI + simple + cost per iter. O(I3R+IR4+R6): cheap – linear convergence Newton-based – cost per iter. O(I3R3): expensive – convergence issues + quadratic convergence TR-based +/– cost per iteration ≤ O(I3R3) + superlinear (up to quadratic) conv. CG-based – large number of iterations + cost per iter. ∼ O(I3R): cheap Use if precision/time are noncritical Use if good start- ing point is available Precise. Fast if the imposed multilinear rank is small Best performance for easy problems. Competitive if randn

slide-36
SLIDE 36

Introduction Some applications Algorithms Local minima

Outline

1

Introduction

2

Some applications Epileptic seizure onset localization Parameter estimation

3

Algorithms Geometric Newton algorithm Trust-region based algorithm Conjugate gradient based algorithm

4

Local minima

slide-37
SLIDE 37

Introduction Some applications Algorithms Local minima

Example 1

ˆ A = T /T + 0.2 ∗ E/E, A = ˆ A/ ˆ A

10 20 30 40 50 60 70 80 90 100 0.1909 0.1909 0.1909 0.1909 0.1909 0.1909 0.1909 TR HOOI

Cost function f Run

A ∈ R10×10×10 (R1, R2, R3) = (2, 2, 2) T is exactly rank-(2, 2, 2) E : noise; randn elements U0, V0, W0 : column-wise ⊥

slide-38
SLIDE 38

Introduction Some applications Algorithms Local minima

Example 1

ˆ A = T /T + 2 ∗ E/E, A = ˆ A/ ˆ A

10 20 30 40 50 60 70 80 90 100 0.885 0.89 0.895 0.9 0.905 0.91 0.915 0.92 0.925 TR HOOI

Cost function f Run

A ∈ R10×10×10 (R1, R2, R3) = (2, 2, 2) T is exactly rank-(2, 2, 2) E : noise; randn elements U0, V0, W0 : column-wise ⊥

slide-39
SLIDE 39

Introduction Some applications Algorithms Local minima

Example 1

ˆ A = T /T + 4 ∗ E/E, A = ˆ A/ ˆ A

10 20 30 40 50 60 70 80 90 100 0.925 0.93 0.935 0.94 0.945 0.95 0.955 TR HOOI

Cost function f Run

A ∈ R10×10×10 (R1, R2, R3) = (2, 2, 2) T is exactly rank-(2, 2, 2) E : noise; randn elements U0, V0, W0 : column-wise ⊥

slide-40
SLIDE 40

Introduction Some applications Algorithms Local minima

Example 2

10 20 30 40 50 60 70 80 90 100 17.5 17.55 17.6 17.65 17.7 17.75 17.8 17.85 TR HOOI

Cost function f Run

20 40 60 80 100 120 29.75 29.8 29.85 29.9 29.95 30 TR HOOI HOSVD+TR HOSVD+HOOI

Cost function f Run

rank-(7, 8, 9) approx. rank-(2, 2, 2) approx. randn A ∈ R10×10×10 Initial values: column-wise orthonormal U0, V0, and W0

slide-41
SLIDE 41

Introduction Some applications Algorithms Local minima

Summary: local minima

Essential difference with the matrix case Similar values of f at the local minima → OK for compression applications Different subspaces → Consequences for subspace based applications, including dimensionality reduction for PARAFAC Truncated HOSVD global minimum → Compute enough local minima and choose the best The global minimum is not always the best choice → No meaningful solution?

slide-42
SLIDE 42

Conclusions References

Conclusions

Parameter estimation: If the matrix of the amplitudes is ill-conditioned then use the tensor algorithm with reduced mode-3 rank. Best low multilinear rank approximation: We have developed 3 new algorithms and examined their advantages and disadvantages. Local minima: Use best low multilinear rank approximation only if it “makes sense”.

slide-43
SLIDE 43

Conclusions References

Key references I

Background

  • L. De Lathauwer, B. De Moor, and J. Vandewalle. A multilinear singular value
  • decomposition. SIAM J. Matrix Anal. Appl., 21(4):1253–1278, 2000.
  • L. De Lathauwer, B. De Moor, and J. Vandewalle. On the best rank-1 and

rank-(R1, R2, . . . , RN) approximation of higher-order tensors. SIAM J. Matrix

  • Anal. Appl., 21(4):1324–1342, 2000.

P .-A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix

  • Manifolds. Princeton University Press, Princeton, NJ, 2008.

P .-A. Absil, C. G. Baker, and K. A. Gallivan. Trust-region methods on Riemannian

  • manifolds. Found. Comput. Math., 7(3):303–330, 2007.

Our algorithms and local minima

  • M. Ishteva, L. De Lathauwer, P

.-A. Absil, and S. Van Huffel. Differential-geometric Newton method for the best rank-(R1, R2, R3) approximation of tensors. Numerical Algorithms, 51(2):179–194, 2009.

  • M. Ishteva, L. De Lathauwer, P

.-A. Absil, and S. Van Huffel. Best low multilinear rank approximation of higher-order tensors, based on the Riemannian trust-region

  • scheme. Tech. Report 09-142, ESAT-SISTA, K.U.Leuven, Belgium, 2009.
slide-44
SLIDE 44

Conclusions References

Key references II

  • M. Ishteva, P

.-A. Absil, S. Van Huffel, and L. De Lathauwer. Best low multilinear rank approximation with conjugate gradients. Tech. Report 09-246, ESAT-SISTA, K.U.Leuven, Belgium, 2009.

  • M. Ishteva, P

.-A. Absil, S. Van Huffel, and L. De Lathauwer. Local minima of the best low multilinear rank approximation of higher-order tensors. Tech. Report 09-247, ESAT-SISTA, K.U.Leuven, Belgium, 2009. Related papers

  • L. Eldén and B. Savas. A Newton–Grassmann method for computing the best

multi-linear rank-(r1, r2, r3) approximation of a tensor. SIAM J. Matrix Anal. Appl., 31(2):248–271, 2009.

  • B. Savas and L.-H. Lim. Best multilinear rank approximation of tensors with

quasi-Newton methods on Grassmannians. Tech. Report LITH-MAT-R-2008-01-SE, Dept. of Mathematics, Linköping University, 2008.

  • B. Savas and L. Eldén. Krylov subspace methods for tensor computations. Tech.

Report LITH-MAT-R-2009-02-SE, Dept. of Mathematics, Linköping University, 2009.

slide-45
SLIDE 45

Conclusions References

Thank you for your attention!