Low Rank Approximation Lecture 8 Daniel Kressner Chair for - - PowerPoint PPT Presentation

low rank approximation lecture 8
SMART_READER_LITE
LIVE PREVIEW

Low Rank Approximation Lecture 8 Daniel Kressner Chair for - - PowerPoint PPT Presentation

Low Rank Approximation Lecture 8 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Linear operators in TT decomposition Consider linear operator L : R n 1 n d R n 1


slide-1
SLIDE 1

Low Rank Approximation Lecture 8

Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch

1

slide-2
SLIDE 2

Linear operators in TT decomposition

Consider linear operator L : Rn1×···×nd → Rn1×···×nd and corresponding matrix representation L((i1, . . . , id), (j1, . . . , jd)). L is TT operator if it takes the form L((i1, . . . , id), (j1, . . . , jd)) =

R1

  • k1=1

· · ·

Rd−1

  • kd−1=1

A1(1, i1, j1, k1)A2(k1, i2, j2, k2) · · · Ad(kd−1, id, jd, 1) with Rµ−1 × nµ × nµ × Rµ tensors Aµ. Operator TT rank of L = Smallest possible values of (R1, . . . , Rµ−1). Matrix Product Operator (MPO): L((i1, . . . , id), (j1, . . . , jd)) = A1(i1, j1)A2(i2, j2) · · · Ad(id, jd), with Aµ(iµ, jµ) = Aµ(:, iµ, jµ, :).

2

slide-3
SLIDE 3

Linear operators in TT decomposition

Alternative view: Reinterpret matrix representation of L as tensor ten(L) ∈ Rn2

1×···×n2 d,

by merging indices (iµ, jµ) applying TT format, and separating indices (iµ, jµ). In practice: Perfect shuffle permutation of modes. Tensor network diagram for operator TT decomposition:

EFY. Show that the identity matrix has operator TT ranks 1. 3

slide-4
SLIDE 4

Linear operators in TT decomposition

Example: Discrete Laplace-like operator takes the form L : X → L1 ◦1 X + · · · + Ld ◦d X for matrices Lµ ∈ Rnµ×nµ. Matrix representation L = Ind ⊗ · · · ⊗ In2 ⊗ L1 + · · · + Ld ⊗ Ind−1 ⊗ · · · ⊗ In1 TT representation with cores A1(i1, j1) =

  • L1(i1, j1)

In1(i1, j1)

  • Aµ(iµ, jµ)

=

  • Inµ(iµ, jµ)

Lµ(iµ, jµ) Inµ(iµ, jµ)

  • Ad(id, jd)

= Ind(id, jd) Ld(id, jd)

  • 4
slide-5
SLIDE 5

Multiplication with linear operators in TT decomposition

Consider matrix-vector product ˜ x = Lx or, equivalently, ˜ X = L(X). Assume that L and U are in TT decomposition ˜ X(i1, . . . , id) =

  • j1,...,jd

L((i1, . . . , id), (j1, . . . , jd))X(j1, . . . , jd) =

  • j1,...,jd

A1(i1, j1) · · · Ad(id, jd)U1(j1) · · · Ud(jd) =

  • j1,...,jd

(A1(i1, j1) ⊗ U1(j1)) · · · (Ad(id, jd) ⊗ Ud(jd)) =

j1

A1(i1, j1) ⊗ U1(j1)

  • · · ·

jd

Ad(id, jd) ⊗ Ud(jd)

  • =:

˜ U1(i1) · · · ˜ Ud(id), that is, TT ranks of ˜ X are bounded by TT ranks of X multiplied with TT operator ranks of L.

5

slide-6
SLIDE 6

Multiplication with linear operators in TT decomposition

Illustration by tensor network diagrams:

◮ Carrying out contractions requires O(dnr 4) operations for

tensors of order d.

6

slide-7
SLIDE 7

TT decomposition – Summary of operations

Easy:

◮ (partial) contractions ◮ multiplication with TT operators ◮ recompression/truncation

Medium:

◮ entrywise products

Hard:

◮ almost everything else

Software:

◮ TT toolox (Matlab, Python), TTeMPS (Matlab), . . .

7

slide-8
SLIDE 8

Alternating least-squares / linear scheme

General setting: Solve optimization problem min

X f(X),

where X is a (large) matrix or tensor and f is “simple” (e.g., convex). Constrain X to Mr, set of rank-r matrices or tensors and aim at solving min

X∈Mr f(X),

Set X = i(U1, U2, . . . , Ud). (e.g., X = U1UT

2 ). Low-rank formats are multilinear there is hope

that optimizing for each component is simple: min

Uµ f(i(U1, U2, . . . , Ud)).

8

slide-9
SLIDE 9

Alternating least-squares / linear scheme

Set f(U1, . . . , Ud) := f(i(U1, . . . , Ud)). ALS:

1: while not converged do 2:

U1 ← arg minU1 f(i(U1, U2, . . . , Ud))

3:

U2 ← arg minU1 f(i(U1, U2, . . . , Ud))

4:

. . .

5:

Ud ← arg minU1 f(i(U1, U2, . . . , Ud))

6: end while

Examples:

◮ ALS for fitting CP decomposition ◮ Subspace iteration.

Closely related: Block Gauss-Seidel, Block Coordinate Descent. Difficulties:

◮ Representation (U1, U2, . . . , Ud) often non-unique, parameters

may become unbounded.

◮ Mr not closed ◮ Convergence (analysis)

9

slide-10
SLIDE 10

Subspace iteration and ALS

Given A ∈ Rm×n, consider computation of best rank-r approximation: min

U∈Rm×r ,V∈Rn×r f(U, V),

f(U, V) := A − UV T2

F ◮ Representation UV T is unique for each U, V individually if U, V

have rank r.

◮ f is convex wrt U and V individually.

∇Uf(U, V), H = f(U + H, V) − f(U, V) + O(H2

2)

= −2AV − UV TV, H. Hence, 0 = ∇Uf(U, V) = −2(AV − UV TV) ⇔ U = AV(V TV)−1. For stability it is advisable to choose V such that it has orthonormal columns.

10

slide-11
SLIDE 11

Subspace iteration and ALS

ALS for low-rank matrix approximation:

1: while not converged do 2:

Compute economy size QR factorization: V = QR and update V ← Q.

3:

U ← AV

4:

Compute economy size QR factorization: U = QR and update U ← Q.

5:

V ← ATU

6: end while

Returns an approximation A ≈ UV T. This is the subspace iteration from Lecture 1!

EFY. Develop an ALS method for solving the weighted low-rank approximation problem min U,V WL(A − UVT )WR F with square and invertible matrices WL, WR . 11

slide-12
SLIDE 12

Linear matrix equations

For linear operator L : Rm×n → Rm×n, consider linear system L(X) = C, C, X ∈ Rm×n. Examples:1

◮ Sylvester matrix equation:

AX + XB = C, A ∈ Rm×m, B ∈ Rn×n, C, X ∈ Rm×n. Applications: Discretized 2D Laplace on rectangle, stability analysis, optimal control, model reduction of linear control systems. Special case Lyapunov equations: m = n, A = BT, C symmetric (and often negative semi-definite)

◮ Stochastic Galerkin methods in uncertainty quantification. ◮ Stochastic control.

1See [V. Simoncini, Computational methods for linear matrix equations, SIAM Rev.,

58 (2016), pp. 377–441] for details and references.

12

slide-13
SLIDE 13

Linear matrix equations

Using the matrix ML representing L in canonical bases, we can rewrite L(X) = B as linear system ML(vec(X)) = vec(C). Assumption: ML has low Kronecker rank: ML = B1 ⊗ A1 + · · · + BR ⊗ AR, R ≪ m, n. Equivalently, L(X) = A1XBT

1 + · · · + ARXBT R

EFY. Develop a variant of ACA (from Lecture 3) that aims at approximating a given sparse matrix A by a matrix of low Kronecker rank for given m, n. EFY. Show that if m = n, ML is symmetric and has Kronecker rank R, one can find symmetric matrices A1, . . . , AR , B1, . . . , BR such that L(X) = A1XB1 + · · · + AR XBR . Is it always possible to choose all Ak , Bk positive semi-definite if ML is positive definite? 13

slide-14
SLIDE 14

Linear matrix equations

Two ways of turning L(X) = C into optimization problem:

  • 1. If ML is symmetric positive definite:

min

X

1 2L(X), X − X, B.

  • 2. General L:

min

X L(X) − B2 F

Will focus on spd ML in the following.

14

slide-15
SLIDE 15

Linear matrix equations

Low-rank approximation of L(X) = B obtained by solving min

U,V f(U, V)

for f(U, V) = 1 2L(UV T), UV T − UV T, C. Let L have Kronecker rank R. Then L(UV T), UV T =

R

  • k=1

AkUV TBk, UV T =

R

  • k=1

AkUV TBkV, U. This shows that arg minU f(U, V) is solution of linear matrix equation A1U(V TB1V) + · · · + ARU(V TBRV) = CV.

EFY. Show that this linear matrix equation always has a unique solution under the assumption that L is symmetric positive definite.

◮ For R = 2, can be reduced to R linear systems of size n × n. ◮ For R > 2, need to solve Rn × Rn system.

15

slide-16
SLIDE 16

Linear matrix equations

ALS for linear matrix equations:

1: while not converged do 2:

Compute economy size QR factorization: V = QR and update V ← Q.

3:

Solve A1U(V TB1V) + · · · + ARU(V TBRV) = CV for U.

4:

Compute economy size QR factorization: U = QR and update U ← Q.

5:

Solve (UTA1U)V TB1 + · · · + (UTARU)V TBR = UTC for V.

6: end while

Returns an approximation X ≈ UV T. For R = 2, there are better alternatives: ADI, Krylov subspace methods, ... [Simoncini’2016].

16

slide-17
SLIDE 17

2D eigenvalue problem

◮ −△u(x) + V(x)u = λu(x) in Ω = [0, 1] × [0, 1]

with Dirichlet b.c. and Henon-Heiles potential V

◮ Regular discretization ◮ Reshaped ground state into matrix

Ground state Singular values

100 200 300 10

−20

10

−15

10

−10

10

−5

10

Excellent rank-10 approximation possible

17

slide-18
SLIDE 18

Rayleigh quotients wrt low-rank matrices

Consider symmetric n2 × n2 matrix A. Then λmin(A) = min

x=0

x, Ax x, x . We now...

◮ reshape vector x into n × n matrix X; ◮ reinterpret Ax as linear operator A : X → A(X).

18

slide-19
SLIDE 19

Rayleigh quotients wrt low-rank matrices

Consider symmetric n2 × n2 matrix A. Then λmin(A) = min

X=0

X, A(X) X, X with matrix inner product ·, ·. We now...

◮ restrict X to low-rank matrices.

19

slide-20
SLIDE 20

Rayleigh quotients wrt low-rank matrices

Consider symmetric n2 × n2 matrix A. Then λmin(A)≈ min

X=UV T =0

X, A(X) X, X .

◮ Approximation error governed by low-rank approximability of X. ◮ Solved by Riemannian optimization techniques or ALS.

20

slide-21
SLIDE 21

ALS for eigenvalue problem

ALS for solving λmin(A)≈ min

X=UV T =0

X, A(X) X, X . Initially:

◮ fix target rank r ◮ U ∈ Rm×r, V n×r randomly, such that V is ONB

˜ λ − λ = 6 × 103 residual = 3 × 103

21

slide-22
SLIDE 22

ALS for eigenvalue problem

ALS for solving λmin(A)≈ min

X=UV T =0

X, A(X) X, X . Fix V, optimize for U. X, A(X) = vec(UV T)TA vec(UV T) = vec(U)T(V ⊗ I)TA(V ⊗ I)vec(U) Compute smallest eigenvalue of reduced matrix (rn × rn) matrix (V ⊗ I)TA(V ⊗ I). Note: Computation of reduced matrix benefits from Kronecker structure of A.

22

slide-23
SLIDE 23

ALS for eigenvalue problem

ALS for solving λmin(A)≈ min

X=UV T =0

X, A(X) X, X . Fix V, optimize for U. ˜ λ − λ = 2 × 103 residual = 2 × 103

23

slide-24
SLIDE 24

ALS for eigenvalue problem

ALS for solving λmin(A)≈ min

X=UV T =0

X, A(X) X, X . Orthonormalize U, fix U, optimize for V. X, A(X) = vec(UV T)TA vec(UV T) = vec(V T)(I ⊗ U)TA(I ⊗ U)vec(V T) Compute smallest eigenvalue of reduced matrix (rn × rn) matrix (I ⊗ U)TA(I ⊗ U). Note: Computation of reduced matrix benefits from Kronecker structure of A.

24

slide-25
SLIDE 25

ALS for eigenvalue problem

ALS for solving λmin(A)≈ min

X=UV T =0

X, A(X) X, X . Orthonormalize U, fix U, optimize for V. ˜ λ − λ = 1.5 × 10−7 residual = 7.7×10−3

25

slide-26
SLIDE 26

ALS

ALS for solving λmin(A)≈ min

X=UV T =0

X, A(X) X, X . Orthonormalize V, fix V, optimize for U. ˜ λ − λ = 1 × 10−12 residual = 6 × 10−7

26

slide-27
SLIDE 27

ALS for eigenvalue problem

ALS for solving λmin(A)≈ min

X=UV T =0

X, A(X) X, X . Orthonormalize U, fix U, optimize for V. ˜ λ − λ = 7.6 × 10−13 residual = 7.2×10−8

27

slide-28
SLIDE 28

Extension of ALS to TT

Recall interface matrices X≤µ−1 ∈ Rn1n2···nµ×rµ−1, X≥µ ∈ Rnµ+1nµ+2···nd×rµ−1 yielding factorization X <µ> = X≤µ−1X T

≥µ,

µ = 1, . . . , d − 1. Combined with recursion X T

≥µ+1 = UR µ(X T ≥µ ⊗ Inµ),

this yields X <µ> = X≤µ−1UR

µX T ≥µ+1,

µ = 1, . . . , d − 1. Hence, vec(X) = (X≥µ+1 ⊗ X≤µ−1) vec(Uµ) This formula allows us to pull out µth core!

28

slide-29
SLIDE 29

Extension of ALS to TT

A TT decomposition is called µ-orthogonal if (UL

ν)TUL ν = Irν,

X T

≤νX≤ν = Irν

for ν = 1, . . . , µ − 1. and UR

ν (UR ν )T = Irν,

X≥νX T

≥ν = Irµ

for ν = µ + 1, . . . , d. This implies that X≥µ+1 ⊗ X≤µ−1 has orthonormal columns! Consider eigenvalue problem λmin(A) = min

X=0

X, A(X) X, X Optimizing for µth core min

Uµ=0

X, A(X) X, X = min

Uµ=0

vec Uµ, Aµ vec Uµ vec Uµ, vec Uµ with rµ−1nµrµ × rµ−1nµrµ matrix Aµ = (X≥µ+1 ⊗ X≤µ−1)TA(X≥µ+1 ⊗ X≤µ−1)

29

slide-30
SLIDE 30

Extension of ALS to TT

◮ Uµ is obtained as eigenvector belonging to smallest eigenvalue

  • f Aµ.

◮ Computation of Aµ for large d only feasible if A has low operator

TT ranks (and is in operator TT decomposition).

◮ One microstep of ALS optimizes Uµ and prepares for next core,

by adjusting orthogonalization.

◮ One sweep of ALS consists of processing cores twice: once from

left to right and once from right to left.

30

slide-31
SLIDE 31

Extension of ALS to TT

Input: X in right-orthogonal TT decomposition.

1: for µ = 1, 2, . . . , d − 1 do 2:

Compute Aµ and replace core Uµ by an eigenvector belonging to smallest eigenvalue of Aµ.

3:

Compute QR decomposition UL

µ = QR.

4:

Set UL

µ ← Q.

5:

Update Uµ+1 ← R ◦1 Uµ+1.

6: end for 7: for µ = d, d − 1, . . . , 2 do 8:

Compute Aµ and replace core Uµ by an eigenvector belonging to smallest eigenvalue of Aµ.

9:

Compute QR decomposition (UR

µ)T = QR.

10:

Set UR

µ ← QT.

11:

Update Uµ−1 ← R ◦3 Uµ−1.

12: end for

31

slide-32
SLIDE 32

Extension of ALS to TT

Remarks:

◮ “Small” matrix Aµ quickly gets large as TT ranks increase

Need to use iterative methods (e.g., Lanczos, LOBPCG), possibly combined with preconditioning [Kressner/Tobler’2011] for solving eigenvalue problems.

◮ In ALS TT ranks of X need to be chosen a priori. Adaptive choice

  • f rank by merging neighbouring cores, optimizing for the merged

core, and split the optimized merged core DMRG, modified

  • ALS. Cheaper: AMEn [White’2005, Dolgov/Savostyanov’2013].

◮ Principles of ALS easily extend to other optimization problems,

e.g., linear systems [Holtz/Rohwedder/Schneider’2012].

32

slide-33
SLIDE 33

Numerical Experiments - Sine potential, d = 10

ALS

100 200 300 400 500 10

−15

10

−10

10

−5

10 10

5

Execution time [s] 100 200 300 400 500 15 20 25 30 35 40 45 err_lambda res nr_iter

Size = 12810 ≈ 1021. Maximal TT rank 40. See [Kressner/Steinlechner/Uschmajew’2014] for details.

33

slide-34
SLIDE 34

Numerical Experiments - Henon-Heiles potential, d = 20

ALS

500 1000 1500 2000 2500 10

−15

10

−10

10

−5

10 10

5

Execution time [s] 500 1000 1500 2000 2500 10 20 30 40 50 60 err_lambda res nr_iter

Size = 12820 ≈ 1042. Maximal TT rank 40.

34

slide-35
SLIDE 35

Numerical Experiments - 1/ξ2 potential, d = 20

ALS

500 1000 1500 10

−15

10

−10

10

−5

10 10

5

Execution time [s] 500 1000 1500 5 10 15 20 25 30 err_lambda res nr_iter

Size = 12820 ≈ 1042. Maximal TT rank 30.

35