Low Rank Approximation Lecture 7 Daniel Kressner Chair for - - PowerPoint PPT Presentation

low rank approximation lecture 7
SMART_READER_LITE
LIVE PREVIEW

Low Rank Approximation Lecture 7 Daniel Kressner Chair for - - PowerPoint PPT Presentation

Low Rank Approximation Lecture 7 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Tensor Train (TT) decomposition A tensor X is in TT decomposition if it can be written as r d


slide-1
SLIDE 1

Low Rank Approximation Lecture 7

Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch

1

slide-2
SLIDE 2

Tensor Train (TT) decomposition

A tensor X is in TT decomposition if it can be written as X(i1, . . . , id) =

r1

  • k1=1

· · ·

rd−1

  • kd−1=1

U1(1, i1, k1)U2(k1, i2, k2) · · · Ud(kd−1, id, 1).

◮ Smallest possible tuple (r1, . . . , rd−1) is called TT rank of X. ◮ Uµ ∈ Rrµ−1×nµ×rµ (formally set r0 = rd = 1) are called TT cores

for µ = 1, . . . , d.

◮ If TT ranks are not large high compression ratio as d grows. ◮ TT decomposition multilinear wrt cores. ◮ TT decomposition connects to

◮ matrix products Matrix Product States (MPS) in physics (see

[Grasedyck/Kressner/Tobler’2013] for references)

◮ simultaneous matrix factorizations SVD-based compression ◮ contractions and tensor network diagrams design of efficient

contraction-based algorithms

2

slide-3
SLIDE 3

TT decomposition and matrix products

X(i1, . . . , id) =

r1

  • k1=1

· · ·

rd−1

  • kd−1=1

U1(1, i1, k1)U2(k1, i2, k2) · · · Ud(kd−1, id, 1). Let Uµ(iµ) be iµth slice of µth core: Uµ(iµ) := Uµ(:, iµ, :) ∈ Rrµ−1×rµ. Then X(i1, i2, . . . , id) = U1(i1)U2(i2) · · · Ud(id). Remark: Error analysis of matrix multiplication [Higham’2002] shows that TT decomposition may suffer from numerical instabilities if U1(i1)2U2(i2)2 · · · Ud(id)2 ≫ |X(i1, i2, . . . , id)|. See [Bachmayr/Kazeev: arXiv:1802.09062] for more details.

3

slide-4
SLIDE 4

TT decomposition and matrix factorizations

X(i1, . . . , id) =

  • k1,k2,...,kd−1

U1(1, i1, k1)U2(k1, i2, k2) · · · Ud(kd−1, id, 1). For any 1 ≤ µ ≤ d − 1 group first µ factors and last d − µ factors together: X(i1, . . . , iµ, iµ+1, . . . id) =

  • kµ=1
  • k1,...,kµ−1

U1(1, i1, k1) · · · Uµ(kµ−1, iµ, kµ)

  • ·
  • kµ+1,...,kd−1

Uµ+1(kµ, iµ+1, kµ+1) · · · Ud(kd−1, id, 1)

  • This can be interpreted as a matrix-matrix product of two (large)

matrices!

4

slide-5
SLIDE 5

TT decomposition and matrix factorizations

The µth unfolding of X ∈ Rn1×n2×···×nd is obtained by arranging the entries in a matrix X <µ> ∈ R(n1n2···nµ)×(nµ+1···nd) where the corresponding index map is given by ι : Rn1×···×nd → Rn1···nµ × Rnµ+1···nd, ι(i1, . . . , id) = (irow, icol), irow = 1 +

µ

  • ν=1

(iν − 1)

ν−1

  • τ=1

nτ, icol = 1 +

d

  • ν=µ+1

(iν − 1)

ν−1

  • τ=µ+1

nτ.

5

slide-6
SLIDE 6

TT decomposition and matrix factorizations

Define interface matrices X≤µ ∈ Rn1n2···nµ×rµ, X≥µ+1 ∈ Rrµ×nµ+1nµ+2···nd as

X≤µ(irow, j) =

  • k1,...,kµ−1

U1(1, i1, k1) · · · Uµ−1(kµ−2, iµ−1, kµ−1)Uµ(kµ−1, iµ, j) X≥µ+1(j, icol) =

  • kµ+1,...,kd−1

Uµ+1(j, iµ+1, kµ+1)Uµ+2(kµ+1, iµ+2, kµ+2) · · · Ud(kd−1, id, 1)

Matrix factorizations X <µ> = X≤µX≥µ+1, µ = 1, . . . , d − 1.

6

slide-7
SLIDE 7

TT decomposition and matrix factorizations

Important: These matrix factorizations are nested! X≤µ = (Inµ ⊗ X≤µ−1)UL

µ,

and X T

≥µ = UR µ(X T ≥µ+1 ⊗ Inµ),

where UL

µ = U<2> µ

, UR

µ = U(1) µ

= U<1>

µ

. The relations X≤1 = UL

1 and

X≤µ = (Inµ ⊗ X≤µ−1)UL

µ,

µ = 2, . . . , d, fully characterize the TT decomposition: vec(X) = X≤d = (I ⊗ X≤d−1)UL

d

= (I ⊗ I ⊗ X≤d−2)(I ⊗ UL

d−1)UL d

. . . = (I ⊗ · · · ⊗ I ⊗ UL

1) · · · (I ⊗ UL d−1)UL d

EFY. Perform an analogous calculation for X≥µ, that is, resolve the recursion X≥µ = UR µ(X≥µ+1 ⊗ Inµ ). 7

slide-8
SLIDE 8

TT decomposition and matrix factorizations

Lemma

The TT rank of a tensor is given by

  • rank X <1>, . . . , rank X <d−1>
  • Proof. Because of the connection to matrix factorizations, the TT rank

cannot be smaller than

  • rank X <1>, . . . , rank X <d−1>

. We need to exclude that it can be larger. For this purpose, we construct a TT decomposition with (r1, . . . , rd−1) :=

  • rank X <1>, . . . , rank X <d−1>

. Step 1: Factorize X <1> = U1 ˜ X <1>, U1 ∈ Rn1×r1, ˜ X <1> ∈ Rr1×n2···nd, and hence ˜ X <1> = U†

1X <1>,

U†

1 = (UT 1 U1)−1UT 1

In terms of tensors: X = U1 ◦1 ˜ X. U1 ≡ U1 is the first TT core (and X≥1 := ˜ X <1>).

8

slide-9
SLIDE 9

Relation for second unfolding via Kronecker product: X <2> = (In2 ⊗ U1)˜ X <2>. Together with full column rank of U1, this implies rank(˜ X <2>) = rank(X <2>) = r2. Step 2: Factorize ˜ X <2> = UL

2 ˆ

X <1>, UL

2 ∈ Rr1n2×r2,

ˆ X <1> ∈ Rr2×n3···nd, UL

2 gives the second TT core U2 ∈ Rr1×n2×r2 and X T ≥2 := ˆ

X <1>. Relation for third unfolding via Kronecker product: X <3> = (In3 ⊗ In2 ⊗ U1)˜ X <3> = (In3 ⊗ In2 ⊗ U1)(In3 ⊗ UL

2)ˆ

X <2> Together with full column ranks of U1 and UL

2, this implies

rank(ˆ X <2>) = rank(X <3>) = r3. Continuing in this manner gives cores Uµ ∈ Rrµ−1×nµ×rµ such that vec(X) = (I ⊗ · · · ⊗ I ⊗ U1) · · · (I ⊗ UL

d−1) vec(Ud)

This defines a TT decomposition.

9

slide-10
SLIDE 10

Truncation in TT format

Proof of Lemma can be turned into practical algorithm (TT-SVD by [Oseledets’2011]) for approximating a given tensor X in TT format: Input: X ∈ Rn1×···×nd, target TT rank (r1, . . . , rd−1). Output: TT cores Uµ ∈ Rrµ−1×nµ×rµ that define a TT decomposition approximating X.

1: Set r0 = rd = 1. (and formally add leading singleton dimension to

X ∈ R1×n1×···×nd).

2: for µ = 1, . . . , d − 1 do 3:

Reshape X into X <2> ∈ Rrµ−1nµ×nµ+1···nd.

4:

Compute rank-rµ approximation X <2> ≈ UΣV T (e.g., via SVD)

5:

Reshape U into Uµ ∈ Rrµ−1×nµ×rµ.

6:

Update X via X <2> ← UTX <2> = ΣV T.

7: end for 8: Set Ud = X.

10

slide-11
SLIDE 11

Truncation in TT format

Theorem

Let XSVD denote the tensor in TT decomposition obtained from TT-SVD. Then X − XSVD ≤

  • ε2

1 + · · · + ε2 d,

where ε2

µ = X <µ> − Trµ(X <µ>)2 F = σrµ+1(X <µ>)2 + · · · .

  • Proof. After µ steps of the algorithm we have the following situation:

◮ Core tensors U1, . . . , Uµ have been computed, defining X≤µ. ◮ Remaining tensor has size rµ × nµ+1 × · · · × nd.

Reshape remaining tensor into matrix Y≥µ ∈ Rrµ×nµ+1···nd. Will prove relations X T

≤µX≤µ = I,

Y≥µ+1 = X T

≤µX <µ>,

and X <µ> − X≤µY≥µ+1F ≤

  • ε2

1 + · · · + ε2 µ

(1) for µ = 1, . . . , d − 1 by induction. For µ = d − 1, this shows the theorem.

11

slide-12
SLIDE 12

Line 3 in the µth step of the algorithm proceeds by reshaping the remaining tensor from step µ − 1 (corresponding to Y≥µ) into an array

  • f size Y <2> ∈ Rrµ−1nµ×nµ+1···nd. By induction assumption,

Y≥µ = X≤µ−1X <µ−1> ⇒ Y <2> = (Inµ ⊗ X T

≤µ−1)X <µ>.

The matrix UL

µ ≡ U computed in Line 4 contains left singular vectors

  • f Y <2>. In particular, (UL

µ)TUL µ = I. Together with the induction

assumption and the relation X≤µ = (Inµ ⊗ X≤µ−1)UL

µ,

this implies X T

≤µX≤µ = I and Y T ≥µ+1 = X T ≤µX <µ>. Moreover,

(I − UL

µ(UL µ)T)Y <2>F

= Y <2> − Trµ(Y <2>)F ≤ X <µ> − Trµ(X <µ>)F = εµ.

12

slide-13
SLIDE 13

Finally, we obtain: X <µ> − X≤µX T

≤µX <µ>2 F

= X <µ> − (I ⊗ X≤µ−1)UL

µ(UL µ)T(I ⊗ X≤µ−1)TX <µ>2 F

= X <µ> − (I ⊗ X≤µ−1)(I ⊗ X≤µ−1)TX <µ>2

F

+ (I ⊗ X≤µ−1)(I ⊗ X≤µ−1)TX <µ> −(I ⊗ X≤µ−1)UL

µ(UL µ)T(I ⊗ X≤µ−1)TX <µ>2 F

= X <µ−1> − X≤µ−1X T

≤µ−1X <µ−1>2 F + (I − UL µ(UL µ)T)Y <2>2 F

≤ ε2

1 + · · · + ε2 µ−1 + ε2 µ

This completes the proof.

13

slide-14
SLIDE 14

Truncation in TT format

Consequence of the theorem:

Corollary

Let Xbest denote the best approximation of X with TT rank (r1, . . . , rd−1). Then X − XSVD ≤ √ d − 1X − Xbest.

Proof.

Since X <µ>

best

has rank rµ we have X − Xbest ≥ max

µ=1,...,r−1 X <µ> − X <µ> best F

= max

µ=1,...,r−1 εµ ≥

1 √ d − 1

  • ε2

1 + · · · + ε2 d

≥ 1 √ d − 1X − XSVD

EFY. Develop a variant of TT-SVD, which for a prescribed accuracy ε > 0 determines the TT ranks adaptively such that X − XSVD ≤ ε. 14

slide-15
SLIDE 15

Orthogonal TT decompositions

The TT decomposition constructed by TT-SVD satisfies orthogonality relations (UL

µ)TUL µ = Irµ,

X T

≤µX≤µ = Irµ

for µ = 1, . . . , d − 1. Such a TT decomposition is called left-orthogonal. A TT decomposition is called right-orthogonal if UR

µ(UR µ)T = Irµ,

X≥µX T

≥µ = Irµ

for µ = 2, . . . , d. Benefits of orthogonal decompositions

◮ Numerical stability (see, e.g., Section 3.7 of [Higham’2002] on

stability of matrix products).

◮ Acceleration of operations.

15

slide-16
SLIDE 16

Orthogonal TT decompositions

Given TT decomposition with cores U1, . . . , Ud, can always compute left-/right-orthogonal TT decomposition for the same tensor. Algorithm to obtain left-orthogonal TT decomposition: for µ = 1, 2, . . . , d − 1 do Compute QR decomposition UL

µ = QR.

Set UL

µ ← Q.

Update Uµ+1 ← R ◦1 Uµ+1. end for Algorithm to obtain right-orthogonal TT decomposition: for µ = d, d − 1, . . . , 2 do Compute QR decomposition (UR

µ)T = QR.

Set UR

µ ← QT.

Update Uµ−1 ← R ◦3 Uµ−1. end for

16

slide-17
SLIDE 17

Recompression of TT decomposition

Task: Approximate TT decomposition with TT cores U1, . . . , Ud having relatively large TT ranks (R1, . . . , Rd) by TT decomposition with TT cores V1, . . . , Vd having smaller TT ranks (r1, . . . , rd). TT-SVD starts with SVD of X <1> = UL

1X T ≥2.

Observation: If X is right-orthogonal then X≥2 has orthonormal columns and only need to compute SVD of UL

1!

Compute rank-r1 approximation (e.g., via truncated SVD): UL

1 ≈ V L 1 ΣW T

This defines the first TT core V1 of the compressed tensor. Move rest to second core: U2 ← ΣW T ◦1 U2. X≥3 remains the same and thus still has orthonormal columns!

17

slide-18
SLIDE 18

Recompression of TT decomposition

Input: X ∈ Rn1×···×nd in TT decomposition with TT cores Uµ ∈ RRµ−1×nµ×Rµ, smaller target TT rank (r1, . . . , rd−1). Output: TT cores Vµ ∈ Rrµ−1×nµ×rµ that define a TT decomposition approximating X.

1: Set r0 = rd = 1. 2: Right-orthogonalize TT cores Uµ. 3: for µ = 1, . . . , d − 1 do 4:

Compute rank-rµ approximation UL

µ ≈ VΣW T (e.g., via SVD)

5:

Reshape V into Vµ ∈ Rrµ−1×nµ×rµ.

6:

Update Uµ+1 ← ΣW T ◦1 Uµ+1.

7: end for 8: Set Vd = Ud.

◮ Complexity: O(dnR3). ◮ Mathematical equivalence to full TT-SVD error bounds carry

  • ver.

18

slide-19
SLIDE 19

Tensor network diagrams

◮ Introduced by Roger Penrose. ◮ Heavily used in quantum mechanics (spin networks). ◮ Useful to gain intuition and guide design of algorithms. ◮ This is the matrix product C = AB:

Cij =

r

  • k=1

AikBkj

19

slide-20
SLIDE 20

Tensor of order 3 in Tucker decomposition

Xijk =

r1

  • ℓ1=1

r2

  • ℓ2=1

r3

  • ℓ3=1

Cℓ1ℓ2ℓ3Uiℓ1Vjℓ2Wkℓ3

◮ r1 × r2 × r3 core tensor C ◮ n1 × r1 matrix U spans first mode ◮ n2 × r2 matrix V spans second mode ◮ n3 × r3 matrix W spans third mode.

20

slide-21
SLIDE 21

Tensor of order 6 in TT decomposition

◮ X implicitly represented by four r × n × r tensors and two n × r

matrices

◮ More detailed picture:

X≤3 X≥5

n1 U1 n2 U2 n3 U3 n4 U4 n5 U5 n6 U6 r1 r2 r3 r4 r5

21

slide-22
SLIDE 22

Inner product of two tensors in TT decomposition

◮ Carrying out contractions requires O(dnr 4) instead of O(nd)

  • perations for tensors of order d.

22