Low Rank Approximation Lecture 7
Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch
1
Low Rank Approximation Lecture 7 Daniel Kressner Chair for - - PowerPoint PPT Presentation
Low Rank Approximation Lecture 7 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Tensor Train (TT) decomposition A tensor X is in TT decomposition if it can be written as r d
1
A tensor X is in TT decomposition if it can be written as X(i1, . . . , id) =
r1
· · ·
rd−1
U1(1, i1, k1)U2(k1, i2, k2) · · · Ud(kd−1, id, 1).
◮ Smallest possible tuple (r1, . . . , rd−1) is called TT rank of X. ◮ Uµ ∈ Rrµ−1×nµ×rµ (formally set r0 = rd = 1) are called TT cores
for µ = 1, . . . , d.
◮ If TT ranks are not large high compression ratio as d grows. ◮ TT decomposition multilinear wrt cores. ◮ TT decomposition connects to
◮ matrix products Matrix Product States (MPS) in physics (see
[Grasedyck/Kressner/Tobler’2013] for references)
◮ simultaneous matrix factorizations SVD-based compression ◮ contractions and tensor network diagrams design of efficient
contraction-based algorithms
2
X(i1, . . . , id) =
r1
· · ·
rd−1
U1(1, i1, k1)U2(k1, i2, k2) · · · Ud(kd−1, id, 1). Let Uµ(iµ) be iµth slice of µth core: Uµ(iµ) := Uµ(:, iµ, :) ∈ Rrµ−1×rµ. Then X(i1, i2, . . . , id) = U1(i1)U2(i2) · · · Ud(id). Remark: Error analysis of matrix multiplication [Higham’2002] shows that TT decomposition may suffer from numerical instabilities if U1(i1)2U2(i2)2 · · · Ud(id)2 ≫ |X(i1, i2, . . . , id)|. See [Bachmayr/Kazeev: arXiv:1802.09062] for more details.
3
X(i1, . . . , id) =
U1(1, i1, k1)U2(k1, i2, k2) · · · Ud(kd−1, id, 1). For any 1 ≤ µ ≤ d − 1 group first µ factors and last d − µ factors together: X(i1, . . . , iµ, iµ+1, . . . id) =
rµ
U1(1, i1, k1) · · · Uµ(kµ−1, iµ, kµ)
Uµ+1(kµ, iµ+1, kµ+1) · · · Ud(kd−1, id, 1)
matrices!
4
The µth unfolding of X ∈ Rn1×n2×···×nd is obtained by arranging the entries in a matrix X <µ> ∈ R(n1n2···nµ)×(nµ+1···nd) where the corresponding index map is given by ι : Rn1×···×nd → Rn1···nµ × Rnµ+1···nd, ι(i1, . . . , id) = (irow, icol), irow = 1 +
µ
(iν − 1)
ν−1
nτ, icol = 1 +
d
(iν − 1)
ν−1
nτ.
5
Define interface matrices X≤µ ∈ Rn1n2···nµ×rµ, X≥µ+1 ∈ Rrµ×nµ+1nµ+2···nd as
X≤µ(irow, j) =
U1(1, i1, k1) · · · Uµ−1(kµ−2, iµ−1, kµ−1)Uµ(kµ−1, iµ, j) X≥µ+1(j, icol) =
Uµ+1(j, iµ+1, kµ+1)Uµ+2(kµ+1, iµ+2, kµ+2) · · · Ud(kd−1, id, 1)
Matrix factorizations X <µ> = X≤µX≥µ+1, µ = 1, . . . , d − 1.
6
Important: These matrix factorizations are nested! X≤µ = (Inµ ⊗ X≤µ−1)UL
µ,
and X T
≥µ = UR µ(X T ≥µ+1 ⊗ Inµ),
where UL
µ = U<2> µ
, UR
µ = U(1) µ
= U<1>
µ
. The relations X≤1 = UL
1 and
X≤µ = (Inµ ⊗ X≤µ−1)UL
µ,
µ = 2, . . . , d, fully characterize the TT decomposition: vec(X) = X≤d = (I ⊗ X≤d−1)UL
d
= (I ⊗ I ⊗ X≤d−2)(I ⊗ UL
d−1)UL d
. . . = (I ⊗ · · · ⊗ I ⊗ UL
1) · · · (I ⊗ UL d−1)UL d
EFY. Perform an analogous calculation for X≥µ, that is, resolve the recursion X≥µ = UR µ(X≥µ+1 ⊗ Inµ ). 7
The TT rank of a tensor is given by
cannot be smaller than
. We need to exclude that it can be larger. For this purpose, we construct a TT decomposition with (r1, . . . , rd−1) :=
. Step 1: Factorize X <1> = U1 ˜ X <1>, U1 ∈ Rn1×r1, ˜ X <1> ∈ Rr1×n2···nd, and hence ˜ X <1> = U†
1X <1>,
U†
1 = (UT 1 U1)−1UT 1
In terms of tensors: X = U1 ◦1 ˜ X. U1 ≡ U1 is the first TT core (and X≥1 := ˜ X <1>).
8
Relation for second unfolding via Kronecker product: X <2> = (In2 ⊗ U1)˜ X <2>. Together with full column rank of U1, this implies rank(˜ X <2>) = rank(X <2>) = r2. Step 2: Factorize ˜ X <2> = UL
2 ˆ
X <1>, UL
2 ∈ Rr1n2×r2,
ˆ X <1> ∈ Rr2×n3···nd, UL
2 gives the second TT core U2 ∈ Rr1×n2×r2 and X T ≥2 := ˆ
X <1>. Relation for third unfolding via Kronecker product: X <3> = (In3 ⊗ In2 ⊗ U1)˜ X <3> = (In3 ⊗ In2 ⊗ U1)(In3 ⊗ UL
2)ˆ
X <2> Together with full column ranks of U1 and UL
2, this implies
rank(ˆ X <2>) = rank(X <3>) = r3. Continuing in this manner gives cores Uµ ∈ Rrµ−1×nµ×rµ such that vec(X) = (I ⊗ · · · ⊗ I ⊗ U1) · · · (I ⊗ UL
d−1) vec(Ud)
This defines a TT decomposition.
9
Proof of Lemma can be turned into practical algorithm (TT-SVD by [Oseledets’2011]) for approximating a given tensor X in TT format: Input: X ∈ Rn1×···×nd, target TT rank (r1, . . . , rd−1). Output: TT cores Uµ ∈ Rrµ−1×nµ×rµ that define a TT decomposition approximating X.
1: Set r0 = rd = 1. (and formally add leading singleton dimension to
X ∈ R1×n1×···×nd).
2: for µ = 1, . . . , d − 1 do 3:
Reshape X into X <2> ∈ Rrµ−1nµ×nµ+1···nd.
4:
Compute rank-rµ approximation X <2> ≈ UΣV T (e.g., via SVD)
5:
Reshape U into Uµ ∈ Rrµ−1×nµ×rµ.
6:
Update X via X <2> ← UTX <2> = ΣV T.
7: end for 8: Set Ud = X.
10
Let XSVD denote the tensor in TT decomposition obtained from TT-SVD. Then X − XSVD ≤
1 + · · · + ε2 d,
where ε2
µ = X <µ> − Trµ(X <µ>)2 F = σrµ+1(X <µ>)2 + · · · .
◮ Core tensors U1, . . . , Uµ have been computed, defining X≤µ. ◮ Remaining tensor has size rµ × nµ+1 × · · · × nd.
Reshape remaining tensor into matrix Y≥µ ∈ Rrµ×nµ+1···nd. Will prove relations X T
≤µX≤µ = I,
Y≥µ+1 = X T
≤µX <µ>,
and X <µ> − X≤µY≥µ+1F ≤
1 + · · · + ε2 µ
(1) for µ = 1, . . . , d − 1 by induction. For µ = d − 1, this shows the theorem.
11
Line 3 in the µth step of the algorithm proceeds by reshaping the remaining tensor from step µ − 1 (corresponding to Y≥µ) into an array
Y≥µ = X≤µ−1X <µ−1> ⇒ Y <2> = (Inµ ⊗ X T
≤µ−1)X <µ>.
The matrix UL
µ ≡ U computed in Line 4 contains left singular vectors
µ)TUL µ = I. Together with the induction
assumption and the relation X≤µ = (Inµ ⊗ X≤µ−1)UL
µ,
this implies X T
≤µX≤µ = I and Y T ≥µ+1 = X T ≤µX <µ>. Moreover,
(I − UL
µ(UL µ)T)Y <2>F
= Y <2> − Trµ(Y <2>)F ≤ X <µ> − Trµ(X <µ>)F = εµ.
12
Finally, we obtain: X <µ> − X≤µX T
≤µX <µ>2 F
= X <µ> − (I ⊗ X≤µ−1)UL
µ(UL µ)T(I ⊗ X≤µ−1)TX <µ>2 F
= X <µ> − (I ⊗ X≤µ−1)(I ⊗ X≤µ−1)TX <µ>2
F
+ (I ⊗ X≤µ−1)(I ⊗ X≤µ−1)TX <µ> −(I ⊗ X≤µ−1)UL
µ(UL µ)T(I ⊗ X≤µ−1)TX <µ>2 F
= X <µ−1> − X≤µ−1X T
≤µ−1X <µ−1>2 F + (I − UL µ(UL µ)T)Y <2>2 F
≤ ε2
1 + · · · + ε2 µ−1 + ε2 µ
This completes the proof.
13
Consequence of the theorem:
Let Xbest denote the best approximation of X with TT rank (r1, . . . , rd−1). Then X − XSVD ≤ √ d − 1X − Xbest.
Since X <µ>
best
has rank rµ we have X − Xbest ≥ max
µ=1,...,r−1 X <µ> − X <µ> best F
= max
µ=1,...,r−1 εµ ≥
1 √ d − 1
1 + · · · + ε2 d
≥ 1 √ d − 1X − XSVD
EFY. Develop a variant of TT-SVD, which for a prescribed accuracy ε > 0 determines the TT ranks adaptively such that X − XSVD ≤ ε. 14
The TT decomposition constructed by TT-SVD satisfies orthogonality relations (UL
µ)TUL µ = Irµ,
X T
≤µX≤µ = Irµ
for µ = 1, . . . , d − 1. Such a TT decomposition is called left-orthogonal. A TT decomposition is called right-orthogonal if UR
µ(UR µ)T = Irµ,
X≥µX T
≥µ = Irµ
for µ = 2, . . . , d. Benefits of orthogonal decompositions
◮ Numerical stability (see, e.g., Section 3.7 of [Higham’2002] on
stability of matrix products).
◮ Acceleration of operations.
15
Given TT decomposition with cores U1, . . . , Ud, can always compute left-/right-orthogonal TT decomposition for the same tensor. Algorithm to obtain left-orthogonal TT decomposition: for µ = 1, 2, . . . , d − 1 do Compute QR decomposition UL
µ = QR.
Set UL
µ ← Q.
Update Uµ+1 ← R ◦1 Uµ+1. end for Algorithm to obtain right-orthogonal TT decomposition: for µ = d, d − 1, . . . , 2 do Compute QR decomposition (UR
µ)T = QR.
Set UR
µ ← QT.
Update Uµ−1 ← R ◦3 Uµ−1. end for
16
Task: Approximate TT decomposition with TT cores U1, . . . , Ud having relatively large TT ranks (R1, . . . , Rd) by TT decomposition with TT cores V1, . . . , Vd having smaller TT ranks (r1, . . . , rd). TT-SVD starts with SVD of X <1> = UL
1X T ≥2.
Observation: If X is right-orthogonal then X≥2 has orthonormal columns and only need to compute SVD of UL
1!
Compute rank-r1 approximation (e.g., via truncated SVD): UL
1 ≈ V L 1 ΣW T
This defines the first TT core V1 of the compressed tensor. Move rest to second core: U2 ← ΣW T ◦1 U2. X≥3 remains the same and thus still has orthonormal columns!
17
Input: X ∈ Rn1×···×nd in TT decomposition with TT cores Uµ ∈ RRµ−1×nµ×Rµ, smaller target TT rank (r1, . . . , rd−1). Output: TT cores Vµ ∈ Rrµ−1×nµ×rµ that define a TT decomposition approximating X.
1: Set r0 = rd = 1. 2: Right-orthogonalize TT cores Uµ. 3: for µ = 1, . . . , d − 1 do 4:
Compute rank-rµ approximation UL
µ ≈ VΣW T (e.g., via SVD)
5:
Reshape V into Vµ ∈ Rrµ−1×nµ×rµ.
6:
Update Uµ+1 ← ΣW T ◦1 Uµ+1.
7: end for 8: Set Vd = Ud.
◮ Complexity: O(dnR3). ◮ Mathematical equivalence to full TT-SVD error bounds carry
18
◮ Introduced by Roger Penrose. ◮ Heavily used in quantum mechanics (spin networks). ◮ Useful to gain intuition and guide design of algorithms. ◮ This is the matrix product C = AB:
Cij =
r
AikBkj
19
Xijk =
r1
r2
r3
Cℓ1ℓ2ℓ3Uiℓ1Vjℓ2Wkℓ3
◮ r1 × r2 × r3 core tensor C ◮ n1 × r1 matrix U spans first mode ◮ n2 × r2 matrix V spans second mode ◮ n3 × r3 matrix W spans third mode.
20
◮ X implicitly represented by four r × n × r tensors and two n × r
matrices
◮ More detailed picture:
n1 U1 n2 U2 n3 U3 n4 U4 n5 U5 n6 U6 r1 r2 r3 r4 r5
21
◮ Carrying out contractions requires O(dnr 4) instead of O(nd)
22