Low Rank Approximation Lecture 10
Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch
1
Low Rank Approximation Lecture 10 Daniel Kressner Chair for - - PowerPoint PPT Presentation
Low Rank Approximation Lecture 10 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 Time-dependent low-rank approximation Goal: Approximate given matrix A ( t ) at every time t
1
Goal: Approximate given matrix A(t) at every time t ∈ [0, T] by rank-r matrix. Motivation for taking time-dependence into account:
◮ If σr(t) > σr+1(t) for all t ∈ [0, T] then Tr(A(t)) inherits
smoothness of A wrt t [Baumann/Helmke’2003; Mehrmann/Rath’1993].
◮ Low-rank approximation at ti may yield valuable information for
low-rank approximation at nearby ti+1.
◮ Allows us to work with tangent space (linear) instead of manifold
(nonlinear). Much easier to impose additional structure. More general setting: Approximate solution of ODE ˙ A(t) = F(A(t)), A(0) = A0. by trajectory X(t) in Mr. Main application for low-rank tensors: Discretized time-dependent high-dimensional PDEs.
2
Recall definition of tangent vectors. If X : [0, T] → Rm×n is a smooth curve in Mr then ˙ X(t) ∈ TX(t)Mr, ∀t ∈ [0, T]. Dynamical low-rank approximation: Given A : [0, T] → Rm×n, for every t construct X(t) ∈ Mr satisfying ˙ X(t) ∈ TX(t)Mr such that ˙ X(t) − ˙ A(t) = min!
◮ Differential equation needs to be supplemented by initial value,
say X(0) = Tr(A(0)), which is in Mr unless A(0) has rank less than r.
◮ Hope: X stays close to A if A admits good rank-r approximation
for all t.
◮ For efficiency, aim at finding equivalent system of ≈ dim Mr
differential equations.
3
Counterexample to hope: A(t) = 1 10−5et−1 10−5e1−t , t ∈ [0, 10]. For r = 2, dynamical low-rank approximation yields X(t) = 1 10−5e1−t . A(T) − X(T)F = 104 but A(T) − Tr(A(T))F = 10−14. Dynamical low-rank approximation is “blind” to evolution of 10−5et−1. Can only hope for good approximation on short time intervals and if σr(t) > σr+1(t).
4
˙ X(t) ∈ TX(t)Mr such that ˙ X(t) − ˙ A(t) = min! is equivalent to differential equation ˙ X(t) = PX(t)( ˙ A(t)), (1) where PX(t) is the orthogonal projection onto TX(t)Mr. Will now omit dependence on time. Given X = USV T with S ∈ Rr×r (not necessarily diagonal), U ∈ St(m, r) and V ∈ St(n, r), we have PX( ˙ A) = PU ˙ APV + P⊥
U ˙
APV + PU ˙ AP⊥
V
= U ˙ SV T + ˙ USV T + US ˙ V T, where ˙ S = UT ˙ AV ˙ U = P⊥
U ˙
AVS−1 ˙ V = P⊥
V ˙
ATUS−T.
5
Dynamical low-rank approximation is equivalent to system of differential equations ˙ S = UT ˙ AV ˙ U = P⊥
U ˙
AVS−1 ˙ V = P⊥
V ˙
ATUS−T with initial values S(0) = S0 ∈ Rr×r, U(0) = U0 ∈ St(m, r), V(0) = V0 ∈ St(n, r). Remarks:
◮ ˙
U ∈ TUSt(m, r) and thus U stays in St(m, r); ˙ V ∈ TVSt(n, r) and thus V stays in St(n, r). This can be preserved using geometric integration [Hairer/Lubich/Wanner’2006] or combining standard integrators with retractions.
◮ Step size control should aim at controlling error for USV T and
not for individual factors
◮ Presence of S−1 makes differential equations very stiff for
ill-conditioned S (σr close to zero); results in small step sizes of explicit integrators.
6
Error bound from Theorem 5.1 in [Koch/Lubich’2007].
Suppose that for t ∈ [0, T] we have
◮ σr(A(t)) > σr+1(A(t)); ◮ A(t) − Tr(A(t))F ≤ ρ/16 with ρ = min t∈[0,T] σr(A(t)).
Then, with X(0) = Tr(A(0)), we have X(t) − Tr(A(t))F ≤ 2βeβt t A(s) − Tr(A(s))F ds where β = 8µ/ρ.
7
Extension to dynamical system ˙ A = F(A) with F : Rm×n → Rm×n. Dynamical low-rank approximation: Construct X(t) ∈ Mr satisfying ˙ X(t) ∈ TX(t)Mr such that ˙ X(t) − F(X(t)) = min! Equivalent to dynamical system ˙ X(t) = PX(t)(F(X(t))). Equivalent to ˙ S = UTF(X(t))V ˙ U = P⊥
U F(X(t))VS−1
˙ V = P⊥
V F(X(t))TUS−T
EFY. Consider the linear matrix differential equation ˙ A = LA + AR for L(t) ∈ Rm×m and R(t) ∈ Rn×n. Show that A(t) ∈ Mr for all t > 0 if A(0) ∈ Mr . Write down the differential equations for U, S, V. EFY. Consider the nonlinear matrix differential equation ˙ A = F(A) := LA + AR + A ◦ A, where ◦ denotes the elementwise product. Develop an efficient method for evaluating F(A(t)) for A(t) ∈ Mr when r ≪ m, n. 8
For error analysis assume for Y(t) := Tr(A(t)) that
◮ F(Y(t))F ≤ µ, F(X(t))F ≤ µ for t ∈ [0, T]; ◮ F(Z1) − F(Z2), Z1 − Z2 ≤ λZ1 − Z22 F for all Z1, Z2 ∈ Mr and
some fixed λ ∈ R;
◮ F(Y(t)) − F(A(t))F ≤ LY(t) − A(t)F for t ∈ [0, T].
Theorem 6.1 in [Koch/Lubich’2007]:
Under assumptions above, suppose that for t ∈ [0, T] we have
◮ σr(A(t)) > σr+1(A(t)); ◮ A(t) − Tr(A(t))F ≤ ρ/16 with ρ = min t∈[0,T] σr(A(t)).
Then, with X(0) = Tr(A(0)), we have X(t) − Tr(A(t))F ≤ (2β + L)e(2β+λ)t t A(s) − Tr(A(s))F ds
9
Numerical integrators face severe problems as σr → 0. Unfortunately, σr ≈ 0 is a very likely situation, as problems of interest typically feature quick singular value decay. Regularization (= increasing small singular values of S by ǫ > 0) introduces additional errors whose effect is poorly understood. Fundamental underlying problem: U, V ill-conditioned in the presence
Idea by [Lubich/Oseledets’2014]: Splitting integrator that is insensitive to σr → 0.
10
Recall that ˙ X(t) = PX(t)( ˙ A(t)), with PX(Z) = PUZPV + P⊥
U ZPV + P⊥ V ZPU
= ZVV T − UUTZVV T + UUTZ = ZPrange(X T ) − Prange(X)ZPrange(X T ) + Prange(X)Z. One step of Lie-Trotter splitting integrator from t0 to t1 = t0 + h applied to this decomposition:
XI = ˙ APrange(X T
I ) on [t0, t1] with i.v. XI(t0) = X0 ∈ Mr.
XII = −Prange(XII) ˙ APrange(X T
II ) on [t0, t1] with i.v.
XII(t0) = XI(t1).
XIII = Prange(XIII) ˙ A on [t0, t1] with i.v. XIII(t0) = XII(t1). Approximation returned: X1 := XIII(t1). Standard theory [Hairer/Lubich/Wanner’2006]: Lie-Trotter splitting integrator has convergence order one.
11
Consider first step: Rhs ˙ APrange(X T
I ) ∈ TXIMr and hence XI ∈ Mr for t ∈ [t0, t1].
factorization XI = UISIV T
I .
Differentiation 1 ∂t (UISI)V T
I + UISI ˙
V T
I = ˙
XI
!
= ˙ AVIV T
I .
Satisfied if 1 ∂t (UISI) = ˙ AVI, ˙ VI = 0. In particular, primitive of ˙ AVI is AVI. In turn, this equation is solved by UI(t)SI(t) = UI(t0)SI(t0) +
Similar considerations for second and third step.
12
Solution of three split equations given by
A(t) − A(t0) TVII(t0),
TUIII(t0), Unassigned Quantities (VI, UII, VII, UIII) remain constant. Practical algorithm: Given X0 = U0S0V T
0 and with ∆A := A(t1) − A(t0), compute
I ∆AV0
III for LIII = V0ST II + ∆T AUI.
Returns U1S1V T
1 = UISIIIV T III
13
Remarks:
◮ KSL splitting is exact if A(t) has rank at most r (Theorem 4.1 by
Lubich/Oseledets) robustness in presence of small singular value σr [Kieri/Lubich/Wallach’2016].
◮ Symmetrization (KSLSK) leads to second order. ◮ Splitting extends to ˙
A = F(A) by setting △A := hF(Y0) (but exactness result does not hold).
14
◮ Extension to Tucker [Koch/Lubich’2010,
Lubich/Vandereycken/Wallach’2018], tensor train [Lubich/Oseledets/Vandereycken’2015]
◮ Application to 3D nonlinear PDEs [Nonnenmacher/Lubich’2009],
matrix differential equations [Mena et al.’2018], Vlasov-Poisson equation [Einkemmer/Lubich’2018], time-dependent PDEs with uncertainties [Musharbash/Nobile/Zhou’2014], quantum chemistry [Kloss/Burghardt/Lubich’2017] and physics [Haegeman et al.’2016].
◮ [Kieri/Vandereycken’2017]: Conceptually simpler approach by
combining standard integrators with low-rank truncation.
15