Hierarchical Tensor Representations
- R. Schneider (TUB Matheon)
Hierarchical Tensor Representations R. Schneider (TUB Matheon) - - PowerPoint PPT Presentation
Hierarchical Tensor Representations R. Schneider (TUB Matheon) Paris 2014 Acknowledgment DFG Priority program SPP 1324 Extraction of essential information from complex data Co-workers: T. Rohwedder (HUB), A. Uschmajev (EPFL Laussanne) W.
d
d
p, gr,s p,q ∈ R,
d
paT p aq + d
r,s aT r aT s apaq .
d
where A := 1
1
1 −1
and discrete annihilation operators ap ≃ ap := S ⊗ . . . ⊗ S ⊗ A(p) ⊗ I ⊗ . . . ⊗ I and creation operators a†
p ≃ aT p := S ⊗ . . . ⊗ S ⊗ AT (p) ⊗ I ⊗ . . . ⊗ I
i=1 Vi,
i=1 Rni = R(Πd
i=1ni)
i=1Ii → K , x = (x1, . . . , xd) → U = U[x1, . . . , xd] ∈ H , d = 1: n-tuples (Ux)n
x=1, or x → U[x], or d = 2: matrices
i=1 Vi,
i=1 Rn = R(nd)
r
i=1 Vi,
i=1 Rn = R(nd)
r
i=1 Vi,
i=1 Rn = R(nd)
r
i=1 ui[xi, k]
{1,2,3,4,5}
{4,5}
5
3 2 1 {1,2,3} {1,2}
{U:dim U ≤r,U⊂V}
{U:dim U ≤r,U⊂V} K
2
{V∈U1⊗U2:U1⊂Rn1,U2⊂Rn2 ; dim U1 ≤r}
i=1 Vi, in the sense
ki : ki = ri} ⊂ Vi , rank tuple r = (r1, . . . , rd) .
k1 ⊗ · · · ⊗ bd kd core tensor
r1
rd
d
ki[xi]
ki=1 (→ Graßmann man.)
r1
rd
d
{1,2,3,4,5} 1 2 3 5 4
r1
rd−1
d
{1,2,3,4,5} {1} {2,3,4,5} {2} {3,4,5} {4,5} {5} {3} {4}
U1 U2 U3 U4 U5 r1 r2 r3 r4 n1 n2 n3 n4 n5
{1,2,3,4,5}
{4,5}
5
3 2 1 {1,2,3} {1,2}
{1,2,3,4,5}
{4,5}
5
3 2 1 {1,2,3} {1,2}
{1,2,3,4,5}
{4,5}
5
3 2 1 {1,2,3} {1,2}
{1,2,3,4,5}
{4,5}
5
3 2 1 {1,2,3} {1,2}
i
ℓ
rα1
rα2
i
j
HT ≃ tree tensor network states in quantum physics (Cirac, Verstraete, Eisert ..... )
r1
rd−1
uhn (2012)
2 ⊗ · · · V ∗ d
r1
rd−1
uhn (2012)
3 ⊗ · · · V ∗ d
r1
rd−1
uhn (2012)
d
r1
rd−1
R1
Rd−1
For i = 1, . . . , d − 1 compute, Ui[ki−1, xi, ˜ ki] :=
Ri
ki−1=1
Vi−1[ki−1, ˜ ki−1]Ui[˜ ki−1, xi, ki] we decompose Ui[ki−1, ni, ˜ ki] =
ri
Ui[ki−1, ni, ki]Vi[ki, ˜ ki]
r
d
1 U2[x2]G2G−1 2
i=1Xi = ×d i=1(Rri−1niri) , Gr := ×d−1 i=1 Gi = ×d−1 i=1 GL(Rri)
i−1Ui(xi)Gi , i = 1, . . . , d, Ui ∈ Xi .
i=1 Xi
x1,...,xi
◮ There exist a well defined rank tuple r := (rt)t∈T,
◮ Mr = {U ∈ H : rt = rank Ut, t ∈ T} is analytic manifold
i=1 Xi
◮
Hackbusch & Falco ◮
networks, MERA etc.
p − 1 2, (e.g. Nuclear norm p = 1)
i
t,i
p ,
2 4t+1, ⇒ s = 2t
Principal ideas of hierarchical tensors have been invented several times:
(70s)
uhn (HT) (2009)
◮ HT - Hackbusch & K¨ uhn (2009), TT - Oseledets & Tyrtyshnikov (2009) ◮ MPS- Affleck et al. AKLT (Affleck, Kennnedy, Lieb, Takesaki 1987), Fannes, Nachtergale & Werner (92), DMRG- S: White (91), ◮ HOSVD-Laathawer et.al. (2001), HSVD Vidal (2003), Oseledets (09), Grasedyck (2010), K¨ uhn (2012) ◮ Riemannian optimization - Absil et al. (2008), Lubich, Koch, Conte, Rohwedder,
◮ Oseledets, Khoromskij, Savostyanov, Dolgov, Kazeev, ... ◮ Grasedyck, Ballani, Bachmayr, Dahmen, ... ◮ Falco, Nouy, Ehrlacher .... ◮ Physics: Cirac, Verstraete, Schollw¨
UM
WARNING: Hillar & Lim (2011): Most tensor problems are NP hard if d ≥ 3. for example: best rank 1 approximation (multiple local minima).
We have fixed our costs so far. But, in order to achieve a desired accuracy, we must enrich our model class (systematically).
[Cances, Ehrlacher& Lelievere], [Falco & Nouy ] and coworkers. Bachmayr & Dahmen
Espig,Hackbusch, Rohwedder & Schneider (2010)
J (W) = U − W2 , W ∈ M
AU = B or g(U) = 0 here J (W) := AW − B2
∗ resp.
F(W) := g(W)2
∗ .
J (W) := 1 2 AW, W − B, W
U = argmin {J (W) = AW, W : W, W = 1} .
j u u j+1
t∈T,0<k≤rt
Γ γ < C(d) suff. small.
Lojasiewicz (-Kurtyka) inequality J (V)θ − J (U)θ ≤ Γgrad J (V) , 0 < θ ≤ 1 2 , UV ≤ δ . LK inequality is valid on algebraic sets, o-minimal structures etc. [Bolte et al.]. It is a powerful mathematical tool for proving convergence.
2: linear convergence Un − U qnU1 − U0, q < 1.
2 : Un − U n−
θ 2−θ
i=1Rni
sensing: Blumensath et al. , matrix recovery : Tanner et al.
Silva & Herrmann (2013)
◮ IHT converges only if the pre-conditioner is sufficiently good. Convergence is linear. ◮ IHT can be easily combined with enrichment strategies (r ↑ (see also Bachmayr / Damen) ◮ RGI is fast (avoiding large HSVD), but only to local minimizers. ◮ RGI requires special care at singular point (where s < r). ◮ Good preconditioners can speed up the convergence of RGI ◮ Subspace accelerations like CG, BFGS, DIIS, Anderson are powerful using an appropriate vector transport (i.e. transporting previous tangent vectors to the new tangential space) (Pfeffer 2014, Vandereycken, Haegemann et al. (CG) ) Practically ◮ good initial guesses are important ◮ RGI must be combined with enrichment strategies, e.g. greedy techniques, two-site DMRG or AMEN (Sebastianov et al.)
UM
UM
UM
i=1 I ⊗ · · · I ⊗ Ai ⊗ I · · · , Ai = H1 0(Ω) ∩ H2(Ω) → L2(Ω).
V(s)∈Mr
d−1
d−1
d−1
j=0 R2 by Tf := U.
◮ Storage complexity N is reduced to 2r 2 log2 N! (linear in
◮ Allow e.g extreme fine grid size h = o(ǫ) = 2−d = 1 N .
j=1 νj2j−1)
j=1 νj2j−1 = Πd
j=1e2πiνl2j−1 , νj = 0, 1,
3
4 cos(x2)
x
20 40 60 80 100 −1 −0.5 0.5 1 x y x−1/4 sin(2/3 x3/2) 10 12 14 16 18 20 −1 −0.5 0.5 1 x y sin(x/4) cos(x
2)
0.2 0.4 0.6 0.8 1 −1 −0.5 0.5 1 x y sin(1/x)
3 5 10 15 20 22 10 16 20 24 30 40 50 60 d
x−1/4 sin(2/3 x3/2), x∈ ]0,100[ sin(1/x), x∈ ]0,1[ sin(x/4) cos(x2), x∈ ]10,20[
2 4 6 8 10 12 14 16 18 20 10 20 30 40 50 60 i ri x−1/4 sin(2/3⋅ x3/2), x ∈ ]0,100[ sin(1/x), x ∈ ]0,1[ sin(x/4) cos(x2), x ∈ ]10,20[
N
i,j=1
FCI = N
FCI =
d−1
◮ µi = 1 means ϕi is (occupied) in Ψ[. . .]. ◮ µi = 0 means ϕi is absend (not occupied) in Ψ[. . .].
d
FCI = {Ψ : Ψ =
d
p ≃ aT p := S ⊗ . . . ⊗ S ⊗ AT (p) ⊗ I ⊗ . . . ⊗ I
FCI =: V → V′
d
paT p aq + d
r,s aT r aT s apaq .
d
pap , ≃
d
p Ap .
d
j=1 K2
sweep
P1,1AP1,1u = P
1,1b T T A1 U2 = U3 Ud-1 Ud U2 U3 Ud-1 Ud A2 A3 Ad-1 Ad n1 n1r1 U1 n1 r1 r1 B1 U2 U3 Ud-1 Ud B2 B3 Bd-1 Bd n1
1st MIS
P
d-1,1AP d-1,1u = P d-1,1b T T
d+1-th MIS
rd-1 rd-1 rd-1 rd-1 Ud Ud U2 U2 A1 U1 U3 U1 U3 A2 A3 Ad-1 Ad n1 n1 n2 n2 n3 n3 nd-1 nd-1 nd nd = Ud-1 nd-1 rd-1 rd-2 rd-1 rd-1 Ud U2 U1 U3 nd-1 B1 B2 B3 Bd-1 Bd
2d-2-th MIS
P2,1AP2,1u = P
2,1b T T A1 U1 U3 Ud-1 Ud U1 U3 Ud-1 Ud A2 A3 Ad-1 Ad r1 r1 r2 r2 n2 n2 = U2 n2 r2 r1 B1 B2 B3 Bd-1 Bd U1 U3 Ud-1 Ud r1 r2 n2
P2,1AP2,1u = P
2,1b T T A1 U1 U3 Ud-1 Ud U1 U3 Ud-1 Ud A2 A3 Ad-1 Ad r1 r1 r2 r2 n2 n2 = U2 n2 r2 r1 B1 B2 B3 Bd-1 Bd U1 U3 Ud-1 Ud r1 r2 n2
2nd MIS
P
d,1AP d,1u = P d,1b T T B1 B2 B3 Bd-1 Bd U2 U2 rd-1 rd-1 A1 U1 U3 Ud-1 U1 U3 Ud-1 A2 A3 Ad-1 Ad nd nd = Ud nd rd-1 U2 rd-1 U1 U3 Ud-1 nd
d-th MIS
f2(x1, x2, x3, x4) =
1 + (x2x3 −
1 x2x4 )2, f3(x1, x2, x3, x4) = tan−1 x2x3 − (x2x4)−1 x1
◮ Dimension d = 4, . . . , 128 varying ◮ Gridsize n = 10 ◮ Right-hand-side b of rank 1 ◮ Solution U has rank 13
joint work with B. Khoromskij, I. Oseledets
V(x1, . . . , xd) = 1 2
f
x2
k + d−1
k xk+1 − 1
3 x3
k
Timings and error dependence for the modified heat equation (imaginary time) with a Henon-Heiles potential time interval [0, 1], τ = 10−2, the manifold has ranks 10
recent joint paper with Legeza, Murg, Nagy, Verstraete (in preparation) dissoziation of a diatomic molecule LiF - first eigenvalues - tree tensor networks (HT)
2 4 6 8 10 12 14 −107.15 −107.1 −107.05 −107 −106.95 −106.9 −106.85 −106.8
GS, S=0 1XS, S=0 2XS, S=0 3XS, S=0
J.M. Claros -Bachelor thesis, M. Pfeffer, TT d = 4, r = 1, 3, Stojanac-Tucker d = 3
100 200 300 400 500 600 700 800 900 1000 10
−12
10
−10
10
−8
10
−6
10
−4
10
−2
10 10
2
iterations error of completion 10% 20% 40% 100 200 300 400 500 600 700 800 900 1000 10
−14
10
−12
10
−10
10
−8
10
−6
10
−4
10
−2
10 10
2
error of residual 10% 20% 40%
10 20 30 40 50 60 70 10 20 30 40 50 60 70 80 90 100 percentage of measurements percentage of success Recovery of low!rank tensors of size 10 x 10 x 10 r=(1,1,1) r=(2,2,2) r=(3,3,3) r=(5,5,5) r=(7,7,7) 5 10 15 20 25 30 35 10 20 30 40 50 60 70 80 90 100 percentage of measurements percentage of success Recovery of low!rank tensors of size 10 x 10 x 10 r=(1,1,2) r=(1,5,5) r=(2,5,7) r=(3,4,5)
UM
Koch&Lubich (2009), Holtz/Rohwedder/Schneider (2011a), Uschmajew/Vandereycken (2012), Arnold& Jahnke (2012) Lubich/Rohwedder/Schneider/Vandereycken (2012)
t , Bt resp. WT t , Ut = 0 ∈ Rkt×kt }
Koch&Lubich (2009), Holtz/Rohwedder/Schneider (2011a), Uschmajew/Vandereycken (2012), Arnold& Jahnke (2012) Lubich/Rohwedder/Schneider/Vandereycken (2012)
t , Bt resp. WT t , Ut = 0 ∈ Rkt×kt }
Koch&Lubich (2009), Holtz/Rohwedder/Schneider (2011a), Uschmajew/Vandereycken (2012), Arnold& Jahnke (2012) Lubich/Rohwedder/Schneider/Vandereycken (2012)
t , Bt resp. WT t , Ut = 0 ∈ Rkt×kt }
Lubich et al. (2009), Holtz/Rohwedder/Schneider (2011a), Uschmajew/Vandereycken (2012), Lubich/Rohwedder/Schneider/Vandereycken (2012), Arnold/Jahnke (2012)
t , Bt resp. WT t , Ut = 0 ∈ Rkt×kt }
t projector to Gt, embedding operator Et = EU t as
t ET t J ′(U) = 0
t E+ t f(U),
tr J ′(U) = 0
t f(U).
i=1 I ⊗ · · · I ⊗ Ai ⊗ I · · · , Ai = H1 0(Ω) ∩ H2(Ω) → L2(Ω).
V(s)∈Mr