role of tensors in numerical mathematics
Lek-Heng Lim
University of Chicago
February 2, 2015
thanks: NSF DMS 1209136, DMS 1057064, AFOSR FA9550-13-1-0133
role of tensors in numerical mathematics Lek-Heng Lim University - - PowerPoint PPT Presentation
role of tensors in numerical mathematics Lek-Heng Lim University of Chicago February 2, 2015 thanks: NSF DMS 1209136, DMS 1057064, AFOSR FA9550-13-1-0133 what is a tensor? a tensor is a multilinear functional f : V 1 V d
Lek-Heng Lim
University of Chicago
February 2, 2015
thanks: NSF DMS 1209136, DMS 1057064, AFOSR FA9550-13-1-0133
f : V1 × · · · × Vd → R for k = 1, . . . , d, f(v1, . . . , αuk + βwk, . . . , vd) = αf(v1, . . . , uk, . . . , vd) + βf(v1, . . . , wk, . . . , vd)
get hypermatrix A = (aj1···jd) ∈ Rn1×···×nd where n1 = dim V1, . . . , nd = dim Vd
[L.-H. Lim, “Tensors and hypermatrices,” Chapter 15, 30 pp., in L. Hogben (Ed.), Handbook of Linear Algebra, 2nd Ed., CRC Press, 2013]
αA + βB = (αaj1···jd + βbj1···jd)
A ⊗ B = (ai1···idbj1···je) ∈ Rn1×···×nd×m1×···×me
A, Bd:1 = p
ai1···id−1kbkj2···je
form AB ∈ Rn×n×n
principle of linearity: in almost any natural process, if we make sufficiently small changes, the variation will be linear with respect to the changes principle of multilinearity: in almost any natural process, if we keep all but one factor constant, the principle of linearity will apply to that changing factor
1succumbing to popular (but erroneous) nomenclature, will use the term
‘tensor’ even when referring to a hypermatrix
f(x) ∈ R, ∇f(x) ∈ Rn, ∇2f(x) ∈ Rn×n, ∇3f(x) ∈ Rn×n×n, ∇4f(x) ∈ Rn×n×n×n, . . .
E(x ⊗ x ⊗ · · · ⊗ x) ∈ Rn×n×···×n
log E(exp(it, x)) ≈
m
i|α|κα(x) tα α!
(κα(x))|α|=1 ∈ Rn, (κα(x))|α|=2 ∈ Rn×n, (κα(x))|α|=3 ∈ Rn×n×n, (κα(x))|α|=4 ∈ Rn×n×n×n, . . .
−0.5 0.5 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 PCA right singular vectors Comp 1 Comp 2 −0.5 0.5 1 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Principal kurtosis components Comp 1 Comp 2
Figure: all Gaussian except 17 and 39; left: principal components; right: principal kurtosis components (thanks! Rasmus)
S ⊆ H1 ⊗ · · · ⊗ Hd
αψ1 ⊗ · · · ⊗ ψd + · · · + βϕ1 ⊗ · · · ⊗ ϕd
H1 ⊗ I ⊗ · · · ⊗ I + I ⊗ H2 ⊗ · · · ⊗ I + · · · + I ⊗ · · · ⊗ I ⊗ Hd Hamiltonian of unified system if systems do not interact
bosons state space of unified system is Sd(H) ⊆ H⊗d fermions state space of unified system is Λd(H) ⊆ H⊗d
Eijk = ei ⊗ ej ⊗ ek where e1, e2, e3 standard basis of C3
e1 = |u = state of the u-quark e2 = |d = state of the d-quark e3 = |s = state of the s-quark Eijk = composite state of the three quark states ei, ej, ek
2(e1 ⊗ e1 ⊗ e2 − e2 ⊗ e1 ⊗ e1) ∈ S(2,1)(C3)
ıve Bayes model Pr(x, y, z) =
H
P = πAρA ⊗ σA ⊗ θA + πCρC ⊗ σC ⊗ θC + πGρG ⊗ σG ⊗ θG + πTρT ⊗ σT ⊗ θT
pijk = πAρAiσAjθAk + πCρCiσCjθCk + πGρGiσGjθGk + πTρTiσTjθTk
identify neural fibers as from diffusion MRI data fODF Maxima Schultz–Seidel
[T. Schultz, A. Fuster, A. Ghosh, L. Florack, and L.-H. Lim, “Higher-order tensors in diffusion imaging,” pp. 129–161, Visualization and Processing of Tensors and Higher Order Descriptors for Multi-Valued Data, Springer, 2014]
aijkl = aijlk = ajilk = · · · = alkji
A = λ1v1 ⊗ v1 ⊗ v1 ⊗ v1 + · · · + λrvr ⊗ vr ⊗ vr ⊗ vr with r minimal and λ1 ≥ λ2 ≥ · · · ≥ λr > 0, v1 = · · · = vr = 1
positive semidefinite matrices
decomposition exist
f(x) = n
j1,...,jd=1 aj1···jdxj1 · · · xjd ∈ R[x1, . . . , xn]d
f(x) = r
i=1(vT i x)d
A = r
i=1 v⊗d i
[L.-H. Lim and T. Schultz, “Moment tensors and high angular resolution diffusion imaging,” preprint, (2015)]
approximate
[C. J. Hillar and L.-H. Lim, “Most tensor problems are NP-hard,” Journal of the ACM, 60 (2013), no. 6, Art. 45, 39 pp.]
1 2 3 4 1 2 3 4
3-colorings encoded as real solutions to bilinear system:
a1c1 − b1d1 − u2, b1c1 + a1d1, c1u − a2
1 + b2 1, d1u − 2a1b1, a1u − c2 1 + d2 1 , b1u − 2d1c1,
a2c2 − b2d2 − u2, b2c2 + a2d2, c2u − a2
2 + b2 2, d2u − 2a2b2, a2u − c2 2 + d2 2 , b2u − 2d2c2,
a3c3 − b3d3 − u2, b3c3 + a3d3, c3u − a2
3 + b2 3, d3u − 2a3b3, a3u − c2 3 + d2 3 , b3u − 2d3c3,
a4c4 − b4d4 − u2, b4c4 + a4d4, c4u − a2
4 + b2 4, d4u − 2a4b4, a4u − c2 4 + d2 4 , b4u − 2d4c4,
a2
1 − b2 1 + a1a3 − b1b3 + a2 3 − b2 3, a2 1 − b2 1 + a1a4 − b1b4 + a2 4 − b2 4, a2 1 − b2 1 + a1a2 − b1b2 + a2 2 − b2 2,
a2
2 − b2 2 + a2a3 − b2b3 + a2 3 − b2 3, a2 3 − b2 3 + a3a4 − b3b4 + a2 4 − b2 4, 2a1b1 + a1b2 + a2b1 + 2a2b2,
2a2b2 + a2b3 + a3b2 + 2a3b3, 2a1b1 + a1b3 + a2b1 + 2a3b3, 2a1b1 + a1b4 + a4b1 + 2a4b4, 2a3b3 + a3b4 + a4b3 + 2a4b4, w2
1 + w2 2 + · · · + w2 17 + w2 18
bilinear system: yTAkz = xTBkz = xTCky = 0, k = 1, . . . , n
. . . the great watershed in optimization isn’t between linearity and nonlinearity, but convexity and nonconvexity.
to ε-accuracy in polynomial time if we have:
(i) self-concordant barrier function (ii) gradient and Hessian computable in polynomial time
[Nesterov–Nemirovskii, 1994]
second-order cone programs, geometric programs, etc
1 − 1 ω(G) = 27 2 max
(u,w)∈Sn+m−1
2
where Sd−1 = {x ∈ Rd : x = 1}
2 ≤ 4σ
3 for all h ∈ Rn
∇2f(x)(h, h) =
n
∂2f(x) ∂xi∂xj hihj, ∇3f(x)(h, h, h) =
n
∂3f(x) ∂xi∂xj∂xk hihjhk
some f at a point, i.e., NP-hard
[L.-H. Lim, “Self-concordance is NP-hard,” preprint, (2013)]
a11 a12 · · · a1n a21 a22 · · · a2n . . . . . . ... . . . an1 an2 · · · ann × b11 b12 · · · b1n b21 b22 · · · b2n . . . . . . ... . . . bn1 bn2 · · · bnn = n
j=1 a1j bj1
n
j=1 a1j bj2
· · · n
j=1 a1j bjn
n
j=1 a2j bj1
n
j=1 anj bj2
· · · n
j=1 a2j bjn
. . . . . . ... . . . n
j=1 anj bj1
n
j=1 anj bj2
· · · n
j=1 anj bjn
a11 a12 · · · a1n a21 a22 · · · a2n . . . . . . ... . . . an1 an2 · · · ann
b11 b12 · · · b1n b21 b22 · · · b2n . . . . . . ... . . . bn1 bn2 · · · bnn = a11b11 a12b12 · · · a1nb1n a21b21 a22b22 · · · a2nb2n . . . . . . ... . . . an1bn1 an2bn2 · · · annbnn
multiplication
are both rings
A ∈ V ⊗ V∗ and B ∈ V ⊗ V∗
[A]B ∈ Cn×n and [B]B ∈ Cn×n i.e., matrices are coordinate representations of 2-tensors
AB = C in V ⊗ V∗ then [A]B × [B]B = [C]B in Cn×n
[A]B′ = X[A]BX −1 where X is change-of-basis matrix from B to B′
(XAX −1)(XBX −1) = XABX −1
the action of GLn(C) by conjugation
“physical law takes same mathematical form in all coordinate systems”
hypermatrices? Cn×n×n × Cn×n×n → Cn×n×n, (A, B) → AB
Cm×n×p × Cm×n×p → Cm×n×p, (A, B) → AB
Cn1×···×nd × Cn1×···×nd → Cn1×···×nd, (A, B) → AB
the usual matrix product
AB = n
i,j,k=1 aikbkjEij =
n
i,j,k=1 ϕik(A)ϕkj(B)Eij
where Eij = eieT
j ∈ Cn×n and ϕij : Cn×n → C, A → aij
µn = n
i,j,k=1 ϕik ⊗ ϕkj ⊗ Eij
so AB = µn(A, B)
µn ∈ (Cn×n)∗ ⊗ (Cn×n)∗ ⊗ Cn×n
required to multiply two n × n matrices?
rank(X) = min
r
p=1 λpup ⊗ vp ⊗ wp
ω := inf{α | rank(µn) = O(nα)}
ıve: O(n3)
µn : Cn×n × Cn×n → Cn×n, (A, B) → AB,
(A, B, C) → tr(ABC)
µn isomorphic to µn: µn ∈ (Cn×n)∗⊗(Cn×n)∗⊗Cn×n ≃ (Cn×n)∗⊗(Cn×n)∗⊗(Cn×n)∗ ∋ µn
matrices [Landsberg–LHL–Ye, 2013] (A, B) → AB ⇔ (A, B, C) → tr(ABC)
m,n,p
i,j,k=1 aijbjkcki
Theorem (first fundatmental theorem of invariant theory)
Invariant ring C[V ∗r ⊕ V s]GL(V) is generated by contractions η(i,j) for 1 ≤ i ≤ r and 1 ≤ j ≤ s
g · f(x) = f(g−1x) where g ∈ GL(V), f ∈ C[E], x ∈ E
η(i,j)(f1, . . . , fr, v1, . . . , vs) = fi(vj) where fi ∈ V ∗ and vj ∈ V
U
η(1,2)
V
η(1,2)
W
even order: d even, AB = C,
ci1···id/2k1···kd/2 = n
j1,...,jd/2=1 ai1···id/2j1···jd/2bj1···jd/2k1···kd/2
[J. M. Landsberg, L.-H. Lim, and K. Ye, “Can one multiply higher-order tensors?” preprint, (2015)]
low-rank matrix completion be generalized to higher-order tensors?
underestimator of ‘ℓ0-norm’ on ℓ∞-norm unit ball x1 ≤ nnz(x)x∞ for all x ∈ Cn [Donoho, Cand` es, Tao, et al.]
underestimator of rank on spectral norm unit ball X∗ ≤ rank(X)Xσ for all X ∈ Cm×n [Cand` es, Fazel, Recht, et al.]
rank(X) = min
r
i=1 λiui ⊗ vi ⊗ wi
X∗ = inf r
i=1|λi| : X =
r
i=1 λiui ⊗ vi ⊗ wi, r ∈ N
Tσ = sup
x,y,z=0
|T(x, y, z)| xyz
X∗ ≤ rank(X)Xσ?
spectral norm unit ball [LHL–Comon, 2013]
spectral norm unit ball
rank(µn) ≤ cnlog2 7 for some c > 0 and [Derksen, 2013] µn∗ = n3, µnσ = 1
lim
n→∞
µn∗ rank(µn) = ∞
[L.-H. Lim and P . Comon, “Blind multilinear identification,” IEEE Transactions
B ∈ Rm1×···×me in the pth and qth indices with np = mq is the (d + e − 2)-tensor A, Bp:q := np
k=1 ai1...ip−1kip+1...idbj1...jq−1kjq+1...je
(p1, . . . , ps)th and (q1, . . . , qs)th indices is (d + e − 2s)-tensor A, Bp1,p2,...,ps:q1,q2,...,qs := np1,np2,...,nps
ip1,ip2,...,ips=1 ai1...idbj1...je
k = 1, . . . , d, subblock AS1···Sd of A is AS1···Sd := [aj1...jd]j1∈S1,...,jd∈Sd ∈ R|S1|×···×|Sd|
aj1···jd = ajσ(1)···jσ(d)
aijk = aikj = ajik = ajki = akij = akji
(Sym(A))i1...id := 1 d!
for ik = 1, . . . , n, k = 1, . . . , d
with A, x⊗d ≥ 0 for all x ∈ Rn definite if > 0 for all x = 0, written A ≻ 0
minimize f(x) subject to hi(x) = 0, i = 1, . . . , m gj(x) ≤ 0, j = 1, . . . , r x ∈ Ω (1)
and Ω is nonempty open
1, . . . , r} denotes feasible set of (1)
active set at point x
William Karush, circa 1987 Fritz John at NYU, circa 1987 Harold Kuhn and Albert Tucker, 1980
several variables with inequalities as side constraints,” M.Sc. thesis, University of Chicago, Chicago, IL, 1939
inequalities as subsidiary conditions,” pp. 187–204, Studies and Essays for Courant 60th Birthday, Wiley, New York, NY, 1948
programming,” pp. 481–492, Proceedings of the 2nd Berkeley Symposium on Mathematical Statistics and Probability, University
1951
KKT theorem,” Doc. Math., Extra volume: Optimization stories (2012),
Theorem (first order KKT conditions)
Suppose x∗ ∈ Feas is a local minimum of (1) and the gradient vectors ∇hi(x∗), i = 1, . . . , m, ∇gj(x∗), j ∈ I(x∗) are linearly independent ( LICQ). Then there exist v ∈ Rm and w ∈ Rr
+ such that
∇f(x∗) + m
i=1 vi∇hi(x∗) +
r
j=1 wj∇gj(x∗) = 0,
(2) and wjgj(x∗) = 0, j = 1, . . . , r. (3)
Theorem (second order KKT conditions)
Let N :=
Then the matrix B := ∇2f(x∗) + m
i=1 vi∇2hi(x∗) +
r
j=1 wj∇2gj(x∗)
is positive semidefinite on N, i.e., dTBd ≥ 0 for all d ∈ N, (4)
without loss of generality, assume N = {d ∈ Rn : dκ+1 = · · · = dn = 0} and B = A
B
B
T
, A ∈ Rp×p and A ≻ 0 define C := ∇3f(x∗) + m
i=1 vi∇3hi(x∗) +
r
j=1 wj∇3gj(x∗) ∈ Rn×n×n
and D := ∇4f(x∗)+ m
i=1 vi∇4hi(x∗)+
r
j=1 wj∇4gj(x∗) ∈ Rn×n×n×n
font: a scalar, a vector, A matrix, A 3-tensor, A 4-tensor
let R := {1, . . . , p}, S := {p + 1, . . . , κ} and T := {κ + 1, . . . , n}, consider following subblocks:
B ∈ Rp×(n−κ): subblock BRT of B
B ∈ R(n−κ)×(n−κ): subblock BTT of B
C ∈ Rp×(κ−p)×(κ−p): subblock CRSS of C
C ∈ R(κ−p)×(κ−p)×(n−κ): subblock CSST of C
M =
T
F ∈ R(n−κ)×n×n be 3-tensor defined by F, ei1:1 =
for i = 1, . . . , m ∇2gji−m(x∗) for i = m + 1, . . . , n − κ
FUSS ∈ R(n−κ)×(κ−p)×(κ−p): (U, S, S)-subblock of F
FUSR ∈ R(n−κ)×(κ−p)×p: (U, S, R)-subblock of F
FUST ∈ R(n−κ)×(κ−p)×(n−κ): (U, S, T)-subblock of F
F ∈ Rp×(κ−p)×(n−κ) be 3-tensor defined by F
kji
for all possible i, j, k
G ∈ R(n−κ)×n×n×n be 4-tensor with
for all i = 1, . . . , m ∇3gji−m(x∗) for all i = m + 1, . . . , n − κ
GUSSS ∈ R(n−κ)×(κ−p)×(κ−p)×(κ−p) be (U, S, S, S)-subblock of G
W := C − 2 F, BM−13:2 − BM−1, F2:1
H := D+3M−T BM−1, F ⊗F1,2:1,4 −6M−1, C ⊗F1,2:3,4 + 4BM−1, 3M−1, F ⊗ F1,2:3,4 − G2:1
Theorem (Hu–LLH, 2015)
Let x∗ ∈ Feas be a local minimum of (1) and all notations be as
3 Sym(BM−1, F2:1)−C = 0 and H−3A−1, W⊗W1,2:1,4 0. If x∗ ∈ Feas satisfying (2), (3) and (4), and the strict complementarity condition, i.e., wj > 0 for all j ∈ I(x∗); and if 3 Sym(BM−1, F2:1)−C = 0 and H−3A−1, W⊗W1,2:1,4 ≻ 0, then x∗ is a strict local minimum. No such conditions are possible for orders > 4.
[S. Hu and L.-H. Lim, “Higher-order KKT conditions,” preprint, (2015)]
minimize f(x, y, z) = 1
4xy3 + 3 2y2z2 − y3z − z4
subject to 2x + y − 2 = 0 x + 2y − 4 ≤ 0
multipliers −10/3 and 14/3
{x ∈ R3 : x = y = 0}
∂z2
— it is however zero
solution (Hu–LHL, 2013)
∂3f ∂z3 = −24 = 0 implies that x∗ cannot be a local minimum
f twice continuously differentiable in neighborhood of x∗ ∈ Rn (i) if x∗ is a local minimum of f, then ∇f(x∗) = 0, ∇2f(x∗) 0 (ii) if ∇f(x∗) = 0, ∇2f(x∗) ≻ 0, then x∗ is strict local minimum of f
A
C ∈ Rp×(n−p)×(n−p): (1, 2, 2)-subblock of ∇3f(x∗)
C = 0, D − 3A−1, C ⊗ C1,2:1,4 0
C = 0, D − 3A−1, C ⊗ C1,2:1,4 ≻ 0, then x∗ is strict local minimum of f