role of tensors in numerical mathematics Lek-Heng Lim University - - PowerPoint PPT Presentation

role of tensors in numerical mathematics
SMART_READER_LITE
LIVE PREVIEW

role of tensors in numerical mathematics Lek-Heng Lim University - - PowerPoint PPT Presentation

role of tensors in numerical mathematics Lek-Heng Lim University of Chicago February 2, 2015 thanks: NSF DMS 1209136, DMS 1057064, AFOSR FA9550-13-1-0133 what is a tensor? a tensor is a multilinear functional f : V 1 V d


slide-1
SLIDE 1

role of tensors in numerical mathematics

Lek-Heng Lim

University of Chicago

February 2, 2015

thanks: NSF DMS 1209136, DMS 1057064, AFOSR FA9550-13-1-0133

slide-2
SLIDE 2

what is a tensor?

  • a tensor is a multilinear functional

f : V1 × · · · × Vd → R for k = 1, . . . , d, f(v1, . . . , αuk + βwk, . . . , vd) = αf(v1, . . . , uk, . . . , vd) + βf(v1, . . . , wk, . . . , vd)

  • if we give f coordinates, i.e., choose bases on V1, . . . , Vd,

get hypermatrix A = (aj1···jd) ∈ Rn1×···×nd where n1 = dim V1, . . . , nd = dim Vd

  • d-dimensional hypermatrix is d-tensor in coordinates
  • matrix is linear operator/bilinear form/dyad in coordinates
slide-3
SLIDE 3

matrix notions may be extended

  • rank
  • hyperdeterminant
  • positive definiteness
  • rank retaining decompositions
  • system of multilinear equations
  • multilinear programming
  • multilinear least squares
  • eigenvalues and eigenvectors
  • singular values and singular vectors
  • Gaussian elimination and QR factorization
  • nonnegative tensors and Perron-Frobenius theory
  • spectral, operator, H¨
  • lder, nuclear norms

[L.-H. Lim, “Tensors and hypermatrices,” Chapter 15, 30 pp., in L. Hogben (Ed.), Handbook of Linear Algebra, 2nd Ed., CRC Press, 2013]

slide-4
SLIDE 4

matrix operations may be extended

  • linear combination: A, B ∈ Rn1×···×nd, α, β ∈ R,

αA + βB = (αaj1···jd + βbj1···jd)

  • outer product: A ∈ Rn1×···×nd, B ∈ Rm1×···×me,

A ⊗ B = (ai1···idbj1···je) ∈ Rn1×···×nd×m1×···×me

  • contraction: A ∈ Rn1×···×nd−1×p, B ∈ Rp×m2×···×me,

A, Bd:1 = p

  • k=1

ai1···id−1kbkj2···je

  • ∈ Rn1×···×nd−1×m2×···×me
  • all arise from coordinate-free operations on tensors
  • what about multiplication? e.g. take A, B ∈ Rn×n×n and

form AB ∈ Rn×n×n

slide-5
SLIDE 5

tensors are useful

slide-6
SLIDE 6

why tensors1

  • d = 0, 1, 2: scalars, vectors, matrices
  • why should you care about d > 2?
  • comes up more often than you might expect

principle of linearity: in almost any natural process, if we make sufficiently small changes, the variation will be linear with respect to the changes principle of multilinearity: in almost any natural process, if we keep all but one factor constant, the principle of linearity will apply to that changing factor

1succumbing to popular (but erroneous) nomenclature, will use the term

‘tensor’ even when referring to a hypermatrix

slide-7
SLIDE 7

derivatives, moments, cumulants

  • higher-order derivatives

f(x) ∈ R, ∇f(x) ∈ Rn, ∇2f(x) ∈ Rn×n, ∇3f(x) ∈ Rn×n×n, ∇4f(x) ∈ Rn×n×n×n, . . .

  • higher-order moments

E(x ⊗ x ⊗ · · · ⊗ x) ∈ Rn×n×···×n

  • higher-order cumulants

log E(exp(it, x)) ≈

m

  • |α|=1

i|α|κα(x) tα α!

  • mean, variance, skewness, kurtosis

(κα(x))|α|=1 ∈ Rn, (κα(x))|α|=2 ∈ Rn×n, (κα(x))|α|=3 ∈ Rn×n×n, (κα(x))|α|=4 ∈ Rn×n×n×n, . . .

slide-8
SLIDE 8

principal components of cumulants

−0.5 0.5 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 PCA right singular vectors Comp 1 Comp 2 −0.5 0.5 1 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Principal kurtosis components Comp 1 Comp 2

Figure: all Gaussian except 17 and 39; left: principal components; right: principal kurtosis components (thanks! Rasmus)

slide-9
SLIDE 9

quantum mechanics

  • H1, . . . , Hd state spaces, state space of unified system is

S ⊆ H1 ⊗ · · · ⊗ Hd

  • S has factorizable states ψ1 ⊗ · · · ⊗ ψd and mixed states

αψ1 ⊗ · · · ⊗ ψd + · · · + βϕ1 ⊗ · · · ⊗ ϕd

  • Hj : Hj → Hj Hamiltonian of jth system and I identity

H1 ⊗ I ⊗ · · · ⊗ I + I ⊗ H2 ⊗ · · · ⊗ I + · · · + I ⊗ · · · ⊗ I ⊗ Hd Hamiltonian of unified system if systems do not interact

  • indistinguishable systems H1 = · · · = Hd = H:

bosons state space of unified system is Sd(H) ⊆ H⊗d fermions state space of unified system is Λd(H) ⊆ H⊗d

slide-10
SLIDE 10

quarks

  • V ⊗ V = S2(V) ⊕ Λ2(V)
  • V ⊗ V ⊗ V = S3(V) ⊕ S(2,1)(V) ⊕ S(2,1)(V) ⊕ Λ3(V)
  • special case: V = C3: standard basis of C3×3×3

Eijk = ei ⊗ ej ⊗ ek where e1, e2, e3 standard basis of C3

  • interpretation

e1 = |u = state of the u-quark e2 = |d = state of the d-quark e3 = |s = state of the s-quark Eijk = composite state of the three quark states ei, ej, ek

  • proton, two u-quarks and one d-quark, has state

2(e1 ⊗ e1 ⊗ e2 − e2 ⊗ e1 ⊗ e1) ∈ S(2,1)(C3)

slide-11
SLIDE 11

latent variables

  • behind one of the most fundamental graphical models
  • na¨

ıve Bayes model Pr(x, y, z) =

  • h Pr(h) Pr(x | h) Pr(y | h) Pr(z | h)

H

  • X
  • Y
  • Z
  • e.g. causality inference, phylogenetics
slide-12
SLIDE 12

example: phylogenetics

  • Markov model for 3-taxon tree [Allman–Rhodes, 2003]
  • probability distribution given by 4 × 4 × 4 table with model

P = πAρA ⊗ σA ⊗ θA + πCρC ⊗ σC ⊗ θC + πGρG ⊗ σG ⊗ θG + πTρT ⊗ σT ⊗ θT

  • for i, j, k ∈ {A, C, G, T}

pijk = πAρAiσAjθAk + πCρCiσCjθCk + πGρGiσGjθGk + πTρTiσTjθTk

slide-13
SLIDE 13

neuroimaging

identify neural fibers as from diffusion MRI data fODF Maxima Schultz–Seidel

[T. Schultz, A. Fuster, A. Ghosh, L. Florack, and L.-H. Lim, “Higher-order tensors in diffusion imaging,” pp. 129–161, Visualization and Processing of Tensors and Higher Order Descriptors for Multi-Valued Data, Springer, 2014]

slide-14
SLIDE 14

positive decomposition

  • A = (aijkl) ∈ S4(Rn), i.e.,

aijkl = aijlk = ajilk = · · · = alkji

  • want positive decomposition

A = λ1v1 ⊗ v1 ⊗ v1 ⊗ v1 + · · · + λrvr ⊗ vr ⊗ vr ⊗ vr with r minimal and λ1 ≥ λ2 ≥ · · · ≥ λr > 0, v1 = · · · = vr = 1

  • more generally for any even order d ≥ 4
  • generalization of symmetric eigenvalue decomposition for

positive semidefinite matrices

  • not always possible — say ‘positively decomposable’ if

decomposition exist

slide-15
SLIDE 15

diffusion MRI imaging

  • after preprocessing, MRI signal is f : S2 → R
  • f is homogeneous polynomial of even degree,

f(x) = n

j1,...,jd=1 aj1···jdxj1 · · · xjd ∈ R[x1, . . . , xn]d

  • coefficients are hypermatrices A = (aj1···jd) ∈ Rn×n×···×n
  • model implies f is sum of powers of linear forms

f(x) = r

i=1(vT i x)d

  • equivalently, A positively decomposable

A = r

i=1 v⊗d i

  • vi gives direction of ith fiber in a voxel

[L.-H. Lim and T. Schultz, “Moment tensors and high angular resolution diffusion imaging,” preprint, (2015)]

slide-16
SLIDE 16

complexity

slide-17
SLIDE 17

most tensor problems are NP-hard

  • some have no FPTAS
  • some are NP-hard even to

approximate

  • some are #P-hard
  • some are VNP-complete
  • some are undecidable
  • isn’t this bad news?

[C. J. Hillar and L.-H. Lim, “Most tensor problems are NP-hard,” Journal of the ACM, 60 (2013), no. 6, Art. 45, 39 pp.]

slide-18
SLIDE 18

encoding NP-hard problems

1 2 3 4 1 2 3 4

3-colorings encoded as real solutions to bilinear system:

a1c1 − b1d1 − u2, b1c1 + a1d1, c1u − a2

1 + b2 1, d1u − 2a1b1, a1u − c2 1 + d2 1 , b1u − 2d1c1,

a2c2 − b2d2 − u2, b2c2 + a2d2, c2u − a2

2 + b2 2, d2u − 2a2b2, a2u − c2 2 + d2 2 , b2u − 2d2c2,

a3c3 − b3d3 − u2, b3c3 + a3d3, c3u − a2

3 + b2 3, d3u − 2a3b3, a3u − c2 3 + d2 3 , b3u − 2d3c3,

a4c4 − b4d4 − u2, b4c4 + a4d4, c4u − a2

4 + b2 4, d4u − 2a4b4, a4u − c2 4 + d2 4 , b4u − 2d4c4,

a2

1 − b2 1 + a1a3 − b1b3 + a2 3 − b2 3, a2 1 − b2 1 + a1a4 − b1b4 + a2 4 − b2 4, a2 1 − b2 1 + a1a2 − b1b2 + a2 2 − b2 2,

a2

2 − b2 2 + a2a3 − b2b3 + a2 3 − b2 3, a2 3 − b2 3 + a3a4 − b3b4 + a2 4 − b2 4, 2a1b1 + a1b2 + a2b1 + 2a2b2,

2a2b2 + a2b3 + a3b2 + 2a3b3, 2a1b1 + a1b3 + a2b1 + 2a3b3, 2a1b1 + a1b4 + a4b1 + 2a4b4, 2a3b3 + a3b4 + a4b3 + 2a4b4, w2

1 + w2 2 + · · · + w2 17 + w2 18

bilinear system: yTAkz = xTBkz = xTCky = 0, k = 1, . . . , n

slide-19
SLIDE 19

tensors shed light on foundational questions

slide-20
SLIDE 20

convex optimization

. . . the great watershed in optimization isn’t between linearity and nonlinearity, but convexity and nonconvexity.

  • R. Rockafellar
  • not quite: plenty of NP-hard convex optimization problems
  • however: convex programming problem can be computed

to ε-accuracy in polynomial time if we have:

(i) self-concordant barrier function (ii) gradient and Hessian computable in polynomial time

[Nesterov–Nemirovskii, 1994]

  • examples: linear programs, semidefinite programs,

second-order cone programs, geometric programs, etc

slide-21
SLIDE 21

self-concordance is NP-hard

  • G = (V, E) undirected graph, n vertices, m edges

1 − 1 ω(G) = 27 2 max

(u,w)∈Sn+m−1

  • {i,j}∈E uiujwij

2

where Sd−1 = {x ∈ Rd : x = 1}

  • convex f : Ω ⊆ Rn → R self-concordant at x ∈ Ω if
  • ∇3f(x)(h, h, h)

2 ≤ 4σ

  • ∇2f(x)(h, h)

3 for all h ∈ Rn

∇2f(x)(h, h) =

n

  • i,j=1

∂2f(x) ∂xi∂xj hihj, ∇3f(x)(h, h, h) =

n

  • i,j,k=1

∂3f(x) ∂xi∂xj∂xk hihjhk

  • clique problem reducible to checking self-concordance of

some f at a point, i.e., NP-hard

[L.-H. Lim, “Self-concordance is NP-hard,” preprint, (2013)]

slide-22
SLIDE 22

multiplying higher-order tensors?

slide-23
SLIDE 23

two matrix multiplications

  • usual

      a11 a12 · · · a1n a21 a22 · · · a2n . . . . . . ... . . . an1 an2 · · · ann       ×       b11 b12 · · · b1n b21 b22 · · · b2n . . . . . . ... . . . bn1 bn2 · · · bnn       =        n

j=1 a1j bj1

n

j=1 a1j bj2

· · · n

j=1 a1j bjn

n

j=1 a2j bj1

n

j=1 anj bj2

· · · n

j=1 a2j bjn

. . . . . . ... . . . n

j=1 anj bj1

n

j=1 anj bj2

· · · n

j=1 anj bjn

      

  • Hadamard

      a11 a12 · · · a1n a21 a22 · · · a2n . . . . . . ... . . . an1 an2 · · · ann      

     b11 b12 · · · b1n b21 b22 · · · b2n . . . . . . ... . . . bn1 bn2 · · · bnn       =       a11b11 a12b12 · · · a1nb1n a21b21 a22b22 · · · a2nb2n . . . . . . ... . . . an1bn1 an2bn2 · · · annbnn      

  • usual multiplication vastly more useful than Hadamard

multiplication

  • why?
  • nothing to do with ring structure: (Cn×n, +, ×), (Cn×n, +, ◦)

are both rings

slide-24
SLIDE 24

why?

  • V vector space over C of dimension n
  • take

A ∈ V ⊗ V∗ and B ∈ V ⊗ V∗

  • choose a basis B on V, then

[A]B ∈ Cn×n and [B]B ∈ Cn×n i.e., matrices are coordinate representations of 2-tensors

  • 2-tensors can be naturally multiplied (later)
  • if

AB = C in V ⊗ V∗ then [A]B × [B]B = [C]B in Cn×n

  • does not depend on choice of B
  • usual matrix product is product of 2-tensors in coordinates
slide-25
SLIDE 25

basis independence

  • for another choice of basis B′

[A]B′ = X[A]BX −1 where X is change-of-basis matrix from B to B′

  • multiplication of 2-tensors is basis independent
  • what this means for usual matrix product

(XAX −1)(XBX −1) = XABX −1

  • in fancy terms: usual matrix product is equivariant under

the action of GLn(C) by conjugation

  • usual matrix product is unique product with this property
  • e.g. Hadamard product does not have this property
  • why usual matrix product is ubiquitous?

“physical law takes same mathematical form in all coordinate systems”

slide-26
SLIDE 26

main question

  • is there a generalization of the usual matrix product to

hypermatrices? Cn×n×n × Cn×n×n → Cn×n×n, (A, B) → AB

  • more generally, do not require cubical hypermatrices

Cm×n×p × Cm×n×p → Cm×n×p, (A, B) → AB

  • even more generally, allow arbitrary order d > 2

Cn1×···×nd × Cn1×···×nd → Cn1×···×nd, (A, B) → AB

  • just one requirement: must be coordinate independent like

the usual matrix product

  • to answer this, need better understanding of matrix product
slide-27
SLIDE 27

matrix multiplication is a 3-tensor

  • for A = (aij), B = (bjk) ∈ Cn×n,

AB = n

i,j,k=1 aikbkjEij =

n

i,j,k=1 ϕik(A)ϕkj(B)Eij

where Eij = eieT

j ∈ Cn×n and ϕij : Cn×n → C, A → aij

  • let

µn = n

i,j,k=1 ϕik ⊗ ϕkj ⊗ Eij

so AB = µn(A, B)

  • moral: matrix product is a 3-tensor

µn ∈ (Cn×n)∗ ⊗ (Cn×n)∗ ⊗ Cn×n

  • what is the minimal number of arithmetic operations

required to multiply two n × n matrices?

slide-28
SLIDE 28

tensor rank

  • X ∈ Cℓ×m×n, u ⊗ v ⊗ w := (uivjwk) ∈ Cℓ×m×n

rank(X) = min

  • r : X =

r

p=1 λpup ⊗ vp ⊗ wp

  • number of arithmetic operations:

ω := inf{α | rank(µn) = O(nα)}

  • na¨

ıve: O(n3)

  • earliest: O(nlog2 7) [Strassen, 1969]
  • latest: O(n2.3727) [Williams, 2011]
  • exact: O(nω)
slide-29
SLIDE 29

alternative formulation

  • what we have: matrix multiplication is bilinear operator

µn : Cn×n × Cn×n → Cn×n, (A, B) → AB,

  • consider trilinear functional
  • µn : Cn×n × Cn×n × Cn×n → C,

(A, B, C) → tr(ABC)

  • easy to show:

µn isomorphic to µn: µn ∈ (Cn×n)∗⊗(Cn×n)∗⊗Cn×n ≃ (Cn×n)∗⊗(Cn×n)∗⊗(Cn×n)∗ ∋ µn

  • moral: product of two matrices = trace of product of three

matrices [Landsberg–LHL–Ye, 2013] (A, B) → AB ⇔ (A, B, C) → tr(ABC)

  • works for rectangular matrices too
  • µm,n,p(A, B, C) = tr(ABC) =

m,n,p

i,j,k=1 aijbjkcki

slide-30
SLIDE 30

invariant theory

Theorem (first fundatmental theorem of invariant theory)

Invariant ring C[V ∗r ⊕ V s]GL(V) is generated by contractions η(i,j) for 1 ≤ i ≤ r and 1 ≤ j ≤ s

  • write E = V ∗r ⊕ V s
  • C[E] set of all complex valued polynomials defined on E
  • C[E] is GL(V)-module with action given by

g · f(x) = f(g−1x) where g ∈ GL(V), f ∈ C[E], x ∈ E

  • C[E]GL(V) subring of functions invariant under GL(V)
  • contraction η(i,j) : V ∗r ⊕ V s → C defined by

η(i,j)(f1, . . . , fr, v1, . . . , vs) = fi(vj) where fi ∈ V ∗ and vj ∈ V

slide-31
SLIDE 31

bottom line

  • usual multiplication of square matrices unique: in fact
  • µn ∈ C[V ∗3 ⊕ V 3]GL(V) is given by
  • µn = η(1,3)η(2,1)η(3,2)
  • ditto for rectangular matrices:
  • µm,n,p = η(1,2)

U

η(1,2)

V

η(1,2)

W

  • usual matrix multiplication extends only to hypermatrices of

even order: d even, AB = C,

  • A = [ai1···id/2j1···jd/2] ∈ Cn1×···×nd
  • B = [bj1···jd/2k1···kd/2] ∈ Cn1×···×nd
  • C = [ci1···id/2k1···kd/2] ∈ Cn1×···×nd

ci1···id/2k1···kd/2 = n

j1,...,jd/2=1 ai1···id/2j1···jd/2bj1···jd/2k1···kd/2

[J. M. Landsberg, L.-H. Lim, and K. Ye, “Can one multiply higher-order tensors?” preprint, (2015)]

slide-32
SLIDE 32

compressed sensing for higher-order tensors?

slide-33
SLIDE 33

another question

  • can ℓ1-norm sparse vector recovery and nuclear norm

low-rank matrix completion be generalized to higher-order tensors?

  • key to compressed sensing: ℓ1-norm is largest convex

underestimator of ‘ℓ0-norm’ on ℓ∞-norm unit ball x1 ≤ nnz(x)x∞ for all x ∈ Cn [Donoho, Cand` es, Tao, et al.]

  • key to matrix completion: nuclear norm is largest convex

underestimator of rank on spectral norm unit ball X∗ ≤ rank(X)Xσ for all X ∈ Cm×n [Cand` es, Fazel, Recht, et al.]

  • is this true for hypermatrices of order d > 2?
slide-34
SLIDE 34

nuclear/spectral norm

  • let X ∈ Cm×n×p
  • rank

rank(X) = min

  • r : X =

r

i=1 λiui ⊗ vi ⊗ wi

  • nuclear norm [LHL–Comon, 2010]

X∗ = inf r

i=1|λi| : X =

r

i=1 λiui ⊗ vi ⊗ wi, r ∈ N

  • spectral norm

Tσ = sup

x,y,z=0

|T(x, y, z)| xyz

  • nuclear norm is dual to spectral norm [LHL–Comon, 2014]
  • is it true that

X∗ ≤ rank(X)Xσ?

slide-35
SLIDE 35

no

  • nuclear norm not even an underestimator of rank on

spectral norm unit ball [LHL–Comon, 2013]

  • in fact, nuclear norm can be arbitrarily larger than rank on

spectral norm unit ball

  • e.g., take µn ∈ Cn2×n2×n2, then [Strassen, 1969]

rank(µn) ≤ cnlog2 7 for some c > 0 and [Derksen, 2013] µn∗ = n3, µnσ = 1

  • so

lim

n→∞

µn∗ rank(µn) = ∞

[L.-H. Lim and P . Comon, “Blind multilinear identification,” IEEE Transactions

  • n Information Theory, 60 (2014), no. 2, pp. 1260–1280]
slide-36
SLIDE 36

higher-order KKT conditions?

slide-37
SLIDE 37

notations

  • contraction of d-tensor A ∈ Rn1×···×nd and e-tensor

B ∈ Rm1×···×me in the pth and qth indices with np = mq is the (d + e − 2)-tensor A, Bp:q := np

k=1 ai1...ip−1kip+1...idbj1...jq−1kjq+1...je

  • if np1 = mq1, . . . , nps = mqs, then contraction of A and B in

(p1, . . . , ps)th and (q1, . . . , qs)th indices is (d + e − 2s)-tensor A, Bp1,p2,...,ps:q1,q2,...,qs := np1,np2,...,nps

ip1,ip2,...,ips=1 ai1...idbj1...je

  • subblocks: A ∈ Rn1×···×nd, nonempty Sk ⊆ {1, . . . , n},

k = 1, . . . , d, subblock AS1···Sd of A is AS1···Sd := [aj1...jd]j1∈S1,...,jd∈Sd ∈ R|S1|×···×|Sd|

slide-38
SLIDE 38

symmetry and positive definiteness

  • d-tensor A ∈ Rn×···×n symmetric if for all σ ∈ Sd,

aj1···jd = ajσ(1)···jσ(d)

  • e.g. (aijk) ∈ Rn×n×n is symmetric if

aijk = aikj = ajik = ajki = akij = akji

  • symmetrization: d-tensor A ∈ Rn×···×n

(Sym(A))i1...id := 1 d!

  • σ∈Sd aiσ(1)...iσ(d)

for ik = 1, . . . , n, k = 1, . . . , d

  • positive semidefinite: d-tensor A ∈ Rn×···×n, denoted A 0

with A, x⊗d ≥ 0 for all x ∈ Rn definite if > 0 for all x = 0, written A ≻ 0

slide-39
SLIDE 39

problem

  • consider nonlinear minimization problem:

minimize f(x) subject to hi(x) = 0, i = 1, . . . , m gj(x) ≤ 0, j = 1, . . . , r x ∈ Ω (1)

  • f, hi, gj are at least four times continuously differentiable

and Ω is nonempty open

  • Feas = {x ∈ Ω : hi(x) = 0, i = 1, . . . , m, gj(x) ≤ 0, j =

1, . . . , r} denotes feasible set of (1)

  • for x ∈ Feas, I(x) = {j ∈ {1, . . . , r} : gj(x) = 0} denotes the

active set at point x

slide-40
SLIDE 40

Karush, Kuhn, Tucker, John

William Karush, circa 1987 Fritz John at NYU, circa 1987 Harold Kuhn and Albert Tucker, 1980

  • W. Karush, “Minima of functions of

several variables with inequalities as side constraints,” M.Sc. thesis, University of Chicago, Chicago, IL, 1939

  • F. John, “Extremum problems with

inequalities as subsidiary conditions,” pp. 187–204, Studies and Essays for Courant 60th Birthday, Wiley, New York, NY, 1948

  • H. Kuhn and A. Tucker “Nonlinear

programming,” pp. 481–492, Proceedings of the 2nd Berkeley Symposium on Mathematical Statistics and Probability, University

  • f California Press, Berkeley, CA,

1951

  • R. Cottle, “William Karush and the

KKT theorem,” Doc. Math., Extra volume: Optimization stories (2012),

  • pp. 255–269
slide-41
SLIDE 41

usual KKT conditions

Theorem (first order KKT conditions)

Suppose x∗ ∈ Feas is a local minimum of (1) and the gradient vectors ∇hi(x∗), i = 1, . . . , m, ∇gj(x∗), j ∈ I(x∗) are linearly independent ( LICQ). Then there exist v ∈ Rm and w ∈ Rr

+ such that

∇f(x∗) + m

i=1 vi∇hi(x∗) +

r

j=1 wj∇gj(x∗) = 0,

(2) and wjgj(x∗) = 0, j = 1, . . . , r. (3)

slide-42
SLIDE 42

second order conditions

Theorem (second order KKT conditions)

Let N :=

  • d ∈ Rn : d ⊥ ∇hi(x∗), i = 1, . . . , m, d ⊥ ∇gj(x∗), j ∈ I(x∗)
  • .

Then the matrix B := ∇2f(x∗) + m

i=1 vi∇2hi(x∗) +

r

j=1 wj∇2gj(x∗)

is positive semidefinite on N, i.e., dTBd ≥ 0 for all d ∈ N, (4)

  • r B N 0 for short.
slide-43
SLIDE 43

convention

without loss of generality, assume N = {d ∈ Rn : dκ+1 = · · · = dn = 0} and B =    A

  • B

B

  • BT

B

T

  • B

   , A ∈ Rp×p and A ≻ 0 define C := ∇3f(x∗) + m

i=1 vi∇3hi(x∗) +

r

j=1 wj∇3gj(x∗) ∈ Rn×n×n

and D := ∇4f(x∗)+ m

i=1 vi∇4hi(x∗)+

r

j=1 wj∇4gj(x∗) ∈ Rn×n×n×n

font: a scalar, a vector, A matrix, A 3-tensor, A 4-tensor

slide-44
SLIDE 44

subblocks

let R := {1, . . . , p}, S := {p + 1, . . . , κ} and T := {κ + 1, . . . , n}, consider following subblocks:

  • A ∈ Rp×p: subblock BRR of B

B ∈ Rp×(n−κ): subblock BRT of B

  • B ∈ R(κ−p)×(n−κ): subblock BST of B

B ∈ R(n−κ)×(n−κ): subblock BTT of B

C ∈ Rp×(κ−p)×(κ−p): subblock CRSS of C

  • C ∈ R(κ−p)×(κ−p)×(κ−p): subblock CSSS of C

C ∈ R(κ−p)×(κ−p)×(n−κ): subblock CSST of C

  • D ∈ R(κ−p)×(κ−p)×(κ−p)×(κ−p): subblock DSSSS of D
slide-45
SLIDE 45

notations

  • let I(x∗) = {j1, . . . , jτ} and M ∈ R(n−κ)×(n−κ) be

M =

  • [∇h1(x∗)]T, . . . , [∇hm(x∗)]T, [∇gj1(x∗)]T, . . . , [∇gjτ (x∗)]T

T

  • let

F ∈ R(n−κ)×n×n be 3-tensor defined by F, ei1:1 =

  • ∇2hi(x∗)

for i = 1, . . . , m ∇2gji−m(x∗) for i = m + 1, . . . , n − κ

  • let U := {1, . . . , n − κ} and
  • F =

FUSS ∈ R(n−κ)×(κ−p)×(κ−p): (U, S, S)-subblock of F

  • F′ =

FUSR ∈ R(n−κ)×(κ−p)×p: (U, S, R)-subblock of F

  • F =

FUST ∈ R(n−κ)×(κ−p)×(n−κ): (U, S, T)-subblock of F

  • let

F ∈ Rp×(κ−p)×(n−κ) be 3-tensor defined by F

  • ijk =
  • F′

kji

for all possible i, j, k

slide-46
SLIDE 46

more notations

  • let

G ∈ R(n−κ)×n×n×n be 4-tensor with

  • G, ei1:1 =
  • ∇3hi(x∗)

for all i = 1, . . . , m ∇3gji−m(x∗) for all i = m + 1, . . . , n − κ

  • let G =

GUSSS ∈ R(n−κ)×(κ−p)×(κ−p)×(κ−p) be (U, S, S, S)-subblock of G

  • define 3-tensor W by

W := C − 2 F, BM−13:2 − BM−1, F2:1

  • define 4-tensor H by

H := D+3M−T BM−1, F ⊗F1,2:1,4 −6M−1, C ⊗F1,2:3,4 + 4BM−1, 3M−1, F ⊗ F1,2:3,4 − G2:1

slide-47
SLIDE 47

higher-order KKT conditions

Theorem (Hu–LLH, 2015)

Let x∗ ∈ Feas be a local minimum of (1) and all notations be as

  • before. Then we must have

3 Sym(BM−1, F2:1)−C = 0 and H−3A−1, W⊗W1,2:1,4 0. If x∗ ∈ Feas satisfying (2), (3) and (4), and the strict complementarity condition, i.e., wj > 0 for all j ∈ I(x∗); and if 3 Sym(BM−1, F2:1)−C = 0 and H−3A−1, W⊗W1,2:1,4 ≻ 0, then x∗ is a strict local minimum. No such conditions are possible for orders > 4.

[S. Hu and L.-H. Lim, “Higher-order KKT conditions,” preprint, (2015)]

slide-48
SLIDE 48

simple example

  • consider problem

minimize f(x, y, z) = 1

4xy3 + 3 2y2z2 − y3z − z4

subject to 2x + y − 2 = 0 x + 2y − 4 ≤ 0

  • KKT condition satisfied at x∗ = (0, 2, 1) with Lagrange

multipliers −10/3 and 14/3

  • null space of constraint gradients at this point is

{x ∈ R3 : x = y = 0}

  • classical second order condition requires us to check ∂2f

∂z2

— it is however zero

  • can we decide if x∗ is local minimum or not?

solution (Hu–LHL, 2013)

∂3f ∂z3 = −24 = 0 implies that x∗ cannot be a local minimum

  • consequence of third order KKT condition
slide-49
SLIDE 49

unconstrained optimization

f twice continuously differentiable in neighborhood of x∗ ∈ Rn (i) if x∗ is a local minimum of f, then ∇f(x∗) = 0, ∇2f(x∗) 0 (ii) if ∇f(x∗) = 0, ∇2f(x∗) ≻ 0, then x∗ is strict local minimum of f

slide-50
SLIDE 50

unconstrained optimization

  • WLOG ∇2f(x∗) =

A

  • with A ∈ Rp×p and A ≻ 0
  • {1, . . . , n} = {1, . . . , p} ∪ {p + 1, . . . , n} induces subblocks
  • C ∈ R(n−p)×(n−p)×(n−p): (2, 2, 2)-subblock of ∇3f(x∗)

C ∈ Rp×(n−p)×(n−p): (1, 2, 2)-subblock of ∇3f(x∗)

  • D ∈ R(n−p)×(n−p)×(n−p)×(n−p): (2, 2, 2, 2)-subblock of ∇4f(x∗)
  • if x∗ is a local minimum of f, then

C = 0, D − 3A−1, C ⊗ C1,2:1,4 0

  • if

C = 0, D − 3A−1, C ⊗ C1,2:1,4 ≻ 0, then x∗ is strict local minimum of f

slide-51
SLIDE 51

thank you