Extending MOSEK with exponential cones ISMP Bordeaux 2018 - - PowerPoint PPT Presentation

extending mosek with exponential cones
SMART_READER_LITE
LIVE PREVIEW

Extending MOSEK with exponential cones ISMP Bordeaux 2018 - - PowerPoint PPT Presentation

Extending MOSEK with exponential cones ISMP Bordeaux 2018 joachim.dahl@mosek.com www.mosek.com Conic optimization Linear cone problem: c T x minimize subject to Ax = b x K , with K = K 1 K 2 K p a product of proper


slide-1
SLIDE 1

Extending MOSEK with exponential cones

ISMP Bordeaux 2018 joachim.dahl@mosek.com www.mosek.com

slide-2
SLIDE 2

Conic optimization

Linear cone problem: minimize cTx subject to Ax = b x ∈ K, with K = K1 × K2 × · · · × Kp a product of proper cones. Dual: maximize bTy subject to c − ATy = s s ∈ K ∗, with K ∗ = K ∗

1 × K ∗ 2 × · · · × K ∗ p .

slide-3
SLIDE 3

Symmetric cones (supported by MOSEK 8)

  • the nonnegative orthant

K n

l := {x ∈ Rn | xj ≥ 0, j = 1, . . . , n},

  • the quadratic cone

K n

q = {x ∈ Rn | x1 ≥

  • x2

2 + · · · + x2 n

1/2},

  • the rotated quadratic cone

K n

r = {x ∈ Rn | 2x1x2 ≥ x2 3 + . . . x2 n, x1, x2 ≥ 0}.

  • the semidefinite matrix cone

K n

s = {x ∈ Rn(n+1)/2 | zTmat(x)z ≥ 0, ∀z}.

slide-4
SLIDE 4

Nonsymmetric cones (supported by MOSEK 9)

  • the three-dimensional power cone

K α

pow = {x ∈ R3 | xα 1 x(1−α) 2

≥ |x3|, x1, x2 > 0}, for 0 < α < 1.

  • the exponential cone

Kexp = cl{x ∈ R3 | x1 ≥ x2 exp(x3/x2), x2 > 0}.

slide-5
SLIDE 5

Central path for conic problem

Central path for homogenous model parametrized by µ: Axµ − bτµ = µ(Ax − bτ) sµ + ATyµ − cτµ = µ(s + ATy − cτ) cTxµ − bTyµ + κµ = µ(cTx − bTy + κ) sµ = −µF ′(xµ), xµ = −µF ′

∗(sµ),

κµτµ = µ,

  • r equivalently

  A −b −AT c bT −cT     yµ xµ τµ   −   sµ κµ   = µ   rp rd rg   sµ = −µF ′(xµ), xµ = −µF ′

∗(sµ),

κµτµ = µ, rp := Ax−bτ, rd := cτ−ATy−s, rg := κ−cTx+bTy, rc := xTs+τκ.

slide-6
SLIDE 6

Scaling for nonsymmetric cones

Following Tun¸ cel [3] we consider a scaling W TW ≻ 0, v = Wx = W −Ts, ˜ v = W ˜ x = W −T ˜ s where ˜ x := −F ′

∗(s) and ˜

s := −F ′(x). The centrality conditions x = µ˜ x, s = µ˜ s can then be written symmetrically as v = µ˜ v, and we linearize the centrality condition v = µ˜ v as W ∆x + W −T∆s = −v + µ˜ v.

slide-7
SLIDE 7

A centering search-direction

Let µ := xTs + τκ ν + 1 with barrier parameter ν and centering γ.   A −b −AT c bT −cT     ∆yc ∆xc ∆τc   −   ∆sc ∆κc   = (γ − 1)   rp rd rg   W ∆xc + W −T∆sc = γµ˜ v − v, τ∆κc + κ∆τc = γµ − κτ, Constant decrease of residuals and complementarity: Ax+ − bτ + = η · rp, cτ + − ATy+ − s+ = η · rd, bTy+ − cTx+ − κ+ = η · rg, (x+)Ts+ + τ +κ+ = η · rc, where z+ := (z + α∆zc) and η = (1 − α(1 − γ)).

slide-8
SLIDE 8

A higher-order corrector term

Derivatives of sµ = −µF ′(xµ): ˙ sµ + µF ′′(xµ) ˙ xµ = −F ′(xµ), ¨ sµ + µF ′′(xµ)¨ xµ = −2F ′′(xµ) ˙ xµ − µF ′′′(xµ)[ ˙ xµ, ˙ xµ]. Since µ ˙ xµ = −[F ′′(xµ)]−1(F ′(xµ) + ˙ sµ) = xµ − [F ′′(xµ)]−1 ˙ sµ, we have µF ′′′(xµ)[ ˙ xµ, ˙ xµ] = F ′′′(xµ)[ ˙ xµ, xµ]

  • −2F ′′(xµ) ˙

−F ′′′(xµ)[ ˙ xµ, (F ′′(xµ))−1 ˙ sµ] so ¨ sµ + µF ′′(xµ)¨ xµ = F ′′′(xµ)[ ˙ xµ, (F ′′(xµ))−1 ˙ sµ].

slide-9
SLIDE 9

An affine search-direction

Affine search-direction:   A −b −AT c bT −cT     ∆ya ∆xa ∆τa   −   ∆sa ∆κa   = −   rp rd rg   W ∆xa + W −T∆sa = −v, τ∆κa + κ∆τa = −κτ, satisfies (∆xa)T∆sa + ∆τa∆κa = 0. Since ˙ sµ + µF ′′(xµ) ˙ xµ = −F ′(xµ) = sµ, we interpret ∆sa = − ˙ sµ and ∆xa = − ˙ xµ.

slide-10
SLIDE 10

A higher-order corrector term

From ¨ sµ + µF ′′(xµ)¨ xµ = F ′′′(xµ)[ ˙ xµ, (F ′′(xµ))−1 ˙ sµ] we define a corrector direction as W ∆xcor + W −T∆scor = 1 2W −TF ′′′(x)[∆xa, (F ′′(x))−1∆sa]. Note that sT∆xcor + xT∆scor = 1 2xTF ′′′(x)[∆xa, (F ′′(x))−1∆sa] = −(∆xa)T∆sa, condition for constant decrease of complementarity.

slide-11
SLIDE 11

A higher-order corrector term

  • Linear case,

1 2F ′′′(x)[∆xa, (F ′′(x))−1∆sa] = −diag(x)−1diag(∆xa)∆sa,

  • Semidefinite case,

1 2F ′′′(x)[∆xa, (F ′′(x))−1∆sa] = −1 2x−1∆xa∆sa − 1 2∆sa∆xax−1 = −(x−1) ◦ (∆xa∆sa),

  • Second-order cone case,

F ′′′(x)[(F ′′(x))−1u] = − 2 xTQx (uxTQ + QxuT − (xTu)Q) for Q = diag(1, −1, . . . , −1). Then F ′′′(x)[(F ′′(x))−1u]e = −2(x−1 ◦ u).

slide-12
SLIDE 12

Combined centering-corrector direction

A combined centering-corrector direction:   A −b −AT c bT −cT     ∆y ∆x ∆τ   −   ∆s ∆κ   = (γ − 1)   rp rd rg   W ∆x + W −1∆s = γµ˜ v − v + 1 2W −TF ′′′(x)[∆xa, (F ′′(x))−1∆sa], τ∆κ + κ∆τ = γµ − τκ − ∆τa∆κa. All residuals and complementarity decrease by η.

slide-13
SLIDE 13

Computing the scaling matrix

Theorem (Schnabel [2])

Let S, Y ∈ Rn×p have full rank p. Then there exists H ≻ 0 such that HS = Y if and only if Y TS ≻ 0. As a consequence H = Y (Y TS)−1Y T + ZZ T where STZ = 0, rank(Z) = n − p. We have n = 3, p = 2 and S :=

  • x

˜ x

  • ,

Y :=

  • s

˜ s

  • ,

with det(Y TS) = ν2(µ˜ µ − 1) ≥ 0 vanishing only on the central path.

slide-14
SLIDE 14

Computing the scaling matrix

Any scaling with n = 3 satisfies W TW = Y (Y TS)−1Y T + zzT where

  • x

˜ x T z = 0, z = 0. Expanding the BFGS update [2] H+ = H + Y (Y TS)−1Y T − HS(STHS)−1STH, for H ≻ 0 gives the scaling by Tun¸ cel [3] and Myklebust [1], i.e., zzT = H − HS(STHS)−1STH, with H = µF ′′(x).

slide-15
SLIDE 15

A negative result on complexity

Nesterov’s long-step Hessian estimation property holds if F ′′′(x)[u] 0, ∀x ∈ int(K), ∀u ∈ K. We have

F ′′′([1; 1; −1])[u] = u1   −9 6 3 6 −5 −3 3 −3 −2   + u2   6 −5 −3 −5 2 3 −3 3 2   + u3   3 −3 −2 −3 −3 2 −2 2 2   .

Not negative semidefinite for all u ∈ K.

slide-16
SLIDE 16

Comparing MOSEK and ECOS conic solvers

50 100 150 200 100 200 300 400

prob instance iterations

MOSEK (2) MOSEK w/o corr (29) ECOS (41)

Iteration counts for different exponential cone problems. Failures marked with ⋄.

slide-17
SLIDE 17

Comparing MOSEK and ECOS conic solvers

10- 1 100 101 102 103 10- 1 100 101 102 103

time MOSEK time other

MOSEK w/o corr (29) ECOS (41)

Solution time for different exponential cone problems. Failures marked with ⋄.

slide-18
SLIDE 18

Comparing MOSEK and ECOS conic solvers

50 100 150 200 10- 12 10- 10 10- 8 10- 6 10- 4 10- 2 100

feasibility measure

prob instance largest error

MOSEK MOSEK w/o corr ECOS

Feasibility measures for different exponential cone problems.

slide-19
SLIDE 19

Comparing MOSEK conic and MOSEK GP solvers

25 50 75 100 125 100 200 300 400

prob instance iterations

conic (3) GP primal (14) GP dual (2)

Iteration counts for different GPs. Failures marked with ⋄.

slide-20
SLIDE 20

Comparing MOSEK conic and MOSEK GP solvers

10- 1 100 101 102 103 104 10- 1 100 101 102 103 104

time conic time GP

GP primal (12) GP dual (2)

Solution time for different GPs. Failures marked with ⋄.

slide-21
SLIDE 21

Comparing MOSEK conic and MOSEK GP solvers

25 50 75 100 125 10- 12 10- 10 10- 8 10- 6 10- 4 10- 2 100

feasibility measure

prob instance largest error

conic GP primal GP dual

Feasibility measures for different GPs.

slide-22
SLIDE 22

References

[1] T. Myklebust and L. Tun¸ cel. Interior-point algorithms for convex optimization based on primal-dual metrics. Technical report, 2014. [2] R. B. Schnabel. Quasi-newton methods using multiple secant equations. Technical report, Colorado Univ., Boulder, Dept. Comp. Sci., 1983. [3] L. Tun¸ cel. Generalization of primal-dual interior-point methods to convex optimization problems in conic form. Foundations of Computational Mathematics, 1:229–254, 2001.