A primal-dual algorithm for expontial-cone optimization ICCOPT - - PowerPoint PPT Presentation

a primal dual algorithm for expontial cone optimization
SMART_READER_LITE
LIVE PREVIEW

A primal-dual algorithm for expontial-cone optimization ICCOPT - - PowerPoint PPT Presentation

A primal-dual algorithm for expontial-cone optimization ICCOPT Berlin, August 8th, 2019 joachim.dahl@mosek.com www.mosek.com Conic optimization Linear cone problem: c T x minimize subject to Ax = b x K , with K = K 1 K 2


slide-1
SLIDE 1

A primal-dual algorithm for expontial-cone

  • ptimization

ICCOPT Berlin, August 8th, 2019 joachim.dahl@mosek.com www.mosek.com

slide-2
SLIDE 2

Conic optimization

Linear cone problem: minimize cTx subject to Ax = b x ∈ K, with K = K1 × K2 × · · · × Kp a product of proper cones. Dual: maximize bTy subject to c − ATy = s s ∈ K ∗, with K ∗ = K ∗

1 × K ∗ 2 × · · · × K ∗ p .

1 / 19

slide-3
SLIDE 3

Conic optimization

MOSEK 9 supports the following symmetric cones,

  • linear, quadratic and semidefinite cones

and the nonsymmetric cones,

  • three-dimensional power cone for 0 < α < 1,

K α

pow = {x ∈ R3 | xα 1 x(1−α) 2

≥ |x3|, x1, x2 > 0},

  • exponential cone

Kexp = cl{x ∈ R3 | x1 ≥ x2 exp(x3/x2), x2 > 0}.

2 / 19

slide-4
SLIDE 4

Self-concordant barriers

Self-concordant barrier for Kexp: F(x) = − log(x2 log(x1/x2) − x3) − log x1 − log x2. Conjugate barrier: F∗(s) = max{−x, s − F(x) : x ∈ int(K)}. Standard properties: F (k)(τx) = 1 τ k F (k)(x) F (k)(x)[x] = −kF (k−1)(x) − F ′(x) ∈ int(K∗) − F ′

∗(s) ∈ int(K)

F ′(−F ′

∗(s)) = −s

F ′′(−F ′

∗(s)) = [F ′′ ∗ (s)]−1

3 / 19

slide-5
SLIDE 5

Central path for conic problem

Central path for homogenous model parametrized by µ: Axµ − bτµ = µ(Ax − bτ) sµ + ATyµ − cτµ = µ(s + ATy − cτ) cTxµ − bTyµ + κµ = µ(cTx − bTy + κ) sµ = −µF ′(xµ), xµ = −µF ′

∗(sµ),

κµτµ = µ,

  • r equivalently

  A −b −AT c bT −cT     yµ xµ τµ   −   sµ κµ   = µ   rp rd rg   sµ = −µF ′(xµ), xµ = −µF ′

∗(sµ),

κµτµ = µ, rp := Ax−bτ, rd := cτ−ATy−s, rg := κ−cTx+bTy, rc := xTs+τκ.

4 / 19

slide-6
SLIDE 6

Scaling for nonsymmetric cones

Following Tun¸ cel [5] we consider a scaling W TW ≻ 0, v = Wx = W −Ts, ˜ v = W ˜ x = W −T ˜ s where ˜ x := −F ′

∗(s) and ˜

s := −F ′(x). The centrality conditions x = µ˜ x, s = µ˜ s can then be written symmetrically as v = µ˜ v, and we linearize the centrality condition v = µ˜ v as W ∆x + W −T∆s = µ˜ v − v.

5 / 19

slide-7
SLIDE 7

An affine search-direction

  A −b −AT c bT −cT     ∆ya ∆xa ∆τa   −   ∆sa ∆κa   = −   rp rd rg   ∆sa + W TW ∆xa = −s, τ∆κa + κ∆τa = −κτ, satisfying (∆xa)T∆sa + ∆τa∆κa = 0. Let αa ∈ (0, 1] denote largest feasible step in the affine direction. We estimate a centering parameter as γ := (1 − αa) min{(1 − αa)2, 1/4}.

6 / 19

slide-8
SLIDE 8

A centering search-direction

Let µ = (xTs + τκ)/(ν + 1).   A −b −AT c bT −cT     ∆yc ∆xc ∆τc   −   ∆sc ∆κc   = (γ − 1)   rp rd rg   W ∆xc + W −T∆sc = γµ˜ v − v, τ∆κc + κ∆τc = γµ − κτ, Constant decrease of residuals and complementarity: Ax+ − bτ + = (1 − α(1 − γ)) · rp, cτ + − ATy+ − s+ = (1 − α(1 − γ)) · rd, bTy+ − cTx+ − κ+ = (1 − α(1 − γ)) · rg, (x+)Ts+ + τ +κ+ = (1 − α(1 − γ)) · rc, where z+ := (z + α∆zc).

7 / 19

slide-9
SLIDE 9

A higher-order corrector term

Derivatives of sµ = −µF ′(xµ): ˙ sµ + µF ′′(xµ) ˙ xµ = −F ′(xµ), ¨ sµ + µF ′′(xµ)¨ xµ = −2F ′′(xµ) ˙ xµ − µF ′′′(xµ)[ ˙ xµ, ˙ xµ]. Using F ′′(x)x = −F ′(x) and F ′′′(x)[x] = −2F ′′(x) we obtain ¨ sµ + µF ′′(xµ)¨ xµ = F ′′′(xµ)[ ˙ xµ, (F ′′(xµ))−1 ˙ sµ]. We interpret ˙ sµ ≈ −µ∆sa and ˙ xµ ≈ −µ∆xa, i.e., ∆scor + W TW ∆xcor = 1 2F ′′′(x)[∆xa, (F ′′(x))−1∆sa], satisfying xT∆scor + sT∆xcor = −(∆xa)T∆sa.

8 / 19

slide-10
SLIDE 10

Combined centering-corrector direction

A combined centering-corrector direction:   A −b −AT c bT −cT     ∆y ∆x ∆τ   −   ∆s ∆κ   = (γ − 1)   rp rd rg   W ∆x + W −T∆s = γµ˜ v − v + 1 2W −TF ′′′(x)[∆xa, (F ′′(x))−1∆sa], τ∆κ + κ∆τ = γµ − τκ − ∆τa∆κa. All residuals and complementarity decrease by (1 − α(1 − γ)).

9 / 19

slide-11
SLIDE 11

Computing the scaling matrix

Theorem (Schnabel [4])

Let S, Y ∈ Rn×p have full rank p. Then there exists H ≻ 0 such that HS = Y if and only if Y TS ≻ 0. Let S :=

  • x

˜ x

  • ,

Y :=

  • s

˜ s

  • both be full rank. As a consequence of Thm. 1 (for n = 3),

H = Y (Y TS)−1Y T + zzT where STz = 0, z = 0 and det(Y TS) =

  • (xTs) · (˜

xT ˜ s) − ν2 > 0 vanishing towards the central path.

10 / 19

slide-12
SLIDE 12

Computing the scaling matrix

Expanding the BFGS update [4] ˆ H = H0 + Y (Y TS)−1Y T − H0S(STH0S)−1STH0, for H0 ≻ 0 gives the scaling by Tun¸ cel [5] and Myklebust [2], i.e., ˆ z ˆ zT = H0 − H0S(STH0S)−1STH0. We choose H0 := µF ′′(x). In other words, W TW = ˆ H ≈ µF ′′(x) and satisfies W TWx = s, W TW ˜ x = ˜ s.

11 / 19

slide-13
SLIDE 13

Tun¸ cel’s scaling bounds

Let µ := (xTs)/ν and ˜ µ := (˜ xT ˜ s)/ν. Tun¸ cel defines T2(ξ, x, s) :=

  • H ≻ 0 | Hx = s, H˜

x = ˜ s, µ ξ(ν(µ˜ µ − 1) + 1)F ′′(x) H ξ(ν(µ˜ µ − 1) + 1) µ F ′′(˜ x)

  • and shows polynomial convergence for a potential reduction

method if inf

ξ T2(ξ, x, s) ≤ O(1),

∀x ∈ int(K), s ∈ int(K ∗). For symmetric cones ξ⋆ ≤ 4/3.

12 / 19

slide-14
SLIDE 14

Bounds for the exponential cone

Given s ∈ int(K ∗

exp) and µ > 0. Let h := (0, 0, νµ/s3) and

xα := h − α(µF ′(s) + h).

1 xα ∈ Kexp, α ∈ [0, ν/2]. 2 xα, s

ν = µ.

3 µF ′(xα), F ′ ∗(s) = ν − 1

α + 1 ν − (ν − 1)α.

4 xα2 −µF ′

∗(s) = (α2 − 2α)ν(ν − 1) + ν2.

Conjecture (Øbro [3]): For the exponential cone ξ⋆ ≈ 1.2532, i.e.,

ξ⋆ = 2ν ν − 1 − 2√ν √ν − 1 −1 (ν − 1)3/2 √ν + 1 ν −

  • ν(ν − 1)

− ν + 1 −1

attained for xα⋆ with α⋆ = ν(ν(ν − 1))−1/2.

13 / 19

slide-15
SLIDE 15

Øbro’s conjecture

0.0 0.5 1.0 1.5 2.0−3 −2 −1 1 2 1 2 3 4

x2 x3 x1

Plot of Kexp ∩ {x : xTs = νµ}, D(−µF ′

∗(s), 1) and xα⋆ (red).

14 / 19

slide-16
SLIDE 16

Implications for the exponential-cone

  • F(x) does not have negative curvature, i.e.,

F ′′′(x)[u] 0, ∀x ∈ int(Kexp), ∀u ∈ Kexp.

  • But F ′′ is still bounded, for another reason.
  • Tun¸

cel’s potential-reduction method for expontial-cones have polynomial-time complexity.

  • No equivalent proof yet for MOSEK’s algorithm, even with
  • ptimal scalings.
  • The BFGS scaling appears to be bounded as well, and often

coincides with the optimal scaling, leaving more to be proved.

15 / 19

slide-17
SLIDE 17

Comparing MOSEK and ECOS conic solvers

50 100 150 100 200 300

problem index iterations

MOSEK MOSEK n/c ECOS

Iteration counts for different exponential cone problems, comparing MOSEK (with and without proposed corrector) and ECOS.

16 / 19

slide-18
SLIDE 18

Comparing MOSEK and ECOS conic solvers

50 100 150 10- 3 10- 2 10- 1 100 101 102

problem index time [s]

MOSEK MOSEK n/c ECOS

Solution time for different exponential cone problems, comparing MOSEK (with and without proposed corrector) and ECOS.

17 / 19

slide-19
SLIDE 19

Conclusions

  • Exponential cone optimization included in MOSEK 9.
  • Works very well in practice, especially with the proposed

corrector.

  • Solution-time, accuracy, number of iterations on level with

symmetric cone implementation.

  • No proof of polynomial-time complexity yet.
  • More details can be found in [1].

18 / 19

slide-20
SLIDE 20

References

[1] J. Dahl and E. D. Andersen. A primal-dual interior-point algorithm for nonsymmetric exponential-cone

  • ptimization.

Technical report, MOSEK ApS., 2019. [2] T. Myklebust and L. Tun¸ cel. Interior-point algorithms for convex optimization based on primal-dual metrics. Technical report, University of Waterloo, 2014. [3] M. Øbro. Conic optimization with exponential cones. Master’s thesis, Technical University of Denmark, 2019. [4] R. B. Schnabel. Quasi-newton methods using multiple secant equations. Technical report, Colorado Univ., Boulder, Dept. Comp. Sci., 1983. [5] L. Tun¸ cel. Generalization of primal-dual interior-point methods to convex optimization problems in conic form. Foundations of Computational Mathematics, 1:229–254, 2001.

19 / 19