An E ffi cient A ffi ne-Scaling Algorithm for Hyperbolic Programming - - PowerPoint PPT Presentation

an e ffi cient a ffi ne scaling algorithm for hyperbolic
SMART_READER_LITE
LIVE PREVIEW

An E ffi cient A ffi ne-Scaling Algorithm for Hyperbolic Programming - - PowerPoint PPT Presentation

An E ffi cient A ffi ne-Scaling Algorithm for Hyperbolic Programming Jim Renegar joint work with Mutiara Sondjaja 1 Euclidean space A homogeneous polynomial p : E ! R is hyperbolic if there is a vector e 2 E such that for all x 2 E , the


slide-1
SLIDE 1

An Efficient Affine-Scaling Algorithm for Hyperbolic Programming

Jim Renegar – joint work with Mutiara Sondjaja

1

slide-2
SLIDE 2

Euclidean space

A homogeneous polynomial p : E ! R is hyperbolic if there is a vector e 2 E such that for all x 2 E, the univariate polynomial t 7! p(x + t e) has only real roots.

“ p is hyperbolic in direction e ”

Example: E = Sn×n , p(X) = det(X) , E = I (identity matrix) – then p(X + t E) is the characteristic polynomial of −X

All roots are real because symmetric matrices have only real eigenvalues.

The hyperbolicity cone Λ++ is the connected component of {x : p(x) 6= 0} containing e . For the example, Λ++ = Sn×n

++

(cone of positive-definite matrices)

– the convexity of this particular cone is true of hyperbolicity cones in general . . .

2

slide-3
SLIDE 3

G¨ uler (1997) introduced hyperbolic programming, motivated largely by the realization that f(x) = − ln p(x) is a self-concordant barrier function “ O(√n) iterations to halve the duality gap ” – where n is the degree of p A hyperbolic program is an optimization problem of the form

min hc, xi s.t. Ax = b x 2 Λ+    HP

closure of Λ++

Thm (G ˚ arding, 1959): Λ++ is a convex cone

G¨ uler showed the barrier functions f(x) = ln p(x) possess many of the nice properties of X 7! ln det(X) although hyperbolicity cones in general are not symmetric (i.e., self-scaled)

3

slide-4
SLIDE 4

min hc, xi s.t. Ax = b x 2 Λ+    HP

There are natural ways in which to “relax” HP to hyperbolic programs for lower degree polynomials.

Fix n, and for 1 ≤ k ≤ n let σk(λ1, . . . , λn) := P

j1<...<jk λj1 · · · λjk

– elementary symmetric polynomial of degree k

For example, to obtain a relaxation of SDP . . .

Then X 7! σk( λ(X) ) is a hyperbolic polynomial in direction E = I of degree k, and its hyperbolicity cone contains Sn×n

++

These polynomials can be evaluated efficiently via the FFT.

Perhaps relaxing SDP’s in this and related ways will allow larger SDP’s to be approximately solved efficiently. The relaxations easily generalize to all hyperbolic programs.

4

slide-5
SLIDE 5

min hc, xi s.t. Ax = b x 2 Λ+    HP

barrier function, f(x) = − ln p(x) its gradient g(x) and Hessian H(x)

positive-definite for all x ∈ Λ++

Given a strictly feasible point e for HP and an appropriate value r > 0 , move from e to the optimal solution e+ for

min hc, xi s.t. Ax = b x 2 ¯ Be(e, r) – the induced norm: kvke = p hv, vie “ local inner product at e 2 Λ++ ” hu, vie := hu, H(e)vi – “Dikin ellipsoids”: ¯ Be(e, r) = {x : kx eke  r} Dikin focused on linear programming and chose r = 1 (giving the largest Dikin ellipsoids contained in Rn

+)

The gist of the original affine-scaling method due to Dikin is simply:

also: Vanderbei, Meketon and Freedman (1986)

5

slide-6
SLIDE 6

In the mid-1980’s, there was considerable effort trying to prove that Dikin’s affine-scaling method runs in polynomial-time (perhaps with choice r < 1) The efforts mostly ceased when in 1986, Shub and Megiddo showed that the “infinitesimal version” of the algorithm can come near all vertices of a Klee-Minty cube.

Monteiro, Adler and Resende 1990 LP, and convex QP Jansen, Roos and Terlaky 1996 LP 1997 PSD LCP-problems Sturm and Zhang 1996 SDP Chua 2007 symmetric cone programming

}

}

use ellipsoidal cones rather than ellipsoids use “scaling points” and “V-space”

These algorithms are primal-dual methods and rely heavily on the cones being self-scaled. Nevertheless, several algorithms with spirit similar to Dikin’s method have been shown to halve the duality gap in polynomial time:

Our framework shares some strong connections to the one developed by Chek Beng Chua, to whom we are indebted.

6

slide-7
SLIDE 7

min hc, xi s.t. Ax = b x 2 Λ+    HP min hc, xi s.t. Ax = b x 2 Λ+    HP

  • !

min hc, xi s.t. Ax = b x 2 Ke(α)    QPe(α)

For e 2 Λ++ and 0 < α < pn , let Ke(α) := {x : he, xie α kxke}

– this happens to be the smallest cone containing the Dikin ellipsoid Be

  • e,

√ n − α2

Keep in mind that the cone grows in size as α decreases.

Definition: Swath(α) = {e ∈ Λ++ : Ae = b and QPe(α) has an optimal solution} Prop: Swath(0) = Central Path Thus, α can be regarded as a measure of the proximity of points in Swath(α) to the central path.

7

slide-8
SLIDE 8

Let xe(α) = optimal solution of QPe(α) (assuming e ∈ Swath(α))

– the main work in computing xe(α) lies in solving a system of linear equations

min hc, xi s.t. Ax = b x 2 Λ+    HP

  • !

min hc, xi s.t. Ax = b x 2 Ke(α)    QPe(α) {x : he, xie αkxke}

Definition: Swath(α) = {e ∈ Λ++ : Ae = b and QPe(α) has an optimal solution}

Current iterate: e ∈ Λ++ Next iterate will be e0 , a convex combination of e and xe(α)

e0 =

1 1+t

  • e + t xe(α)
  • The choice of t is made through duality

. . .

We assume 0 < α < 1 , in which case Λ+ ⊆ Ke(α) – thus, Ke(α) is a relaxation of HP – hence,

  • ptimal value of HP hc, xe(α)i

8

slide-9
SLIDE 9

Let xe(α) = optimal solution of QPe(α) (assuming e ∈ Swath(α))

min hc, xi s.t. Ax = b x 2 Λ+    HP

  • !

min hc, xi s.t. Ax = b x 2 Ke(α)    QPe(α) {x : he, xie αkxke}

Definition: Swath(α) = {e ∈ Λ++ : Ae = b and QPe(α) has an optimal solution}

9

slide-10
SLIDE 10

min hc, xi s.t. Ax = b x 2 Λ+    HP

  • !

min hc, xi s.t. Ax = b x 2 Ke(α)    QPe(α) {x : he, xie αkxke}

Definition: Swath(α) = {e ∈ Λ++ : Ae = b and QPe(α) has an optimal solution}

Let xe = optimal solution of QPe(α) (assuming e ∈ Swath(α))

10

slide-11
SLIDE 11

max bT y s.t. A∗y + s = c s ∈ Λ∗

+

   HP∗ − → max bT y s.t. A∗y + s = c x ∈ Ke(α)∗    QPe(α)∗

First-order optimality conditions for xe yield optimal solution (ye, se) for QPe(α)∗

because Λ+ ⊆ Ke(α) and hence Ke(α)∗ ⊆ Λ∗

+

Moreover, (ye, se) is feasible for HP∗ duality gap: gape := hc, ei bT ye primal-dual feasible pair: e for HP , (ye, se) for HP∗

min hc, xi s.t. Ax = b x 2 Λ+    HP

  • !

min hc, xi s.t. Ax = b x 2 Ke(α)    QPe(α) {x : he, xie αkxke}

Definition: Swath(α) = {e ∈ Λ++ : Ae = b and QPe(α) has an optimal solution}

Let xe = optimal solution of QPe(α) (assuming e ∈ Swath(α))

11

slide-12
SLIDE 12
  • e(t) ∈ Λ++

Swath(α) = {e ∈ Λ++ : Ae = b and QPe(α) has an optimal solution} xe = optimal solution of QPe(α) (assuming e ∈ Swath(α)) primal-dual feasible pair: e for HP , (ye, se) for HP∗

  • se ∈ int(Ke(t)(α)∗)

We choose t to be the minimizer of a particular quadratic polynomial, and thereby ensure that: Current iterate: e ∈ Λ++ Next iterate will be a convex combination of e and xe: e(t) =

1 1+t

  • e + t xe
  • – consequently, both e(t) is strictly feasible for QPe(t)(α)

and (ye, se) is strictly feasible for QPe(t)(α)∗ – hence, e(t) ∈ Swath(α)

Want t to be large so as to improve primal objective value, but also want e(t) ∈ Swath(α)

12

slide-13
SLIDE 13

– and thus ensure good improvement in the primal objective value if, say, kxeke  pn

Swath(α) = {e ∈ Λ++ : Ae = b and QPe(α) has an optimal solution} xe = optimal solution of QPe(α) (assuming e ∈ Swath(α)) primal-dual feasible pair: e for HP , (ye, se) for HP∗

We choose t to be the minimizer of a particular quadratic polynomial, and thereby ensure that: Current iterate: e ∈ Λ++ Next iterate will be a convex combination of e and xe: e(t) =

1 1+t

  • e + t xe
  • t 1

2α/kxeke

Want t to be large so as to improve primal objective value, but also want e(t) ∈ Swath(α)

13

slide-14
SLIDE 14

– and hence (ye, se) is “very strongly” feasible for QPe(t)(α)∗

  • se ∈ Ke(t)(β)∗

where β = α q

1+α 2

– which implies se is “deep within” Ke(t)(α)∗

Swath(α) = {e ∈ Λ++ : Ae = b and QPe(α) has an optimal solution} xe = optimal solution of QPe(α) (assuming e ∈ Swath(α)) primal-dual feasible pair: e for HP , (ye, se) for HP∗

Current iterate: e ∈ Λ++ Next iterate will be a convex combination of e and xe: e(t) =

1 1+t

  • e + t xe
  • We choose t to be the minimizer of a particular quadratic polynomial,

and thereby ensure that:

Want t to be large so as to improve primal objective value, but also want e(t) ∈ Swath(α)

14

slide-15
SLIDE 15

We choose t to be the minimizer of a particular quadratic polynomial, and thereby ensure that: e(t) =

1 1+t

  • e + t xe
  • 1. There is “good” improvement in primal objective value if kxeke  pn

2. (ye, se) is “very strongly” feasible for QP∗

e(t)

Sequence of iterates: e0, e1, e2, . . . – write xi and (yi, si) rather than xei and (yei, sei) If i > 0, then

2. (yi−1, si−1) is “very strongly” feasible for QP∗

ei

1. kxikei  pn ) hc, ei+1i ⌧ hc, eii

On the other hand, we show

3. (kxikei pn) ^ (2.) ) bT yi bT yi−1 In this manner we establish the Main Theorem . . .

15

slide-16
SLIDE 16

Main Thm:

Sequence of iterates: e0, e1, e2, . . .

  • The primal objective value improves monotonically,

and so does the dual objective value. Ke(α) := {x : he, xie α kxke} e(t) =

1 1+t

  • e + t xe
  • “We choose t to be the minimizer of a particular quadratic polynomial”

In fact, the theorem holds if one simply chooses t = 1

2α/kxikei ,

but choosing t to be the minimizer can result in steps that are far longer. So what is the particular quadratic polynomial?

  • If i, k ≥ 0, then

gapei gapei+k ≥ 1 + α r 1 − α 8n !k−1

16

slide-17
SLIDE 17

Ke(α) := {x : he, xie α kxke} Special Case of SDP: E ∈ Sn×n

++

, XE optimal for QPE(α) , (yE, SE) optimal for QPE(α)∗ Let E(t) =

1 1+t (E + t XE)

Prop: Here is the quadratic polynomial: q(t) := tr ⇣ ((E + t XE)SE)2 ⌘ (t 0) ^ (E(t) 0) ) min

  • β : SE 2 KE(t)(β)∗

= p n 1/q(t)

Prop: The minimizer t of q satisfies

Corollary: SE is “deep within” KE(t)(α)∗ for the minimizer t of q

t > 1

2α/kXEkE ,

E(t) 0 and q(t)  α q

1+α 2

.

17

slide-18
SLIDE 18

e ∈ Λ++ , xe optimal for QPe(α) , (ye, se) optimal for QPe(α)∗ Hyperbolic Programming in General: Ke(α) := {x : he, xie α kxke} Let e(t) =

1 1+t (e + t xe)

(t ≥ 0) ∧ (e(t) ∈ Λ++) ⇒ min

  • β : se ∈ Ke(t)(β)∗

= p n − 1/˜ q(t) As happened for SDP, we would like to have an easily computable function ˜ q for which However, in general the resulting function ˜ q need not be a quadratic polynomial, nor do we see any reason that it necessarily be efficiently computable

– in fact, about the most we know is that ˜ q is semi algebraic.

But we do know how to obtain a quadratic polynomial q which serves as an appropriate upper bound to the function ˜ q – we do this by leveraging our SDP result with the (very deep) Helton-Vinnikov Theorem for hyperbolicity cones.

18

slide-19
SLIDE 19

Helton-Vinnikov Theorem: If p : E → R is hyperbolic in direction e and of degree n, and if L is a 3-dimensional subspace of E containing e, then there exists a linear transformation T : L → Sn×n satisfying T(e) = I and p(x) = p(e) det (T(x)) for all x ∈ L .

19

slide-20
SLIDE 20

Luckily, the resulting quadratic polynomial always can be efficiently computed:

  • First compute the five leading coefficients an, an−1, an−2, an−3, an−4
  • f the univariate polynomial
  • Then compute

κ1 = an−1 an , κ2 = ✓an−1 an ◆2 −an−2 an , κ3 = ✓an−1 an ◆3 −3 2 an−1 an an−2 an +1 2 an−3 an

  • The desired quadratic polynomial is

t 7! at2 + bt + c where a = κ2

1 κ2 − 2 α2 κ1 κ3 + α4 κ4 ,

b = 2 α4 κ3 − 2 κ3

1

and c = (n − α2) κ2

1

γ 7! p(xe + γ e) = Pn

i=1 aiγi

and κ4 = ✓an−1 an ◆4 −2 ✓an−1 an ◆2 an−2 an + 1 2 ✓an−2 an ◆2 + 2 3 an−1 an an−3 an − 1 6 an−4 an

20

slide-21
SLIDE 21

Epilogue: Recently we learned of a paper on linear programming in which quadratic cones relaxing the non-negative orthant are used in devising a polynomial-time algorithm, albeit one with complexity “O(n L) iterations” rather than “O(√n L) iterations.” I.S. Litvinchev, “A circular cone relaxation primal interior point algorithm for LP,” Optimization 52 (2003) 529-540.

21