An E ffi cient A ffi ne-Scaling Algorithm for Hyperbolic Programming - PowerPoint PPT Presentation

An E ffi cient A ffi ne-Scaling Algorithm for Hyperbolic Programming Jim Renegar – joint work with Mutiara Sondjaja 1

Euclidean space A homogeneous polynomial p : E ! R is hyperbolic if there is a vector e 2 E such that for all x 2 E , the univariate polynomial t 7! p ( x + t e ) has only real roots. “ p is hyperbolic in direction e ” E = S n × n , p ( X ) = det( X ) , E = I (identity matrix) Example: – then p ( X + t E ) is the characteristic polynomial of − X All roots are real because symmetric matrices have only real eigenvalues. The hyperbolicity cone is Λ ++ the connected component of { x : p ( x ) 6 = 0 } containing e . For the example, Λ ++ = S n × n (cone of positive-definite matrices) ++ – the convexity of this particular cone is true of hyperbolicity cones in general . . . 2

Thm (G ˚ arding, 1959): Λ ++ is a convex cone A hyperbolic program is an optimization problem of the form  h c, x i min  Ax = b s.t.  HP x 2 Λ + closure of Λ ++ G¨ uler (1997) introduced hyperbolic programming, motivated largely by the realization that f ( x ) = − ln p ( x ) is a self-concordant barrier function “ O ( √ n ) iterations to halve the duality gap ” – where n is the degree of p G¨ uler showed the barrier functions f ( x ) = � ln p ( x ) possess many of the nice properties of X 7! � ln det( X ) although hyperbolicity cones in general are not symmetric (i.e., self-scaled) 3

 h c, x i min  Ax = b s.t.  HP x 2 Λ + There are natural ways in which to “relax” HP to hyperbolic programs for lower degree polynomials. For example, to obtain a relaxation of SDP . . . Fix n , and for 1 ≤ k ≤ n let σ k ( λ 1 , . . . , λ n ) := P j 1 <...<j k λ j 1 · · · λ j k – elementary symmetric polynomial of degree k Then X 7! σ k ( λ ( X ) ) is a hyperbolic polynomial in direction E = I of degree k , and its hyperbolicity cone contains S n × n ++ These polynomials can be evaluated e ffi ciently via the FFT. Perhaps relaxing SDP’s in this and related ways will allow larger SDP’s to be approximately solved e ffi ciently. The relaxations easily generalize to all hyperbolic programs. 4

 h c, x i min  Ax = b s.t.  HP x 2 Λ + barrier function, f ( x ) = − ln p ( x ) its gradient g ( x ) and Hessian H ( x ) positive-definite for all x ∈ Λ ++ “ local inner product at e 2 Λ ++ ” h u, v i e := h u, H ( e ) v i p – the induced norm: k v k e = h v, v i e – “Dikin ellipsoids”: ¯ B e ( e, r ) = { x : k x � e k e  r } The gist of the original a ffi ne-scaling method due to Dikin is simply: Given a strictly feasible point e for HP and an appropriate value r > 0 , move from e to the optimal solution e + for min h c, x i s.t. Ax = b x 2 ¯ B e ( e, r ) Dikin focused on linear programming and chose r = 1 (giving the largest Dikin ellipsoids contained in R n + ) also: Vanderbei, Meketon and Freedman (1986) 5

In the mid-1980’s, there was considerable e ff ort trying to prove that Dikin’s a ffi ne-scaling method runs in polynomial-time (perhaps with choice r < 1) The e ff orts mostly ceased when in 1986, Shub and Megiddo showed that the “infinitesimal version” of the algorithm can come near all vertices of a Klee-Minty cube. Nevertheless, several algorithms with spirit similar to Dikin’s method have been shown to halve the duality gap in polynomial time: Monteiro, Adler and Resende 1990 LP, and convex QP } Jansen, Roos and Terlaky 1996 LP use “scaling points” and “V-space” 1997 PSD LCP-problems Sturm and Zhang 1996 SDP } use ellipsoidal cones Chua 2007 symmetric cone programming rather than ellipsoids These algorithms are primal-dual methods and rely heavily on the cones being self-scaled. Our framework shares some strong connections to the one developed by Chek Beng Chua, to whom we are indebted. 6

 h c, x i min  Ax = b s.t.  HP x 2 Λ + For e 2 Λ ++ and 0 < α < p n , let K e ( α ) := { x : h e, x i e � α k x k e } – this happens to be the smallest cone √ � n − α 2 � containing the Dikin ellipsoid B e e, Keep in mind that the cone grows in size as α decreases.   min h c, x i min h c, x i   s.t. Ax = b s.t. Ax = b  HP � !  QP e ( α ) x 2 Λ + x 2 K e ( α ) Swath( α ) = { e ∈ Λ ++ : Ae = b and QP e ( α ) has an optimal solution } Definition: Swath(0) = Central Path Prop: Thus, α can be regarded as a measure of the proximity of points in Swath( α ) to the central path. 7

  min h c, x i min h c, x i   s.t. Ax = b  HP � ! s.t. Ax = b  QP e ( α ) x 2 Λ + x 2 K e ( α ) { x : h e, x i e � α k x k e } Swath( α ) = { e ∈ Λ ++ : Ae = b and QP e ( α ) has an optimal solution } Definition: Let x e ( α ) = optimal solution of QP e ( α ) (assuming e ∈ Swath( α )) – the main work in computing x e ( α ) lies in solving a system of linear equations We assume 0 < α < 1 , in which case Λ + ⊆ K e ( α ) – thus, K e ( α ) is a relaxation of HP – hence, optimal value of HP � h c, x e ( α ) i Current iterate: e ∈ Λ ++ Next iterate will be e 0 , a convex combination of e and x e ( α ) e 0 = 1 � � e + t x e ( α ) 1+ t The choice of t is made through duality . . . 8

  min h c, x i min h c, x i   s.t. Ax = b  HP � ! s.t. Ax = b  QP e ( α ) x 2 Λ + x 2 K e ( α ) { x : h e, x i e � α k x k e } Swath( α ) = { e ∈ Λ ++ : Ae = b and QP e ( α ) has an optimal solution } Definition: Let x e ( α ) = optimal solution of QP e ( α ) (assuming e ∈ Swath( α )) 9

  min h c, x i min h c, x i   s.t. Ax = b  HP � ! s.t. Ax = b  QP e ( α ) x 2 Λ + x 2 K e ( α ) { x : h e, x i e � α k x k e } Swath( α ) = { e ∈ Λ ++ : Ae = b and QP e ( α ) has an optimal solution } Definition: Let x e = optimal solution of QP e ( α ) (assuming e ∈ Swath( α )) 10

  min h c, x i min h c, x i   s.t. Ax = b  HP � ! s.t. Ax = b  QP e ( α ) x 2 Λ + x 2 K e ( α ) { x : h e, x i e � α k x k e } Swath( α ) = { e ∈ Λ ++ : Ae = b and QP e ( α ) has an optimal solution } Definition: Let x e = optimal solution of QP e ( α ) (assuming e ∈ Swath( α )) b T y b T y   max max   s.t. A ∗ y + s = c  HP ∗ s.t. A ∗ y + s = c  QP e ( α ) ∗ − → s ∈ Λ ∗ x ∈ K e ( α ) ∗ + First-order optimality conditions for x e yield optimal solution ( y e , s e ) for QP e ( α ) ∗ because Λ + ⊆ K e ( α ) Moreover, ( y e , s e ) is feasible for HP ∗ and hence K e ( α ) ∗ ⊆ Λ ∗ + ( y e , s e ) for HP ∗ e for HP , primal-dual feasible pair: gap e := h c, e i � b T y e duality gap: 11

Swath( α ) = { e ∈ Λ ++ : Ae = b and QP e ( α ) has an optimal solution } x e = optimal solution of QP e ( α ) (assuming e ∈ Swath( α )) primal-dual feasible pair: e for HP , ( y e , s e ) for HP ∗ Current iterate: e ∈ Λ ++ Next iterate will be a convex combination of e and x e : 1 � � e ( t ) = e + t x e 1+ t Want t to be large so as to improve primal objective value, but also want e ( t ) ∈ Swath( α ) We choose t to be the minimizer of a particular quadratic polynomial, and thereby ensure that: s e ∈ int( K e ( t ) ( α ) ∗ ) e ( t ) ∈ Λ ++ • • – consequently, both e ( t ) is strictly feasible for QP e ( t ) ( α ) and ( y e , s e ) is strictly feasible for QP e ( t ) ( α ) ∗ – hence, e ( t ) ∈ Swath( α ) 12

Swath( α ) = { e ∈ Λ ++ : Ae = b and QP e ( α ) has an optimal solution } x e = optimal solution of QP e ( α ) (assuming e ∈ Swath( α )) primal-dual feasible pair: e for HP , ( y e , s e ) for HP ∗ Current iterate: e ∈ Λ ++ Next iterate will be a convex combination of e and x e : 1 � � e ( t ) = e + t x e 1+ t Want t to be large so as to improve primal objective value, but also want e ( t ) ∈ Swath( α ) We choose t to be the minimizer of a particular quadratic polynomial, and thereby ensure that: t � 1 2 α / k x e k e • – and thus ensure good improvement in the primal objective value if, say, k x e k e  p n 13

Swath( α ) = { e ∈ Λ ++ : Ae = b and QP e ( α ) has an optimal solution } x e = optimal solution of QP e ( α ) (assuming e ∈ Swath( α )) primal-dual feasible pair: e for HP , ( y e , s e ) for HP ∗ Current iterate: e ∈ Λ ++ Next iterate will be a convex combination of e and x e : 1 � � e ( t ) = e + t x e 1+ t Want t to be large so as to improve primal objective value, but also want e ( t ) ∈ Swath( α ) We choose t to be the minimizer of a particular quadratic polynomial, and thereby ensure that: q 1+ α s e ∈ K e ( t ) ( β ) ∗ where β = α • 2 – which implies s e is “deep within” K e ( t ) ( α ) ∗ – and hence ( y e , s e ) is “very strongly” feasible for QP e ( t ) ( α ) ∗ 14

1 � � e ( t ) = e + t x e 1+ t We choose t to be the minimizer of a particular quadratic polynomial, and thereby ensure that: 1. There is “good” improvement in primal objective value if k x e k e  p n ( y e , s e ) is “very strongly” feasible for QP ∗ 2. e ( t ) Sequence of iterates: e 0 , e 1 , e 2 , . . . – write x i and ( y i , s i ) rather than x e i and ( y e i , s e i ) If i > 0, then k x i k e i  p n ) h c, e i +1 i ⌧ h c, e i i 1. ( y i − 1 , s i − 1 ) is “very strongly” feasible for QP ∗ 2. e i On the other hand, we show ( k x i k e i � p n ) ^ (2 . ) b T y i � b T y i − 1 3. ) In this manner we establish the Main Theorem . . . 15

An E ffi cient A ffi ne-Scaling Algorithm for Hyperbolic Programming - PowerPoint PPT Presentation

An E ffi cient A ffi ne-Scaling Algorithm for Hyperbolic Programming Jim Renegar joint work with Mutiara Sondjaja 1 Euclidean space A homogeneous polynomial p : E ! R is hyperbolic if there is a vector e 2 E such that for all x 2 E , the

Lecture 3. Su ffi ciency Lecture 3. Su ffi ciency 1 (114) 3. Su ffi ciency 3.1. Su ffi cient

Hyperbolic Neural Networks Hyperbolic Neural Networks Use hyperbolic space instead of Euclidean

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

E ffi cient Document Scoring VSM, session 5 CS6200: Information Retrieval Slides by: Jesse

15 E ffi cient mesh models Steve Marschner CS5625 Spring 2020 Follows chapter 16 in RTR 4e Basics

FFI The good, the bad and the ugly Esteban Lorenzano (The Pharo firefighter) Current status of

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Solid State Drive Based Energy E ffi cient Cloud Storage Jesus Ramos Alexis Je ff erson Ti ff any

E ffi cient, Cost E ff ective and Sustainable Self-Delivery of Asphalt for Small Works 1

A Large Scale Study of the Small Sample Performance of Random Coe ffi cient Models of Demand

E ffi cient and Incentive-Compatible Liver Exchange Haluk Ergin Tayfun Snmez M. Utku nver U

Point-Voxel CNN for E ffi cient 3D Deep Learning Zhijian Liu* , Haotian Tang* , Yujun Lin , and

E ffi cient use of semidefinite programming for the selection of rotamers in protein conformation

An Optimal Affine Invariant Smooth Minimization Algorithm. Alexandre dAspremont , CNRS &

Results of the WMT16 Metrics Shared Task Ond rej Bojar Yvette Graham Amir Kamran Milo s

A primal-dual algorithm for expontial-cone optimization ICCOPT Berlin, August 8th, 2019

Evaluating Intensive Outpatient Primary Care: VA Experience Steven M. Asch MD MPH Director,

Model 1 proc logistic data=framing descending; model chd01 = age; run; Model Information Data

Machine Learning for Information Discovery Thorsten Joachims Cornell University Department of

Square Formation by Asynchronous Oblivious Robots CCCG 2016 Marcello Mamino, Giovanni Viglietta

Improved Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance Blair Bilodeau 1 , 2

An E ffi cient A ffi ne-Scaling Algorithm for Hyperbolic Programming - PowerPoint PPT Presentation

An E ffi cient A ffi ne-Scaling Algorithm for Hyperbolic Programming Jim Renegar joint work with Mutiara Sondjaja 1 Euclidean space A homogeneous polynomial p : E ! R is hyperbolic if there is a vector e 2 E such that for all x 2 E , the

Lecture 3. Su ffi ciency Lecture 3. Su ffi ciency 1 (114) 3. Su ffi ciency 3.1. Su ffi cient

Hyperbolic Neural Networks Hyperbolic Neural Networks Use hyperbolic space instead of Euclidean

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

E ffi cient Document Scoring VSM, session 5 CS6200: Information Retrieval Slides by: Jesse

15 E ffi cient mesh models Steve Marschner CS5625 Spring 2020 Follows chapter 16 in RTR 4e Basics

FFI The good, the bad and the ugly Esteban Lorenzano (The Pharo firefighter) Current status of

Analysis of Scaling Algorithms for Matrix &amp; Operator Scaling Contents Scaling Algorithms

Solid State Drive Based Energy E ffi cient Cloud Storage Jesus Ramos Alexis Je ff erson Ti ff any

E ffi cient, Cost E ff ective and Sustainable Self-Delivery of Asphalt for Small Works 1

A Large Scale Study of the Small Sample Performance of Random Coe ffi cient Models of Demand

E ffi cient and Incentive-Compatible Liver Exchange Haluk Ergin Tayfun Snmez M. Utku nver U

Point-Voxel CNN for E ffi cient 3D Deep Learning Zhijian Liu* , Haotian Tang* , Yujun Lin , and

E ffi cient use of semidefinite programming for the selection of rotamers in protein conformation

An Optimal Affine Invariant Smooth Minimization Algorithm. Alexandre dAspremont , CNRS &amp;

Results of the WMT16 Metrics Shared Task Ond rej Bojar Yvette Graham Amir Kamran Milo s

A primal-dual algorithm for expontial-cone optimization ICCOPT Berlin, August 8th, 2019

Evaluating Intensive Outpatient Primary Care: VA Experience Steven M. Asch MD MPH Director,

Model 1 proc logistic data=framing descending; model chd01 = age; run; Model Information Data

Machine Learning for Information Discovery Thorsten Joachims Cornell University Department of

Square Formation by Asynchronous Oblivious Robots CCCG 2016 Marcello Mamino, Giovanni Viglietta

Improved Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance Blair Bilodeau 1 , 2

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

An Optimal Affine Invariant Smooth Minimization Algorithm. Alexandre dAspremont , CNRS &