On the Linear Algebra Employed in the MOSEK Conic Optimizer Monday - PowerPoint PPT Presentation

On the Linear Algebra Employed in the MOSEK Conic Optimizer Monday Jul 13 Erling D. Andersen www.mosek.com

MOSEK summary General LP Convex QP SDP MIP CQP • Version 8: Work in progress.

The conic optimization problem n n ¯ � ¯ � � C j , ¯ � min c j x j + X j j =1 j =1 n n ¯ � ¯ A ij , ¯ � � � subject to a ij x j + = b i , i = 1 , . . . , m , X j j =1 j =1 x ∈ K , ¯ X j � 0 , j = 1 , . . . , ¯ n . Explanation: • x j is a scalar variable. • ¯ X j is a square matrix variable.

• K represents Cartesian product of conic quadratic constraints e.g. x 1 ≥ � x 2: n � . • ¯ X j � 0 represents ¯ X j = ¯ X T and ¯ X j is PSD. j • ¯ C j and ¯ A j are required to be symmetric. • � A , B � := tr( A T B ). • Dimensions are large. • Data matrices are typically sparse. • A has ≤ 10 nonzeros per column on average usually. • ¯ A ij contains few nonzeros and/or is low rank.

The algorithm: Simplified version • Step 1: Setup the homogeneous and self-dual model. • Step 2. Choose a starting point. • Step 3: Compute Nesterov-Todd search direction. • Step 4: Take a step. • Step 5: Stop if the trial solution is good enough. • Step 6: Goto 3.

Search direction computation Requires solution of: − ( WW T ) − 1 A T       0 d x r x − ( ¯ W ¯ ¯  = W T ) − 1 A T 0 d ¯ r ¯ x x      ¯ 0 d y r y A A where • W and ¯ W are nonsingular block diagonal matrices. • WW T is a diagonal matrix + low rank terms.

Reduced Schur system approach We have (( AW )( AW ) T + ( ¯ A ¯ W )( ¯ A ¯ W ) T ) d y = · · · and − ( WW T )( r x − A T d y ) , d x = − ( ¯ W ¯ x − ¯ W T )( r ¯ A T d y ) . = d ¯ x Cons: • Dense columns cause issues. • Numerical stability. Bad condition number. Pros: • A positive definite symmetric system. • Use Cholesky with no pivoting. • Employed in major commercial solvers.

Computing the Schur matrix Assumptions: • Let us focus at: W ) T = ¯ W T ¯ ( ¯ A ¯ W )( ¯ A ¯ A ¯ W ¯ A T . • Only one 1 matrix variable. The general case follows easily. • NT search direction implies W T = R T ⊗ R T W = R ⊗ R and ¯ ¯ where the Kronecker product ⊗ is defined as   R 11 R R 12 R · · · R 21 R R 22 R R ⊗ R =  .   .  . .

Fact: W T ¯ A k ) T vec ( RR T ¯ k ¯ A ¯ W ¯ A T e l = vec ( ¯ e T A l RR T ) .

Exploiting sparsity 1 Compute the lower triangular part of vec ( RR T ¯ T vec ( ¯  A 1 ) T   A 1 RR T )  . . W T ¯ A T = A ¯ ¯ W ¯ . .     . .     vec ( RR T ¯ vec ( ¯ A m ) T A m RR T ) so the l th column is computed as W T ¯ A k ) T vec ( RR T ¯ k ¯ A ¯ W ¯ A T e l = vec ( ¯ e T A l RR T ) , for k ≥ l . Avoid computing W T ¯ k ¯ A ¯ W ¯ e T A T e l A k ) T vec ( RR T ¯ vec ( ¯ A l RR T ) = = 0 if A k = 0 or A l = 0.

Exploiting sparsity 2 Moreover, • R is a dense square matrix. • A i is typically extremely sparse e.g. A i = e k e T k . as observed by J. Sturm for instance. • Wlog assume A i = U i V T + ( U i V T i ) T . i because U i = A i / 2 and V i = I is a valid choice. • In practice U i and V i are sparse and low rank e.g. has few columns. • The new idea!

Recall W T ¯ A k ) T vec ( RR T ¯ k ¯ A ¯ W ¯ A T e l = vec ( ¯ e T A l RR T ) must be computed for all k ≥ l and RR T ¯ RR T ( U l ( V l ) T + ( U l ( V l ) T ) T ) RR T A l RR T = U l ˆ ˆ + ( ˆ U l ˆ V T V T l ) T = l where ˆ RR T U l , := U l ˆ RR T V l . V l :=

• ˆ U l and ˆ V l are dense matrices. • Sparsity in U l and V l are exploited. • Low rank structure is exploited. • Is all of ˆ U l and ˆ V l required?

Observe e T i ( U k V T k + ( U k V T k ) T ) = 0 , ∀ i �∈ I k where I k := { i | U ki : � = 0 ∨ V ki : � = 0 } . Now A k ) T vec ( RR T ¯ vec ( ¯ A l RR T ) k ) T ) vec ( ˆ U l ˆ + ( ˆ U l ˆ vec ( U k V T k + ( U k V T V T V T l ) T ) = l � 2( U k e i ) T ( ˆ U l ˆ + ( ˆ U l ˆ V T V T l ) T )( V k e i ) = l i Therefore, only rows ˆ U l and ˆ V l corresponding to � I k k ≥ l are needed.

Proposed algorithm: • Compute � I k k ≥ l • Compute ˆ U kI K : and ˆ V kI K : . • Compute � 2( U k e i ) T ( ˆ U l ˆ + ( ˆ U l ˆ V T V T l ) T )( V k e i ) l i Possible improvements • Exploit the special case U k : j = α V k : j . • Exploit dense computations e.g. level 3 BLAS when possible and worthwhile.

Summary: • Exploit sparsity as done in SeDuMi by Sturm. • Also able to exploit low rank structure. • Not implemented yet!

Linear algebra summary • Sparse matrix operations e.g. multiplications. • Large sparse matrix factorization e.g. Cholesky. • Including ordering (AMD,GP). • Dense column detection and handling. • Dense sequential level 1,2,3 BLAS operations. • Inside sparse Cholesky for instance. • Sequential INTEL Math Kernel Library is employed extensively. • Eigenvalue computations. • What about the parallelization? • Modern computers have many cores. • Typically from 4 to 12. • Recent customer example had 80.

The parallelization challenge on shared memory • A computer has many cores. • Parallelization using native threads is cumbersome and error prone. • Employ a parallelization framework e.g. Cilk or OpenMP. Other issues; • Exploit caches. • Do not overload the memory bus. • Not fine grained due to threading overhead.

Cilk summary: • Extension to C and C++. • Has a runtime environment that execute tasks in parallel on a number of workers. • Handles the load balancing. • Allows nested/recursive parallelism e.g. • Parallel dense matrix mul. within parallelized sparse Cholesky. • Parallel IPM within B&B. • Is run to run deterministic. • Care must be taken in floating point computatiosn. • Supported by the Intel C compiler, Gcc, Clang.

Example parallelized dense syrk The dense level 3 BLAS syrk operation does C = AA T . Parallelized version using Cilk: If C is small C = AA T else C 21 = A 2: A T cilk spawn gemm 1: C 11 = A 1: A T cilk spawn syrk 1: C 22 = A 2: A T cilk spawn syrk 2: cilk sync Usage of recursion is allowed!

Our experience with cilk • cilk is easy to learn i.e. 3 new keywords. • Nested/recursive parallelism is allowed. • Useful for both sparse and dense matrix computations. • Efficient parallelization is nevertheless hard.

Summary and conclusions • I am behind the schedule with MOSEK version 8. • Proposed a new algorithm for computing the Schur matrix in the semidefinite case. • Discussed the usage of task based parallelization framework exemplified by cilk. • Slides url https://mosek.com/resources/presentations .

On the Linear Algebra Employed in the MOSEK Conic Optimizer Monday - PowerPoint PPT Presentation

On the Linear Algebra Employed in the MOSEK Conic Optimizer Monday Jul 13 Erling D. Andersen www.mosek.com MOSEK summary General LP Convex QP SDP MIP CQP Version 8: Work in progress. The conic optimization problem n n

On Using MOSEK to Solve The MOSEK solvers Large-Scale Linear and Conic Optimization Problems

On recent improvements in MOSEK Erling D. Andersen MOSEK ApS, Fruebjergvej 3, Box 16, 2100

Tour de MOSEK 7: The short version Erling D. Andersen MOSEK ApS, Fruebjergvej 3, Box 16, 2100

Solving conic optimization problems using MOSEK December 16th 2017 e.d.andersen@mosek.com

Extending MOSEK with exponential cones ISMP Bordeaux 2018 joachim.dahl@mosek.com www.mosek.com

What is Mosek up to January 15, 2019 Erling D. Andersen www.mosek.com Mosek A software

MOSEK version 9 (work in progress) July 4, 2018 Erling D. Andersen www.mosek.com Mosek A

An overview of MOSEK Erling D. Andersen MOSEK ApS Fruebjergvej 3, Symbion Science park, Box 16

Mixed-integer conic optimization and MOSEK Dagstuhl seminar on MINLP, February 20th 2018 Sven

Mixed-integer conic optimization and MOSEK Dagstuhl seminar on MINLP, February 20th 2018 Sven

Exponential cone in MOSEK ISMP2018, Relative Entropy Optimization, 6 July 2018 Micha l

Recent Advances In The New Mosek 8 16-November-2016 Erling D. Andersen www.mosek.com

The Mixed-integer Conic Optimizer in MOSEK 23rd International Symposium on Mathematical

Power and Exponential cones x y 1 | z | and x ye z/y June 27, 2018 Ulf Worse

Semidefinite Optimization using MOSEK Joachim Dahl ISMP Berlin, August 23, 2012

Conic optimization Aalborg University, June 26th, 2017 Joachim Dahl www.mosek.com Section 1

Loop Diagonalization Vedant Kumar October 27, 2014 Overview Loop/matrix equivalence Fast

ON MATRIX D -STABILITY AND RELATED PROPERTIES Olga Kushel Shanghai Jiao Tong University, China

Eigenvalues, Eigenvectors, and Their Uses James H. Steiger Department of Psychology and Human

Announcements Monday, November 06 The third midterm is on Friday, November 17 . That is

Accuracy and Stability: recent advances in C.A.G.D. J.M. Pe na* *University of Zaragoza, Spain

Efficient Full-Matrix Adaptive Regularization Naman Agarwal, Brian Bullins, Xinyi Chen, Elad

Linear Algebra Chapter 10: Solving Large Systems Section 10.2 The LU -FactorizationProofs of

Orthogonal similarity reduction of any symmetric matrix into a diagonal-plus-semiseparable one

Sambuz

Useful Links

Newsletter

Mail Us