Solving Linear Equations in Interior-Point Methods Tongryeol Seol - - PowerPoint PPT Presentation
Solving Linear Equations in Interior-Point Methods Tongryeol Seol - - PowerPoint PPT Presentation
Optimization Seminar Solving Linear Equations in Interior-Point Methods Tongryeol Seol Computing and Software Department, McMaster University March 1, 2004 Outline Linear Equations in IPMs How to Solve Sparse Linear Equations
1 / 21
Outline
- Linear Equations in IPMs
- How to Solve Sparse Linear Equations
- Introduction to McSML
- Conclusion
- Reference
2 / 21
Linear Equations in IPMs for LO and QO
- The key implementation issue of IPMs is the solution of the linear
system of equations arising in the Newton system:
- At every iteration, we solve the above system with different H.
- Solving the equations takes in the average 60~90% of the time of
solving problems by IPMs.
hL Δy A fL = Δx AT –H–1 AHAT Δy = AHfL + hL
augmented system in LO normal equation in LO
hQ Δy A fQ = Δx AT – Q – H–1 A(Q+ H–1)–1AT Δy = A(Q+H–1)–1 fQ + hQ
augmented system in QO normal equation in QO
3 / 21
How to Solve Linear Equations in IPMs
Direct Methods
- Cholesky factorization (LLT) is most popular for normal equation
approach.
– Cholesky factorization is a symmetric variant of LU factorization. – Data structures of L can be determined previously and fixed.
- LDLT factorization is used for augmented system approach.
– D is a block diagonal matrix if 2x2 pivot is applied, otherwise just a diagonal matrix.
Iterative Methods
- Conjugate gradient method is considered as an alternative in IPMs
for network flow optimization.
4 / 21
Normal Equation Approach
- In IPMs for LO, normal equation is popular because AHAT is
positive defnite and H is a diagonal matrix.
- If there is one or more dense columns in A, AHAT loses sparsity.
- normal equation of QO?
AHAT Δy = AH fL + hL
normal equation in LO
A(Q+H–1)–1AT Δy = A(Q+H–1)–1 fQ + hQ
normal equation in QO nonzero pattern of A of fit1p (nz=9,868) nonzero pattern of AAT of fit1p (nz=393,129)
5 / 21
Augmented System Approach
- Augmented system is free from dense columns or Q matrix, but
loses positive definiteness of normal equation.
- There are two approaches to solve augmented system:
Symmetric Block Factorization with 2x2 pivoting
– Theory: Bunch-Parlett (or Bunch-Kaufman) Pivoting Strategy – Implementation: LINPACK, Harwell Library (MA27, MA47), Fourer and Mehrotra (fo1aug)
LDLT Factorization with Regularization
– Theory: Quasidefinite, Proximal Point Algorithm – Implementation: Mészáros, Gondzio
6 / 21
Block Factorization with 2× 2 Pivoting
- We cannot apply the Cholesky factorization to indefinite matrices.
– e.g
- Bunch-Kaufman pivoting strategy uses 2×2 pivot
when 1×1 pivot is not acceptable.
– At first, it tries a partial pivoting search (case 1~3). – If no 1×1 pivot is acceptable, it performs 2×2 pivot. 1 1 λ d σ c σ λ
λ has the largest absolute value in this column σ has the largest absolute value in this column
case 1. |d| ≥ α |λ|, case 2. |dσ| ≥ α |λ|2 , case 3. |c| ≥ α |σ | , case 4. Otherwise, use d as a 1×1 pivot. use d as a 1×1 pivot. use c as a 1×1 pivot. use as a 2×2 pivot. d λ λ c
α ≒ 0.6404
7 / 21
Solving Linear Equations by Regularization
- Quasidefinite matrices are strongly factorizable.
– A symmetric matrix K is called quasidefinite if it has the form where H and F are positive definite and A has full row rank. – Factorizable: There exists D and L such that K = LDLT. – Strongly Factorizable: For any permutation matrix P, PKPT = LDLT exists.
- Regularization can be applied to achieve stability in the linear
algebra kernel.
– Primal-dual regularization: – However, we keep the regularizations small since we don’t want to change greatly the behaviour of IPMs. – H AT A F K = – Q – H–1 AT A 0 M = + – Rp 0 Rd
makes the part more positive definite makes the part more negative definite
8 / 21
Sparse Gaussian Elimination
- First step of Gaussian elimination:
– Repeat Gaussian elimination on C … Result in a Factorization M = LU. – In symmetric cases, M = L (U = DLT ) = LDLT.
- For sparse M:
C v = wT α I v/α 1 C – wT α α vwT M =
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
If v and w are not zero, is not zero when C is zero. C – α vwT Different order of pivoting can reduce the number of fill-ins.
unit lower triangular matrix upper triangular matrix diagonal matrix
9 / 21
Control of Nonzeros (Fill-ins)
- When numerical stability is not an issue,
- rdering,
symbolic factorization, can be performed separately. numerical factorization
- Ordering permutes equations and variables to reduce fill in L.
– Finding Optimal Permutation is NP-complete problem. – Local heuristics (e.g minimum degree ordering) – Global heuristics (e.g nested dissection)
- Symbolic factorization determines locations of nonzeros in L, and
we can set up data structures for storing nonzeros of L previously.
10 / 21
Local Ordering Heuristics
- The Markowitz criterion in Gaussian elimination
- Minimum degree ordering is a symmetric variant of Markowitz
- rdering.
- Graph representation of the sparse pattern of a matrix
- Nodes: on-diagonal elements
- Edges: off-diagonal elements
X X X Ⓧ ■ ■ ■ ■ ■ ■ ■ ■ ■ X ■ ■ ■ X X X X X X Ⓧ ■ ■ ■ ■ ■ ■ ■ X ■ ■ X X
Markowitz count: (ri-1)(cj-1) ri cj
estimated # of elements to be nonzeros
Markowitz count: (ri-1)2 ri Degree: r
② ① ③ ④ ⑤
X X X ① X ③ X ④ X X ⑤ X ② X X
11 / 21
M12 M11 M22 M21
Global Ordering Heuristics
- Nested dissection ordering
– Its roots are in finite-element substructuring. – It makes a matrix doubly-bordered block diagonal form. – It finds a small separator to disconnect a given graph into components of approximately equal size.
M2 M1 M1 M2 M11 M21 M12 M22
9×9 grid 81×81 matrix
12 / 21
Notion of Supernode
- Consecutive columns with identical sparsity structure can often be
found in L.
- Supernode is a group of consecutive columns { j, j+1, …, j+t }
such that
– columns j to j+t have a dense diagonal block, – columns j to j+t have identical sparsity pattern below row j+t
- Benefits of Supernode
– Reduce inefficient indirect addressing, – Take advantage of cache memory, – Allow a compact representation of the sparsity structure of L – We can expect about 30% speed-up by using supernode structures.
13 / 21
Introduction to McSML
- McSML consists of 11 C–MEX files for solution of linear
equations in IPMs.
- ANALYSIS
– Minimum (External) Degree Ordering with Multiple Elimination
- FACTORIZE and SOLVE
– Left-Looking Cholesky/LDLT Factorization – Supernodal Techniques: Compact Row Indices, Loop-unrolling, etc – Iterative Refinement: Conjugate Gradient Method
- MISC.
– We maintain A and AT together. – Normal Equation Builder and Augmented System Builder.
14 / 21
Normal Equation Solver of McSML
AT=sm_trans(A); P=sf_mmd_ne(A); [L,SINFO]=sf_scc_ne(A,AT,P); D=nf_aat(A,AT,H,P,L,SINFO); nf_scc_ne(L,SINFO,D); x=nf_sub_ne(A,H,L,SINFO,D,P,rhs);
It performs minimum degree
- rdering. P has ordering info.
It builds data structures for L and create supernode structures(SINFO). It computes AHAT. It performs numerical factorization. It solves the equation (AHATx=rhs) and refines the solution. We assume that every matrix is sparse and every vector is full.
15 / 21
Augmented System Solver of McSML
AT=sm_trans(A); P=sf_mmd_as(A,AT,H,F); [L,SINFO]=sf_scc_as(A,AT,H,F,P); D=nf_aug(A,AT,H,F,P,L,SINFO,Rp,Rd); nf_scc_as(L,SINFO,D); x=nf_sub_as(A,AT,H,F,L,SINFO,D,P,rhs);
It performs numerical factorization. It solves the equation and refines the solution. It builds + . H AT A F
- Rp 0
0 Rd
16 / 21
Performance of McSML – Accuracy Benchmarking
McSMLne 7.620719E-09 2.026405E-12 1.120950E-04 4.664558E-08 2.108489E-09 1.824042E-13 qap15 3.458121E-07 2.185784E-15 4.624334E-07 3.715834E-15 3.656559E-06 5.607903E-15 cre_d 6.074663E-07 2.963212E-13 2.726310E-07 1.235207E-13 5.722383E-07 2.846440E-13 ken_13 3.723433E-08 1.302376E-14 1.846481E-08 5.002513E-15 3.371902E-08 9.263912E-15 pds_20 4.836873E-09 2.762969E-12 5.972575E-06 3.819078E-09 2.697597E-08 4.186702E-12 qap12 1.229784E-08 4.556325E-15 1.540756E-08 2.779174E-15 2.471212E-08 4.614586E-15 pds_10 3.667600E-04 1.608902E-13 2.755767E-02 1.062017E-11 2.436846E-02 9.793590E-12
- sa_60
5.262877E-07 1.994308E-15 5.997905E-07 1.994308E-15 2.453067E-04 2.446411E-13 cre_b 8.061074E-09 2.118858E-13 6.462990E-09 1.850241E-13 7.965795E-09 2.457402E-13 stocfor3 4.792537E-10 2.895742E-15 4.535023E-10 3.474891E-15 4.963837E-10 2.779912E-15 maros_r7 7.146214E-06 6.693688E-14 7.685944E-06 4.457410E-14 1.264779E-05 3.434812E-14 dfl001
- Sol. Error 2
- Sol. Error 1
- Sol. Error 2
- Sol. Error 1
- Sol. Error 2
- Sol. Error 1
5.873346E-14 1.323670E-14 3.993961E-16 2.874662E-12 1.613404E-14 1.774429E-15 2.700201E-12 3.186449E-14 1.927925E-15 MATLAB(LAPACK-DPOTRF) 5.723799E-05 3.308398E-03 3.061775E-03
- sa_30
2.168695E-10 2.526719E-10 3.405521E-10 pilot87 6.414922E-12 1.293002E-11 1.305830E-11 80bau3b LIPSOL(ORNL Cholesky Solver) Selected Netlib Problems, IBM RS/6000 W/S ||b-AATx||∞ ||b||∞ ||b-AATx||2 Solution Error 1: , Solution Error 2:
17 / 21
Performance of McSML – Speed Benchmarking
2,816.78 140.69 2.98 1,414.06 70.23 9.46 6,299.51 314.91 1.31 qap15 30.21 1.47 0.81 35.37 1.66 2.17 2,934.20 146.63 1.60 cre_d 7.42 0.29 1.62 12.01 0.56 0.81 16.21 0.77 0.81 ken_13 1,326.40 65.92 8.00 708.08 24.90 210.08 16,362.97 818.06 1.77 pds_20 332.36 16.58 0.76 176.12 8.73 1.52 1,043.80 52.17 0.40 qap12 161.49 7.99 1.69 88.78 3.16 25.58 193.81 9.66 0.61 pds_10 32.93 1.51 2.73 125.86 3.65 52.86 100.66 2.46 51.46
- sa_60
41.06 2.01 0.86 39.14 1.87 1.74 4,367.50 218.29 1.70 cre_b 2.81 0.11 0.61 9.15 0.45 0.15 11.25 0.54 0.45 stocfor3 49.52 2.45 0.52 50.40 2.50 0.40 117.15 5.79 1.35 maros_r7 172.01 8.56 0.81 69.15 3.40 1.15 1,086.69 54.32 0.29 dfl001 Total* Numeric Symbolic Total* Numeric Symbolic Total* Numeric Symbolic 12.32 18.67 1.50 0.92 0.27 0.10 McSMLne 1.50 1.02 0.11 28.82 41.73 1.93 9.82 0.73 0.13 MATLAB(LAPACK-DPOTRF) 0.57 39.95 9.95 0.95
- sa_30
0.92 21.22 0.82 2.05 pilot87 0.07 2.29 0.09 0.09 80bau3b LIPSOL(ORNL Cholesky Solver) *Total = Sym. Comp. Time + 20 * Num. Comp. Time (in second) Selected Netlib Problems, IBM RS/6000 W/S
18 / 21
Performance of McSML – N.E. vs. A.S.
Selected Netlib Problems, IBM RS/6000 W/S 42.57 8.25 424.81 82.78 4898.74 762.97 65.91 83.20 84.11 13.72 2.14 3.52 15.52 2.26 Ord.* McSMLas 0.85 0.24 0.92 0.26 1.15 0.34 0.09 0.21 0.24 0.04 0.09 0.20 0.18 0.02
- S. F.
0.28 0.07 0.33 0.09 0.23 0.09 0.04 0.08 0.09 0.02 0.03 0.05 0.07 0.01 AS 117.76 14.49 69.44 7.88 1.83 0.64 0.28 1.64 2.10 0.08 0.94 2.01 7.71 0.03
- N. F.
1.79 (05) 2.62 (37) 3.59 (10) 1.10 (10) 1.03 (03) 0.41 (03) 0.64 (11) 1.30 (14) 1.39 (14) 0.20 (07) 0.12 (04) 0.22 (03) 0.97 (15) 0.03 (03)
- S. R.
4.787706E+00 4.003021E-10 1.481893E-09 5.471392E-10 4.062912E-08 1.095287E-08 6.276160E-10 6.666891E-10 5.002931E-10 3.719834E-09 3.719834E-09 4.098304E-10 2.476533E-07 8.280160E-11 Accuracy 142.14 16.67 65.80 7.84 0.51 0.16 0.19 1.26 1.71 0.06 0.77 2.17 8.54 0.02
- N. F.
0.27 0.07 0.50 0.15 0.21 (2) 0.09 (2) 0.05 0.08 0.09 0.02 0.01 0.04 0.12 0.01
- S. R.
0.99 0.23 0.70 0.19 1.01 0.30 0.06 0.16 0.17 0.04 0.06 0.19 0.19 0.01
- S. F.
7.620719E-09 0.39 1.83 qap15 3.458121E-07 0.09 0.47 cre_d 6.074663E-07 0.05 0.37 ken_13 3.723433E-08 0.45 4.58 pds_20 4.836873E-09 0.10 0.42 qap12 1.229784E-08 0.11 1.01 pds_10 3.667600E-04 0.56 1.35
- sa_60
5.262877E-07 0.10 0.50 cre_b 8.061074E-09 0.03 0.15 stocfor3 4.792537E-10 0.22 0.28 maros_r7 7.146214E-06 0.09 0.51 dfl001 Accuracy NE Ord. 5.723799E-05 2.168695E-10 6.414922E-12 0.52 0.15 0.03 McSMLne 0.22
- sa_30
0.10 pilot87 0.00 80bau3b * Tentative version of minimum degree ordering for augmented system
19 / 21
Performance of McSML – McIPM with McSML
5.89444E-08 7.39163E-09 9.22093E-08 2.85713E-08 1.21127E-09 7.17196E-10 6.31605E-08 1.34671E+00 6.13231E-08 1.81016E-08 1.88798E-11 3.89770E-09 2.17643E-09 1.43719E-10 Duality Gap 9.61793E-08 2.00629E-11 2.59837E-07 1.59447E-11 6.11396E-12 3.60120E-09 1.30478E-05 3.17591E+02 4.65285E-05 1.14794E-10 3.97911E-09 3.08955E-09 3.44373E-09 3.33908E-09 Dual Feas. 2.65988E-09 7.46069E-09 7.69573E-09 2.85714E-08 1.21127E-09 7.28516E-10 9.67817E-12 2.33736E-11 2.96888E-11 1.80975E-08 1.88593E-11 3.89769E-09 2.17643E-09 1.43661E-10 Duality Gap 1.99680E-10 2.02139E-11 1.03668E-08 1.59437E-11 8.48931E-13 4.24861E-09 1.75965E-09 1.15036E-09 6.43470E-10 1.14724E-10 3.97786E-09 2.93585E-09 3.31878E-09 7.34164E-10 Dual Feas. 1.51093E-06 0.98 23* 2.07036E-12 1.07 23 lotfi 3.27941E+03 0.74 3* 8.44625E-09 0.47 15 scagr7 1.69120E-06 0.44 18* 2.28081E-10 0.47 15 stocfor1 5.82778E-05 0.36 13* 8.67057E-09 0.40 12 share2b 7.08901E-09 0.77 17 7.15516E-09 1.02 17 vtp_base 1.44868E-08 0.45 12 1.44868E-08 0.53 12 recipe 1.08636E-10 0.48 13 7.68271E-12 0.51 13 sc205 6.31986E-05 0.55 21* 8.43804E-10 0.46 14 adlittle 1.32147E-09 0.31 12 1.31704E-09 0.38 12 sc105 1.19769E-09 0.23 11 1.06181E-09 0.30 11 sc50a 2.11146E-09 0.24 10 1.95050E-09 0.27 10 sc50b Primal Feas. Time It. Primal Feas. Time It. 0.81 0.41 0.20 1.04876E-09 2.94229E-11 8.93729E-10 11 17 10 McIPM with LIPSOL’s Equation Solver 1.65276E-09 11 0.35 blend 2.96139E-11 17 0.48 kb2 3.75415E-09 10 0.67 afiro McIPM with McSMLne Selected Netlib Problems, IBM RS/6000 W/S * Numerical difficulty happens.
20 / 21
Conclusion and Future Work
- We implemented sparse linear equation solvers for IPMs.
- The McIPMne with most of netlib problems is competitive to
LIPSOL’s linear equation solver but slower with some big
- problems. We need to improve ordering quality.
- The McIPMas is not numerically stable yet. Refinement by PCG
doesn’t work well either. We need to 1check operations in factorization and substitution and 2implement Bunch-Parlett method.
- McIPM with McSML fails due to numerical instability of the last
iterations (H=X-1Z has big range in value).
21 / 21
Selected References
Books
- Duff, I. S., Erisman, A. M. and Reid, J. K. (1989) Direct Methods for Sparse Matrices, Oxford University Press,
New York
- George, A. and Liu, J. W. H. (1981) Computer Solution of Large Sparse Positive Definite Systems, Prentice Hall,
Inc., Englewood Cliffs
Papers
- Gondzio, J. (1993) Implementing Cholesky Factorization for IPMs of LP, Optimization
- Altman, A. and Gondzio, J. (1998) Regularized Symmetric Indefinite Systems in IPMs for Linear and Quadratic
Optimization, Opt. Methods & Soft.
- Mészáros, C. (1996) Fast Cholesky Factorization for IPMs of LP, Comp. & Math. Appl.
- Mészáros, C. (1997) The augmented system variant of IPMs in two-stage stochastic LP computation, EJOR
- Maros, I., and Mészáros, C. (1998) The Role of the Augmented System in IPMs, EJOR
- Vanderbei, R. J. (1995) Symmetric Quasidefinite Matrices, SIAM J. Opt.
- Liu, J. W. H. (1990) The Multifrontal Method for Sparse Matrix Solution: Theory and Practice, SIAM Review
- Fourer, R. and Mehrotra, S. (1993) Solving Symmetric Indefinite Systems in an IPM for LP, Math. Prog.
- Duff, I. S. and Reid, J. K. (1995) Exploiting Zeros on the Diagonal in the Direct Solution of Indefinite Sparse
Symmetric Linear Systems, ACM Tran. Math. Soft.