.it4i.cz
- M. Čermák, V. Hapla, D. Horák, J. Kružík, L. Pospíšil*,
- R. Sojka, A. Vašatová + prof. Z. Dostál
libraries for large scale quadratic programming M. ermk, V. Hapla, - - PowerPoint PPT Presentation
libraries for large scale quadratic programming M. ermk, V. Hapla, D. Hork, J. Kruk, L. Pospil*, R. Sojka, A. Vaatov + prof. Z. Dostl PETSc User Meeting 2016, Vienna, June 28-30, 2016 * Universit dellaSvizzera italiana,
Prague Brno Ostrava Anselm - 94 TFLOPs system
Salomon – 2 PFLOPs system from June/July 2015 2013 2014 2015 Vienna Czechia
§ 2 x 12 per node
§ 128GB per node, 5.3GB per core
§ Intel Xeon Phi 7120P § 52,704 cores
Iargest European installation of MICs ⇒ interested in PETSc on MIC
§ generic QP solution framework § QP problems, transforms, solvers
§ FETI DDM implementation § extends PermonQP
§ based on/extendingPETSc open-source frameworkfor sci. comp. § together solve large contact mechanics problems
Linear solvers (KSP) Preconditioners (PC) Nonlinear solvers (SNES) Matrices (Mat) Vectors (Vec) Index Sets (IS) PermonQP solvers (QPS) PermonQP problems (QP) PermonFLLOP TAO PCBDDC PCPERMON?
PERMON = external PETSc extension much like SLEPc
0.1 0.2 0.3 0.4 0.5
0.5
0.1 0.2 0.3 0.4 0.5 0.6
𝐯 7 8
𝛍 7 8
𝛍 7 8
𝛍 M 7 8
𝛍 M 7 8
𝐯 7 8
𝛍 M 7 8
(nonpenetration -mortar conditions)
F
§ all stiffness matrices singular § no need to detect it § same nullspaces in typical cases § uniform handling
PetscErrorCode QPFetiPrepare(QP qp) { … QPTDualize(qp); QPTHomogenizeEq(qp); QPTEnforceEqByProjector(qp); … } MPI_Comm comm; Mat K, BE, BI; Věc f, cE, cI, x; QP qp; QPCreate(comm, &qp); QPSetOperator(qp, K); QPSetRhs(qp, f); QPSetEq(qp, BE, cE); QPSetIneq(qp, BI, cI); QPFetiPrepare(qp); QPSCreate(comm, &qps); QPSSetQP(qps, qp); QPSSetFromOptions(qps); QPSSolve(qps); QPGetSolutionVector(qp,&x);
Sequence of QP transforms
and corresponding backward reconstructions
QP0
parent child parent child
QP1 QP2
reconstruct. function
injects
reconstruct. function
injects
solver user
§ linear equality § linear inequality § separable convex constraints – next slide
QPC index set QPCBOUND lower bound 𝐲 ≥ 𝒎 QPCELLIPTIC major axis minor axis QPCCONICAL angles QPCQUADRATIC radii QPCBOX lower bound upper bound 𝐦 ≤ 𝐲 ≤ 𝐯
===================== QPChainViewKKT output follows QP Object: 4 MPI processes type: QP #0 in chain, derived by || x|| = 4.75e+00 max( x) = 2.00e-06 = x(132) min( x) = -3.23e-01 = x(483) || b|| = 4.75e-02 max( b) = -4.17e-04 = b(10) min( b) = -2.50e-03 = b(423) ||cE|| = 0.00e-00 max(cE) = 0.00e-00 = cE(0) min(cE) = 0.00e-00 = cE(0) ||cI|| = 1.63e+00 max(cI) = 4.34e-01 = cI(0) min(cI) = 3.01e-01 = cI(9) r = ||A*x - b + BE'*lambda_E + BI'*lambda_I|| = 6.78e-06 rO/||b|| = 1.43e-04 r = ||BE*x-cE|| = 1.58e-05 r/||b|| = 3.32e-04 r = ||max(BI*x-cI,0)|| = 0.00e+00 r/||b|| = 0.00e+00 r = ||min(lambda_I,0)|| = 0.00e+00 r/||b|| = 0.00e+00 r = lambda_I'*(BI*x-cI) = -1.39e-07 r/||b|| = -2.92e-06
type: QP #1 in chain, derived by QPTScale || x|| = 4.75e+00 max( x) = 2.00e-06 = x(132) min( x) = -3.23e-01 = x(483) || b|| = 4.75e-02 max( b) = -4.17e-04 = b(10) min( b) = -2.50e-03 = b(423) ||cE|| = 0.00e-00 max(cE) = 0.00e-00 = cE(0) min(cE) = 0.00e-00 = cE(0) ||cI|| = 1.63e+00 max(cI) = 4.34e-01 = cI(0) min(cI) = 3.01e-01 = cI(9) r = ||A*x - b + BE'*lambda_E + BI'*lambda_I|| = 6.78e-06 rO/||b|| = 1.43e-04 r = ||BE*x-cE|| = 1.58e-05 r/||b|| = 3.32e-04 r = ||max(BI*x-cI,0)|| = 0.00e+00 r/||b|| = 0.00e+00 r = ||min(lambda_I,0)|| = 0.00e+00 r/||b|| = 0.00e+00 r = lambda_I'*(BI*x-cI) = -1.39e-07 r/||b|| = -2.92e-06
type: QP #2 in chain, derived by QPTDualize || x|| = 4.47e-01 max( x) = 1.30e-01 = x(112) min( x) = -1.47e-01 = x(106) || b|| = 2.62e+00 max( b) = 3.90e-01 = b(21) min( b) = -5.61e-01 = b(133) ||cE|| = 5.00e-01 max(cE) = -2.50e-01 = cE(3) min(cE) = -2.50e-01 = cE(0) ||lb|| = inf max(lb) = 0.00e+00 = lb(30) min(lb) = -4.49e+307 = lb(0) r = ||A*x - b + BE'*lambda_E - lambda_lb|| = 5.31e-05 rO/||b|| = 2.03e-05 r = ||BE*x-cE|| = 6.78e-06 r/||b|| = 2.59e-06 r = ||min(x-lb,0)|| = 0.00e+00 r/||b|| = 0.00e+00 r = ||min(lambda_lb,0)|| = 1.67e-05 r/||b|| = 6.37e-06 r = |lambda_lb'*(lb-x)| = 2.47e-06 r/||b|| = 9.42e-07
type: QP #2 in chain, derived by QPTDualize || x|| = 4.47e-01 max( x) = 1.30e-01 = x(112) min( x) = -1.47e-01 = x(106) || b|| = 2.62e+00 max( b) = 3.90e-01 = b(21) min( b) = -5.61e-01 = b(133) ||cE|| = 5.00e-01 max(cE) = -2.50e-01 = cE(3) min(cE) = -2.50e-01 = cE(0) ||lb|| = inf max(lb) = 0.00e+00 = lb(30) min(lb) = -4.49e+307 = lb(0) r = ||A*x - b + BE'*lambda_E - lambda_lb|| = 5.31e-05 rO/||b|| = 2.03e-05 r = ||BE*x-cE|| = 6.78e-06 r/||b|| = 2.59e-06 r = ||min(x-lb,0)|| = 0.00e+00 r/||b|| = 0.00e+00 r = ||min(lambda_lb,0)|| = 1.67e-05 r/||b|| = 6.37e-06 r = |lambda_lb'*(lb-x)| = 2.47e-06 r/||b|| = 9.42e-07
min 7 8 ⁄ 𝐲%𝐁𝐲 − 𝐲% 𝐜 s.t. 𝐂Q𝐲 = 𝐝Q and 𝐲 ≥ 𝒎 𝐁𝐲 − 𝐜 + 𝐂Q
_𝛍Q − 𝛍` = 𝐩
𝐂Q𝐲 = 𝒅Q 𝐲 ≥ 𝒎 𝛍` ≥ 𝐩 𝛍` 𝒎 − 𝐲 = 𝐩
§ crucial QP transform § new QP with smaller dim, better conditioned and simpler constraints
§ "pass-through" solver taking care of equality constraints
§
§ unconstrained – PETSc KSP wrapper (QPSKSP) § bound/box constraints § MPGP, APGD, PBBf, SPG-QP, ... § PETSc TAO wrapper
min 7 8 ⁄ 𝐯%𝐋𝐯 − 𝐯%𝐠 s.t. 𝐂Q𝐯 = 𝐩 and 𝐂R𝐯 ≤ 𝐝R min 7 8 ⁄ 𝛍%𝐁𝛍 − 𝛍% 𝐜 s.t. 𝐃𝛍 = 𝐞 and 𝛍 ≥ 𝒎 min 7 8 ⁄ 𝛍%𝐁𝛍 − 𝛍% 𝐜 s.t. 𝐃𝛍 = 𝐞 and 𝛍 ≥ 𝒎 min 7 8 ⁄ 𝛍%𝐁𝛍 − 𝛍% 𝐜 s.t. 𝛍 ≥ 𝒎
d e 𝐲_𝐁𝐲 − 𝐲_𝐜 (SMALE)
§ using direct/iterative linear system solver § 𝛎 is equality constraint Lagrange multiplier § inequality constrained → slacks, nonlin. eq. → Interior Point Method
e 𝐲_𝐁𝐲 − 𝐲_𝐜 s. t. 𝐂𝐲 = 𝐩
§ using Augmented Lagrangians and inner iterativesolver § easier to extend for inequality constraints § equality and inequality constraints handled completely separately,
§ matrix structure agnostic - both A and B may be even implicit
e 𝐲_𝐁𝐲 − 𝐲_𝐜 + 𝛎_𝐂𝐲
e 𝐲_𝐁𝐲 − 𝐲_𝐜 + 𝛎_𝐂𝐲 + d e 𝜍 𝐂𝐲 8
𝐂𝐲 c 𝐩
d e 𝐲_(𝐁 + 𝜍𝐂_𝐂)𝐲 − 𝐲_𝐜
e 𝐲_(𝐁 + 𝜍𝐂_𝐂)𝐲 − 𝐲_(𝐜 + 𝐂𝛎)
e 𝐲_(𝐁 + 𝜍𝐂_𝐂)𝐲 − 𝐲_𝐜
70p qrst 𝐁 u
Dostál, Z.: Optimal Quadratic Programming Algorithms, Springer Optimization and Its Applications, 2009, ISBN 978-0387848051
𝐲
𝐂𝐲 c 𝐩
d e 𝐲_𝐁𝐲 − 𝐲_𝐜
𝐲{07 = arg min
𝐲 7 8 𝐲_(𝐁 + 𝜍{𝐂_𝐂)𝐲 − 𝐲_(𝐜 + 𝐂𝛎{07)
Dostál, Z.: Optimal Quadratic Programming Algorithms, Springer Optimization and Its Applications, 2009, ISBN 978-0387848051
, 𝜍w > 0 2 𝐁 ,𝜃 > 0 0.1 𝐜
e 𝜍{ 𝐂𝐲{07 8
d e 𝐲_𝐁𝐲 − 𝐲_𝐜 (SMALE)
again solving min
𝐲
d e 𝐲_(𝐁 + 𝜍{𝐂_𝐂)𝐲 − 𝐲_(𝐜 + 𝐂𝛎{07)
BUT with early termination
SMALE or SMALE-M
, 𝜍w > 0 2 𝐁 ,𝜃 > 0 0.1 𝐜
e 𝜍{ 𝐂𝐲{07 8
𝐂𝐲 c 𝐩 𝐲 „ 𝒎
d e 𝐲_𝐁𝐲 − 𝐲_𝐜 (SMALBE)
𝐡ƒ … projected gradient – see next slide now solving min
𝐲 „ 𝒎
d e 𝐲_(𝐁 + 𝜍{𝐂_𝐂)𝐲 − 𝐲_(𝐜 + 𝐂𝛎{07)
but using the same 𝑀 j corresponding to min
𝐂𝐲 c 𝐩
d e 𝐲_𝐁𝐲 − 𝐲_𝐜
SMALBE or SMALBE-M
Dostál, Z.: Optimal Quadratic Programming Algorithms, Springer Optimization and Its Applications, 2009, ISBN 978-0387848051
ƒ , 𝐡
ƒ = ‹
Dostál, Z.: Optimal Quadratic Programming Algorithms, Springer Optimization and Its Applications, 2009, ISBN 978-0387848051
Dostál Z., Pospíšil L.: Minimization of the quadratic function with semidefinite Hessian subject to the bound constraints, Computers and Mathematics with Applications, (2015).
CG-step CG-halfstep + expansion step
(proportioning step)
Dostál Z., Pospíšil L.: Minimization of the quadratic function with semidefinite Hessian subject to the bound constraints, Computers and Mathematics with Applications, (2015).
Inner stopping criterion: 𝐡 l ƒ ≤ min 𝑁 𝐂𝐲{ ,𝜃 ∧ 𝐡 l ƒ < ε Start to increase penalty 𝜍 by factor of 2 each outer iteration once 𝐡 l ƒ < ε is reached
10 125 000 20 250 000 40 500 000 81 000 000 162 000 000 324 000 000 20 40 60 80 100 120 140 160 180 10 20 30 40 50 60 70 80 90 100 216 432 864 1728 3456 6912
# DOFs (log2 scale) solution time [sec] # Hessian mult # subdomains (log2 scale) # Hessian mult time
PermonCube + PermonFLLOP + PermonQP Machine: Salomon, IT4Innovations, CZ 8000 375,000,000
10 125 000 20 250 000 40 500 000 81 000 000 162 000 000 324 000 000 20 40 60 80 100 120 140 160 180 50 100 150 200 250 300 216 432 864 1728 3456 6912
# DOFs (log2 scale) solution time [sec] # Hessian mult # subdomains (log2 scale) # Hessian mult time
x y z d
PermonCube + PermonFLLOP + PermonQP Machine: Salomon, IT4Innovations, CZ 8000 375,000,000
52,728,000 105,456,000 210,912,000 421,824,000 20 40 60 80 100 120 140 160 180 50 100 150 200 250 300 1000 2000 4000 8000
# DOFs (log2 scale) solution time [sec] # Hessian mult # subdomains (log2 scale) # Hessian mult time
52,728,000 105,456,000 210,912,000 421,824,000 20 40 60 80 100 120 140 160 180 50 100 150 200 250 300 1000 2000 4000 8000
# DOFs (log2 scale) solution time [sec] # Hessian mult # subdomains (log2 scale) # Hessian mult time
(older PERMON version)
§ support for non-conforming submeshes – hanging nodes at interfaces
§ support vector machines, data fitting, SQP, Proximal Bundle Method,…
§ improve arithmetical density, accelerators
§ libMesh, FEniCS, Sifel, Elmer, ViennaMesh/Grid tools
§ merge separable convex constraints (QPC) § finish splitting into PermonQPand PermonFLLOP § basic documentation
The EXA2CT European Project (EXascale Algorithms and Advanced Computational Techniques) Účinné metody odhadu životnosti pro
(Efficient methods for life prediction for general multiaxial fatigue) GA15-18274S National Programme of Sustainability (NPU II) project “IT4Innovations excellence in science” LQ1602 IPCC-IT4I
* It is able to prepare 2 espressos in parallel.