Harnessing over a Million CPU Cores to Solve a Single Hard Mixed - PowerPoint PPT Presentation

Harnessing over a Million CPU Cores to Solve a Single Hard Mixed Integer Programming Problem on a Supercomputer Yuji Shinano Zuse Institute Berlin 06/07/2017 The 1st workshop on parallel constraint reasoning 1 Outline Background and

1. Harnessing over a Million CPU Cores to Solve a Single Hard Mixed Integer Programming Problem on a Supercomputer Yuji Shinano Zuse Institute Berlin 06/07/2017 The 1st workshop on parallel constraint reasoning 1

2. Outline Ø Background and Purpose u State-of-the-art Mixed Integer Programming (MIP) solvers u Parallelization of MIP solvers Ø Ubiquity Generator (UG) framework and ParaSCIP Ø Computational results for solving previously unsolved MIP instances on supercomputers Ø How to harness over a million CPU cores Ø Concluding remarks The 1st workshop on parallel constraint reasoning 2

3. Background and Purpose MIP (Mixed Integer Linear Programming) Ø minimizes or maximizes a linear function Ø is subject to linear constraints Ø has integer and continuous variables min { c € x : Ax ≤ b, l ≤ x ≤ u, x j ∈ Z , for all j ∈ I } A ∈ R m × n , b ∈ R m , c, l, u ∈ R n , I ⊆ { 1 , . . . , n } The most general form of combinatorial optimization problems Many applications The 1st workshop on parallel constraint reasoning 3

4. The 1st workshop on parallel constraint reasoning 4

5. Background and Purpose MIP (Mixed Integer Linear Programming) Ø minimizes or maximizes a linear function Ø is subject to linear constraints Ø has integer and continuous variables min { c € x : Ax ≤ b, l ≤ x ≤ u, x j ∈ Z , for all j ∈ I } A ∈ R m × n , b ∈ R m , c, l, u ∈ R n , I ⊆ { 1 , . . . , n } The most general form of combinatorial optimization problems Many applications MIP solvability has been improving The 1st workshop on parallel constraint reasoning 5

6. Progress in a state-of-the-art MIP solver Time limit: 10000 sec. Test set: 3741 model - 235 discarded due to inconsistent answers - 934 discarded none of the version can solve - speed-up measured on >100s bracket; 1205 models The 1st workshop on parallel constraint reasoning 6

7. Background and Purpose MIP (Mixed Integer Linear Programming) Ø minimizes or maximizes a linear function Ø is subject to linear constraints Ø has integer and continuous variables min { c € x : Ax ≤ b, l ≤ x ≤ u, x j ∈ Z , for all j ∈ I } A ∈ R m × n , b ∈ R m , c, l, u ∈ R n , I ⊆ { 1 , . . . , n } The most general form of combinatorial optimization problems Many applications MIP solvability has been improving Development of a massively parallel MIP solver - can solve instances that cannot be solved by state-of-the-art MIP solvers - keeps catching up performance improvements of state-of-the-art MIP solvers The 1st workshop on parallel constraint reasoning 7

8. Parallelization of MIP solvers Branch-and-bound looks suitable for parallelization Ø MIP solvers: LP based Branch-and-cut algorithm Obj. x 1 ≤ 0 x 1 ≥ 1 x 2 ≥ 1 x 2 ≤ 0 x 5 ≥ 1 x 5 ≤ 0 gap Subproblem (sub-MIP) time min { c € x : Ax ≤ b, l i ≤ x ≤ u i , x j ∈ Z , for all j ∈ I } Subproblems (sub-MIPs) can be processed independently Utilize the large number of processors for solving extremely hard MIP instances ( previously unsolved problem instances from MIPLIB ) The 1st workshop on parallel constraint reasoning 8

9. Performance of state-of-the-art MIP Solvers Huge performance difference! MIP solver benchmark (1 thread): Shifted geometric mean of results taken from the homepage of Hans Mittelmann (23/Mar/2014). Unsolved or failed instances are accounted for with the time limit of 1 hour . As of 14/April/2017 The 1st workshop on parallel constraint reasoning 9

10. Solving techniques involved in SCIP The 1st workshop on parallel constraint reasoning 10

11. UG Ubiquity Generator Framework UG framework LoadCorrdinator Loads are coordinated by a special process or thread Base solver I/O , presolve Base solver Base solver Base solver (UG) Solver Using API to control Using API to control Using API to control solving algorithms solving algorithms solving algorithms Using MPI or Using MPI or Using MPI or pthreads for pthreads for pthreads for communications communications communications shared memory distributed memory Parallel Solver ug[ SCIP , pthreads ] ug[ SCIP , MPI ] Instantiation (FiberSCIP) (ParaSCIP) External parallelization Run on PC Run on PC clusters and supercomputers The 1st workshop on parallel constraint reasoning 11

12. Dynamic load balancing is needed p Highly unbalanced tree is generated p Two types of irregularity can be handled well § Irregular # of nodes are generated by a sub-MIP 1 1,297,605 Real observation for solving ds in parallel § Irregular computing time for a node solving with 4095 solvers 1.5h 0.001sec The 1st workshop on parallel constraint reasoning 12

13. GAMS and Condor: M.R.Bussieck and M.C.Ferris (2006) The 1st workshop on parallel constraint reasoning 13

14. How UG do parallel tree search [Ramp-up(Racing)] min { c € x : Ax ≤ b, l i ≤ x ≤ u i , x j ∈ Z , for all j ∈ I } LoadCoordinator Base solver waiting: ( l i , u i ) I/O , presolve running: Winner A’ : Presolved A : Original (sub-)problem (sub-)problem P Presolved problem is distributed Solver n Solver 4 Solver 1 Solver 2 Solver 3 All Solvers start solving immediately, trying to generate different search trees The 1st workshop on parallel constraint reasoning 14

15. How UG do parallel tree search [Ramp-up(Racing)] LoadCoordinator Base solver waiting: I/O , presolve running: Winner Solver n Solver 4 Solver 1 Solver 2 Solver 3 Winner is selected by taking into account dual bound, # nodes, etc. The 1st workshop on parallel constraint reasoning 15

16. Dynamic load balancing p LoadCoordinator Open nodes: Solver 1 Solver 2 Solver 3 Solver 4 Solver 5 Solver n Try to keep p open nodes in LoadCoordinator Notification message: best dual bound, # nodes remain, # nodes solved - Send periodically and asynchronously - Interval is specified by a parameter LoadCoordinator makes selected Solvers in collecting mode Expected to have heavy nodes: large subtree underneath Global view of tree search The 1st workshop on parallel constraint reasoning 16

17. Dynamic load balancing p*m p p LoadCoordinator Open nodes: Solver 1 Solver 2 Solver 3 Solver 4 Solver 5 Solver n 1240 "Incumbents" "Solver26" "Optimal" "Solver27" "GlobalLBs" "Solver28" "Solver1" "Solver29" 1220 Collecting mode Solver "Solver10" "Solver3" "Solver11" "Solver30" "Solver12" "Solver31" Objective Function Value "Solver13" "Solver32" "Solver14" "Solver33" 1200 "Solver15" "Solver34" - Changes search strategy to "Solver16" "Solver35" "Solver17" "Solver36" "Solver18" "Solver37" best dual bound first "Solver19" "Solver38" 1180 "Solver2" "Solver39" "Solver20" "Solver4" - Sends requested number of nodes "Solver21" "Solver5" "Solver22" "Solver6" 1160 "Solver23" "Solver7" "Solver24" "Solver8" "Solver25" "Solver9" 1140 1120 Solver which has best dual bound node 1100 1080 0 50 100 150 200 250 300 350 400 Computing Time (sec.) The 1st workshop on parallel constraint reasoning 17

18. Why can it handle large scale? p LoadCoordinator Open nodes: Solver 1 Solver 2 Solver 3 Solver 4 Solver 5 Solver n 1240 "Incumbents" "Solver26" "Optimal" "Solver27" "GlobalLBs" "Solver28" Collecting mode Solver "Solver1" "Solver29" 1220 "Solver10" "Solver3" "Solver11" "Solver30" "Solver12" "Solver31" Objective Function Value "Solver13" "Solver32" "Solver14" "Solver33" 1200 The # of solvers at a time is restricted "Solver15" "Solver34" "Solver16" "Solver35" "Solver17" "Solver36" - Starts from 1 "Solver18" "Solver37" "Solver19" "Solver38" 1180 "Solver2" "Solver39" - Dynamically switching "Solver20" "Solver4" "Solver21" "Solver5" "Solver22" "Solver6" 1160 "Solver23" "Solver7" "Solver24" "Solver8" - The number is increased by at most "Solver25" "Solver9" 250 even if run with 80,000 Solvers 1140 1120 Solver which has best dual bound node 1100 1080 0 50 100 150 200 250 300 350 400 Computing Time (sec.) The 1st workshop on parallel constraint reasoning 18

19. Layered presolving min { c € x : Ax ≤ b, l i ≤ x ≤ u i , x j ∈ Z , for all j ∈ I } A’ : Presolved A : Original (sub-)problem (sub-)problem A’’ : Presolved A’ : Original (sub-)problem (sub-)problem Global view of tree search The 1st workshop on parallel constraint reasoning

20. Check pointing of UG LoadCoordinator p Base solver waiting: I/O , presolve running: n Solver n Solver 4 Solver 1 Solver 2 Solver 3 Only essential root nodes of subproblems are saved If a sub-tree has been solved, checkpoint file contains comp. statistics The 1st workshop on parallel constraint reasoning 20