The Potential of Diffusive Load Balancing at Large Scale EuroMPI - PowerPoint PPT Presentation

Center for Information Services and High Performance Computing The Potential of Diffusive Load Balancing at Large Scale EuroMPI 2016, Edinburgh, 27 September 2016 Matthias Lieber, Kerstin Gößner, Wolfgang E. Nagel matthias.lieber@tu-dresden.de

Motivation: Load Balancing Load Balance • A challenge for HPC at large scale • Especially for applications with workload variations Goals of load balancing Repartition application to balance workload • • Reduce comm. costs between partitions (edge cut) • Reduce task migration costs • Fast & scalable decision making Particle density, laser wakefield acceleration simulation with particle-in-cell code PIConGPU Slide 2

Motivation: Diffusive Load Balancing Fully distributed method Cybenko, • Local operations lead to global Dynamic Load Balancing for Distributed Memory convergence Multiprocessors, J. Parallel Distr. Com. 7(2), 1989. Load per node over iterations Watts, Taylor, Practical application is rare IEEE T. Parall. Distr. 9, 1998. Diekmann, Preis, Schlimbach, • Well described since the 1990's Walshaw, Parallel Computing 26(12), 2000. • Only few papers show real use in HPC Schloegel, Karypis, Kumar, SC 2000. Motivation of this work • Performance comparison to other state-of-the-art methods at large scale Slide 3

Contents Motivation • Load Balancing • Diffusive Load Balancing Short Diffusion Intro Concept • • Algorithms Performance Comparison • Benchmark Setup • Other Methods • Results Slide 4

Short Diffusion Intro Concept • Arrange processes/nodes in a graph G , e.g. mesh • Balance virtual load with neighbors for several iterations until global convergence • Result: minimal* load flow between neighbors in G that leads to global balance How to realize the flows? • 2nd step required: task selection • Satisfy flows best possible, keep edge cut and migration low (to reduce communication) * Most methods minimize sum of squares of individual flows between nodes (two-norm) Slide 5

Short Diffusion Intro: Algorithms Original Diffusion Algorithm (Orig Diff) Cybenko, J. Parallel Distr. Com. 7(2), • In each iteration i each node v updates its 1989. i + ∑ i − l v i ) load: i + 1 = l v α vw ( l w l v Estimation: w ∈ N v l v ... load value of node v One iteration N v ... neighbor nodes of node v should take few 10 µs only α vw ... diffusion parameter Second Order Diffusion (SO Diff) Muthukrishnan, Ghosh, Schultz, • Prev. iteration‘s transfer influences current Theory Comput. Sys. 31, 1998. Improved Diffusion (Impr Diff) Hu, Blake, Parallel Computing 25(4), • Update rule is adapted during iterations 1999. based on Laplacian matrix of graph G Dimension Exchange (Dim Exch) Cybenko, 1989. Xu, Monien, Lüling, Lau, • Local load is updated immediately before Conc. Pract. E. 7, 1995. exchanging with next neighbor Slide 6

Contents Motivation • Load Balancing • Diffusive Load Balancing Short Diffusion Intro Concept • • Algorithms Performance Comparison • Benchmark Setup • Other Methods • Results Slide 7

Performance Comparison: Diffusion Benchmark Benchmark setup • 3D task grid, 3D process mesh, 512 tasks per proc • Artificial imbalanced workload data* • Iterations terminate at target imbalance of 0.1% 2D grid example of BOX scenario Red part is overloaded such that imbalance is 11% (i.e. max load / avg load - 1) Simplifications • Time measurement w/o checking termination criterion • Simple task selection algorithm (single pass) * in the paper we also use the particle-in-cell application szenario Slide 8

Performance Comparison: Other Methods Zoltan load balancing library http://www.cs.sandia.gov/Zoltan Boman, Catalyurek, Chevalier, Devine, • MPI-based library implementations The Zoltan and Isorropia Parallel Toolkits for Combinatorial Scientific Computing: Partitioning, Ordering, and • RCB: recursive coordinate bisection Coloring, Scientific Programming, 20(2), 2012. • HSFC: Hilbert space-filling curve Schloegel, Karypis, Kumar, • ParMetis graph partitioning via Zoltan A Unified Algorithm for Load-balancing Adaptive Scientific Simulations, SC 2000. Hierarchical space-filling curve • Own fast and scalable method Lieber, Nagel, S calable High-Quality 1D Partitioning , • Leads to high migration HPCS 2014. Slide 9

Performance Comparison: 1Ki-8Ki weak scaling Max number of Iterations until task mesh edges Median run time of 61 runs on flows lead to Max tasks cut by partition Taurus, Intel Haswell + Infiniband 0.1% imbalance = max ( l v ) avg ( l v ) − 1 sent+received borders among FDR cluster with Intel MPI, (before task among all procs all procs error bars show 25/75 percentiles selection) • Diffusion leads to smallest migration • Diffusion achieves very good edge cut • Diffusion run time ca. 2 ms for 8192 processes, Zoltan much slower Slide 10

Performance: 8Ki-128Ki, without task selection Max / total load transfer Median run time of 19 runs on Iterations until computed by diffusion relative to Juqueen, IBM Blue Gene/Q, flows lead to to avg / total load of procs error bars show 25/75 percentiles 0.1% imbalance • Dimension exchange scales better than second order diffusion • Diffusion takes few ms even on 128k processes* * task selection time does not depend on process count and takes few ms on Juqueen Slide 11

Summary Conclusion Diffusive load balancing is attractive on large scale when overhead (time for decision making, task migration) has to be low, e.g. in case of frequent rebalancing. Future work • Improve task selection • Scalable termination criterion: estimate required iterations or check convergence? • Optimal process graph topology: match the hardware or the application? • Add to Zoltan / Charm++ / application XYZ Slide 12

Thank you very much for your attention Acknowledgments / Funding: Slide 13

The Potential of Diffusive Load Balancing at Large Scale EuroMPI - PowerPoint PPT Presentation

Center for Information Services and High Performance Computing The Potential of Diffusive Load Balancing at Large Scale EuroMPI 2016, Edinburgh, 27 September 2016 Matthias Lieber, Kerstin Gner, Wolfgang E. Nagel

Load Balancing Load Balancing Load balancing: distributing data and/or computations across

Load Balancing with nftables by Laura Garca (Zen Load Balancer Team) Netdev 1.1 Prototype of

Internal Load Balancing in 5 mins Deliver scalable and resilient internal-only services on GCP

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + Charm+ + Abhinav S Bhatele

Epidemic Algorithm for Load Balancing Harshitha Menon, Laxmikant Kal e 15th April 1 / 25

L O A D B A L A N C I N G I S I M P O S S I B L E LOAD BALANCING IS IMPOSSIBLE Tyler McMullen

Load Balancing in Ceph: Load Balancing With Pseudorandom Placement Esteban Molina-Estolano,

Balancing Gas system information provision 12 June 2018 GRTgaz balancing in a nutshell -> 2

Load Balancing Load Balancing: Example Example Problem Consider 6 jobs whose processing times

Load Balancing and Termination Detection Load balancing used to distribute computations fairly

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Adventures in Load Balancing at Scale: Successes, Fizzles, and Next Steps Rusty Lusk Mathematics

Relative entropy in diffusive relaxation C. Lattanzio 1 In collaboration with A.E. Tzavaras 2 14th

Global solvability of some double-diffusive convection systems . Mitsuharu O TANI Waseda

Mean field limits for Hawkes processes in a diffusive regime Xavier Erny 1 ocherbach 2 and Dasha

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Scalable Object Storage with Resource Reservations and Dynamic Load Balancing Alex Aizman

MetaBalancer: An automatic load balancer based on application characteristics Harshitha Menon

Today Load balancing. Balls in Bins. Power of two choices. Cuckoo hashing. n k k n

TomcatCon Tomcat Load-balancing Mark Thomas TM Introduction TM Terminology TM Terminology:

salbnet: A Self-Adapting Load Balancing Network Jrg Jung University of Potsdam Institute for

Load-Balancing Spatially Located Computations using Rectangular Partitions Erdeniz s 1 , 2 ,

Content-distribution networks Str trat ategie egies Divide and conquer Partition

Integrating renewable energies - estimating needs for flexibility, competition of technologies

The Potential of Diffusive Load Balancing at Large Scale EuroMPI - PowerPoint PPT Presentation

Center for Information Services and High Performance Computing The Potential of Diffusive Load Balancing at Large Scale EuroMPI 2016, Edinburgh, 27 September 2016 Matthias Lieber, Kerstin Gner, Wolfgang E. Nagel

Load Balancing Load Balancing Load balancing: distributing data and/or computations across

Load Balancing with nftables by Laura Garca (Zen Load Balancer Team) Netdev 1.1 Prototype of

Internal Load Balancing in 5 mins Deliver scalable and resilient internal-only services on GCP

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + Charm+ + Abhinav S Bhatele

Epidemic Algorithm for Load Balancing Harshitha Menon, Laxmikant Kal e 15th April 1 / 25

L O A D B A L A N C I N G I S I M P O S S I B L E LOAD BALANCING IS IMPOSSIBLE Tyler McMullen

Load Balancing in Ceph: Load Balancing With Pseudorandom Placement Esteban Molina-Estolano,

Balancing Gas system information provision 12 June 2018 GRTgaz balancing in a nutshell -&gt; 2

Load Balancing Load Balancing: Example Example Problem Consider 6 jobs whose processing times

Load Balancing and Termination Detection Load balancing used to distribute computations fairly

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

Adventures in Load Balancing at Scale: Successes, Fizzles, and Next Steps Rusty Lusk Mathematics

Relative entropy in diffusive relaxation C. Lattanzio 1 In collaboration with A.E. Tzavaras 2 14th

Global solvability of some double-diffusive convection systems . Mitsuharu O TANI Waseda

Mean field limits for Hawkes processes in a diffusive regime Xavier Erny 1 ocherbach 2 and Dasha

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Scalable Object Storage with Resource Reservations and Dynamic Load Balancing Alex Aizman

MetaBalancer: An automatic load balancer based on application characteristics Harshitha Menon

Today Load balancing. Balls in Bins. Power of two choices. Cuckoo hashing. n k k n

TomcatCon Tomcat Load-balancing Mark Thomas TM Introduction TM Terminology TM Terminology:

salbnet: A Self-Adapting Load Balancing Network Jrg Jung University of Potsdam Institute for

Load-Balancing Spatially Located Computations using Rectangular Partitions Erdeniz s 1 , 2 ,

Content-distribution networks Str trat ategie egies Divide and conquer Partition

Integrating renewable energies - estimating needs for flexibility, competition of technologies

Balancing Gas system information provision 12 June 2018 GRTgaz balancing in a nutshell -> 2