load balancing spatially located computations using
play

Load-Balancing Spatially Located Computations using Rectangular - PowerPoint PPT Presentation

Load-Balancing Spatially Located Computations using Rectangular Partitions Erdeniz s 1 , 2 , Erik Saule 1 , urek 1 , 3 O. Ba Umit V. C ataly { erdeniz,esaule,umit } @bmi.osu.edu 1 Department of Biomedical Informatics 2 Department of


  1. Load-Balancing Spatially Located Computations using Rectangular Partitions Erdeniz ¨ s 1 , 2 , Erik Saule 1 , ¨ urek 1 , 3 O. Ba¸ Umit V. C ¸ataly¨ { erdeniz,esaule,umit } @bmi.osu.edu 1 Department of Biomedical Informatics 2 Department of Computer Science and Engineering 3 Department of Electric and Computer Engineering The Ohio State University SIAM Conference on Parallel Processing for Scientific Computing 2012 Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek :: 1 / 31 HPC Lab http://bmi.osu.edu/hpc

  2. A load distribution problem Load matrix In parallel computing, the load can be spatially located. The computation should be distributed accordingly. Applications Particles in Cell Sparse Matrices Direct Volume Rendering Metrics Load balance Communication Stability Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Introduction:: 2 / 31 HPC Lab http://bmi.osu.edu/hpc

  3. Different kinds of partition Uniform Rectilinear P × Q -way jagged (th) m -way jagged hierarchical spiral (def, heur, th, opt) (heur, opt) (heur, opt) Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Introduction:: 3 / 31 HPC Lab http://bmi.osu.edu/hpc

  4. Different load balance on 2304 processors Particles (2050x2050) Uniform (17.5%) Rectilinear (15.1%) P × Q -way jagged (2.3%) m -way jagged (2.0%) hierarchical (2.7%) Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Introduction:: 4 / 31 HPC Lab http://bmi.osu.edu/hpc

  5. This talk is about how to generate such partitions, either optimally or heuristically, and the type of guarantee we can obtain. Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Introduction:: 5 / 31 HPC Lab http://bmi.osu.edu/hpc

  6. Outline Introduction 1 Preliminaries 2 Notation In One Dimension Simulation Setting Rectilinear Partitioning 3 Nicol’s Algorithm Jagged Partitioning 4 P × Q -way Jagged m -way Jagged Hierarchical Bisection 5 Recursive Bisection Dynamic Programming Final thoughts 6 Summing up Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Introduction:: 6 / 31 HPC Lab http://bmi.osu.edu/hpc

  7. The Rectangular Partitioning Problem Definition Let A be a n 1 × n 2 matrix of non-negative values. The problem is to partition the [1 , 1] × [ n 1 , n 2 ] rectangle into a set S of m rectangles. The load of rectangle r = [ x , y ] × [ x ′ , y ′ ] is L ( r ) = � x ≤ i ≤ x ′ , y ≤ j ≤ y ′ A [ i ][ j ]. The problem is to minimize L max = max r ∈ S L ( r ). Prefix Sum Algorithms are rarely interested in the value of a particular element but rather interested in the load of a rectangle. The matrix is given as a 2D i ′ ≤ i , j ′ ≤ j A [ i ′ ][ j ′ ]. By convention prefix sum array Pr such as Pr [ i ][ j ] = � Pr [0][ j ] = Pr [ i ][0] = 0. We can now compute the load of rectangle r = [ x , y ] × [ x ′ , y ′ ] as L ( r ) = Pr [ x ′ ][ y ′ ] − Pr [ x − 1][ y ′ ] − Pr [ x ′ ][ y − 1] + Pr [ x − 1][ y − 1]. Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Preliminaries::Notation 7 / 31 HPC Lab http://bmi.osu.edu/hpc

  8. In One Dimension Optimal : Nicol’s algorithm [Nic94] (improved by [PA04]) Based on parametric search. Complexity: O (( m log n m ) 2 ). Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Preliminaries::In One Dimension 8 / 31 HPC Lab http://bmi.osu.edu/hpc

  9. Simulation Setting Classes (Some inspired by [MS96]) Processors Simulation are perform with different number of processors: most squared numbers up to 10,000. Metric L max Load imbalance is the presented metric : − 1. � i , j A [ i ][ j ] m Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Preliminaries::Simulation Setting 9 / 31 HPC Lab http://bmi.osu.edu/hpc

  10. Outline of the Talk Introduction 1 Preliminaries 2 Notation In One Dimension Simulation Setting Rectilinear Partitioning 3 Nicol’s Algorithm Jagged Partitioning 4 P × Q -way Jagged m -way Jagged Hierarchical Bisection 5 Recursive Bisection Dynamic Programming Final thoughts 6 Summing up Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Rectilinear Partitioning:: 10 / 31 HPC Lab http://bmi.osu.edu/hpc

  11. Rectilinear Partitioning Generalities The problem is NP-Hard. Approximation algorithms exist but are very slow. RECT-NICOL [Nic94] An iterative heuristics. At each iteration the partition in one dimension is refined. Complexity: O ( n 1 n 2 ) iterations ( ≤ 10 in practice). 1 iteration: P ) 2 + P ( Q log n 2 O ( Q ( P log n 1 Q ) 2 ). Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Rectilinear Partitioning:: 11 / 31 HPC Lab http://bmi.osu.edu/hpc

  12. Outline of the Talk Introduction 1 Preliminaries 2 Notation In One Dimension Simulation Setting Rectilinear Partitioning 3 Nicol’s Algorithm Jagged Partitioning 4 P × Q -way Jagged m -way Jagged Hierarchical Bisection 5 Recursive Bisection Dynamic Programming Final thoughts 6 Summing up Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: 12 / 31 HPC Lab http://bmi.osu.edu/hpc

  13. A P × Q -way Jagged Heuristic JAG-PQ-HEUR Sum on each column to generate a 1D problem. Partition it into P parts. For the first stripe, sum on each row. Partition it in Q parts. Treat all stripes. Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: P × Q -way Jagged 13 / 31 HPC Lab http://bmi.osu.edu/hpc

  14. A P × Q -way Jagged Heuristic JAG-PQ-HEUR Sum on each column to generate a 1D problem. Partition it into P parts. For the first stripe, sum on each row. Partition it in Q parts. � � � � � � � Treat all stripes. Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: P × Q -way Jagged 13 / 31 HPC Lab http://bmi.osu.edu/hpc

  15. A P × Q -way Jagged Heuristic � JAG-PQ-HEUR � Sum on each column to generate a 1D problem. � Partition it into P parts. � For the first stripe, sum on each row. � Partition it in Q parts. � Treat all stripes. Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: P × Q -way Jagged 13 / 31 HPC Lab http://bmi.osu.edu/hpc

  16. A P × Q -way Jagged Heuristic JAG-PQ-HEUR Sum on each column to generate a 1D problem. Partition it into P parts. For the first stripe, sum on each row. Partition it in Q parts. Treat all stripes. Complexity : P ) 2 + P × ( Q log n 2 O (( P log n 1 Q ) 2 ). Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: P × Q -way Jagged 13 / 31 HPC Lab http://bmi.osu.edu/hpc

  17. An optimal P × Q -way jagged partitioning : JAG-PQ-OPT A Dynamic Programming Formulation  L max ( n 1 , P ) = min 1 ≤ k < n 1 max( L max ( k − 1 , P − 1) , 1 D ( k , n 1 , Q ))  L max (0 , P ) = 0 L max ( n 1 , 0) = + ∞ , ∀ n 1 ≥ 1  O ( n 1 P ) L max functions to evaluate. (Each is O ( k ).) O ( n 2 1 ) 1D functions to evaluate. (Each is O (( Q log n 2 Q ) 2 ).) (Some significant implementation optimizations apply) For a 512x512 matrix and 1000 processors, that’s 512,000+262,144 values. On 64-bit values, that’s 6MB. Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: P × Q -way Jagged 14 / 31 HPC Lab http://bmi.osu.edu/hpc

  18. Performance of P × Q -way jagged (PIC-MAG it=30000) 1 RECT-NICOL JAG-PQ-HEUR JAG-PQ-OPT 0.1 load imbalance 0.01 0.001 10 100 1000 10000 number of processors Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: P × Q -way Jagged 15 / 31 HPC Lab http://bmi.osu.edu/hpc

  19. m -way jagged partitioning heuristics JAG-M-HEUR Similar to JAG-PQ-HEUR . Cut in P stripes using an optimal 1D Algorithm. Distribute processors proportionally to the stripe’s load. Compute a 1D partitioning of each stripe independently. Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: m -way Jagged 16 / 31 HPC Lab http://bmi.osu.edu/hpc

  20. m -way jagged partitioning heuristics JAG-M-HEUR Similar to JAG-PQ-HEUR . Cut in P stripes using an optimal 1D Algorithm. Distribute processors proportionally to the stripe’s load. Compute a 1D partitioning of each stripe independently. JAG-M-HEUR-PROBE Partition all the stripes at once using a multiple 1D arrays partitioning algorithm [Fre92]. Ohio State University, Biomedical Informatics 2D partitioning ¨ Umit V. C ¸ataly¨ urek Jagged Partitioning:: m -way Jagged 16 / 31 HPC Lab http://bmi.osu.edu/hpc

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend