load balancing scatter operations for grid computing
play

Load-Balancing Scatter Operations for Grid Computing Stphane Genaud - PowerPoint PPT Presentation

Load-Balancing Scatter Operations for Grid Computing Stphane Genaud , Arnaud Giersch , and Frdric Vivien {stephane.genaud,arnaud.giersch}@icps.u-strasbg.fr frederic.vivien@ens-lyon.fr ICPS-LSIIT - UMR CNRS 7005, Universit


  1. Load-Balancing Scatter Operations for Grid Computing Stéphane Genaud † , Arnaud Giersch † , and Frédéric Vivien ‡ {stephane.genaud,arnaud.giersch}@icps.u-strasbg.fr frederic.vivien@ens-lyon.fr † ICPS-LSIIT - UMR CNRS 7005, Université Louis Pasteur, Strasbourg, France ‡ LIP , ENS Lyon, France - INRIA This research is partially supported by the French Ministry of Research through the ACI-GRID program. Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.1

  2. Introduction Motivating example Target application Scatter operation Static load-balancing Exact solution Guaranteed heuristic A case study: solving in rational Processor ordering policy Conclusion Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.2

  3. Application Geophysical code: build a global seismic velocity model of the Earth (ray-tracing seismic waves) Embarrassingly parallel application Original target architecture: parallel computer Data distribution: MPI_Scatter if (rank = ROOT) raydata ← read n lines from data file; MPI_Scatter(raydata, n/P , ..., rbuff, ..., ROOT, MPI_COMM_WORLD); compute_work(rbuff); Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.3

  4. � ✄ �✁ ✂ ✂ ✂ ✂ ✂ ✂ ✄ ✄ ✄ ✄ ✄ Environment proc. 1–8 proc. 9–16 (SGI O.2000, PC-Linux) (SGI O.3800) Globus + MPIch-G2 CPU # Type CPU (s/ray) Rating Bandwidth (s/ray) 1 PIII/933 0 . 009288 1 0 1 . 12 · 10 − 5 2 PIII/800 0 . 009365 0 . 99 1 . 00 · 10 − 5 3 XP1800 0 . 004629 2 1 . 70 · 10 − 5 4 XP1800 0 . 004885 1 . 90 8 . 15 · 10 − 5 5, 6 XP2000 0 . 003976 2 . 33 2 . 10 · 10 − 5 7, 8 R12K/300 0 . 016156 0 . 57 3 . 53 · 10 − 5 9–16 R14K/500 0 . 009677 0 . 95 Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.4

  5. Unbalanced program execution 900 110000 total time 800 comm. time 100000 amount of data 90000 700 80000 600 time (seconds) 70000 data (rays) 500 60000 400 50000 40000 300 30000 200 20000 100 10000 0 0 caseb pellinore sekhmet seven seven leda leda leda leda leda leda leda leda merlin merlin dinadan How to load-balance execution with few code rewrites? replace MPI_Scatter with MPI_Scatterv Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.5

  6. Scatter operation A scatter communication followed by a computation phase P 1 P 2 P 3 P 4 t 0 idle computing time sending receiving t 1 Root: P 4 (one-port model) Questions: best data distribution? best processor ordering? Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.6

  7. Framework Processors: P 1 , . . . , P p ; data distribution: n 1 , . . . , n p Cost functions: T comm ( i, x ) and T comp ( i, x ) P i ends its processing at time: i � T i = T comm ( j, n j ) + T comp ( i, n i ) j =1 Overall processing time: � i � � T = max 1 ≤ i ≤ p T i = max T comm ( j, n j ) + T comp ( i, n i ) 1 ≤ i ≤ p j =1 Data distribution n 1 , . . . , n p minimizing T ? Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.7

  8. Exact solution � T opt ( n, P i , . . . , P p ) = min 0 ≤ n i ≤ n T comm ( i, n i ) � + max( T comp ( i, n i ) , T opt ( n − n i , P i +1 , . . . , P p )) Dynamic programming algorithm: for i ← p − 1 downto 1 knowing optimal solutions for 0 to n data items onto P i + i , . . . , P p looping from 0 to n data items assigned onto P i , compute optimal solutions for 0 to n data items onto P i , . . . , P p Algorithmic complexity: O ( p · n 2 ) Assumptions: T comm ( i, x ) and T comp ( i, x ) are non-negative, and null whenever x = 0 Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.8

  9. Guaranteed heuristic Assumptions: T comm ( i, x ) and T comp ( i, x ) are affine in x , increasing, and non-negative Linear program:  Minimize T such that    ∀ i ∈ [1 , p ] , n i ≥ 0 ,  � p i =1 n i = n,   ∀ i ∈ [1 , p ] , T ≥ � i  j =1 T comm ( j, n j ) + T comp ( i, n i ) .  Rational solution rounded to the nearest integer Solution ( T ′ ) is guaranteed: p T opt ≤ T ′ ≤ T opt + � T comm ( j, 1) + max 1 ≤ i ≤ p T comp ( i, 1) . j =1 Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.9

  10. Load-balanced program execution 500 110000 450 100000 400 90000 350 80000 time (seconds) 70000 300 data (rays) 60000 250 50000 200 40000 150 30000 100 20000 50 10000 0 0 caseb pellinore sekhmet seven seven leda leda leda leda leda leda leda leda merlin merlin dinadan Processor loads appear well balanced. Exact solution: 15 min., heuristic: instantaneous relative error < 6 · 10 − 6 Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.10

  11. Divisible-load case T comm ( i, x ) = λ i · x and T comp ( i, x ) = µ i · x What can be said when looking for rational solutions? � − 1 , � � p µ j λ i + µ i · � i − 1 1 With D ( P 1 , . . . , P p ) = i =1 j =1 λ j + µ j there exists an optimal rational solution, where each processor receives a non-empty share of data and all processors end at same date, if and only if ∀ i ∈ [1 , p − 1] , λ i ≤ D ( P i +1 , . . . , P p ) in this case, the optimal solution is � i − 1 � µ j 1 � T = n · D ( P 1 , . . . , P p ) ; n i = · T · λ i + µ i λ j + µ j j =1 Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.11

  12. Processor ordering policy With rational solutions, processors should be ordered by decreasing bandwidth With integer solutions and T comm ( i, x ) and T comp ( i, x ) linear in x , this ordering policy is guaranteed: p T opt ≤ T ′ ≤ T opt + � T comm ( j, 1) + max 1 ≤ i ≤ p T comp ( i, 1) j =1 Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.12

  13. Ordering policies 500 500 110000 110000 450 450 100000 100000 400 400 90000 90000 350 350 80000 80000 70000 70000 300 300 time (seconds) time (seconds) data (rays) data (rays) 60000 60000 250 250 50000 50000 200 200 40000 40000 150 150 30000 30000 100 100 20000 20000 50 50 10000 10000 0 0 0 0 caseb pellinore sekhmet seven seven leda leda leda leda leda leda leda leda merlin merlin dinadan merlin merlin leda leda leda leda leda leda leda leda seven seven sekhmet pellinore caseb dinadan Descending bandwidth Ascending bandwidth Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.13

  14. Conclusion Studied static load-balancing of scatter operations in heterogeneous environment. Two solutions to compute load-balanced distributions: a general and exact algorithm; and a guaranteed heuristic far more efficient for simple cases. A processor ordering policy that is guaranteed for simple cases: they must be ordered by decreasing order of their bandwidth. Experiments showing that replacing MPI_Scatter by MPI_Scatterv with clever distributions leads to great performance improvement at low cost . Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend