Load-Balancing Scatter Operations for Grid Computing Stphane Genaud - - PowerPoint PPT Presentation

load balancing scatter operations for grid computing
SMART_READER_LITE
LIVE PREVIEW

Load-Balancing Scatter Operations for Grid Computing Stphane Genaud - - PowerPoint PPT Presentation

Load-Balancing Scatter Operations for Grid Computing Stphane Genaud , Arnaud Giersch , and Frdric Vivien {stephane.genaud,arnaud.giersch}@icps.u-strasbg.fr frederic.vivien@ens-lyon.fr ICPS-LSIIT - UMR CNRS 7005, Universit


slide-1
SLIDE 1

Load-Balancing Scatter Operations for Grid Computing

Stéphane Genaud†, Arnaud Giersch†, and Frédéric Vivien‡

{stephane.genaud,arnaud.giersch}@icps.u-strasbg.fr frederic.vivien@ens-lyon.fr † ICPS-LSIIT - UMR CNRS 7005, Université Louis Pasteur, Strasbourg, France ‡ LIP

, ENS Lyon, France - INRIA

This research is partially supported by the French Ministry of Research through the ACI-GRID program.

Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.1

slide-2
SLIDE 2

Introduction

Motivating example Target application Scatter operation Static load-balancing Exact solution Guaranteed heuristic A case study: solving in rational Processor ordering policy Conclusion

Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.2

slide-3
SLIDE 3

Application

Geophysical code: build a global seismic velocity model of the Earth (ray-tracing seismic waves) Embarrassingly parallel application Original target architecture: parallel computer Data distribution: MPI_Scatter if (rank = ROOT) raydata ← read n lines from data file; MPI_Scatter(raydata, n/P, ..., rbuff, ..., ROOT, MPI_COMM_WORLD); compute_work(rbuff);

Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.3

slide-4
SLIDE 4

Environment

✂ ✂ ✂ ✂ ✂ ✂ ✄ ✄ ✄ ✄ ✄ ✄
  • proc. 9–16
  • proc. 1–8

(SGI O.2000, PC-Linux) (SGI O.3800)

Globus + MPIch-G2

CPU # Type CPU (s/ray) Rating Bandwidth (s/ray) 1 PIII/933 0.009288 1 2 PIII/800 0.009365 0.99 1.12 · 10−5 3 XP1800 0.004629 2 1.00 · 10−5 4 XP1800 0.004885 1.90 1.70 · 10−5 5, 6 XP2000 0.003976 2.33 8.15 · 10−5 7, 8 R12K/300 0.016156 0.57 2.10 · 10−5 9–16 R14K/500 0.009677 0.95 3.53 · 10−5

Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.4

slide-5
SLIDE 5

Unbalanced program execution

100 200 300 400 500 600 700 800 900 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 110000 time (seconds) data (rays) caseb pellinore sekhmet seven seven leda leda leda leda leda leda leda leda merlin merlin dinadan total time

  • comm. time

amount of data

How to load-balance execution with few code rewrites? replace MPI_Scatter with MPI_Scatterv

Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.5

slide-6
SLIDE 6

Scatter operation

A scatter communication followed by a computation phase

idle receiving sending computing time

P1 P2 P3 P4 t0 t1

Root: P4 (one-port model) Questions: best data distribution? best processor ordering?

Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.6

slide-7
SLIDE 7

Framework

Processors: P1, . . . , Pp; data distribution: n1, . . . , np Cost functions: Tcomm(i, x) and Tcomp(i, x)

Pi ends its processing at time: Ti =

i

  • j=1

Tcomm(j, nj) + Tcomp(i, ni)

Overall processing time:

T = max

1≤i≤p Ti = max 1≤i≤p

i

  • j=1

Tcomm(j, nj) + Tcomp(i, ni)

  • Data distribution n1, . . . , np minimizing T?

Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.7

slide-8
SLIDE 8

Exact solution

Topt(n, Pi, . . . , Pp) = min0≤ni≤n

  • Tcomm(i, ni)

+ max(Tcomp(i, ni), Topt(n − ni, Pi+1, . . . , Pp))

  • Dynamic programming algorithm:

for i ← p − 1 downto 1 knowing optimal solutions for 0 to n data items

  • nto Pi+i, . . . , Pp

looping from 0 to n data items assigned onto Pi, compute optimal solutions for 0 to n data items

  • nto Pi, . . . , Pp

Algorithmic complexity: O(p · n2) Assumptions: Tcomm(i, x) and Tcomp(i, x) are non-negative, and null whenever x = 0

Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.8

slide-9
SLIDE 9

Guaranteed heuristic

Assumptions: Tcomm(i, x) and Tcomp(i, x) are affine in x, increasing, and non-negative Linear program:

        

Minimize T such that

∀i ∈ [1, p], ni ≥ 0, p

i=1 ni = n,

∀i ∈ [1, p], T ≥ i

j=1 Tcomm(j, nj) + Tcomp(i, ni).

Rational solution rounded to the nearest integer Solution (T ′) is guaranteed:

Topt ≤ T ′ ≤ Topt +

p

  • j=1

Tcomm(j, 1) + max

1≤i≤p Tcomp(i, 1).

Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.9

slide-10
SLIDE 10

Load-balanced program execution

50 100 150 200 250 300 350 400 450 500 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 110000 time (seconds) data (rays) caseb pellinore sekhmet seven seven leda leda leda leda leda leda leda leda merlin merlin dinadan

Processor loads appear well balanced. Exact solution: 15 min., heuristic: instantaneous

relative error < 6 · 10−6

Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.10

slide-11
SLIDE 11

Divisible-load case

Tcomm(i, x) = λi · x and Tcomp(i, x) = µi · x

What can be said when looking for rational solutions? With D(P1, . . . , Pp) =

p

i=1 1 λi+µi · i−1 j=1 µj λj+µj

−1,

there exists an optimal rational solution, where each processor receives a non-empty share of data and all processors end at same date, if and only if

∀i ∈ [1, p − 1], λi ≤ D(Pi+1, . . . , Pp)

in this case, the optimal solution is

T = n·D(P1, . . . , Pp) ; ni = 1 λi + µi · i−1

  • j=1

µj λj + µj

  • ·T

Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.11

slide-12
SLIDE 12

Processor ordering policy

With rational solutions, processors should be ordered by decreasing bandwidth With integer solutions and Tcomm(i, x) and Tcomp(i, x) linear in x, this ordering policy is guaranteed:

Topt ≤ T ′ ≤ Topt +

p

  • j=1

Tcomm(j, 1) + max

1≤i≤p Tcomp(i, 1)

Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.12

slide-13
SLIDE 13

Ordering policies

50 100 150 200 250 300 350 400 450 500 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 110000 time (seconds) data (rays) caseb pellinore sekhmet seven seven leda leda leda leda leda leda leda leda merlin merlin dinadan 50 100 150 200 250 300 350 400 450 500 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 110000 time (seconds) data (rays) merlin merlin leda leda leda leda leda leda leda leda seven seven sekhmet pellinore caseb dinadan

Descending bandwidth Ascending bandwidth

Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.13

slide-14
SLIDE 14

Conclusion

Studied static load-balancing of scatter operations in heterogeneous environment. Two solutions to compute load-balanced distributions: a general and exact algorithm; and a guaranteed heuristic far more efficient for simple cases. A processor ordering policy that is guaranteed for simple cases: they must be ordered by decreasing

  • rder of their bandwidth.

Experiments showing that replacing MPI_Scatter by MPI_Scatterv with clever distributions leads to great performance improvement at low cost.

Load-Balancing Scatter Operations for Grid Computing – HCW 2003 – p.14