Mapping pipeline skeletons onto heterogeneous platforms Anne Benoit - - PowerPoint PPT Presentation

mapping pipeline skeletons onto heterogeneous platforms
SMART_READER_LITE
LIVE PREVIEW

Mapping pipeline skeletons onto heterogeneous platforms Anne Benoit - - PowerPoint PPT Presentation

Introduction Framework Complexity Heuristics Experiments Conclusion Mapping pipeline skeletons onto heterogeneous platforms Anne Benoit and Yves Robert GRAAL team, LIP Ecole Normale Sup erieure de Lyon May 2007


slide-1
SLIDE 1

Introduction Framework Complexity Heuristics Experiments Conclusion

Mapping pipeline skeletons

  • nto heterogeneous platforms

Anne Benoit and Yves Robert

GRAAL team, LIP ´ Ecole Normale Sup´ erieure de Lyon

May 2007

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 1/ 26

slide-2
SLIDE 2

Introduction Framework Complexity Heuristics Experiments Conclusion

Introduction and motivation

Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach

Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping

Mapping pipeline skeletons onto heterogeneous platforms

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 2/ 26

slide-3
SLIDE 3

Introduction Framework Complexity Heuristics Experiments Conclusion

Introduction and motivation

Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach

Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping

Mapping pipeline skeletons onto heterogeneous platforms

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 2/ 26

slide-4
SLIDE 4

Introduction Framework Complexity Heuristics Experiments Conclusion

Introduction and motivation

Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach

Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping

Mapping pipeline skeletons onto heterogeneous platforms

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 2/ 26

slide-5
SLIDE 5

Introduction Framework Complexity Heuristics Experiments Conclusion

Introduction and motivation

Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach

Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping

Mapping pipeline skeletons onto heterogeneous platforms

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 2/ 26

slide-6
SLIDE 6

Introduction Framework Complexity Heuristics Experiments Conclusion

Introduction and motivation

Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach

Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping

Mapping pipeline skeletons onto heterogeneous platforms

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 2/ 26

slide-7
SLIDE 7

Introduction Framework Complexity Heuristics Experiments Conclusion

Introduction and motivation

Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach

Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping

Mapping pipeline skeletons onto heterogeneous platforms

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 2/ 26

slide-8
SLIDE 8

Introduction Framework Complexity Heuristics Experiments Conclusion

Introduction and motivation

Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach

Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping

Mapping pipeline skeletons onto heterogeneous platforms

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 2/ 26

slide-9
SLIDE 9

Introduction Framework Complexity Heuristics Experiments Conclusion

Rule of the game

Map each pipeline stage on a single processor Goal: minimize execution time Several mapping strategies

S1

... ...

S2 Sk Sn

The pipeline application

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 3/ 26

slide-10
SLIDE 10

Introduction Framework Complexity Heuristics Experiments Conclusion

Rule of the game

Map each pipeline stage on a single processor Goal: minimize execution time Several mapping strategies

S1

... ...

S2 Sk Sn

The pipeline application

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 3/ 26

slide-11
SLIDE 11

Introduction Framework Complexity Heuristics Experiments Conclusion

Rule of the game

Map each pipeline stage on a single processor Goal: minimize execution time Several mapping strategies

S1

... ...

S2 Sk Sn

One-to-one Mapping

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 3/ 26

slide-12
SLIDE 12

Introduction Framework Complexity Heuristics Experiments Conclusion

Rule of the game

Map each pipeline stage on a single processor Goal: minimize execution time Several mapping strategies

S1

... ...

S2 Sk Sn

Interval Mapping

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 3/ 26

slide-13
SLIDE 13

Introduction Framework Complexity Heuristics Experiments Conclusion

Rule of the game

Map each pipeline stage on a single processor Goal: minimize execution time Several mapping strategies

S1

... ...

S2 Sk Sn

General Mapping

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 3/ 26

slide-14
SLIDE 14

Introduction Framework Complexity Heuristics Experiments Conclusion

Major contributions

Theory Formal approach to the problem Problem complexity Integer linear program for exact resolution Practice Heuristics for Interval Mapping on clusters Experiments to compare heuristics and evaluate their absolute performance

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 4/ 26

slide-15
SLIDE 15

Introduction Framework Complexity Heuristics Experiments Conclusion

Major contributions

Theory Formal approach to the problem Problem complexity Integer linear program for exact resolution Practice Heuristics for Interval Mapping on clusters Experiments to compare heuristics and evaluate their absolute performance

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 4/ 26

slide-16
SLIDE 16

Introduction Framework Complexity Heuristics Experiments Conclusion

Outline

1

Framework

2

Complexity results

3

Heuristics

4

Experiments

5

Conclusion

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 5/ 26

slide-17
SLIDE 17

Introduction Framework Complexity Heuristics Experiments Conclusion

The application

... ...

S2 Sk Sn S1 w1 w2 wk wn δ0 δ1 δk−1 δk δn

n stages Sk, 1 ≤ k ≤ n Sk:

receives input of size δk−1 from Sk−1 performs wk computations

  • utputs data of size δk to Sk+1

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 6/ 26

slide-18
SLIDE 18

Introduction Framework Complexity Heuristics Experiments Conclusion

The platform

bin,u Pv Pout Pin sv Pu su bv,out bu,v sin sout

p processors Pu, 1 ≤ u ≤ p, fully interconnected su: speed of processor Pu bidirectional link linku,v : Pu → Pv, bandwidth bu,v

  • ne-port model: each processor can either send, receive or

compute at any time-step

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 7/ 26

slide-19
SLIDE 19

Introduction Framework Complexity Heuristics Experiments Conclusion

Different platforms

Fully Homogeneous – Identical processors (su = s) and links (bu,v = b): typical parallel machines Communication Homogeneous – Different-speed processors (su = sv), identical links (bu,v = b): networks of workstations, clusters Fully Heterogeneous – Fully heterogeneous architectures, su = sv and bu,v = bu′,v′: hierarchical platforms, grids

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 8/ 26

slide-20
SLIDE 20

Introduction Framework Complexity Heuristics Experiments Conclusion

Mapping problem: Interval Mapping

Several consecutive stages onto the same processor Increase computational load, reduce communications Partition of [1..n] into m intervals Ij = [dj, ej] (with dj ≤ ej for 1 ≤ j ≤ m, d1 = 1, dj+1 = ej + 1 for 1 ≤ j ≤ m − 1 and em = n) Interval Ij mapped onto processor Palloc(j) Minimize the period: Tperiod = max

1≤j≤m

  • δdj−1

balloc(j−1),alloc(j) + ej

i=dj wi

salloc(j) + δej balloc(j),alloc(j+1)

  • Anne.Benoit@ens-lyon.fr

May 2007 Mapping pipeline skeletons ICCS’07 9/ 26

slide-21
SLIDE 21

Introduction Framework Complexity Heuristics Experiments Conclusion

Mapping problem: Interval Mapping

Several consecutive stages onto the same processor Increase computational load, reduce communications Partition of [1..n] into m intervals Ij = [dj, ej] (with dj ≤ ej for 1 ≤ j ≤ m, d1 = 1, dj+1 = ej + 1 for 1 ≤ j ≤ m − 1 and em = n) Interval Ij mapped onto processor Palloc(j) Minimize the period: Tperiod = max

1≤j≤m

  • δdj−1

balloc(j−1),alloc(j) + ej

i=dj wi

salloc(j) + δej balloc(j),alloc(j+1)

  • Anne.Benoit@ens-lyon.fr

May 2007 Mapping pipeline skeletons ICCS’07 9/ 26

slide-22
SLIDE 22

Introduction Framework Complexity Heuristics Experiments Conclusion

Mapping problem: Interval Mapping

Several consecutive stages onto the same processor Increase computational load, reduce communications Partition of [1..n] into m intervals Ij = [dj, ej] (with dj ≤ ej for 1 ≤ j ≤ m, d1 = 1, dj+1 = ej + 1 for 1 ≤ j ≤ m − 1 and em = n) Interval Ij mapped onto processor Palloc(j) Minimize the period: Tperiod = max

1≤j≤m

  • δdj−1

balloc(j−1),alloc(j) + ej

i=dj wi

salloc(j) + δej balloc(j),alloc(j+1)

  • Anne.Benoit@ens-lyon.fr

May 2007 Mapping pipeline skeletons ICCS’07 9/ 26

slide-23
SLIDE 23

Introduction Framework Complexity Heuristics Experiments Conclusion

Outline

1

Framework

2

Complexity results

3

Heuristics

4

Experiments

5

Conclusion

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 10/ 26

slide-24
SLIDE 24

Introduction Framework Complexity Heuristics Experiments Conclusion

Complexity results

Fully Hom.

  • Comm. Hom.

One-to-one Mapping Interval Mapping General Mapping

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 11/ 26

slide-25
SLIDE 25

Introduction Framework Complexity Heuristics Experiments Conclusion

Complexity results

Fully Hom.

  • Comm. Hom.

One-to-one Mapping polynomial polynomial Interval Mapping General Mapping Binary search polynomial algorithm for One-to-one Mapping

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 11/ 26

slide-26
SLIDE 26

Introduction Framework Complexity Heuristics Experiments Conclusion

Complexity results

Fully Hom.

  • Comm. Hom.

One-to-one Mapping polynomial polynomial Interval Mapping polynomial NP-complete General Mapping Binary search polynomial algorithm for One-to-one Mapping Dynamic programming algorithm for Interval Mapping on

  • Hom. platforms (NP-hard otherwise)

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 11/ 26

slide-27
SLIDE 27

Introduction Framework Complexity Heuristics Experiments Conclusion

Complexity results

Fully Hom.

  • Comm. Hom.

One-to-one Mapping polynomial polynomial Interval Mapping polynomial NP-complete General Mapping same complexity as Interval Binary search polynomial algorithm for One-to-one Mapping Dynamic programming algorithm for Interval Mapping on

  • Hom. platforms (NP-hard otherwise)

General mapping: same complexity as Interval Mapping

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 11/ 26

slide-28
SLIDE 28

Introduction Framework Complexity Heuristics Experiments Conclusion

Complexity results

Fully Hom.

  • Comm. Hom.

One-to-one Mapping polynomial polynomial Interval Mapping polynomial NP-complete General Mapping same complexity as Interval Binary search polynomial algorithm for One-to-one Mapping Dynamic programming algorithm for Interval Mapping on

  • Hom. platforms (NP-hard otherwise)

General mapping: same complexity as Interval Mapping All problem instances NP-complete on Fully Heterogeneous platforms

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 11/ 26

slide-29
SLIDE 29

Introduction Framework Complexity Heuristics Experiments Conclusion

One-to-one/Comm. Hom.: binary search algorithm

Work with fastest n processors, numbered P1 to Pn, where s1 ≤ s2 ≤ . . . ≤ sn Mark all stages S1 to Sn as free For u = 1 to n

Pick up any free stage Sk s.t. δk−1/b + wk/su + δk/b ≤ Tperiod Assign Sk to Pu, and mark Sk as already assigned If no stage found return ”failure”

Proof: exchange argument

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 12/ 26

slide-30
SLIDE 30

Introduction Framework Complexity Heuristics Experiments Conclusion

One-to-one/Comm. Hom.: binary search algorithm

Work with fastest n processors, numbered P1 to Pn, where s1 ≤ s2 ≤ . . . ≤ sn Mark all stages S1 to Sn as free For u = 1 to n

Pick up any free stage Sk s.t. δk−1/b + wk/su + δk/b ≤ Tperiod Assign Sk to Pu, and mark Sk as already assigned If no stage found return ”failure”

Proof: exchange argument

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 12/ 26

slide-31
SLIDE 31

Introduction Framework Complexity Heuristics Experiments Conclusion

Outline

1

Framework

2

Complexity results

3

Heuristics

4

Experiments

5

Conclusion

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 13/ 26

slide-32
SLIDE 32

Introduction Framework Complexity Heuristics Experiments Conclusion

Greedy heuristics

Target clusters: Com. hom. platforms and Interval Mapping H1a-GR: random – fixed intervals H1b-GRIL: random interval length H2-GSW: biggest w – Place interval with most computations

  • n fastest processor

H3-GSD: biggest δin + δout – Intervals are sorted by communications (δin + δout) in: first stage of interval; (out − 1): last one H4-GP: biggest period on fastest processor – Balancing computation and communication: processors sorted by decreasing speed su; for current processor u, choose interval with biggest period (δin + δout)/b +

i∈Interval wi/su

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 14/ 26

slide-33
SLIDE 33

Introduction Framework Complexity Heuristics Experiments Conclusion

Sophisticated heuristics

H5-BS121: binary search for One-to-one Mapping – optimal algorithm for One-to-one Mapping. When p < n, application cut in fixed intervals of length L. H6-SPL: splitting intervals – Processors sorted by decreasing speed, all stages to first processor. At each step, select used proc j with largest period, split its interval (give fraction of stages to j′): minimize max(period(j), period(j′)) and split if maximum period improved. H7a-BSL and H7b-BSC: binary search (longest/closest) – Binary search on period P: start with stage s = 1, build intervals (s, s′) fitting on processors. For each u, and each s′ ≥ s, compute period (s..s′, u) and check whether it is smaller than P. H7a: maximizes s′; H7b: chooses the closest period.

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 15/ 26

slide-34
SLIDE 34

Introduction Framework Complexity Heuristics Experiments Conclusion

Outline

1

Framework

2

Complexity results

3

Heuristics

4

Experiments

5

Conclusion

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 16/ 26

slide-35
SLIDE 35

Introduction Framework Complexity Heuristics Experiments Conclusion

Plan of experiments

Assess performance of polynomial heuristics Random applications, n = 1 to 50 stages Random platforms, p = 10 and p = 100 processors b = 10 (comm. hom.), proc. speed between 1 and 20 Relevant parameters: ratios δ

b and w s

Average over 100 similar random appli/platform pairs

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 17/ 26

slide-36
SLIDE 36

Introduction Framework Complexity Heuristics Experiments Conclusion

Plan of experiments

Assess performance of polynomial heuristics Random applications, n = 1 to 50 stages Random platforms, p = 10 and p = 100 processors b = 10 (comm. hom.), proc. speed between 1 and 20 Relevant parameters: ratios δ

b and w s

Average over 100 similar random appli/platform pairs

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 17/ 26

slide-37
SLIDE 37

Introduction Framework Complexity Heuristics Experiments Conclusion

Experiment 1 - balanced comm/comp, hom comm

δi = 10, computation time between 1 and 20 10 processors

5 10 15 20 25 10 20 30 40 50 Maximum period Number of stages (p=10) H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 18/ 26

slide-38
SLIDE 38

Introduction Framework Complexity Heuristics Experiments Conclusion

Experiment 1 - balanced comm/comp, hom comm

δi = 10, computation time between 1 and 20 100 processors

2.6 2.8 3 3.2 3.4 10 20 30 40 50 Maximum period Number of stages (p=100) H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 18/ 26

slide-39
SLIDE 39

Introduction Framework Complexity Heuristics Experiments Conclusion

Experiment 2 - balanced comm/comp, het comm

communication time between 1 and 100 computation time between 1 and 20

10 15 20 25 30 35 10 20 30 40 50 Maximum period Number of stages (p=10) H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 19/ 26

slide-40
SLIDE 40

Introduction Framework Complexity Heuristics Experiments Conclusion

Experiment 2 - balanced comm/comp, het comm

communication time between 1 and 100 computation time between 1 and 20

10 15 20 25 30 10 20 30 40 50 Maximum period Number of stages (p=100) H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 19/ 26

slide-41
SLIDE 41

Introduction Framework Complexity Heuristics Experiments Conclusion

Experiment 3 - large computations

communication time between 1 and 20 computation time between 10 and 1000

200 400 600 800 1000 10 20 30 40 50 Maximum period Number of stages (p=10) H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 20/ 26

slide-42
SLIDE 42

Introduction Framework Complexity Heuristics Experiments Conclusion

Experiment 3 - large computations

communication time between 1 and 20 computation time between 10 and 1000

25 30 35 40 45 50 55 60 65 70 10 20 30 40 50 Maximum period Number of stages (p=100) H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 20/ 26

slide-43
SLIDE 43

Introduction Framework Complexity Heuristics Experiments Conclusion

Experiment 4 - small computations

communication time between 1 and 20 computation time between 0.01 and 10

1.5 2 2.5 3 3.5 4 4.5 10 20 30 40 50 Maximum period Number of stages (p=10) H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 21/ 26

slide-44
SLIDE 44

Introduction Framework Complexity Heuristics Experiments Conclusion

Experiment 4 - small computations

communication time between 1 and 20 computation time between 0.01 and 10

1 1.5 2 2.5 3 3.5 4 4.5 5 10 20 30 40 50 Maximum period Number of stages (p=100) H1a-GreedyRandom H1b-GreedyRandomIntervalLength H2-GreedySumW H3-GreedySumDinDout H4-GreedyPeriod H5-BinarySearch1to1 H6-SPLitting H7a-BinarySearchLongest H7b-BinarySearchClosest

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 21/ 26

slide-45
SLIDE 45

Introduction Framework Complexity Heuristics Experiments Conclusion

Summary of experiments

Much more efficient than random mappings Three dominant heuristics for different cases Insignificant communications (hom. or small) and many processors: H5-BS121 (One-to-one Mapping) Insignificant communications (hom. or small) and few processors: H7b-BSC (binary search: clever choice where to split) Important communications (het. or big): H6-SPL (splitting choice relevant for any number of processors)

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 22/ 26

slide-46
SLIDE 46

Introduction Framework Complexity Heuristics Experiments Conclusion

Summary of experiments

Much more efficient than random mappings Three dominant heuristics for different cases Insignificant communications (hom. or small) and many processors: H5-BS121 (One-to-one Mapping) Insignificant communications (hom. or small) and few processors: H7b-BSC (binary search: clever choice where to split) Important communications (het. or big): H6-SPL (splitting choice relevant for any number of processors)

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 22/ 26

slide-47
SLIDE 47

Introduction Framework Complexity Heuristics Experiments Conclusion

Outline

1

Framework

2

Complexity results

3

Heuristics

4

Experiments

5

Conclusion

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 23/ 26

slide-48
SLIDE 48

Introduction Framework Complexity Heuristics Experiments Conclusion

Related work

Scheduling task graphs on heterogeneous platforms– Acyclic task graphs scheduled on different speed processors [Topcuoglu et al.]. Communication contention: 1-port model [Beaumont et al.]. Mapping pipelined computations onto special-purpose architectures– FPGA arrays [Fabiani et al.]. Fault-tolerance for embedded systems [Zhu et al.] Mapping pipelined computations onto clusters and grids– DAG [Taura et al.], DataCutter [Saltz et al.] Mapping skeletons onto clusters and grids– Use of stochastic process algebra [Benoit et al.]

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 24/ 26

slide-49
SLIDE 49

Introduction Framework Complexity Heuristics Experiments Conclusion

Conclusion

Theoretical side – Complexity for different mapping strategies and different platform types Practical side Optimal polynomial algorithm for One-to-one Mapping Design of several heuristics for Interval Mapping on Communication Homogeneous Comparison of their performance Linear program to assess the absolute performance of the heuristics, which turns out to be quite good

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 25/ 26

slide-50
SLIDE 50

Introduction Framework Complexity Heuristics Experiments Conclusion

Future work

Short term Heuristics for Fully Heterogeneous platforms Extension to DAG-trees (a DAG which is a tree when un-oriented) Extension to stage replication LP with replication and DAG-trees Longer term Real experiments on heterogeneous clusters, using an already-implemented skeleton library and MPI Comparison of effective performance against theoretical performance

Anne.Benoit@ens-lyon.fr May 2007 Mapping pipeline skeletons ICCS’07 26/ 26