Mapping pipelined applications with replication to increase - - PowerPoint PPT Presentation

mapping pipelined applications with replication to
SMART_READER_LITE
LIVE PREVIEW

Mapping pipelined applications with replication to increase - - PowerPoint PPT Presentation

Framework Complexity Practical Conclusion Mapping pipelined applications with replication to increase throughput and reliability Anne Benoit 1 , 2 , Loris Marchal 2 , Yves Robert 1 , 2 , Oliver Sinnen 3 1. Institut Universitaire de France 2.


slide-1
SLIDE 1

Framework Complexity Practical Conclusion

Mapping pipelined applications with replication to increase throughput and reliability

Anne Benoit1,2, Loris Marchal2, Yves Robert1,2, Oliver Sinnen3

  • 1. Institut Universitaire de France
  • 2. LIP, ´

Ecole Normale Sup´ erieure de Lyon, France

  • 3. University of Auckland, New Zealand

SBAC-PAD, Petropolis, Rio de Janeiro, Brazil October 27-30, 2010

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 1/ 28

slide-2
SLIDE 2

Framework Complexity Practical Conclusion

Motivations

Mapping pipelined applications onto parallel platforms: practical applications, but difficult challenge Both performance (throughput) and reliability objectives: even more difficult! Use of replication: mapping an application stage onto more than one processor

redundant computations: increase reliability round-robin computations (over consecutive data sets): increase throughput bi-criteria problem: need to trade-off between two kinds of replication

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 2/ 28

slide-3
SLIDE 3

Framework Complexity Practical Conclusion

Motivations

Mapping pipelined applications onto parallel platforms: practical applications, but difficult challenge Both performance (throughput) and reliability objectives: even more difficult! Use of replication: mapping an application stage onto more than one processor

redundant computations: increase reliability round-robin computations (over consecutive data sets): increase throughput bi-criteria problem: need to trade-off between two kinds of replication

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 2/ 28

slide-4
SLIDE 4

Framework Complexity Practical Conclusion

Motivations

Mapping pipelined applications onto parallel platforms: practical applications, but difficult challenge Both performance (throughput) and reliability objectives: even more difficult! Use of replication: mapping an application stage onto more than one processor

redundant computations: increase reliability round-robin computations (over consecutive data sets): increase throughput bi-criteria problem: need to trade-off between two kinds of replication

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 2/ 28

slide-5
SLIDE 5

Framework Complexity Practical Conclusion

Motivations

Mapping pipelined applications onto parallel platforms: practical applications, but difficult challenge Both performance (throughput) and reliability objectives: even more difficult! Use of replication: mapping an application stage onto more than one processor

redundant computations: increase reliability round-robin computations (over consecutive data sets): increase throughput bi-criteria problem: need to trade-off between two kinds of replication

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 2/ 28

slide-6
SLIDE 6

Framework Complexity Practical Conclusion

Motivations

Mapping pipelined applications onto parallel platforms: practical applications, but difficult challenge Both performance (throughput) and reliability objectives: even more difficult! Use of replication: mapping an application stage onto more than one processor

redundant computations: increase reliability round-robin computations (over consecutive data sets): increase throughput bi-criteria problem: need to trade-off between two kinds of replication

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 2/ 28

slide-7
SLIDE 7

Framework Complexity Practical Conclusion

Motivations

Mapping pipelined applications onto parallel platforms: practical applications, but difficult challenge Both performance (throughput) and reliability objectives: even more difficult! Use of replication: mapping an application stage onto more than one processor

redundant computations: increase reliability round-robin computations (over consecutive data sets): increase throughput bi-criteria problem: need to trade-off between two kinds of replication

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 2/ 28

slide-8
SLIDE 8

Framework Complexity Practical Conclusion

Main contributions

Theoretical side: assess problem hardness with different mapping rules and platform characteristics Practical side: heuristics on most general (NP-complete) case, exact algorithm based on A*, experiments to assess heuristics performance

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 3/ 28

slide-9
SLIDE 9

Framework Complexity Practical Conclusion

Main contributions

Theoretical side: assess problem hardness with different mapping rules and platform characteristics Practical side: heuristics on most general (NP-complete) case, exact algorithm based on A*, experiments to assess heuristics performance

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 3/ 28

slide-10
SLIDE 10

Framework Complexity Practical Conclusion Application Platform Mapping Objective

Outline of the talk

1

Framework Application Platform Mapping Objective

2

Complexity results Mono-criterion Bi-criteria Approximation results

3

Practical side Heuristics Optimal algorithm using A* Evaluation results

4

Conclusion

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 4/ 28

slide-11
SLIDE 11

Framework Complexity Practical Conclusion Application Platform Mapping Objective

Applicative framework

Sn

... ...

w1 w2 wn S1 S2 Si wi

Pipeline of n stages S1, . . . , Sn Stage Si performs a number wi of computations Communication costs are negligible in comparison with computation costs

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 5/ 28

slide-12
SLIDE 12

Framework Complexity Practical Conclusion Application Platform Mapping Objective

Target platform

Platform with p processors P1, . . . , Pp, fully interconnected as a (virtual) clique For 1 ≤ u ≤ p, processor Pu has speed su and failure probability 0 < fu < 1 Failure probability: independent of the duration of the application, meant to run for a long time (cycle-stealing scenario) SpeedHom platform: identical speeds su = s for 1 ≤ u ≤ p (as opposed to SpeedHet) FailureHom platform: identical failure probabilities (as opposed to FailureHet)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 6/ 28

slide-13
SLIDE 13

Framework Complexity Practical Conclusion Application Platform Mapping Objective

Target platform

Platform with p processors P1, . . . , Pp, fully interconnected as a (virtual) clique For 1 ≤ u ≤ p, processor Pu has speed su and failure probability 0 < fu < 1 Failure probability: independent of the duration of the application, meant to run for a long time (cycle-stealing scenario) SpeedHom platform: identical speeds su = s for 1 ≤ u ≤ p (as opposed to SpeedHet) FailureHom platform: identical failure probabilities (as opposed to FailureHet)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 6/ 28

slide-14
SLIDE 14

Framework Complexity Practical Conclusion Application Platform Mapping Objective

Mapping problem

Interval mapping: consecutive stages mapped together: partition of [1..n] into m ≤ p intervals Ij Ij mapped onto set of processors Aj, organized into ℓj teams

processors within a team perform redundant computations (replication for reliability) different teams assigned to same interval execute distinct data sets in a round-robin fashion (replication for performance)

A processor cannot participate in two different teams ℓ = m

j=1 ℓj is the total number of teams

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 7/ 28

slide-15
SLIDE 15

Framework Complexity Practical Conclusion Application Platform Mapping Objective

Mapping problem

Interval mapping: consecutive stages mapped together: partition of [1..n] into m ≤ p intervals Ij Ij mapped onto set of processors Aj, organized into ℓj teams

processors within a team perform redundant computations (replication for reliability) different teams assigned to same interval execute distinct data sets in a round-robin fashion (replication for performance)

A processor cannot participate in two different teams ℓ = m

j=1 ℓj is the total number of teams

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 7/ 28

slide-16
SLIDE 16

Framework Complexity Practical Conclusion Application Platform Mapping Objective

Mapping problem

Interval mapping: consecutive stages mapped together: partition of [1..n] into m ≤ p intervals Ij Ij mapped onto set of processors Aj, organized into ℓj teams

processors within a team perform redundant computations (replication for reliability) different teams assigned to same interval execute distinct data sets in a round-robin fashion (replication for performance)

A processor cannot participate in two different teams ℓ = m

j=1 ℓj is the total number of teams

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 7/ 28

slide-17
SLIDE 17

Framework Complexity Practical Conclusion Application Platform Mapping Objective

Mapping problem

Interval mapping: consecutive stages mapped together: partition of [1..n] into m ≤ p intervals Ij Ij mapped onto set of processors Aj, organized into ℓj teams

processors within a team perform redundant computations (replication for reliability) different teams assigned to same interval execute distinct data sets in a round-robin fashion (replication for performance)

A processor cannot participate in two different teams ℓ = m

j=1 ℓj is the total number of teams

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 7/ 28

slide-18
SLIDE 18

Framework Complexity Practical Conclusion Application Platform Mapping Objective

Mapping problem

Interval mapping: consecutive stages mapped together: partition of [1..n] into m ≤ p intervals Ij Ij mapped onto set of processors Aj, organized into ℓj teams

processors within a team perform redundant computations (replication for reliability) different teams assigned to same interval execute distinct data sets in a round-robin fashion (replication for performance)

A processor cannot participate in two different teams ℓ = m

j=1 ℓj is the total number of teams

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 7/ 28

slide-19
SLIDE 19

Framework Complexity Practical Conclusion Application Platform Mapping Objective

Mapping problem

Interval mapping: consecutive stages mapped together: partition of [1..n] into m ≤ p intervals Ij Ij mapped onto set of processors Aj, organized into ℓj teams

processors within a team perform redundant computations (replication for reliability) different teams assigned to same interval execute distinct data sets in a round-robin fashion (replication for performance)

A processor cannot participate in two different teams ℓ = m

j=1 ℓj is the total number of teams

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 7/ 28

slide-20
SLIDE 20

Framework Complexity Practical Conclusion Application Platform Mapping Objective

Example of mapping

S S S S S T T T T T T I I I

1 2 3 1 2 3 4 5 1 2 3

P

1

P

2

P

3

P

4

P

5

P

6

P

7

P

8

P

9

P

10

P

11 4 5 6

n = 5 stages divided into m = 3 intervals p = 11 processors organized in ℓ = 6 teams ℓ1 = 3, ℓ2 = 1, ℓ3 = 2

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 8/ 28

slide-21
SLIDE 21

Framework Complexity Practical Conclusion Application Platform Mapping Objective

Example of mapping

S S S S S T T T T T T I I I

1 2 3 1 2 3 4 5 1 2 3

P

1

P

2

P

3

P

4

P

5

P

6

P

7

P

8

P

9

P

10

P

11 4 5 6

n = 5 stages divided into m = 3 intervals p = 11 processors organized in ℓ = 6 teams ℓ1 = 3, ℓ2 = 1, ℓ3 = 2

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 8/ 28

slide-22
SLIDE 22

Framework Complexity Practical Conclusion Application Platform Mapping Objective

Objective functions

Period of the application: P = max

1≤j≤m

  • i∈Ij wi

ℓj × minPu∈Aj su

  • Round-robin distribution: each team compute one data set every
  • ther ℓj ones, computation slowed down by slowest processor for

interval

Failure probability: F = 1 −

  • 1≤k≤ℓ

(1 −

  • Pu∈Tk

fu)

Computation successful if at least one surviving processor per team

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 9/ 28

slide-23
SLIDE 23

Framework Complexity Practical Conclusion Application Platform Mapping Objective

Objective functions

Period of the application: P = max

1≤j≤m

  • i∈Ij wi

ℓj × minPu∈Aj su

  • Round-robin distribution: each team compute one data set every
  • ther ℓj ones, computation slowed down by slowest processor for

interval

Failure probability: F = 1 −

  • 1≤k≤ℓ

(1 −

  • Pu∈Tk

fu)

Computation successful if at least one surviving processor per team

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 9/ 28

slide-24
SLIDE 24

Framework Complexity Practical Conclusion Application Platform Mapping Objective

Objective functions

Period of the application: P = max

1≤j≤m

  • i∈Ij wi

ℓj × minPu∈Aj su

  • Round-robin distribution: each team compute one data set every
  • ther ℓj ones, computation slowed down by slowest processor for

interval

Failure probability: F = 1 −

  • 1≤k≤ℓ

(1 −

  • Pu∈Tk

fu)

Computation successful if at least one surviving processor per team

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 9/ 28

slide-25
SLIDE 25

Framework Complexity Practical Conclusion Application Platform Mapping Objective

Objective functions

Period of the application: P = max

1≤j≤m

  • i∈Ij wi

ℓj × minPu∈Aj su

  • Round-robin distribution: each team compute one data set every
  • ther ℓj ones, computation slowed down by slowest processor for

interval

Failure probability: F = 1 −

  • 1≤k≤ℓ

(1 −

  • Pu∈Tk

fu)

Computation successful if at least one surviving processor per team

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 9/ 28

slide-26
SLIDE 26

Framework Complexity Practical Conclusion Application Platform Mapping Objective

The problem

Determine the best interval mapping, over all possible partitions into intervals and processor assignments Mono-criterion: minimize period or failure probability Bi-criteria: (i) given a threshold period, minimize failure probability or (ii) given a threshold failure probability, minimize period

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 10/ 28

slide-27
SLIDE 27

Framework Complexity Practical Conclusion Application Platform Mapping Objective

The problem

Determine the best interval mapping, over all possible partitions into intervals and processor assignments Mono-criterion: minimize period or failure probability Bi-criteria: (i) given a threshold period, minimize failure probability or (ii) given a threshold failure probability, minimize period

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 10/ 28

slide-28
SLIDE 28

Framework Complexity Practical Conclusion Application Platform Mapping Objective

The problem

Determine the best interval mapping, over all possible partitions into intervals and processor assignments Mono-criterion: minimize period or failure probability Bi-criteria: (i) given a threshold period, minimize failure probability or (ii) given a threshold failure probability, minimize period

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 10/ 28

slide-29
SLIDE 29

Framework Complexity Practical Conclusion Mono-criterion Bi-criteria Approximation

Outline of the talk

1

Framework Application Platform Mapping Objective

2

Complexity results Mono-criterion Bi-criteria Approximation results

3

Practical side Heuristics Optimal algorithm using A* Evaluation results

4

Conclusion

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 11/ 28

slide-30
SLIDE 30

Framework Complexity Practical Conclusion Mono-criterion Bi-criteria Approximation

Mono-criterion complexity results

Failure probability: easy on any kind of platforms: group all stages as a single interval, processed by one single team with all p processors Period: one processor per team

SpeedHom platform: one interval processed by p teams SpeedHet platforms: NP-hard in the general case, polynomial if wi = w for 1 ≤ i ≤ n (see previous work [Algorithmica2010])

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 12/ 28

slide-31
SLIDE 31

Framework Complexity Practical Conclusion Mono-criterion Bi-criteria Approximation

Mono-criterion complexity results

Failure probability: easy on any kind of platforms: group all stages as a single interval, processed by one single team with all p processors Period: one processor per team

SpeedHom platform: one interval processed by p teams SpeedHet platforms: NP-hard in the general case, polynomial if wi = w for 1 ≤ i ≤ n (see previous work [Algorithmica2010])

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 12/ 28

slide-32
SLIDE 32

Framework Complexity Practical Conclusion Mono-criterion Bi-criteria Approximation

Bi-criteria complexity results

Preliminary result: for SpeedHom platforms, there exists an

  • ptimal bi-criteria mapping with one single interval

Proof: starting from an optimal solution with several intervals, merge intervals, and the single interval is processed by all teams of optimal solution Failure probability remains the same (same teams) New period cannot be greater than optimal period (SpeedHom platform)

Not true on SpeedHet platforms: example with w1 = s1 = 1 and w2 = s2 = 2, F∗ = 1

period 1 with two intervals period 3/2 with one single interval

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 13/ 28

slide-33
SLIDE 33

Framework Complexity Practical Conclusion Mono-criterion Bi-criteria Approximation

Bi-criteria complexity results

Preliminary result: for SpeedHom platforms, there exists an

  • ptimal bi-criteria mapping with one single interval

Proof: starting from an optimal solution with several intervals, merge intervals, and the single interval is processed by all teams of optimal solution Failure probability remains the same (same teams) New period cannot be greater than optimal period (SpeedHom platform)

Not true on SpeedHet platforms: example with w1 = s1 = 1 and w2 = s2 = 2, F∗ = 1

period 1 with two intervals period 3/2 with one single interval

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 13/ 28

slide-34
SLIDE 34

Framework Complexity Practical Conclusion Mono-criterion Bi-criteria Approximation

Bi-criteria complexity results

Preliminary result: for SpeedHom platforms, there exists an

  • ptimal bi-criteria mapping with one single interval

Proof: starting from an optimal solution with several intervals, merge intervals, and the single interval is processed by all teams of optimal solution Failure probability remains the same (same teams) New period cannot be greater than optimal period (SpeedHom platform)

Not true on SpeedHet platforms: example with w1 = s1 = 1 and w2 = s2 = 2, F∗ = 1

period 1 with two intervals period 3/2 with one single interval

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 13/ 28

slide-35
SLIDE 35

Framework Complexity Practical Conclusion Mono-criterion Bi-criteria Approximation

Bi-criteria complexity results

Preliminary result: for SpeedHom platforms, there exists an

  • ptimal bi-criteria mapping with one single interval

Proof: starting from an optimal solution with several intervals, merge intervals, and the single interval is processed by all teams of optimal solution Failure probability remains the same (same teams) New period cannot be greater than optimal period (SpeedHom platform)

Not true on SpeedHet platforms: example with w1 = s1 = 1 and w2 = s2 = 2, F∗ = 1

period 1 with two intervals period 3/2 with one single interval

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 13/ 28

slide-36
SLIDE 36

Framework Complexity Practical Conclusion Mono-criterion Bi-criteria Approximation

Bi-criteria complexity results

Preliminary result: for SpeedHom platforms, there exists an

  • ptimal bi-criteria mapping with one single interval

Proof: starting from an optimal solution with several intervals, merge intervals, and the single interval is processed by all teams of optimal solution Failure probability remains the same (same teams) New period cannot be greater than optimal period (SpeedHom platform)

Not true on SpeedHet platforms: example with w1 = s1 = 1 and w2 = s2 = 2, F∗ = 1

period 1 with two intervals period 3/2 with one single interval

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 13/ 28

slide-37
SLIDE 37

Framework Complexity Practical Conclusion Mono-criterion Bi-criteria Approximation

SpeedHom-FailureHom platforms

SpeedHom-FailureHom: Polynomial time algorithm Fixed period P∗

  • ne single interval with minimum number of teams

ℓmin = n

i=1 wi

P∗ × s

  • greedily assign processors to teams to have balanced teams

algorithm in O(p)

Converse problem: fixed F∗

  • ne single interval...

...but must try all possible number of teams 1 ≤ ℓ ≤ p algorithm in O(p log p)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 14/ 28

slide-38
SLIDE 38

Framework Complexity Practical Conclusion Mono-criterion Bi-criteria Approximation

SpeedHom-FailureHom platforms

SpeedHom-FailureHom: Polynomial time algorithm Fixed period P∗

  • ne single interval with minimum number of teams

ℓmin = n

i=1 wi

P∗ × s

  • greedily assign processors to teams to have balanced teams

algorithm in O(p)

Converse problem: fixed F∗

  • ne single interval...

...but must try all possible number of teams 1 ≤ ℓ ≤ p algorithm in O(p log p)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 14/ 28

slide-39
SLIDE 39

Framework Complexity Practical Conclusion Mono-criterion Bi-criteria Approximation

SpeedHom-FailureHom platforms

SpeedHom-FailureHom: Polynomial time algorithm Fixed period P∗

  • ne single interval with minimum number of teams

ℓmin = n

i=1 wi

P∗ × s

  • greedily assign processors to teams to have balanced teams

algorithm in O(p)

Converse problem: fixed F∗

  • ne single interval...

...but must try all possible number of teams 1 ≤ ℓ ≤ p algorithm in O(p log p)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 14/ 28

slide-40
SLIDE 40

Framework Complexity Practical Conclusion Mono-criterion Bi-criteria Approximation

SpeedHom-FailureHom platforms

SpeedHom-FailureHom: Polynomial time algorithm Fixed period P∗

  • ne single interval with minimum number of teams

ℓmin = n

i=1 wi

P∗ × s

  • greedily assign processors to teams to have balanced teams

algorithm in O(p)

Converse problem: fixed F∗

  • ne single interval...

...but must try all possible number of teams 1 ≤ ℓ ≤ p algorithm in O(p log p)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 14/ 28

slide-41
SLIDE 41

Framework Complexity Practical Conclusion Mono-criterion Bi-criteria Approximation

SpeedHom-FailureHom platforms

SpeedHom-FailureHom: Polynomial time algorithm Fixed period P∗

  • ne single interval with minimum number of teams

ℓmin = n

i=1 wi

P∗ × s

  • greedily assign processors to teams to have balanced teams

algorithm in O(p)

Converse problem: fixed F∗

  • ne single interval...

...but must try all possible number of teams 1 ≤ ℓ ≤ p algorithm in O(p log p)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 14/ 28

slide-42
SLIDE 42

Framework Complexity Practical Conclusion Mono-criterion Bi-criteria Approximation

With heterogeneous platforms

SpeedHet-FailureHom is NP-hard because SpeedHet is NP-hard for period minimization SpeedHom-FailureHet becomes NP-hard as well: balancing processors within teams is combinatorial; reduction from 3-PARTITION Intermediate result: best reliability always obtained by balancing failure probabilities of each team

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 15/ 28

slide-43
SLIDE 43

Framework Complexity Practical Conclusion Mono-criterion Bi-criteria Approximation

With heterogeneous platforms

SpeedHet-FailureHom is NP-hard because SpeedHet is NP-hard for period minimization SpeedHom-FailureHet becomes NP-hard as well: balancing processors within teams is combinatorial; reduction from 3-PARTITION Intermediate result: best reliability always obtained by balancing failure probabilities of each team

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 15/ 28

slide-44
SLIDE 44

Framework Complexity Practical Conclusion Mono-criterion Bi-criteria Approximation

With heterogeneous platforms

SpeedHet-FailureHom is NP-hard because SpeedHet is NP-hard for period minimization SpeedHom-FailureHet becomes NP-hard as well: balancing processors within teams is combinatorial; reduction from 3-PARTITION Intermediate result: best reliability always obtained by balancing failure probabilities of each team

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 15/ 28

slide-45
SLIDE 45

Framework Complexity Practical Conclusion Mono-criterion Bi-criteria Approximation

Approximation results

SpeedHom: always optimal with single interval SpeedHet: period minimization problem (NP-hard) The optimal single-interval mapping can be found:

sort processors by non-increasing speeds for 1 ≤ i ≤ p, compute period using i fastest processors time O(p log p)

Theorem: single-interval mapping is a n-approximation algorithm for period minimization; this factor cannot be improved Proof sketch: start from an optimal solution, with m ≤ n intervals, and build a single interval solution, with period P1 ≤ m × Pm

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 16/ 28

slide-46
SLIDE 46

Framework Complexity Practical Conclusion Mono-criterion Bi-criteria Approximation

Approximation results

SpeedHom: always optimal with single interval SpeedHet: period minimization problem (NP-hard) The optimal single-interval mapping can be found:

sort processors by non-increasing speeds for 1 ≤ i ≤ p, compute period using i fastest processors time O(p log p)

Theorem: single-interval mapping is a n-approximation algorithm for period minimization; this factor cannot be improved Proof sketch: start from an optimal solution, with m ≤ n intervals, and build a single interval solution, with period P1 ≤ m × Pm

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 16/ 28

slide-47
SLIDE 47

Framework Complexity Practical Conclusion Mono-criterion Bi-criteria Approximation

Approximation results

SpeedHom: always optimal with single interval SpeedHet: period minimization problem (NP-hard) The optimal single-interval mapping can be found:

sort processors by non-increasing speeds for 1 ≤ i ≤ p, compute period using i fastest processors time O(p log p)

Theorem: single-interval mapping is a n-approximation algorithm for period minimization; this factor cannot be improved Proof sketch: start from an optimal solution, with m ≤ n intervals, and build a single interval solution, with period P1 ≤ m × Pm

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 16/ 28

slide-48
SLIDE 48

Framework Complexity Practical Conclusion Mono-criterion Bi-criteria Approximation

Approximation results

SpeedHom: always optimal with single interval SpeedHet: period minimization problem (NP-hard) The optimal single-interval mapping can be found:

sort processors by non-increasing speeds for 1 ≤ i ≤ p, compute period using i fastest processors time O(p log p)

Theorem: single-interval mapping is a n-approximation algorithm for period minimization; this factor cannot be improved Proof sketch: start from an optimal solution, with m ≤ n intervals, and build a single interval solution, with period P1 ≤ m × Pm

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 16/ 28

slide-49
SLIDE 49

Framework Complexity Practical Conclusion Heuristics A* Evaluation

Outline of the talk

1

Framework Application Platform Mapping Objective

2

Complexity results Mono-criterion Bi-criteria Approximation results

3

Practical side Heuristics Optimal algorithm using A* Evaluation results

4

Conclusion

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 17/ 28

slide-50
SLIDE 50

Framework Complexity Practical Conclusion Heuristics A* Evaluation

Heuristics

SpeedHet-FailureHet platforms Minimize F under a fixed upper period P∗ Counterpart problem: binary search over P∗ Two heuristics:

OneInterval: stages grouped as a single interval (motivated by complexity results) MultiInterval: solution with multiple intervals (recall that single interval may be far from optimal)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 18/ 28

slide-51
SLIDE 51

Framework Complexity Practical Conclusion Heuristics A* Evaluation

Heuristics

SpeedHet-FailureHet platforms Minimize F under a fixed upper period P∗ Counterpart problem: binary search over P∗ Two heuristics:

OneInterval: stages grouped as a single interval (motivated by complexity results) MultiInterval: solution with multiple intervals (recall that single interval may be far from optimal)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 18/ 28

slide-52
SLIDE 52

Framework Complexity Practical Conclusion Heuristics A* Evaluation

OneInterval

One single interval Determine number of teams: try all values ℓ between 1 and p For a given ℓ, discard processors too slow for period Assign processors to teams to minimize failure probability

From complexity results: balance reliability across teams NP-hard problem but efficient greedy heuristic: sort processors by non-decreasing failure probability and assign next processor to team with highest failure probability

Time complexity: O(p2 log p)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 19/ 28

slide-53
SLIDE 53

Framework Complexity Practical Conclusion Heuristics A* Evaluation

OneInterval

One single interval Determine number of teams: try all values ℓ between 1 and p For a given ℓ, discard processors too slow for period Assign processors to teams to minimize failure probability

From complexity results: balance reliability across teams NP-hard problem but efficient greedy heuristic: sort processors by non-decreasing failure probability and assign next processor to team with highest failure probability

Time complexity: O(p2 log p)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 19/ 28

slide-54
SLIDE 54

Framework Complexity Practical Conclusion Heuristics A* Evaluation

OneInterval

One single interval Determine number of teams: try all values ℓ between 1 and p For a given ℓ, discard processors too slow for period Assign processors to teams to minimize failure probability

From complexity results: balance reliability across teams NP-hard problem but efficient greedy heuristic: sort processors by non-decreasing failure probability and assign next processor to team with highest failure probability

Time complexity: O(p2 log p)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 19/ 28

slide-55
SLIDE 55

Framework Complexity Practical Conclusion Heuristics A* Evaluation

MultiInterval

Step 1: create min(n, p) intervals (one stage per processor, or balance computational load across intervals) Step 2: greedily add processors to stages, to minimize maximum ratio of interval computation load to accumulated processor speed Step 3: for each interval, use OneInterval to form teams; use previously unallocated processors (too slow for period); increase bound on period for the interval until valid allocation returned Step 4: if period bound not achieved for at least one interval, merge interval with largest period with previous or next interval, until bound is achieved Step 5: merge intervals with highest failure probability as long as it is beneficial Note that OneInterval is called each time we tentatively merge two intervals (steps 4 and 5) Time complexity: O(p3 log p)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 20/ 28

slide-56
SLIDE 56

Framework Complexity Practical Conclusion Heuristics A* Evaluation

MultiInterval

Step 1: create min(n, p) intervals (one stage per processor, or balance computational load across intervals) Step 2: greedily add processors to stages, to minimize maximum ratio of interval computation load to accumulated processor speed Step 3: for each interval, use OneInterval to form teams; use previously unallocated processors (too slow for period); increase bound on period for the interval until valid allocation returned Step 4: if period bound not achieved for at least one interval, merge interval with largest period with previous or next interval, until bound is achieved Step 5: merge intervals with highest failure probability as long as it is beneficial Note that OneInterval is called each time we tentatively merge two intervals (steps 4 and 5) Time complexity: O(p3 log p)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 20/ 28

slide-57
SLIDE 57

Framework Complexity Practical Conclusion Heuristics A* Evaluation

MultiInterval

Step 1: create min(n, p) intervals (one stage per processor, or balance computational load across intervals) Step 2: greedily add processors to stages, to minimize maximum ratio of interval computation load to accumulated processor speed Step 3: for each interval, use OneInterval to form teams; use previously unallocated processors (too slow for period); increase bound on period for the interval until valid allocation returned Step 4: if period bound not achieved for at least one interval, merge interval with largest period with previous or next interval, until bound is achieved Step 5: merge intervals with highest failure probability as long as it is beneficial Note that OneInterval is called each time we tentatively merge two intervals (steps 4 and 5) Time complexity: O(p3 log p)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 20/ 28

slide-58
SLIDE 58

Framework Complexity Practical Conclusion Heuristics A* Evaluation

MultiInterval

Step 1: create min(n, p) intervals (one stage per processor, or balance computational load across intervals) Step 2: greedily add processors to stages, to minimize maximum ratio of interval computation load to accumulated processor speed Step 3: for each interval, use OneInterval to form teams; use previously unallocated processors (too slow for period); increase bound on period for the interval until valid allocation returned Step 4: if period bound not achieved for at least one interval, merge interval with largest period with previous or next interval, until bound is achieved Step 5: merge intervals with highest failure probability as long as it is beneficial Note that OneInterval is called each time we tentatively merge two intervals (steps 4 and 5) Time complexity: O(p3 log p)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 20/ 28

slide-59
SLIDE 59

Framework Complexity Practical Conclusion Heuristics A* Evaluation

MultiInterval

Step 1: create min(n, p) intervals (one stage per processor, or balance computational load across intervals) Step 2: greedily add processors to stages, to minimize maximum ratio of interval computation load to accumulated processor speed Step 3: for each interval, use OneInterval to form teams; use previously unallocated processors (too slow for period); increase bound on period for the interval until valid allocation returned Step 4: if period bound not achieved for at least one interval, merge interval with largest period with previous or next interval, until bound is achieved Step 5: merge intervals with highest failure probability as long as it is beneficial Note that OneInterval is called each time we tentatively merge two intervals (steps 4 and 5) Time complexity: O(p3 log p)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 20/ 28

slide-60
SLIDE 60

Framework Complexity Practical Conclusion Heuristics A* Evaluation

MultiInterval

Step 1: create min(n, p) intervals (one stage per processor, or balance computational load across intervals) Step 2: greedily add processors to stages, to minimize maximum ratio of interval computation load to accumulated processor speed Step 3: for each interval, use OneInterval to form teams; use previously unallocated processors (too slow for period); increase bound on period for the interval until valid allocation returned Step 4: if period bound not achieved for at least one interval, merge interval with largest period with previous or next interval, until bound is achieved Step 5: merge intervals with highest failure probability as long as it is beneficial Note that OneInterval is called each time we tentatively merge two intervals (steps 4 and 5) Time complexity: O(p3 log p)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 20/ 28

slide-61
SLIDE 61

Framework Complexity Practical Conclusion Heuristics A* Evaluation

MultiInterval

Step 1: create min(n, p) intervals (one stage per processor, or balance computational load across intervals) Step 2: greedily add processors to stages, to minimize maximum ratio of interval computation load to accumulated processor speed Step 3: for each interval, use OneInterval to form teams; use previously unallocated processors (too slow for period); increase bound on period for the interval until valid allocation returned Step 4: if period bound not achieved for at least one interval, merge interval with largest period with previous or next interval, until bound is achieved Step 5: merge intervals with highest failure probability as long as it is beneficial Note that OneInterval is called each time we tentatively merge two intervals (steps 4 and 5) Time complexity: O(p3 log p)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 20/ 28

slide-62
SLIDE 62

Framework Complexity Practical Conclusion Heuristics A* Evaluation

A* algorithm

A* best-first state space search algorithm for small problem instances Non-linearity of failure probability: rules out the use of integer linear programming Search space: state s is a partial solution (i.e., partial mapping), with underestimated cost value c(s) Expansion of a partial solution with lowest c(s) value, with a stage or a processor Complete mapping obtained: optimal solution (best-first strategy)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 21/ 28

slide-63
SLIDE 63

Framework Complexity Practical Conclusion Heuristics A* Evaluation

A* algorithm

A* best-first state space search algorithm for small problem instances Non-linearity of failure probability: rules out the use of integer linear programming Search space: state s is a partial solution (i.e., partial mapping), with underestimated cost value c(s) Expansion of a partial solution with lowest c(s) value, with a stage or a processor Complete mapping obtained: optimal solution (best-first strategy)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 21/ 28

slide-64
SLIDE 64

Framework Complexity Practical Conclusion Heuristics A* Evaluation

A* algorithm

A* best-first state space search algorithm for small problem instances Non-linearity of failure probability: rules out the use of integer linear programming Search space: state s is a partial solution (i.e., partial mapping), with underestimated cost value c(s) Expansion of a partial solution with lowest c(s) value, with a stage or a processor Complete mapping obtained: optimal solution (best-first strategy)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 21/ 28

slide-65
SLIDE 65

Framework Complexity Practical Conclusion Heuristics A* Evaluation

A* algorithm

A* best-first state space search algorithm for small problem instances Non-linearity of failure probability: rules out the use of integer linear programming Search space: state s is a partial solution (i.e., partial mapping), with underestimated cost value c(s) Expansion of a partial solution with lowest c(s) value, with a stage or a processor Complete mapping obtained: optimal solution (best-first strategy)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 21/ 28

slide-66
SLIDE 66

Framework Complexity Practical Conclusion Heuristics A* Evaluation

A* algorithm

A* best-first state space search algorithm for small problem instances Non-linearity of failure probability: rules out the use of integer linear programming Search space: state s is a partial solution (i.e., partial mapping), with underestimated cost value c(s) Expansion of a partial solution with lowest c(s) value, with a stage or a processor Complete mapping obtained: optimal solution (best-first strategy)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 21/ 28

slide-67
SLIDE 67

Framework Complexity Practical Conclusion Heuristics A* Evaluation

State tree for two stages on two processors

[Sa; Sb] (P1, P2) (P3, P4) P5, P6 : first team for this interval : second team for this interval : processors not selected : expansion with a new stage : expansion with a new processor : invalid state : goal state for the last interval : one interval

Legend

[S1] [S1] P1 [S1] P1, P2 [S1] (P1) [S1, S2] (P1) [S1] [S2] (P1) [S1] (P1, P2) [S2] [S1] (P1) (P2) [S2] [S1, S2] (P1) (P2) [S1] (P1) (P2) (P1, P2) [S1, S2] [S1, S2] [S2] [S1] [S2] [S1] P1 [S2] [S1] [S1] (P1, P2) [S1] P2 (P1) [S2] [S1] P2 [S2] [S1] empty state [S1] (P2) P1 (P1) (P1) (P2) (P2) (P2) (P1) (P2) (P2) Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 22/ 28

slide-68
SLIDE 68

Framework Complexity Practical Conclusion Heuristics A* Evaluation

Underestimate cost functions

Failure probability F

Partial mapping: adding team increases failure probability Underestimate: add remaining processors to existing teams NP-hard problem: consider amount of reliability available and distribute it to the existing teams to balance their reliability

Period P

Need to check that partial solution does not exceed the bound: can be computed exactly Second underestimate: optimal period achieved by remaining processors on remaining stages NP-hard problem: consider perfect load balance: P ≤

wi su

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 23/ 28

slide-69
SLIDE 69

Framework Complexity Practical Conclusion Heuristics A* Evaluation

Underestimate cost functions

Failure probability F

Partial mapping: adding team increases failure probability Underestimate: add remaining processors to existing teams NP-hard problem: consider amount of reliability available and distribute it to the existing teams to balance their reliability

Period P

Need to check that partial solution does not exceed the bound: can be computed exactly Second underestimate: optimal period achieved by remaining processors on remaining stages NP-hard problem: consider perfect load balance: P ≤

wi su

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 23/ 28

slide-70
SLIDE 70

Framework Complexity Practical Conclusion Heuristics A* Evaluation

Heuristics vs A*

Randomly generated workload scenarios Both heuristics close to optimal solution OneInterval is better than MultiInterval in a few cases A* much slower, but main limitation is memory

2 3 4 5 6 7 8 9 10 Period bound 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Failure probability MultiInterval OneInterval A* 1 2 3 4 5 6 7 8 9 10 Period bound 0.5 1 1.5 2 2.5 3 3.5 Running time (seconds) MultiInterval OneInterval A*

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 24/ 28

slide-71
SLIDE 71

Framework Complexity Practical Conclusion Heuristics A* Evaluation

Performance of heuristics

Distribution of ratio between failure probability obtained by a heuristic (OneInterval in red, MultiInterval in blue) and

  • ptimal failure probability (A*) (optimal: ratio 1)

On average, heuristics 20% above optimal Ratio 10: cases in which heuristics find no solution (≈ 10%)

1 2 3 4 5 6 7 8 9 10 Ratio of the failure probability to the optimal one 10 20 30 40 50 60 70 Frequency (%) 1 2 3 4 5 6 7 8 9 10 Ratio of the failure probability to the optimal one 10 20 30 40 50 60 70 Frequency (%)

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 25/ 28

slide-72
SLIDE 72

Framework Complexity Practical Conclusion Heuristics A* Evaluation

Larger scenarios

OneInterval better in 61% of the cases MultiInterval better in 20% of the cases On average, failure probability of OneInterval 2% above MultiInterval Comparison of OneInterval with optimal single-interval solution (easy to compute with A*): in average, 0.05% above

  • ptimal, and 5% in the worst case

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 26/ 28

slide-73
SLIDE 73

Framework Complexity Practical Conclusion Heuristics A* Evaluation

Larger scenarios

OneInterval better in 61% of the cases MultiInterval better in 20% of the cases On average, failure probability of OneInterval 2% above MultiInterval Comparison of OneInterval with optimal single-interval solution (easy to compute with A*): in average, 0.05% above

  • ptimal, and 5% in the worst case

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 26/ 28

slide-74
SLIDE 74

Framework Complexity Practical Conclusion Heuristics A* Evaluation

Larger scenarios

OneInterval better in 61% of the cases MultiInterval better in 20% of the cases On average, failure probability of OneInterval 2% above MultiInterval Comparison of OneInterval with optimal single-interval solution (easy to compute with A*): in average, 0.05% above

  • ptimal, and 5% in the worst case

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 26/ 28

slide-75
SLIDE 75

Framework Complexity Practical Conclusion

Outline of the talk

1

Framework Application Platform Mapping Objective

2

Complexity results Mono-criterion Bi-criteria Approximation results

3

Practical side Heuristics Optimal algorithm using A* Evaluation results

4

Conclusion

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 27/ 28

slide-76
SLIDE 76

Framework Complexity Practical Conclusion

Conclusion and future work

Exhaustive complexity study

polynomial time algorithm for SpeedHom-FailureHom platforms NP-completeness with one level of heterogeneity approximation results to compare single interval solution with any other solution

Practical solution to the problem

efficient heuristics (inspired by theoretical study) for SpeedHet-FailureHet platforms A* algorithm with non-trivial underestimate functions experimental results: very good behaviour of heuristics

Future work

further approximation results enhanced multiple interval heuristics improved A* techniques

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 28/ 28

slide-77
SLIDE 77

Framework Complexity Practical Conclusion

Conclusion and future work

Exhaustive complexity study

polynomial time algorithm for SpeedHom-FailureHom platforms NP-completeness with one level of heterogeneity approximation results to compare single interval solution with any other solution

Practical solution to the problem

efficient heuristics (inspired by theoretical study) for SpeedHet-FailureHet platforms A* algorithm with non-trivial underestimate functions experimental results: very good behaviour of heuristics

Future work

further approximation results enhanced multiple interval heuristics improved A* techniques

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 28/ 28

slide-78
SLIDE 78

Framework Complexity Practical Conclusion

Conclusion and future work

Exhaustive complexity study

polynomial time algorithm for SpeedHom-FailureHom platforms NP-completeness with one level of heterogeneity approximation results to compare single interval solution with any other solution

Practical solution to the problem

efficient heuristics (inspired by theoretical study) for SpeedHet-FailureHet platforms A* algorithm with non-trivial underestimate functions experimental results: very good behaviour of heuristics

Future work

further approximation results enhanced multiple interval heuristics improved A* techniques

Anne.Benoit@ens-lyon.fr October 28, 2010 Mapping pipelined applications with replication 28/ 28