Optimizing Latency and Reliability of Pipeline Workflow Applications - - PowerPoint PPT Presentation

optimizing latency and reliability of pipeline workflow
SMART_READER_LITE
LIVE PREVIEW

Optimizing Latency and Reliability of Pipeline Workflow Applications - - PowerPoint PPT Presentation

Introduction Framework Examples Complexity Conclusion Optimizing Latency and Reliability of Pipeline Workflow Applications Anne Benoit Veronika Rehn-Sonigo Yves Robert GRAAL team, LIP Ecole Normale Sup erieure de Lyon France HCW


slide-1
SLIDE 1

Introduction Framework Examples Complexity Conclusion

Optimizing Latency and Reliability of Pipeline Workflow Applications

Anne Benoit Veronika Rehn-Sonigo Yves Robert

GRAAL team, LIP ´ Ecole Normale Sup´ erieure de Lyon France

HCW 2008

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 1/ 27

slide-2
SLIDE 2

Introduction Framework Examples Complexity Conclusion

Introduction and motivation

Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach

Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping

Mapping pipeline skeletons onto heterogeneous platforms

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 2/ 27

slide-3
SLIDE 3

Introduction Framework Examples Complexity Conclusion

Introduction and motivation

Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach

Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping

Mapping pipeline skeletons onto heterogeneous platforms

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 2/ 27

slide-4
SLIDE 4

Introduction Framework Examples Complexity Conclusion

Introduction and motivation

Mapping applications onto parallel platforms Difficult challenge Heterogeneous clusters, fully heterogeneous platforms Even more difficult! Structured programming approach

Easier to program (deadlocks, process starvation) Range of well-known paradigms (pipeline, farm) Algorithmic skeleton: help for mapping

Mapping pipeline skeletons onto heterogeneous platforms

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 2/ 27

slide-5
SLIDE 5

Introduction Framework Examples Complexity Conclusion

Multi-criteria scheduling of workflows

Workflow Several consecutive data-sets enter the application graph. Multi-criteria? Latency: maximal time elapsed between beginning and end of execution of a data set Failure: the probability that a processor fails during execution Bi-criteria!

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 3/ 27

slide-6
SLIDE 6

Introduction Framework Examples Complexity Conclusion

Multi-criteria scheduling of workflows

Workflow Several consecutive data-sets enter the application graph. Multi-criteria? Latency: maximal time elapsed between beginning and end of execution of a data set Failure: the probability that a processor fails during execution Bi-criteria!

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 3/ 27

slide-7
SLIDE 7

Introduction Framework Examples Complexity Conclusion

Multi-criteria scheduling of workflows

Workflow Several consecutive data-sets enter the application graph. Multi-criteria? Latency: maximal time elapsed between beginning and end of execution of a data set Failure: the probability that a processor fails during execution Bi-criteria!

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 3/ 27

slide-8
SLIDE 8

Introduction Framework Examples Complexity Conclusion

Multi-criteria scheduling of workflows

Workflow Several consecutive data-sets enter the application graph. Multi-criteria? Latency: maximal time elapsed between beginning and end of execution of a data set Failure: the probability that a processor fails during execution Bi-criteria!

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 3/ 27

slide-9
SLIDE 9

Introduction Framework Examples Complexity Conclusion

Rule of the game

Map each pipeline stage on a single processor Goal: minimize latency AND minimize failure probability Several mapping strategies

... ...

S2 Sk Sn S1

The pipeline application

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 4/ 27

slide-10
SLIDE 10

Introduction Framework Examples Complexity Conclusion

Rule of the game

Map each pipeline stage on a single processor Goal: minimize latency AND minimize failure probability Several mapping strategies

... ...

S2 Sk Sn S1

The pipeline application

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 4/ 27

slide-11
SLIDE 11

Introduction Framework Examples Complexity Conclusion

Rule of the game

Map each pipeline stage on a single processor Goal: minimize latency AND minimize failure probability Several mapping strategies

... ...

S2 Sk Sn S1

One-to-one Mapping

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 4/ 27

slide-12
SLIDE 12

Introduction Framework Examples Complexity Conclusion

Rule of the game

Map each pipeline stage on a single processor Goal: minimize latency AND minimize failure probability Several mapping strategies

... ...

S2 Sk Sn S1

Interval Mapping

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 4/ 27

slide-13
SLIDE 13

Introduction Framework Examples Complexity Conclusion

Rule of the game

Map each pipeline stage on a single processor Goal: minimize latency AND minimize failure probability Several mapping strategies

... ...

S2 Sk Sn S1

General Mapping

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 4/ 27

slide-14
SLIDE 14

Introduction Framework Examples Complexity Conclusion

Rule of the game

Map each pipeline stage on a single processor Goal: minimize latency AND minimize failure probability Several mapping strategies

... ...

S2 Sk Sn S1

Interval Mapping Replication (one interval onto several processors) in

  • rder to increase reliability

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 4/ 27

slide-15
SLIDE 15

Introduction Framework Examples Complexity Conclusion

Major Contributions

Definition of bi-criteria mapping Complexity results

Mono-criterion problems Bi-criteria problems

Optimal algorithms

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 5/ 27

slide-16
SLIDE 16

Introduction Framework Examples Complexity Conclusion

Outline

1

Framework

2

Motivating Examples

3

Complexity Results Mono-criterion Problems Bi-criteria Problems

4

Conclusion

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 6/ 27

slide-17
SLIDE 17

Introduction Framework Examples Complexity Conclusion

The application

... ...

S2 Sk Sn S1 w1 w2 wk wn δ0 δ1 δk−1 δk δn

n stages Sk, 1 ≤ k ≤ n Sk:

receives input of size δk−1 from Sk−1 performs wk computations

  • utputs data of size δk to Sk+1

S0 and Sn+1: virtual stages representing the outside world

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 7/ 27

slide-18
SLIDE 18

Introduction Framework Examples Complexity Conclusion

The platform

Pv Pout Pin sv Pu su bv,out bu,v sin sout bin,u

p processors Pu, 1 ≤ u ≤ p, fully interconnected su: speed of processor Pu bidirectional link linku,v : Pu → Pv, bandwidth bu,v fpu: failure probability of processor Pu (independent of duration, meant to run for a long time)

  • ne-port model: each processor can either send, receive or

compute at any time-step

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 8/ 27

slide-19
SLIDE 19

Introduction Framework Examples Complexity Conclusion

Different platforms

Fully Homogeneous – Identical processors (su = s) and links (bu,v = b): typical parallel machines Communication Homogeneous – Different-speed processors (su = sv), identical links (bu,v = b): networks of workstations, clusters Fully Heterogeneous – Fully heterogeneous architectures, su = sv and bu,v = bu′,v′: hierarchical platforms, grids

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 9/ 27

slide-20
SLIDE 20

Introduction Framework Examples Complexity Conclusion

Different platforms

Fully Homogeneous – Identical processors (su = s) and links (bu,v = b): typical parallel machines Failure Homogeneous – Identically reliable processors (fpu = fpv) Communication Homogeneous – Different-speed processors (su = sv), identical links (bu,v = b): networks of workstations, clusters Fully Heterogeneous – Fully heterogeneous architectures, su = sv and bu,v = bu′,v′: hierarchical platforms, grids Failure Heterogeneous – Different failure probabilities (fpu = fpv)

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 9/ 27

slide-21
SLIDE 21

Introduction Framework Examples Complexity Conclusion

Mapping problem: Interval Mapping

Partition of [1..n] into m intervals Ij = [dj, ej] (with dj ≤ ej for 1 ≤ j ≤ m, d1 = 1, dj+1 = ej + 1 for 1 ≤ j ≤ m − 1 and em = n) Interval Ij mapped onto set of processors Palloc(j) FP = 1 −

  • 1≤j≤p

(1 −

  • u∈alloc(j)

fpu)

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 10/ 27

slide-22
SLIDE 22

Introduction Framework Examples Complexity Conclusion

Mapping problem: Interval Mapping

Partition of [1..n] into m intervals Ij = [dj, ej] (with dj ≤ ej for 1 ≤ j ≤ m, d1 = 1, dj+1 = ej + 1 for 1 ≤ j ≤ m − 1 and em = n) Interval Ij mapped onto set of processors Palloc(j) FP = 1 −

  • 1≤j≤p

(1 −

  • u∈alloc(j)

fpu)

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 10/ 27

slide-23
SLIDE 23

Introduction Framework Examples Complexity Conclusion

Mapping problem: Interval Mapping

Partition of [1..n] into m intervals Ij = [dj, ej] (with dj ≤ ej for 1 ≤ j ≤ m, d1 = 1, dj+1 = ej + 1 for 1 ≤ j ≤ m − 1 and em = n) Interval Ij mapped onto set of processors Palloc(j) FP = 1 −

  • 1≤j≤p

(1 −

  • u∈alloc(j)

fpu) L =

  • 1≤j≤p
  • kj × δdj−1

b + ej

i=dj wi

minu∈alloc(j)(su)

  • +δn

b

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 10/ 27

slide-24
SLIDE 24

Introduction Framework Examples Complexity Conclusion

Mapping problem: Interval Mapping

Partition of [1..n] into m intervals Ij = [dj, ej] (with dj ≤ ej for 1 ≤ j ≤ m, d1 = 1, dj+1 = ej + 1 for 1 ≤ j ≤ m − 1 and em = n) Interval Ij mapped onto set of processors Palloc(j) FP = 1 −

  • 1≤j≤p

(1 −

  • u∈alloc(j)

fpu) L =

  • u∈alloc(1)

δ0 bin,u +

  • 1≤j≤p

max

u∈alloc(j)

   ej

i=dj wi

su +

  • v∈alloc(j+1)

δej bu,v   

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 10/ 27

slide-25
SLIDE 25

Introduction Framework Examples Complexity Conclusion

Objective function?

Mono-criterion Minimize L Minimize FP Bi-criteria How to define it? Minimize α.L + β.FP? Values which are not comparable Minimize L for a fixed failure probability Minimize FP for a fixed latency

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 11/ 27

slide-26
SLIDE 26

Introduction Framework Examples Complexity Conclusion

Objective function?

Mono-criterion Minimize L Minimize FP Bi-criteria How to define it? Minimize α.L + β.FP? Values which are not comparable Minimize L for a fixed failure probability Minimize FP for a fixed latency

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 11/ 27

slide-27
SLIDE 27

Introduction Framework Examples Complexity Conclusion

Objective function?

Mono-criterion Minimize L Minimize FP Bi-criteria How to define it? Minimize α.L + β.FP? Values which are not comparable Minimize L for a fixed failure probability Minimize FP for a fixed latency

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 11/ 27

slide-28
SLIDE 28

Introduction Framework Examples Complexity Conclusion

Objective function?

Mono-criterion Minimize L Minimize FP Bi-criteria How to define it? Minimize α.L + β.FP? Values which are not comparable Minimize L for a fixed failure probability Minimize FP for a fixed latency

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 11/ 27

slide-29
SLIDE 29

Introduction Framework Examples Complexity Conclusion

Outline

1

Framework

2

Motivating Examples

3

Complexity Results Mono-criterion Problems Bi-criteria Problems

4

Conclusion

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 12/ 27

slide-30
SLIDE 30

Introduction Framework Examples Complexity Conclusion

Mono-criterion - Interval Mapping

Minimize L

100 100 w2 = 2 w1 = 2 100

S1 S2

100 100 100 100 100 s1 = 1 s2 = 2

Pin P1 Pout P2

  • Comm. Hom. Platform

100 100 w2 = 2 w1 = 2 100

S1 S2

100 1 1 100 100 s1 = 1 s2 = 1

Pin P1 Pout P2

  • Hetero. Platform

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 13/ 27

slide-31
SLIDE 31

Introduction Framework Examples Complexity Conclusion

Mono-criterion - Interval Mapping

Minimize L

100 100 w2 = 2 w1 = 2 100

S1 S2

100 100 100 100 100 s1 = 1 s2 = 2

Pin P1 Pout P2

  • Comm. Hom. Platform

100 100 w2 = 2 w1 = 2 100

S1 S2

100 1 1 100 100 s1 = 1 s2 = 1

Pin P1 Pout P2

  • Hetero. Platform

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 13/ 27

slide-32
SLIDE 32

Introduction Framework Examples Complexity Conclusion

Mono-criterion - Interval Mapping

Minimize L

100 100 w2 = 2 w1 = 2 100

S1 S2

100 100 100 100 100 s1 = 1 s2 = 2

Pin P1 Pout P2

  • Comm. Hom. Platform

100 100 w2 = 2 w1 = 2 100

S1 S2

100 1 1 100 100 s1 = 1 s2 = 1

Pin P1 Pout P2

  • Hetero. Platform

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 13/ 27

slide-33
SLIDE 33

Introduction Framework Examples Complexity Conclusion

Bi-criteria - Interval Mapping

Minimize FP with fixed latency Communication homogeneous - Failure heterogeneous Fixed latency: 22 10 1 w2 = 100 w1 = 1 S1 S2

s = 100 fp = 0.8 s = 1, fp = 0.1

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 14/ 27

slide-34
SLIDE 34

Introduction Framework Examples Complexity Conclusion

Bi-criteria - Interval Mapping

Minimize FP with fixed latency Communication homogeneous - Failure heterogeneous Fixed latency: 22 10 1 w2 = 100 w1 = 1 S1 S2 10 + 101 ≫ 22

s = 100 fp = 0.8 s = 1, fp = 0.1

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 14/ 27

slide-35
SLIDE 35

Introduction Framework Examples Complexity Conclusion

Bi-criteria - Interval Mapping

Minimize FP with fixed latency Communication homogeneous - Failure heterogeneous Fixed latency: 22 10 1 w2 = 100 w1 = 1 S1 S2 20 + 101/100 < 22 FP = (1 − (1 − 0.82)) = 0.64

s = 100 fp = 0.8 s = 1, fp = 0.1

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 14/ 27

slide-36
SLIDE 36

Introduction Framework Examples Complexity Conclusion

Bi-criteria - Interval Mapping

Minimize FP with fixed latency Communication homogeneous - Failure heterogeneous Fixed latency: 22 10 1 w2 = 100 w1 = 1 S1 S2 30 + 101/100 > 22

s = 100 fp = 0.8 s = 1, fp = 0.1

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 14/ 27

slide-37
SLIDE 37

Introduction Framework Examples Complexity Conclusion

Bi-criteria - Interval Mapping

Minimize FP with fixed latency Communication homogeneous - Failure heterogeneous Fixed latency: 22 10 1 w2 = 100 w1 = 1 S1 S2 10 + 1/1 + 10 × 1 + 100/100 = 22 FP : 1−(1−0.1)×(1−0.810) < 0.2

s = 100 fp = 0.8 s = 1, fp = 0.1

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 14/ 27

slide-38
SLIDE 38

Introduction Framework Examples Complexity Conclusion

Outline

1

Framework

2

Motivating Examples

3

Complexity Results Mono-criterion Problems Bi-criteria Problems

4

Conclusion

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 15/ 27

slide-39
SLIDE 39

Introduction Framework Examples Complexity Conclusion

Mono-criterion Problems

Minimize the failure probability? Theorem 1 Minimizing the failure probability can be done in polynomial time. Replicate the whole pipeline as a single interval. Use all processors. True for all platform types.

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 16/ 27

slide-40
SLIDE 40

Introduction Framework Examples Complexity Conclusion

Mono-criterion Problems

Minimize the failure probability? Theorem 1 Minimizing the failure probability can be done in polynomial time. Replicate the whole pipeline as a single interval. Use all processors. True for all platform types.

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 16/ 27

slide-41
SLIDE 41

Introduction Framework Examples Complexity Conclusion

Mono-criterion Problems

Minimize the failure probability? Theorem 1 Minimizing the failure probability can be done in polynomial time. Replicate the whole pipeline as a single interval. Use all processors. True for all platform types.

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 16/ 27

slide-42
SLIDE 42

Introduction Framework Examples Complexity Conclusion

Mono-criterion Problems

Minimize the latency? Theorem 2 Minimizing the latency can be done in polynomial time on Communication Homogeneous platforms. Idea: Latency is optimized by suppressing all communications. Replication increases latency (additional communication). Map whole pipeline on fastest processor.

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 17/ 27

slide-43
SLIDE 43

Introduction Framework Examples Complexity Conclusion

Mono-criterion Problems

Minimize the latency? Theorem 2 Minimizing the latency can be done in polynomial time on Communication Homogeneous platforms. Idea: Latency is optimized by suppressing all communications. Replication increases latency (additional communication). Map whole pipeline on fastest processor.

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 17/ 27

slide-44
SLIDE 44

Introduction Framework Examples Complexity Conclusion

Mono-criterion Problems

Minimize the latency? Theorem 2 Minimizing the latency can be done in polynomial time on Communication Homogeneous platforms. Idea: Latency is optimized by suppressing all communications. Replication increases latency (additional communication). Map whole pipeline on fastest processor.

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 17/ 27

slide-45
SLIDE 45

Introduction Framework Examples Complexity Conclusion

Mono-criterion Problems

Minimize the latency? What about Fully Heterogeneous platforms? Remember example:

100 100 w2 = 2 w1 = 2 100

S1 S2

100 1 1 100 100 s1 = 1 s2 = 1

Pin P1 Pout P2

Theorem 3 Minimizing the latency is NP-hard on Fully Heterogeneous platforms for one-to-one mappings.

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 18/ 27

slide-46
SLIDE 46

Introduction Framework Examples Complexity Conclusion

Mono-criterion Problems

But ... considering general mappings ... Theorem 4 Minimizing the latency is polynomial on Fully Heterogeneous platforms for general mappings.

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 19/ 27

slide-47
SLIDE 47

Introduction Framework Examples Complexity Conclusion

Mono-criterion Problems

But ... considering general mappings ... Theorem 4 Minimizing the latency is polynomial on Fully Heterogeneous platforms for general mappings.

V1,1 V2,1 e1,1,1 V0,in e0,in,1 en,1,out

... ... ... ... ...

V1,2 V1,m Vn+1,out Vn,1 Vn,2 Vn,m V2,1 V2,m e0,in,m e2,u,v en−1,u,v

Optimal mapping: Shortest path in the graph.

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 19/ 27

slide-48
SLIDE 48

Introduction Framework Examples Complexity Conclusion

Mono-criterion Problems

But ... considering general mappings ... Theorem 4 Minimizing the latency is polynomial on Fully Heterogeneous platforms for general mappings.

V1,1 V2,1 e1,1,1 V0,in e0,in,1 en,1,out

... ... ... ... ...

V1,2 V1,m Vn+1,out Vn,1 Vn,2 Vn,m V2,1 V2,m e0,in,m e2,u,v en−1,u,v

Optimal mapping: Shortest path in the graph. Interval mapping: still an open problem

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 19/ 27

slide-49
SLIDE 49

Introduction Framework Examples Complexity Conclusion

Bi-criteria Problems

L

1 − (1 − fpa+b) ≤ 1 − ((1 − fpa)(1 − fpb)) Lemma On Fully Homogeneous and Communication Homogeneous-Failure Homogeneous platforms, there is a mapping

  • f the pipeline as a single interval which minimizes the failure

probability under a fixed latency threshold, and there is a mapping

  • f the pipeline as a single interval which minimizes the latency

under a fixed failure probability threshold.

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 20/ 27

slide-50
SLIDE 50

Introduction Framework Examples Complexity Conclusion

Bi-criteria Problems

L

1 − (1 − fpa+b) ≤ 1 − ((1 − fpa)(1 − fpb)) Lemma On Fully Homogeneous and Communication Homogeneous-Failure Homogeneous platforms, there is a mapping

  • f the pipeline as a single interval which minimizes the failure

probability under a fixed latency threshold, and there is a mapping

  • f the pipeline as a single interval which minimizes the latency

under a fixed failure probability threshold.

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 20/ 27

slide-51
SLIDE 51

Introduction Framework Examples Complexity Conclusion

Bi-criteria Problems

L

1 − (1 − fpa+b) ≤ 1 − ((1 − fpa)(1 − fpb)) Lemma On Fully Homogeneous and Communication Homogeneous-Failure Homogeneous platforms, there is a mapping

  • f the pipeline as a single interval which minimizes the failure

probability under a fixed latency threshold, and there is a mapping

  • f the pipeline as a single interval which minimizes the latency

under a fixed failure probability threshold.

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 20/ 27

slide-52
SLIDE 52

Introduction Framework Examples Complexity Conclusion

Fully Homogeneous platforms

Minimize FP for a fixed latency L Algorithm 1 begin Find k maximum, such that k × δ0 b +

  • 1≤j≤n wj

s + δn b ≤ L Replicate the whole pipeline as a single interval onto the k (most reliable) processors end

L

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 21/ 27

slide-53
SLIDE 53

Introduction Framework Examples Complexity Conclusion

Fully Homogeneous platforms

Minimize FP for a fixed latency L Algorithm 1 begin Find k maximum, such that k × δ0 b +

  • 1≤j≤n wj

s + δn b ≤ L Replicate the whole pipeline as a single interval onto the k (most reliable) processors end

L

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 21/ 27

slide-54
SLIDE 54

Introduction Framework Examples Complexity Conclusion

Fully Homogeneous platforms

Minimize L for a fixed failure probability FP Algorithm 2 begin Find k minimum, such that 1 − (1 − fpk) ≤ FP Replicate the whole pipeline as a single interval onto the k (most reliable) processors end

1 FP

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 22/ 27

slide-55
SLIDE 55

Introduction Framework Examples Complexity Conclusion

Fully Homogeneous platforms

Minimize L for a fixed failure probability FP Algorithm 2 begin Find k minimum, such that 1 − (1 − fpk) ≤ FP Replicate the whole pipeline as a single interval onto the k (most reliable) processors end

1 FP

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 22/ 27

slide-56
SLIDE 56

Introduction Framework Examples Complexity Conclusion

Other Platform Configurations

Communication Homogeneous platforms - Failure Homogeneous Slightly modified Fully Homogeneous algorithms are optimal. Communication Homogeneous platforms - Failure Heterogeneous Lemma does not hold anymore. Remember example. Open problem Fully Heterogeneous platforms On Fully Heterogeneous platforms, the bi-criteria (decision problems associated to the) optimization problems are NP-hard.

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 23/ 27

slide-57
SLIDE 57

Introduction Framework Examples Complexity Conclusion

Other Platform Configurations

Communication Homogeneous platforms - Failure Homogeneous Slightly modified Fully Homogeneous algorithms are optimal. Communication Homogeneous platforms - Failure Heterogeneous Lemma does not hold anymore. Remember example. Open problem Fully Heterogeneous platforms On Fully Heterogeneous platforms, the bi-criteria (decision problems associated to the) optimization problems are NP-hard.

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 23/ 27

slide-58
SLIDE 58

Introduction Framework Examples Complexity Conclusion

Other Platform Configurations

Communication Homogeneous platforms - Failure Homogeneous Slightly modified Fully Homogeneous algorithms are optimal. Communication Homogeneous platforms - Failure Heterogeneous Lemma does not hold anymore. Remember example. Open problem Fully Heterogeneous platforms On Fully Heterogeneous platforms, the bi-criteria (decision problems associated to the) optimization problems are NP-hard.

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 23/ 27

slide-59
SLIDE 59

Introduction Framework Examples Complexity Conclusion

Outline

1

Framework

2

Motivating Examples

3

Complexity Results Mono-criterion Problems Bi-criteria Problems

4

Conclusion

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 24/ 27

slide-60
SLIDE 60

Introduction Framework Examples Complexity Conclusion

Related work

Subhlok and Vondran Latency and throughput optimization on pipeline graphs (homogeneous platforms only) Benoit et al. Extension of the work of Subholk and Vondran Mapping pipelined computations onto clusters and grids DAG [Taura et al.], DataCutter [Saltz et al.] Energy-aware mapping of pipelined computations [Melhem et al.], three-criteria optimization Mapping pipelined computations onto special-purpose architectures FPGA arrays [Fabiani et al.]. Fault-tolerance for embedded systems [Zhu et al.] Real World Application Motion-JPEG

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 25/ 27

slide-61
SLIDE 61

Introduction Framework Examples Complexity Conclusion

Conclusion

Bi-criteria mapping problem: latency and reliability Pipeline structured workflow applications Complexity study Interval Mapping Hom.

  • Com. Hom.

Hetero. Mono- L polyn. polyn. ? crit. FP polyn. polyn. polyn. Bi- L - FP hom polyn. polyn. NP crit. L - FP het polyn. ? NP min L, one-to-one mapping: NP min L, general mapping: polynomial

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 26/ 27

slide-62
SLIDE 62

Introduction Framework Examples Complexity Conclusion

Conclusion

Bi-criteria mapping problem: latency and reliability Pipeline structured workflow applications Complexity study Interval Mapping Hom.

  • Com. Hom.

Hetero. Mono- L polyn. polyn. ? crit. FP polyn. polyn. polyn. Bi- L - FP hom polyn. polyn. NP crit. L - FP het polyn. ? NP min L, one-to-one mapping: NP min L, general mapping: polynomial

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 26/ 27

slide-63
SLIDE 63

Introduction Framework Examples Complexity Conclusion

Future work

Theory Extension to fork, fork-join and tree workflows Multi-criteria: throughput in addition to reliability and latency Practice Design of multi-criteria heuristics Comparison of effective performance against theoretical performance Real experiments on heterogeneous clusters with different applications, using MPI

Veronika.Sonigo@ens-lyon.fr HCW 2008 Optimizing Latency and Reliability 27/ 27