Energy-aware checkpointing of divisible tasks with soft or hard - - PowerPoint PPT Presentation

energy aware checkpointing of divisible tasks with soft
SMART_READER_LITE
LIVE PREVIEW

Energy-aware checkpointing of divisible tasks with soft or hard - - PowerPoint PPT Presentation

Introduction Framework Single chunk Multiple chunks Simulations Conclusion Energy-aware checkpointing of divisible tasks with soft or hard deadlines Guillaume Aupy 1 , Anne Benoit 1 , 2 , Rami Melhem 3 , Paul Renaud-Goud 1 and Yves Robert 1 ,


slide-1
SLIDE 1

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Energy-aware checkpointing of divisible tasks with soft or hard deadlines

Guillaume Aupy1, Anne Benoit1,2, Rami Melhem3, Paul Renaud-Goud1 and Yves Robert1,2,4

  • 1. Ecole Normale Sup´

erieure de Lyon, France

  • 2. Institut Universitaire de France
  • 3. University of Pittsburgh, USA
  • 4. University of Tennessee Knoxville, USA

Anne.Benoit@ens-lyon.fr http://graal.ens-lyon.fr/~abenoit/

International Green Computing Conference 2013 Arlington, USA

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 1/ 25

slide-2
SLIDE 2

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Divisible load scheduling and resilience

Divisible load scheduling: divide a computational workload into chunks

Arbitrary number of chunks Size of chunks freely chosen by user

Goal: minimize makespan, i.e., total execution time Current platforms: increasing frequency of failures Well-established method to deal with failures: checkpointing Take a checkpoint at the end of each chunk and verify result Re-execution in case of transient failure

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 2/ 25

slide-3
SLIDE 3

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Divisible load scheduling and resilience

Divisible load scheduling: divide a computational workload into chunks

Arbitrary number of chunks Size of chunks freely chosen by user

Goal: minimize makespan, i.e., total execution time Current platforms: increasing frequency of failures Well-established method to deal with failures: checkpointing Take a checkpoint at the end of each chunk and verify result Re-execution in case of transient failure

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 2/ 25

slide-4
SLIDE 4

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Energy: a crucial issue

IGCC: Green Computing Conference! Real need to reduce energy dissipation in current processors Processor running at speed s: power s3 watts Dynamic voltage and frequency scaling techniques (DVFS) Our goal: minimize energy consumption

including that of checkpointing and re-execution (if failure) while enforcing a bound on execution time

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 3/ 25

slide-5
SLIDE 5

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Energy: a crucial issue

IGCC: Green Computing Conference! Real need to reduce energy dissipation in current processors Processor running at speed s: power s3 watts Dynamic voltage and frequency scaling techniques (DVFS) Our goal: minimize energy consumption

including that of checkpointing and re-execution (if failure) while enforcing a bound on execution time

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 3/ 25

slide-6
SLIDE 6

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Outline

1

Framework

2

With a single chunk

3

With several chunks

4

Simulation results

5

Conclusion

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 4/ 25

slide-7
SLIDE 7

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Framework

Execution of a divisible task (W operations) Failures may occur

Transient faults Resilience through checkpointing

Objective: minimize expected energy consumption E(E), given a deadline bound D Probabilistic nature of failure hits: expectation of energy consumption is natural (average cost over many executions) Deadline bound: two relevant scenarios (soft or hard deadline)

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 5/ 25

slide-8
SLIDE 8

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Soft vs hard deadline

Soft deadline: met in expectation, i.e., E(T) ≤ D (average response time) Hard deadline: met in the worst case, i.e., Twc ≤ D Hard (worst-case) Soft (expected) VS

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 6/ 25

slide-9
SLIDE 9

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Execution time, one single chunk

One single chunk of size W Checkpoint overhead: execution time TC Instantaneous failure rate: λ First execution at speed s: Texec = W

s + TC

Failure probability: Pfail = λTexec = λ( W

s + TC)

In case of failure: re-execute at speed σ: Treexec = W

σ + TC

And we assume success after re-execution

E(T) = Texec + PfailTreexec = ( W

s + TC) + λ( W s + TC)( W σ + TC)

Twc = Texec + Treexec = ( W

s + TC) + ( W σ + TC)

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 7/ 25

slide-10
SLIDE 10

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Energy consumption, one single chunk

One single chunk of size W Checkpoint overhead: energy consumption EC First execution at speed s: W

s × s3 + EC = Ws2 + EC

Re-execution at speed σ: W σ2 + EC, with probability Pfail

  • Pfail = λTexec = λ( W

s + TC)

  • E(E) = (Ws2 + EC) + λ

W

s + TC

W σ2 + EC

  • Anne.Benoit@ens-lyon.fr

IGCC’2013 Energy-aware checkpointing 8/ 25

slide-11
SLIDE 11

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Multiple chunks

Execution times: sum of execution times for each chunk (worst-case or expected) Expected energy consumption: sum of expected energy for each chunk Coherent failure model: consider two chunks W1 + W2 = W Probability of failure for first chunk: P1

fail = λ( W1 s + TC)

For second chunk: P2

fail = λ( W2 s + TC)

With a single chunk of size W : Pfail = λ( W

s + TC), differs

from P1

fail + P2 fail only because of extra checkpoint

Trade-off: many small chunks (more TC to pay, but small re-execution cost) vs few larger chunks (fewer TC, but increased re-execution cost)

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 9/ 25

slide-12
SLIDE 12

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Optimization problem

Decisions that should be taken before execution:

Chunks: how many (n)? which sizes (Wi for chunk i)? Speeds of each chunk: first run (si)? re-execution (σi)?

Input: W , TC (checkpointing time), EC (energy spent for checkpointing), λ (instantaneous failure rate), D (deadline)

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 10/ 25

slide-13
SLIDE 13

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Optimization problem

Decisions that should be taken before execution:

Chunks: how many (n)? which sizes (Wi for chunk i)? Speeds of each chunk: first run (si)? re-execution (σi)?

Input: W , TC (checkpointing time), EC (energy spent for checkpointing), λ (instantaneous failure rate), D (deadline)

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 10/ 25

slide-14
SLIDE 14

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Optimization problem

Decisions that should be taken before execution:

Chunks: how many (n)? which sizes (Wi for chunk i)? Speeds of each chunk: first run (si)? re-execution (σi)?

Input: W , TC (checkpointing time), EC (energy spent for checkpointing), λ (instantaneous failure rate), D (deadline)

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 10/ 25

slide-15
SLIDE 15

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Models

Chunks Single chunk of size W Multiple chunks (n and Wi’s) VS Speed per chunk Single speed (s) Multiple speeds (s and σ) VS Deadline bound Hard (Twc ≤ D) Soft (E(T) ≤ D) VS

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 11/ 25

slide-16
SLIDE 16

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Outline

1

Framework

2

With a single chunk

3

With several chunks

4

Simulation results

5

Conclusion

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 12/ 25

slide-17
SLIDE 17

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Single chunk and single speed

Consider first that s = σ (single speed): need to find optimal speed E(E) is a function of s: E(E)(s) = (Ws2 + EC)(1 + λ( W

s + TC))

Lemma: this function is convex and has a unique minimum s⋆ (function of λ, W , Ec, Tc)

s⋆ =

λW 6(1+λTC )

  • −(3

√ 3√ 27a2−4a−27a+2)1/3 21/3

21/3 (3 √ 3√ 27a2−4a−27a+2)1/3 − 1

  • ,

where a = λEC 2(1+λTC )

λW

2

E(T) and Twc: decreasing functions of s Minimum speed sexp and swc required to match deadline D (function of D, W , Tc, and λ for sexp) → Optimal speed: maximum between s⋆ and sexp or swc

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 13/ 25

slide-18
SLIDE 18

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Single chunk and multiple speeds

Consider now that s = σ (multiple speeds): two unknowns E(E) is a function of s and σ: E(E)(s, σ) = (Ws2 + EC) + λ( W

s + TC)(W σ2 + EC)

Lemma: energy minimized when deadline tight (both for wc and exp) σ expressed as a function of s:

σexp =

λW D W s +TC −(1+λTC ) ,

σwc =

W (D−2TC )s−W s

→ Minimization of single-variable function, can be solved numerically (no expression of optimal s)

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 14/ 25

slide-19
SLIDE 19

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Outline

1

Framework

2

With a single chunk

3

With several chunks

4

Simulation results

5

Conclusion

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 15/ 25

slide-20
SLIDE 20

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

General problem with multiple chunks

Divisible task of size W Split into n chunks of size Wi: n

i=1 Wi = W

Chunk i is executed once at speed si, and re-executed (if necessary) at speed σi Unknowns: n, Wi, si, σi E(E) =

n

  • i=1
  • Wis2

i + EC

  • + λ

n

  • i=1

Wi si + TC Wiσ2

i + EC

  • Anne.Benoit@ens-lyon.fr

IGCC’2013 Energy-aware checkpointing 16/ 25

slide-21
SLIDE 21

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Multiple chunks and single speed

With a single speed, σi = si for each chunk Theorem: in optimal solution, n equal-sized chunks (Wi = W

n ), executed at same speed si = s

Proof by contradiction: consider two chunks W1 and W2 executed at speed s1 and s2, with either s1 = s2,

  • r s1 = s2 and W1 = W2

⇒ Strictly better solution with two chunks of size w = (W1 + W2)/2 and same speed s

Only two unknowns, s and n Minimum speed with n chunks:

s⋆

exp(n) = W

1 + 2λTC +

  • 4 λD

n

+ 1 2(D − nTC (1 + λTC ))

→ Minimization of double-variable function, can be solved numerically both for expected and hard deadline

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 17/ 25

slide-22
SLIDE 22

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Multiple chunks and multiple speeds

Need to find n, Wi, si, σi With expected deadline:

All re-execution speeds are equal (σi = σ) and tight deadline All chunks have same size and are executed at same speed

WIth hard deadline:

If si = s and σi = σ, then all Wi’s are equal Conjecture: equal-sized chunks, same first-execution / re-execution speeds

σ as a function of s, bound on s given n → Minimization of double-variable function, can be solved numerically

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 18/ 25

slide-23
SLIDE 23

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Outline

1

Framework

2

With a single chunk

3

With several chunks

4

Simulation results

5

Conclusion

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 19/ 25

slide-24
SLIDE 24

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Simulation settings

Large set of simulations: illustrate differences between models Maple software to solve problems We plot relative energy consumption as a function of λ

The lower the better Given a deadline constraint (hard or expected), normalize with the result of single-chunk single-speed Impact of the constraint: normalize expected deadline with hard deadline

Parameters varying within large ranges

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 20/ 25

slide-25
SLIDE 25

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Comparison with single-chunk single-speed

  • 0.25

0.50 0.75 1.00 1e−06 1e−03 1e+00

lambda E

Model (/SCSS) ●

  • SCMSed

SCMShd MCSSed MCSShd MCMSed MCMShd

Results identical for any value

  • f W /D

For expected deadline, with small λ (< 10−2), using multiple chunks or multiple speeds do not improve energy ratio: re-execution term negligible; increasing λ: improvement with multiple chunks For hard deadline, better to run at high speed during second execution: use multiple speeds; use multiple chunks if frequent failures

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 21/ 25

slide-26
SLIDE 26

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Expected vs hard deadline constraint

  • 0.25

0.50 0.75 1.00 1e−06 1e−03 1e+00

lambda E

Model ●

  • SCSS

SCMS MCSS MCMS

Important differences for single speed models, confirming previous conclusions: with hard deadline, use multiple speeds Multiple speeds: no difference for small λ: re-execution at maximum speed has little impact on expected energy consumption; increasing λ: more impact of re-execution, and expected deadline may use slower re-execution speed, hence reducing energy consumption

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 22/ 25

slide-27
SLIDE 27

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Outline

1

Framework

2

With a single chunk

3

With several chunks

4

Simulation results

5

Conclusion

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 23/ 25

slide-28
SLIDE 28

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

Conclusion

Energy consumption of a divisible load workload

  • n volatile platforms

Soft or hard deadline constraint Theoretical side:

Formal models for the problem Expression of solutions as functions to minimize With multiple chunks, use same size chunks, same speed, and same re-execution speed (conjecture for multiple-speed hard-deadline)

Simulations:

Single-chunk single-speed is very good for expected deadline Hard deadline and small λ: use multiple speeds Large values of λ: use multiple speeds and multiple chunks

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 24/ 25

slide-29
SLIDE 29

Introduction Framework Single chunk Multiple chunks Simulations Conclusion

What we had: What we aim at: Energy-aware checkpointing + frequency scaling

Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 25/ 25