energy aware checkpointing of divisible tasks with soft
play

Energy-aware checkpointing of divisible tasks with soft or hard - PowerPoint PPT Presentation

Introduction Framework Single chunk Multiple chunks Simulations Conclusion Energy-aware checkpointing of divisible tasks with soft or hard deadlines Guillaume Aupy 1 , Anne Benoit 1 , 2 , Rami Melhem 3 , Paul Renaud-Goud 1 and Yves Robert 1 ,


  1. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Energy-aware checkpointing of divisible tasks with soft or hard deadlines Guillaume Aupy 1 , Anne Benoit 1 , 2 , Rami Melhem 3 , Paul Renaud-Goud 1 and Yves Robert 1 , 2 , 4 1 . Ecole Normale Sup´ erieure de Lyon, France 2 . Institut Universitaire de France 3 . University of Pittsburgh, USA 4 . University of Tennessee Knoxville, USA Anne.Benoit@ens-lyon.fr http://graal.ens-lyon.fr/~abenoit/ International Green Computing Conference 2013 Arlington, USA Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 1/ 25

  2. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Divisible load scheduling and resilience Divisible load scheduling: divide a computational workload into chunks Arbitrary number of chunks Size of chunks freely chosen by user Goal: minimize makespan, i.e., total execution time Current platforms: increasing frequency of failures Well-established method to deal with failures: checkpointing Take a checkpoint at the end of each chunk and verify result Re-execution in case of transient failure Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 2/ 25

  3. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Divisible load scheduling and resilience Divisible load scheduling: divide a computational workload into chunks Arbitrary number of chunks Size of chunks freely chosen by user Goal: minimize makespan, i.e., total execution time Current platforms: increasing frequency of failures Well-established method to deal with failures: checkpointing Take a checkpoint at the end of each chunk and verify result Re-execution in case of transient failure Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 2/ 25

  4. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Energy: a crucial issue IGCC: Green Computing Conference! Real need to reduce energy dissipation in current processors Processor running at speed s : power s 3 watts Dynamic voltage and frequency scaling techniques (DVFS) Our goal: minimize energy consumption including that of checkpointing and re-execution (if failure) while enforcing a bound on execution time Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 3/ 25

  5. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Energy: a crucial issue IGCC: Green Computing Conference! Real need to reduce energy dissipation in current processors Processor running at speed s : power s 3 watts Dynamic voltage and frequency scaling techniques (DVFS) Our goal: minimize energy consumption including that of checkpointing and re-execution (if failure) while enforcing a bound on execution time Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 3/ 25

  6. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Outline Framework 1 With a single chunk 2 With several chunks 3 Simulation results 4 Conclusion 5 Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 4/ 25

  7. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Framework Execution of a divisible task ( W operations) Failures may occur Transient faults Resilience through checkpointing Objective: minimize expected energy consumption E ( E ), given a deadline bound D Probabilistic nature of failure hits: expectation of energy consumption is natural (average cost over many executions) Deadline bound: two relevant scenarios (soft or hard deadline) Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 5/ 25

  8. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Soft vs hard deadline Soft deadline: met in expectation, i.e., E ( T ) ≤ D (average response time) Hard deadline: met in the worst case, i.e., T wc ≤ D VS Soft (expected) Hard (worst-case) Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 6/ 25

  9. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Execution time, one single chunk One single chunk of size W Checkpoint overhead: execution time T C Instantaneous failure rate: λ First execution at speed s : T exec = W s + T C Failure probability: P fail = λ T exec = λ ( W s + T C ) In case of failure: re-execute at speed σ : T reexec = W σ + T C And we assume success after re-execution E ( T ) = T exec + P fail T reexec = ( W s + T C ) + λ ( W s + T C )( W σ + T C ) T wc = T exec + T reexec = ( W s + T C ) + ( W σ + T C ) Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 7/ 25

  10. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Energy consumption, one single chunk One single chunk of size W Checkpoint overhead: energy consumption E C s × s 3 + E C = Ws 2 + E C First execution at speed s : W Re-execution at speed σ : W σ 2 + E C , with probability P fail P fail = λ T exec = λ ( W � � s + T C ) � W E ( E ) = ( Ws 2 + E C ) + λ W σ 2 + E C � � � s + T C Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 8/ 25

  11. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Multiple chunks Execution times: sum of execution times for each chunk (worst-case or expected) Expected energy consumption: sum of expected energy for each chunk Coherent failure model: consider two chunks W 1 + W 2 = W fail = λ ( W 1 Probability of failure for first chunk: P 1 s + T C ) For second chunk: P 2 fail = λ ( W 2 s + T C ) With a single chunk of size W : P fail = λ ( W s + T C ), differs from P 1 fail + P 2 fail only because of extra checkpoint Trade-off: many small chunks (more T C to pay, but small re-execution cost) vs few larger chunks (fewer T C , but increased re-execution cost) Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 9/ 25

  12. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Optimization problem Decisions that should be taken before execution: Chunks: how many ( n )? which sizes ( W i for chunk i )? Speeds of each chunk: first run ( s i )? re-execution ( σ i )? Input: W , T C (checkpointing time), E C (energy spent for checkpointing), λ (instantaneous failure rate), D (deadline) Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 10/ 25

  13. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Optimization problem Decisions that should be taken before execution: Chunks: how many ( n )? which sizes ( W i for chunk i )? Speeds of each chunk: first run ( s i )? re-execution ( σ i )? Input: W , T C (checkpointing time), E C (energy spent for checkpointing), λ (instantaneous failure rate), D (deadline) Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 10/ 25

  14. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Optimization problem Decisions that should be taken before execution: Chunks: how many ( n )? which sizes ( W i for chunk i )? Speeds of each chunk: first run ( s i )? re-execution ( σ i )? Input: W , T C (checkpointing time), E C (energy spent for checkpointing), λ (instantaneous failure rate), D (deadline) Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 10/ 25

  15. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Models Chunks VS Single chunk of size W Multiple chunks ( n and W i ’s) Speed per chunk VS Multiple speeds ( s and σ ) Single speed ( s ) Deadline bound VS Soft ( E ( T ) ≤ D ) Hard ( T wc ≤ D ) Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 11/ 25

  16. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Outline Framework 1 With a single chunk 2 With several chunks 3 Simulation results 4 Conclusion 5 Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 12/ 25

  17. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Single chunk and single speed Consider first that s = σ (single speed): need to find optimal speed E ( E ) is a function of s : E ( E )( s ) = ( Ws 2 + E C )(1 + λ ( W s + T C )) Lemma: this function is convex and has a unique minimum s ⋆ (function of λ, W , E c , T c ) 3 √ � � √ 27 a 2 − 4 a − 27 a +2) 1 / 3 2 1 / 3 s ⋆ = − (3 λ W 27 a 2 − 4 a − 27 a +2) 1 / 3 − 1 , 3 √ − 6(1+ λ T C ) 2 1 / 3 √ (3 � 2(1+ λ T C ) � 2 where a = λ E C λ W E ( T ) and T wc : decreasing functions of s Minimum speed s exp and s wc required to match deadline D (function of D , W , T c , and λ for s exp ) → Optimal speed: maximum between s ⋆ and s exp or s wc Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 13/ 25

  18. Introduction Framework Single chunk Multiple chunks Simulations Conclusion Single chunk and multiple speeds Consider now that s � = σ (multiple speeds): two unknowns E ( E ) is a function of s and σ : E ( E )( s , σ ) = ( Ws 2 + E C ) + λ ( W s + T C )( W σ 2 + E C ) Lemma: energy minimized when deadline tight (both for wc and exp) � σ expressed as a function of s : λ W W σ exp = − (1+ λ TC ) , σ wc = ( D − 2 TC ) s − W s D W s + TC → Minimization of single-variable function, can be solved numerically (no expression of optimal s ) Anne.Benoit@ens-lyon.fr IGCC’2013 Energy-aware checkpointing 14/ 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend