rebound scalable checkpointing for coherent shared memor
play

Rebound: Scalable Checkpointing for Coherent Shared Memor for - PowerPoint PPT Presentation

Rebound: Scalable Checkpointing for Coherent Shared Memor for Coherent Shared Memory Rishi Agarwal, Pranav Garg, and Josep Torrellas D Department of Computer Science f C S i University of Illinois at Urbana-Champaign


  1. Rebound: Scalable Checkpointing for Coherent Shared Memor for Coherent Shared Memory Rishi Agarwal, Pranav Garg, and Josep Torrellas D Department of Computer Science f C S i University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu p

  2. Checkpointing in Shared-Memory MPs rollback Fault save save chkpt chkpt • HW-based schemes for small CMPs use Global checkpointing – All procs participate in system-wide checkpoints P1 P1 P2 P2 P3 P4 P3 P4 checkpoint checkpoint h k i t • Global checkpointing is not scalable – Synchronization, bursty movement of data, loss in rollback… R. Agarwal, P. Garg, J. Torrellas 2 Rebound: Scalable Checkpointing

  3. Alternative: Coordinated Local Checkpointing • Idea: threads coordinate their checkpointing in groups • Rationale: – Faults propagate only through communication – Interleaving between non-comm. threads is irrelevant P1 P2 P3 P4 P5 P1 P2 P3 P4 P5 Local Global Local Chkpt Chkpt Chkpt + Scalable: Checkpoint and rollback in processor groups – Complexity: Record inter-thread dependences dynamically. C l it R d i t th d d d d i ll R. Agarwal, P. Garg, J. Torrellas 3 Rebound: Scalable Checkpointing

  4. Contributions Rebound: First HW-based scheme for scalable, coordinated local checkpointing in coherent shared-memory p g y • Leverages directory protocol to track inter-thread deps. • Opts to boost checkpointing efficiency: • Delaying write-back of data to safe memory at checkpoints • Supporting multiple checkpoints • Optimizing checkpointing at barrier synchronization • Avg. performance overhead for 64 procs: 2% • Compared to 15% for global checkpointing p g p g R. Agarwal, P. Garg, J. Torrellas 4 Rebound: Scalable Checkpointing

  5. Background: In-Memory Checkpt with ReVive [Pvrulovic-02] Execution Register Register P1 P1 P2 P2 P3 P3 Dump CHK Displacement Caches Dirty Cache Dirty Cache Writebacks lines W W W W WB Checkpoint Writeback Application Stalls Stalls Logging Log Memory R. Agarwal, P. Garg, J. Torrellas 5 Rebound: Scalable Checkpointing

  6. Background: In-Memory Checkpt with ReVive [Pvrulovic-02] Old Register restored P1 P1 P2 P2 P3 P3 CHK Fault Caches Cache Invalidated W W W W WB Memory Lines Reverted R d Log Memory Global Local Coordinated Scalable protocol Broadcast protocol R. Agarwal, P. Garg, J. Torrellas 6 Rebound: Scalable Checkpointing

  7. Coordinated Local Checkpointing Rules P1 P1 P1 P1 P2 P2 P2 P2 P1 P1 P2 P2 wr x rd x chkpt chkpt Consumer Producer Producer Consumer rollback rollback chkpoint chkpoint rollback rollback chkpoint chkpoint P checkpoints � P’s producers checkpoint P rolls back � P s consumers rollback � P’s consumers rollback P rolls back • Banatre et al. used Coordinated Local checkpointing for bus- based machines [Banatre96] based machines [Banatre96] R. Agarwal, P. Garg, J. Torrellas 7 Rebound: Scalable Checkpointing

  8. Rebound Fault Model Chip Multiprocessor Main Memory Log (in SW) Log (in SW) • Any part of the chip can suffer transient or permanent faults. • A fault can occur even during checkpointing • Off-chip memory and logs suffer no fault on their own (e g NVM) Off chip memory and logs suffer no fault on their own (e.g. NVM) • Fault detection outside our scope: • Fault detection latency has upper-bound of L cycles R. Agarwal, P. Garg, J. Torrellas 8 Rebound: Scalable Checkpointing

  9. Rebound Architecture Chip Multiprocessor Main Memory P+L1 MyProducer Dep MyConsumer L2 Register Directory Cache LW-ID R. Agarwal, P. Garg, J. Torrellas 9 Rebound: Scalable Checkpointing

  10. Rebound Architecture Chip Multiprocessor Main Memory P+L1 MyProducer Dep MyConsumer L2 Register Directory Cache LW-ID • Dependence (Dep) registers in the L2 cache controller: p ( p) g • MyProducers : bitmap of proc. that produced data consumed by the local proc. • MyConsumers : bitmap of proc that consumed data produced MyConsumers : bitmap of proc. that consumed data produced by the local proc. R. Agarwal, P. Garg, J. Torrellas 10 Rebound: Scalable Checkpointing

  11. Rebound Architecture Chip Multiprocessor Main Memory P+L1 MyProducer Dep MyConsumer L2 Register Directory Cache LW-ID • Dependence (Dep) registers in the L2 cache controller: p ( p) g • MyProducers : bitmap of proc. that produced data consumed by the local proc. • MyConsumers : bitmap of proc that consumed data produced MyConsumers : bitmap of proc. that consumed data produced by the local proc. • Processor ID in each directory entry: • LW-ID : last writer to the line in the current checkpoint interval. LW ID l t it t th li i th t h k i t i t l R. Agarwal, P. Garg, J. Torrellas 11 Rebound: Scalable Checkpointing

  12. Recording Inter-Thread Dependences P1 P2 MyProducers MyProducers P1 writes MyConsumers MyConsumers Write Write LW-ID P1 D Log Memory Assume MESI protocol R. Agarwal, P. Garg, J. Torrellas 12 Rebound: Scalable Checkpointing

  13. Recording Inter-Thread Dependences MyConsumers � P2 P1 P2 y MyProducers MyProducers P1 P2 reads MyConsumers MyConsumers P2 MyProducers � P1 LW-ID P1 S D Write back Logging gg g Log Memory Assume MESI protocol R. Agarwal, P. Garg, J. Torrellas 13 Rebound: Scalable Checkpointing

  14. Recording Inter-Thread Dependences P1 P2 MyProducers MyProducers P1 P1 writes MyConsumers MyConsumers P2 LW-ID P1 S P1 P1 D Log Memory Assume MESI protocol R. Agarwal, P. Garg, J. Torrellas 14 Rebound: Scalable Checkpointing

  15. Recording Inter-Thread Dependences P1 P2 Clear Dep registers p g MyProducers MyProducers P1 P1 checkpoints MyConsumers MyConsumers P2 Clear LW ID Clear LW-ID LW-ID LW-ID should remain set till P1 S Writebacks W it b k th li the line is i P1 D P1 checkpointed Logging Log Memory Assume MESI protocol R. Agarwal, P. Garg, J. Torrellas 15 Rebound: Scalable Checkpointing

  16. Distributed Checkpointing Protocol in SW • Interaction Set [P i ]: set of producer processors (transitively) for P i – Built using MyProducers – Built using MyProducers InteractionSet : P1 P1 P2 P3 P4 P1 P1 chk initiate checkpoint checkpoint R. Agarwal, P. Garg, J. Torrellas 16 Rebound: Scalable Checkpointing

  17. Distributed Checkpointing Protocol in SW • Interaction Set [P i ]: set of producer processors (transitively) for P i – Built using MyProducers – Built using MyProducers InteractionSet : P1, P2, P3 P1 P2 P3 P4 P1 P1 chk Ck? Ck? P2 P3 initiate checkpoint checkpoint R. Agarwal, P. Garg, J. Torrellas 17 Rebound: Scalable Checkpointing

  18. Distributed Checkpointing Protocol in SW • Interaction Set [P i ]: set of producer processors (transitively) for P i – Built using MyProducers – Built using MyProducers InteractionSet : P1, P2, P3 P1 P2 P3 P4 P1 P1 chk Ck? Ck? P2 P3 Ck ? initiate P4 checkpoint checkpoint R. Agarwal, P. Garg, J. Torrellas 18 Rebound: Scalable Checkpointing

  19. Distributed Checkpointing Protocol in SW • Interaction Set [P i ]: set of producer processors (transitively) for P i – Built using MyProducers – Built using MyProducers InteractionSet : P1, P2, P3 P1 P2 P3 P4 P1 P1 chk Ck? Ck? P2 P3 Ck ? initiate P4 checkpoint checkpoint R. Agarwal, P. Garg, J. Torrellas 19 Rebound: Scalable Checkpointing

  20. Distributed Checkpointing Protocol in SW • Interaction Set [P i ]: set of producer processors (transitively) for P i – Built using MyProducers – Built using MyProducers InteractionSet : P1, P2, P3 P1 P2 P3 P4 P1 P1 chk Ck? Ck? P2 P3 Ck ? initiate P4 checkpoint checkpoint • Rollback handled similarly using MyConsumers R. Agarwal, P. Garg, J. Torrellas 20 Rebound: Scalable Checkpointing

  21. Optimization1 : Delayed Writebacks Time Interval nterval I 1 I 1 Stall Stall In Checkpoint sync sync eckpoint nterval Stall WB dirty lines WB dirty lines I 2 sync In C Ch Interval sync I 2 • Checkpointing overhead dominated by data writebacks • Delayed Writeback optimization • Processors synchronize and resume execution • Hardware automatically writes back dirty lines in background • Checkpoint only completed when all delayed data written back • Still need to record inter-thread dependences on delayed data Still d t d i t th d d d d l d d t R. Agarwal, P. Garg, J. Torrellas 21 Rebound: Scalable Checkpointing

  22. Delayed Writeback Pros/Cons + Significant reduction in checkpoint overhead - Additional support: Each processor has two sets of Dep. registers Each cache line has a delayed bit E h h li h d l d bit - Increased vulnerability A rollback event forces both intervals to roll back R. Agarwal, P. Garg, J. Torrellas 22 Rebound: Scalable Checkpointing

  23. Optimization2 : Multiple Checkpoints • Problem: Fault detection is not instantaneous – Checkpoint is safe only after max fault-detection latency (L) p y y ( ) Ckpt 1 Dep registers 1 Rollback Ckpt 2 ection ency Dep registers 2 Late Dete t f Fault • Solution: Keep multiple checkpoints – On fault, roll back interacting processors to safe checkpoints • No Domino Effect R. Agarwal, P. Garg, J. Torrellas 23 Rebound: Scalable Checkpointing

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend