fluidcheck a redundant threading based approach for
play

FluidCheck: A Redundant Threading based Approach for Reliable - PowerPoint PPT Presentation

FluidCheck: A Redundant Threading based Approach for Reliable Execution in Manycore Processors Rajshekar Kalayappan, Sm ruti R. Sarangi Dept of Computer Science and Engineering Indian Institute of Technology Delhi New Delhi, India. S oft


  1. FluidCheck: A Redundant Threading based Approach for Reliable Execution in Manycore Processors Rajshekar Kalayappan, Sm ruti R. Sarangi Dept of Computer Science and Engineering Indian Institute of Technology Delhi New Delhi, India.

  2. S oft Errors • Temporary nature [ im g src : aviral.lab.asu.edu ] • Occurs due to particle strikes on the silicon • Source of particles : ▫ Solar ion flux ▫ Explosion of distant stars ▫ Impurities in the chip

  3. S oft Errors • Rare event ▫ Particles need to strike at the right place, at the right angle, with the right amount of energy • Not rare enough to be ignored ▫ The critical charge required to flip a bit reduces with reducing feature size and operating voltage

  4. S oft Errors • Solutions ▫ Device level radiation hardening � Two to four generations behind commercial counterparts [Courtland2015] ▫ System level hardening techniques required � Redundancy Compare Vote DMR TMR

  5. Problem S tatement • To efficiently execute a set of applications on a chip multi-processor (homogeneous SMT- capable cores), while ensuring reliability in the face of soft errors

  6. Related Work : DIVA [Austin1999] • Meant to provide reliability. Leader Checker • IP • Execution Assistance : Branch Prediction Hints • Operand Value Hints • • Result • Example <0x1234><op1=5><op2=2><res=7> • Cache line forwarding

  7. Related Work L 2 L 1 Leader/ C 1 C 2 Checker L 4 L 3 C 3 C 4 SRT [Reinhardt20 0 0 ], CRT [Mukherjee20 0 2] AR-SMT [Rotenberg1999] • Improvement over SRT • Saves area • Circumvents hazards borne out of • Better throughput per core resource requirement similarity between a leader-checker pair • Better throughput per core

  8. Motivational Example Without any checking, throughput = 4.84 instructions per cycle L perlbench L mcf C perlbench C mcf L gromacs L cactusADM C gromacs C cactusADM SRT

  9. Motivational Example Without any checking, throughput = 4.84 instructions per cycle L perlbench L mcf C perlbench C mcf L gromacs L cactusADM C gromacs C cactusADM SRT • Throughput = 3.24 • Similarity in resource requirement • High throughput threads together

  10. Motivational Example Without any checking, throughput = 4.84 instructions per cycle L perlbench L mcf L perlbench L mcf C perlbench C mcf C mcf C perlbench L gromacs L cactusADM L gromacs L cactusADM C gromacs C cactusADM C cactusADM C gromacs CRT SRT • Throughput = 3.24 • Similarity in resource requirement • High throughput threads together

  11. Motivational Example Without any checking, throughput = 4.84 instructions per cycle L perlbench L mcf L perlbench L mcf C perlbench C mcf C mcf C perlbench L gromacs L cactusADM L gromacs L cactusADM C gromacs C cactusADM C cactusADM C gromacs CRT SRT • Throughput = 3.24 • Throughput = 3.55 • Similarity in resource • Similarity is broken requirement • Can we do better? • High throughput threads together

  12. Motivational Example Without any checking, throughput = 4.84 instructions per cycle L perlbench L mcf L perlbench L mcf L perlbench L mcf C perlbench C mcf C mcf C perlbench C mcf C gromacs C cactusADM L gromacs L cactusADM L gromacs L cactusADM L gromacs L cactusADM C gromacs C cactusADM C cactusADM C gromacs C perlbench CRT SRT • Throughput = 3.24 • Throughput = 3.55 • Throughput = 3.76 • Similarity in resource • Similarity is broken requirement • Can we do better? • High throughput threads together

  13. Motivational Example Without any checking, throughput = 4.84 instructions per cycle L perlbench L mcf L perlbench L mcf L perlbench L mcf C perlbench C mcf C mcf C perlbench C mcf C gromacs C cactusADM L gromacs L cactusADM L gromacs L cactusADM L gromacs L cactusADM C gromacs C cactusADM C cactusADM C gromacs C perlbench CRT FluidCheck SRT • Throughput = 3.24 • Throughput = 3.55 • Throughput = 3.76 • Similarity in resource • Similarity is broken • Schedules based on the requirement • Can we do better? applications’ behavior • High throughput • FluidCheck is a superset threads together of schedules; SRT, CRT are instances within FluidCheck

  14. S implified Illustration of FluidCheck’s Working Arbiter L1 L2 Core A Core B L3 L4 C2 C1 Core C Core D C3 C4

  15. S implified Illustration of FluidCheck’s Working Arbiter L1 L2 Core A Core B HELP C1 unable to keep up L3 L4 C2 C1 Core C Core D C1 C3 C4

  16. S implified Illustration of FluidCheck’s Working Checker Arbiter assignment request L1 L2 Core C Core A Core B L3 L4 C2 C1 Core C Core D C3 C4

  17. S implified Illustration of FluidCheck’s Working Arbiter L1 L2 Core A Core B L3 L4 C2 Core C Core D C1 C3 C4

  18. S implified Illustration of FluidCheck’s Working Periodic Arbiter reassignment L1 L2 Core A Core B L3 L4 C2 Core C Core D C1 C3 C4

  19. S implified Illustration of FluidCheck’s Working Arbiter L1 L2 L4 Core A Core B L3 C1 C2 Core C Core D C3 C4

  20. Challenges to achieving FluidCheck • Reactive phase-based scheduler • Efficient transfer of hints • Efficient forwarding of cache lines from the leader to the checker • Circumventing subtle livelock scenarios

  21. Hardware Architecture

  22. Overview of Redundant Execution

  23. Ct Checker Pipeline L1 Memory Checkpointing L2 Ct Leader Pipeline L1

  24. Memory Checkpointing Leader Checker Hint Ct Ct Pipeline Pipeline Store 11010101 1 L1 L1 L2

  25. Ct Checker Pipeline L1 Memory Checkpointing L2 Ct Leader 1 11010101 Pipeline L1

  26. Memory Checkpointing Leader Checker Ct Ct Pipeline Pipeline Ld/St 11010101 1 L1 L1 L2

  27. Memory Checkpointing Leader Checker Ct Ct Pipeline Pipeline Ld/St Miss! 11010101 1 L1 L1 L2

  28. Memory Checkpointing Leader Checker Ct Ct Pipeline Pipeline Ld/St Miss! 11010101 1 L1 L1 L2

  29. Memory Checkpointing Leader Checker Ct Ct Pipeline Pipeline Ld/St Evict! 11010101 1 L1 L1 L2

  30. Memory Checkpointing Leader Checker Ct Ct Pipeline Pipeline Ld/St Evict! 1 00001111 0 1101.. L1 Victim Cache L1 L2

  31. Ct Checker Pipeline L1 Memory Checkpointing L2 Victim Cache Ct Leader Pipeline L1

  32. Memory Checkpointing Leader Checker Ct Ct Pipeline Pipeline Store 11010101 L1 Victim Cache L1 L2

  33. Memory Checkpointing SYNC Leader Checker Ct Ct Pipeline Pipeline 11001101 1 1 11010111 1 1101.. 1 11110101 1 1001.. L1 Victim Cache L1 L2

  34. Memory Checkpointing SYNC Leader Checker Ct Ct Pipeline Pipeline 11001101 1 1 11010111 1 1101.. 1 11110101 1 1001.. L1 Victim Cache L1 L2

  35. Memory Checkpointing SYNC Leader Checker Ct Ct Pipeline Pipeline 11001101 0 11010111 0 11110101 0 L1 Victim Cache L1 L2

  36. Memory Checkpointing Rollback Leader Checker Ct Ct Pipeline Pipeline 11001101 1 1 11010111 1 1101.. 1 11110101 1 1001.. L1 Victim Cache L1 L2

  37. Memory Checkpointing Rollback Leader Checker Ct Ct Pipeline Pipeline L1 Victim Cache L1 L2

  38. Ct Forwarding Filters L2 Leader Pipeline L1

  39. Ct Ld/St Forwarding Filters L2 Leader Pipeline L1

  40. Ct Ld/St Forwarding Filters L2 Leader Pipeline L1 Hit!

  41. Do Not Forward Ct Ld/St Forwarding Filters L2 Leader Pipeline L1 Hit!

  42. Ct Forwarding Filters L2 Leader Pipeline L1 Miss!

  43. Ct Forwarding Filters L2 RFB Leader Pipeline L1 Miss!

  44. Ct Forwarding Filters L2 RFB Hit! Leader Pipeline L1 Miss!

  45. Do Not Forward Ct Forwarding Filters L2 RFB Hit! Leader Pipeline L1 Miss!

  46. Do Not Forward Ct Forwarding Filters L2 RFB Miss! Leader Pipeline L1 Miss!

  47. Forwarding Filters Do Not Forward Leader Ct Pipeline Miss! Miss! 11010011 0 L1 RFB LFB L2

  48. Forwarding Filters Do Not Forward Leader Ct Pipeline Miss! Miss! 11010011 0 L1 RFB LFB L2

  49. 1 Ct LFB 11010011 Forwarding Filters L2 RFB Miss! Leader Pipeline L1 Miss!

  50. 1 Ct Forward LFB 11010011 Forwarding Filters L2 RFB Miss! Leader Pipeline L1 Miss!

  51. Arbiter Logic: I • Activity ▫ IPC ▫ WIPC(x) • Mapping a Single Thread ▫ Select the core with minimum activity that has free SMT slots ▫ If activity is IPC, scheme is termed m inIPC ▫ If activity is WIPC(x), scheme is termed m inWIPC_x

  52. Arbiter Logic: II • Mapping a Set of Threads ▫ Scheduling Policies: � Pinned Leaders (SP-PL) � Unpinned Leaders (SP-UL) � Unpinned Leaders All Leaders First (SP-UALF) • SMT Fetch Policy ▫ Full Simultaneous Issue [Tullsen1995] ▫ If n threads on a core have activities A 1 , A 2 .. A n , then the i th thread gets fetch cycles (cycle block of size B considered) A × i B ∑ = 1 n A k k

  53. Evaluation: S imulation Parameters • 16-core processor, 4-way SMT • Core configuration based on Intel Sandybridge and IBM Power7 Param eter Value Pipeline width 4 i-cache and d-cache 32 kB Shared L2 cache 12 MB NOC topology 2D torus Hint buffer 512 entry Victim Cache 32 entry RFB and LFB 64 entries each

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend