FluidCheck: A Redundant Threading based Approach for Reliable - PowerPoint PPT Presentation

FluidCheck: A Redundant Threading based Approach for Reliable Execution in Manycore Processors Rajshekar Kalayappan, Sm ruti R. Sarangi Dept of Computer Science and Engineering Indian Institute of Technology Delhi New Delhi, India.

S oft Errors • Temporary nature [ im g src : aviral.lab.asu.edu ] • Occurs due to particle strikes on the silicon • Source of particles : ▫ Solar ion flux ▫ Explosion of distant stars ▫ Impurities in the chip

S oft Errors • Rare event ▫ Particles need to strike at the right place, at the right angle, with the right amount of energy • Not rare enough to be ignored ▫ The critical charge required to flip a bit reduces with reducing feature size and operating voltage

S oft Errors • Solutions ▫ Device level radiation hardening � Two to four generations behind commercial counterparts [Courtland2015] ▫ System level hardening techniques required � Redundancy Compare Vote DMR TMR

Problem S tatement • To efficiently execute a set of applications on a chip multi-processor (homogeneous SMT- capable cores), while ensuring reliability in the face of soft errors

Related Work : DIVA [Austin1999] • Meant to provide reliability. Leader Checker • IP • Execution Assistance : Branch Prediction Hints • Operand Value Hints • • Result • Example <0x1234><op1=5><op2=2><res=7> • Cache line forwarding

Related Work L 2 L 1 Leader/ C 1 C 2 Checker L 4 L 3 C 3 C 4 SRT [Reinhardt20 0 0 ], CRT [Mukherjee20 0 2] AR-SMT [Rotenberg1999] • Improvement over SRT • Saves area • Circumvents hazards borne out of • Better throughput per core resource requirement similarity between a leader-checker pair • Better throughput per core

Motivational Example Without any checking, throughput = 4.84 instructions per cycle L perlbench L mcf C perlbench C mcf L gromacs L cactusADM C gromacs C cactusADM SRT

Motivational Example Without any checking, throughput = 4.84 instructions per cycle L perlbench L mcf C perlbench C mcf L gromacs L cactusADM C gromacs C cactusADM SRT • Throughput = 3.24 • Similarity in resource requirement • High throughput threads together

Motivational Example Without any checking, throughput = 4.84 instructions per cycle L perlbench L mcf L perlbench L mcf C perlbench C mcf C mcf C perlbench L gromacs L cactusADM L gromacs L cactusADM C gromacs C cactusADM C cactusADM C gromacs CRT SRT • Throughput = 3.24 • Similarity in resource requirement • High throughput threads together

Motivational Example Without any checking, throughput = 4.84 instructions per cycle L perlbench L mcf L perlbench L mcf C perlbench C mcf C mcf C perlbench L gromacs L cactusADM L gromacs L cactusADM C gromacs C cactusADM C cactusADM C gromacs CRT SRT • Throughput = 3.24 • Throughput = 3.55 • Similarity in resource • Similarity is broken requirement • Can we do better? • High throughput threads together

Motivational Example Without any checking, throughput = 4.84 instructions per cycle L perlbench L mcf L perlbench L mcf L perlbench L mcf C perlbench C mcf C mcf C perlbench C mcf C gromacs C cactusADM L gromacs L cactusADM L gromacs L cactusADM L gromacs L cactusADM C gromacs C cactusADM C cactusADM C gromacs C perlbench CRT SRT • Throughput = 3.24 • Throughput = 3.55 • Throughput = 3.76 • Similarity in resource • Similarity is broken requirement • Can we do better? • High throughput threads together

Motivational Example Without any checking, throughput = 4.84 instructions per cycle L perlbench L mcf L perlbench L mcf L perlbench L mcf C perlbench C mcf C mcf C perlbench C mcf C gromacs C cactusADM L gromacs L cactusADM L gromacs L cactusADM L gromacs L cactusADM C gromacs C cactusADM C cactusADM C gromacs C perlbench CRT FluidCheck SRT • Throughput = 3.24 • Throughput = 3.55 • Throughput = 3.76 • Similarity in resource • Similarity is broken • Schedules based on the requirement • Can we do better? applications’ behavior • High throughput • FluidCheck is a superset threads together of schedules; SRT, CRT are instances within FluidCheck

S implified Illustration of FluidCheck’s Working Arbiter L1 L2 Core A Core B L3 L4 C2 C1 Core C Core D C3 C4

S implified Illustration of FluidCheck’s Working Arbiter L1 L2 Core A Core B HELP C1 unable to keep up L3 L4 C2 C1 Core C Core D C1 C3 C4

S implified Illustration of FluidCheck’s Working Checker Arbiter assignment request L1 L2 Core C Core A Core B L3 L4 C2 C1 Core C Core D C3 C4

S implified Illustration of FluidCheck’s Working Arbiter L1 L2 Core A Core B L3 L4 C2 Core C Core D C1 C3 C4

S implified Illustration of FluidCheck’s Working Periodic Arbiter reassignment L1 L2 Core A Core B L3 L4 C2 Core C Core D C1 C3 C4

S implified Illustration of FluidCheck’s Working Arbiter L1 L2 L4 Core A Core B L3 C1 C2 Core C Core D C3 C4

Challenges to achieving FluidCheck • Reactive phase-based scheduler • Efficient transfer of hints • Efficient forwarding of cache lines from the leader to the checker • Circumventing subtle livelock scenarios

Hardware Architecture

Overview of Redundant Execution

Ct Checker Pipeline L1 Memory Checkpointing L2 Ct Leader Pipeline L1

Memory Checkpointing Leader Checker Hint Ct Ct Pipeline Pipeline Store 11010101 1 L1 L1 L2

Ct Checker Pipeline L1 Memory Checkpointing L2 Ct Leader 1 11010101 Pipeline L1

Memory Checkpointing Leader Checker Ct Ct Pipeline Pipeline Ld/St 11010101 1 L1 L1 L2

Memory Checkpointing Leader Checker Ct Ct Pipeline Pipeline Ld/St Miss! 11010101 1 L1 L1 L2

Memory Checkpointing Leader Checker Ct Ct Pipeline Pipeline Ld/St Evict! 11010101 1 L1 L1 L2

Memory Checkpointing Leader Checker Ct Ct Pipeline Pipeline Ld/St Evict! 1 00001111 0 1101.. L1 Victim Cache L1 L2

Ct Checker Pipeline L1 Memory Checkpointing L2 Victim Cache Ct Leader Pipeline L1

Memory Checkpointing Leader Checker Ct Ct Pipeline Pipeline Store 11010101 L1 Victim Cache L1 L2

Memory Checkpointing SYNC Leader Checker Ct Ct Pipeline Pipeline 11001101 1 1 11010111 1 1101.. 1 11110101 1 1001.. L1 Victim Cache L1 L2

Memory Checkpointing SYNC Leader Checker Ct Ct Pipeline Pipeline 11001101 0 11010111 0 11110101 0 L1 Victim Cache L1 L2

Memory Checkpointing Rollback Leader Checker Ct Ct Pipeline Pipeline 11001101 1 1 11010111 1 1101.. 1 11110101 1 1001.. L1 Victim Cache L1 L2

Memory Checkpointing Rollback Leader Checker Ct Ct Pipeline Pipeline L1 Victim Cache L1 L2

Ct Forwarding Filters L2 Leader Pipeline L1

Ct Ld/St Forwarding Filters L2 Leader Pipeline L1

Ct Ld/St Forwarding Filters L2 Leader Pipeline L1 Hit!

Do Not Forward Ct Ld/St Forwarding Filters L2 Leader Pipeline L1 Hit!

Ct Forwarding Filters L2 Leader Pipeline L1 Miss!

Ct Forwarding Filters L2 RFB Leader Pipeline L1 Miss!

Ct Forwarding Filters L2 RFB Hit! Leader Pipeline L1 Miss!

Do Not Forward Ct Forwarding Filters L2 RFB Hit! Leader Pipeline L1 Miss!

Do Not Forward Ct Forwarding Filters L2 RFB Miss! Leader Pipeline L1 Miss!

Forwarding Filters Do Not Forward Leader Ct Pipeline Miss! Miss! 11010011 0 L1 RFB LFB L2

1 Ct LFB 11010011 Forwarding Filters L2 RFB Miss! Leader Pipeline L1 Miss!

1 Ct Forward LFB 11010011 Forwarding Filters L2 RFB Miss! Leader Pipeline L1 Miss!

Arbiter Logic: I • Activity ▫ IPC ▫ WIPC(x) • Mapping a Single Thread ▫ Select the core with minimum activity that has free SMT slots ▫ If activity is IPC, scheme is termed m inIPC ▫ If activity is WIPC(x), scheme is termed m inWIPC_x

Arbiter Logic: II • Mapping a Set of Threads ▫ Scheduling Policies: � Pinned Leaders (SP-PL) � Unpinned Leaders (SP-UL) � Unpinned Leaders All Leaders First (SP-UALF) • SMT Fetch Policy ▫ Full Simultaneous Issue [Tullsen1995] ▫ If n threads on a core have activities A 1 , A 2 .. A n , then the i th thread gets fetch cycles (cycle block of size B considered) A × i B ∑ = 1 n A k k

Evaluation: S imulation Parameters • 16-core processor, 4-way SMT • Core configuration based on Intel Sandybridge and IBM Power7 Param eter Value Pipeline width 4 i-cache and d-cache 32 kB Shared L2 cache 12 MB NOC topology 2D torus Hint buffer 512 entry Victim Cache 32 entry RFB and LFB 64 entries each

FluidCheck: A Redundant Threading based Approach for Reliable - PowerPoint PPT Presentation

FluidCheck: A Redundant Threading based Approach for Reliable Execution in Manycore Processors Rajshekar Kalayappan, Sm ruti R. Sarangi Dept of Computer Science and Engineering Indian Institute of Technology Delhi New Delhi, India. S oft

P i Paired Redundant IOCs Paired Redundant IOCs d R d d t IOC with Redundant Hardware with

Threading, Events, and Concurrency Threading Recap Threading in Multicore World

Chapt er 14: Redundant Arit hmet ic Keshab K. Parhi A non-redundant radix-r number has

Web Threading DAVID CATUHE - @DELTAKOSH BABYLON.JS / MICROSOFT Today multi - threading is

Protein threading Protein Threading Basic premise Structure is better conserved than

Chip Multi-threading and Chip Multi-threading and Sun s Niagara-series s Niagara-series

Threading the Needle: Threading the Needle: NHs Journey to Establish NHs Journey to

Threads Threads Threads vs Processes Multi-threading Models Threading Issues

Redundant Feature Elimination Redundant Feature Elimination for Multi-Class Problems for

Redundant Via Insertion Redundant Via Insertion with Wire Bending with Wire Bending Kuang-

Threading Nima Honarmand (Based on slides by Don Porter and Mike Ferdman) Fall 2014:: CSE 506::

Intel Threading Building Blocks (TBB) Julius Adorf 26.10.2009 Seminar: Semantics of C++ TU M

Welcome! Todays Agenda: Self-modifying code Multi-threading (1)

Racing in Hyperspace: Closing Hyper-Threading Side Channels on SGX with Contrived Data Races

Multithreaded processors Hung-Wei Tseng Simultaneous Multi- Threading (SMT) 12 Simultaneous

Introduction to multi-threading and vectorization Matti Kortelainen LArSoft Workshop 2019 25

GLOBE. WE ARE THE U.S. ARMY SERVICE COMPONENT COMMAND OF THE U.S. TRANSPORTATION COMMAND A N D A

USING WORD EMBEDDINGS TO REPRESENT DIFFERENT TYPES OF CLINICAL DATA MERIJN BEEKSMA

Single Sensor Estimation of Radio Activity via Blind Block- Partitioned Tensor Decomposition

Outline Out line 1. T il ed M ap ap R edu 1. iled educe ce . O pt n on TMR TMR 2. ptimi

SMT-based Analysis of Reli liability Architectures Alessandro Cimatti Fondazione Bruno Kessler,

A Fault-Tolerant Alternative to Lockstep Triple Modular Redundancy Andrew L. Baldwin, BS 09,

Effective but Lightweight Online Selftest for Energy-Constrained WSNs SenseApp 2018 Ulf Kulau,

WHO WATCHES THE WATCHMEN? Protecting Operating System Reliability Mechanisms Bj orn D