Argus: Low-cost, Comprehensive Error-Detection for Simple Cores - - PowerPoint PPT Presentation
Argus: Low-cost, Comprehensive Error-Detection for Simple Cores - - PowerPoint PPT Presentation
Argus: Low-cost, Comprehensive Error-Detection for Simple Cores Albert Meixner, Michael Bauer, Daniel Sorin Duke University Introduction Introduction Hardware error rates are expected to rise as CMOS shrinks Online error detection
Introduction Introduction
Hardware error rates are expected to rise as CMOS shrinks Online error detection techniques can keep errors from propagating to the application propagating to the application
- Dual Modular Redundancy
- Redundant Multithreading
- Checker cores (DIVA)
Checker cores (DIVA)
Existing techniques are overly expensive for small, simple cores
Si l d i t b dd d k t
- Simple cores dominate embedded market
- Throughput-oriented CMPs utilize simple cores
Sun Ultrasparc T1
Argus Goals and Approach Argus Goals and Approach
Goal: Detect both transient and permanent errors in simple cores at low cost Approach: Decompose program execution into four high-level tasks and check them independently
C l Fl D Fl C i M A
- Control Flow, Data Flow, Computation, Memory Access
Advantages of high-level decomposition
Ch k l it t k ifi ti t d t
- Checkers exploit task-specific properties to reduce cost
- Unlike per-component checkers, tasks are abstract and
implementation-independent p p
Task Decomposition Task Decomposition
Dynamic Instruction
Data Flow Computation
Instruction Stream
Data Flow
Correct inputs selected and
Computation
Operation result
Inputs
result
data passed correctly computed correctly
r1←r3+r4
Control Flow
Correct instruction
Memory
Data transferred instruction selected for execution transferred correctly from and to memory
From Theory to Practice From Theory to Practice
Tasks Ideal Checker Tasks
Control Flow
Checkers
CF Checker
Hardware CFZ CFA CFA
Data Flow DF Checker Computation
Form Definit Hardw Desig DFA DFZ DFZ
Computation Memory Computation Checker Memory Checker
mal tion ware gn CCZ CCA MCA MCZ CCA MCA
Completeness Proof
Checker
A Z
Equivalence Proof
A
* *
“Checkers ensure correct execution” “Argus-1 checkers are equivalent to ideal checkers”
* *
Limitations Limitations
Completeness Proof assumes no interrupts, exceptions, or I/O
- Single fault assumption
Single fault assumption
Equivalence Proof holds under limiting assumptions
- Equivalence only holds at block boundary
- Perfect checksums (no aliasing)
- Known coverage hole in memory checker
- Known coverage hole in memory checker
When assumptions are violated, errors can go undetected
Outline Outline
Introduction Basic Argus concept g p Argus-1 checker designs Arg s 1 implementation and e al ation Argus-1 implementation and evaluation Conclusions
Control Flow Checker Control Flow Checker
loop:
Similar to prior control flow checkers Assign each basic block dd i d d t ID
B C
A
loop: … bnez r1, L1
address-independent ID
- ID computed from block
contents
Embed IDs of legal
B C
L1: … j L2 …
Embed IDs of legal successors in each block
- Most blocks have one or two
legal successors
D D
D
L2:
- Pick correct ID at runtime
Indirect branch addresses are more challenging
- See paper
A E
D
… bez r2, loop
- See paper
E
ret
Data Flow Checker Data Flow Checker
B d “D i
b 4 2 3 B
Based on “Dynamic Dataflow Verification”
- Presented at PACT 2007
sub r4, r2, r3 mul r5, r2, r6 add r3, r5, r4 Basic Block
- Compiler computes
reference data flow signatures for basic bl k
r2 r3 r6
k Dat
Val es blocks
- Data flow checker tracks
actual data flow and t f
aflow Gra
Values protected with EDC compares to reference
Data flow signatures are used as block IDs for
r4 r3 r5
ph
control flow checker
Dataflow Signature
Computation Checkers Computation Checkers
Not a single monolithic checker, but multiple sub-checkers for different operations
- Large amounts of prior work on computation checking
Large amounts of prior work on computation checking
Operations are checked using redundant hardware
- Exploit that checking computation is often easier than performing it
Multiply checker trades coverage for cost
- Replay modulo 31
- Non zero probability of missing errors due to aliasing
- Non-zero probability of missing errors due to aliasing
Memory Checker Memory Checker
Data corruption detected using parity Addressing errors are transformed into data corruption
E i h l i t f t dd A i t
- Error in cache logic transforms access to address A into
access to address B
- No storage overhead, addresses are embedded into data
words
Address computation and alignment errors are detected by redundant computation checkers by redundant computation checkers Stores that don’t update the cache are not detected
- Unlikely error scenario, high-level fixes are expensive
y , g p
Outline Outline
Introduction Basic Argus concept g p Argus-1 checker designs Arg s 1 implementation and e al ation Argus-1 implementation and evaluation Conclusions
Argus-1 Core Specs Argus 1 Core Specs
Based on Verilog model of OpenRISC 1200 core
- 4-stage, single-issue, 32-bit RISC CPU
ll f i l f
- Fully functional, open source core from opencores.org
Removed unnecessary features to obtain a minimal core minimal core
- TLBs, advanced interrupt controller, debug unit
- Worst case for Argus-1 area overhead
Worst case for Argus 1 area overhead
GCC 3.4 used to compile benchmarks
- Patch from opencores.org adds OpenRISC support
Patch from opencores.org adds OpenRISC support
Argus-1 Pipeline Overview Argus 1 Pipeline Overview
Original Argus
Argus-1 Compilation Tool Chain Argus 1 Compilation Tool Chain
assemble compile link sign pad
Original Argus
Embed signatures used for data and control flow checking To minimize code bloat, signatures are embedded in unused instruction bits
O g a gus
unused instruction bits
- Blocks with insufficient unused bits padded with NOPs
Signatures are embedded after linking
- Compute data flow signatures for each block
- Determine legal successor blocks
- Embed signatures of legal successors into unused bits
Embed signatures of legal successors into unused bits
Argus-1 Error Coverage Argus 1 Error Coverage
Coverage results based on error injection experiments
- 5000 test-runs each with a single fault injected into a
5000 test-runs, each with a single fault injected into a different randomly selected gate
- Compare test program run to known correct execution
T t d t fi ti i t i t t
- Test does not use configuration registers, interrupts,
and exception logic
Argus detects 98.0% of transient and 98.8% of g permanent errors that affected test program Most undetected errors due to aliasing in operand it parity
Argus-1 Area Overhead Argus 1 Area Overhead
Synthesized with Synopsys Design Compiler using 250nm
Component Overhead Core 16.6% 8KB, 2‐way D‐Cache 5.1%
Compiler using 250nm VTVT standard cell library
, y 8KB, 2‐way I‐Cache 0% Argus‐1 (Core+Caches) 10.6%
library Laid out with Cadence Silicon Ensemble Cache overhead estimated with CACTI
Argus-1 Performance Overhead Argus 1 Performance Overhead
No direct impact from checkers
- Checkers work in parallel with regular execution
and never stall the pipeline and never stall the pipeline
- CAD tools showed no increase in cycle time
Only impact is from padding blocks to embed Only impact is from padding blocks to embed signatures
- One cycle penalty for each embedded NOP
- Increased pressure on instruction cache
Performance results obtained by running MediaBench on the OR1K simulator MediaBench on the OR1K simulator
Performance Overhead Graph Performance Overhead Graph
Conclusions Conclusions
Self-checking core can be built using a high-level “divide and conquer” approach
- Correctness of this approach can be shown formally
pp y
Individual tasks can be checked using existing checkers with slight alterations
- Result is a self checking core with very low area and
- Result is a self-checking core with very low area and
performance overhead
Not a complete solution for self-checking chip, yet
Mi i d i f i d i i i
- Missing error detection for exception and interrupt circuitry
- Use multi-processor aware memory checker to build self-