Argus: Low-cost, Comprehensive Error-Detection for Simple Cores - - PowerPoint PPT Presentation

argus low cost comprehensive error detection for simple
SMART_READER_LITE
LIVE PREVIEW

Argus: Low-cost, Comprehensive Error-Detection for Simple Cores - - PowerPoint PPT Presentation

Argus: Low-cost, Comprehensive Error-Detection for Simple Cores Albert Meixner, Michael Bauer, Daniel Sorin Duke University Introduction Introduction Hardware error rates are expected to rise as CMOS shrinks Online error detection


slide-1
SLIDE 1

Argus: Low-cost, Comprehensive Error-Detection for Simple Cores

Albert Meixner, Michael Bauer, Daniel Sorin Duke University

slide-2
SLIDE 2

Introduction Introduction

Hardware error rates are expected to rise as CMOS shrinks Online error detection techniques can keep errors from propagating to the application propagating to the application

  • Dual Modular Redundancy
  • Redundant Multithreading
  • Checker cores (DIVA)

Checker cores (DIVA)

Existing techniques are overly expensive for small, simple cores

Si l d i t b dd d k t

  • Simple cores dominate embedded market
  • Throughput-oriented CMPs utilize simple cores

Sun Ultrasparc T1

slide-3
SLIDE 3

Argus Goals and Approach Argus Goals and Approach

Goal: Detect both transient and permanent errors in simple cores at low cost Approach: Decompose program execution into four high-level tasks and check them independently

C l Fl D Fl C i M A

  • Control Flow, Data Flow, Computation, Memory Access

Advantages of high-level decomposition

Ch k l it t k ifi ti t d t

  • Checkers exploit task-specific properties to reduce cost
  • Unlike per-component checkers, tasks are abstract and

implementation-independent p p

slide-4
SLIDE 4

Task Decomposition Task Decomposition

Dynamic Instruction

Data Flow Computation

Instruction Stream

Data Flow

Correct inputs selected and

Computation

Operation result

Inputs

result

data passed correctly computed correctly

r1←r3+r4

Control Flow

Correct instruction

Memory

Data transferred instruction selected for execution transferred correctly from and to memory

slide-5
SLIDE 5

From Theory to Practice From Theory to Practice

Tasks Ideal Checker Tasks

Control Flow

Checkers

CF Checker

Hardware CFZ CFA CFA

Data Flow DF Checker Computation

Form Definit Hardw Desig DFA DFZ DFZ

Computation Memory Computation Checker Memory Checker

mal tion ware gn CCZ CCA MCA MCZ CCA MCA

Completeness Proof

Checker

A Z

Equivalence Proof

A

* *

“Checkers ensure correct execution” “Argus-1 checkers are equivalent to ideal checkers”

* *

slide-6
SLIDE 6

Limitations Limitations

Completeness Proof assumes no interrupts, exceptions, or I/O

  • Single fault assumption

Single fault assumption

Equivalence Proof holds under limiting assumptions

  • Equivalence only holds at block boundary
  • Perfect checksums (no aliasing)
  • Known coverage hole in memory checker
  • Known coverage hole in memory checker

When assumptions are violated, errors can go undetected

slide-7
SLIDE 7

Outline Outline

Introduction Basic Argus concept g p Argus-1 checker designs Arg s 1 implementation and e al ation Argus-1 implementation and evaluation Conclusions

slide-8
SLIDE 8

Control Flow Checker Control Flow Checker

loop:

Similar to prior control flow checkers Assign each basic block dd i d d t ID

B C

A

loop: … bnez r1, L1

address-independent ID

  • ID computed from block

contents

Embed IDs of legal

B C

L1: … j L2 …

Embed IDs of legal successors in each block

  • Most blocks have one or two

legal successors

D D

D

L2:

  • Pick correct ID at runtime

Indirect branch addresses are more challenging

  • See paper

A E

D

… bez r2, loop

  • See paper

E

ret

slide-9
SLIDE 9

Data Flow Checker Data Flow Checker

B d “D i

b 4 2 3 B

Based on “Dynamic Dataflow Verification”

  • Presented at PACT 2007

sub r4, r2, r3 mul r5, r2, r6 add r3, r5, r4 Basic Block

  • Compiler computes

reference data flow signatures for basic bl k

r2 r3 r6

k Dat

Val es blocks

  • Data flow checker tracks

actual data flow and t f

aflow Gra

Values protected with EDC compares to reference

Data flow signatures are used as block IDs for

r4 r3 r5

ph

control flow checker

Dataflow Signature

slide-10
SLIDE 10

Computation Checkers Computation Checkers

Not a single monolithic checker, but multiple sub-checkers for different operations

  • Large amounts of prior work on computation checking

Large amounts of prior work on computation checking

Operations are checked using redundant hardware

  • Exploit that checking computation is often easier than performing it

Multiply checker trades coverage for cost

  • Replay modulo 31
  • Non zero probability of missing errors due to aliasing
  • Non-zero probability of missing errors due to aliasing
slide-11
SLIDE 11

Memory Checker Memory Checker

Data corruption detected using parity Addressing errors are transformed into data corruption

E i h l i t f t dd A i t

  • Error in cache logic transforms access to address A into

access to address B

  • No storage overhead, addresses are embedded into data

words

Address computation and alignment errors are detected by redundant computation checkers by redundant computation checkers Stores that don’t update the cache are not detected

  • Unlikely error scenario, high-level fixes are expensive

y , g p

slide-12
SLIDE 12

Outline Outline

Introduction Basic Argus concept g p Argus-1 checker designs Arg s 1 implementation and e al ation Argus-1 implementation and evaluation Conclusions

slide-13
SLIDE 13

Argus-1 Core Specs Argus 1 Core Specs

Based on Verilog model of OpenRISC 1200 core

  • 4-stage, single-issue, 32-bit RISC CPU

ll f i l f

  • Fully functional, open source core from opencores.org

Removed unnecessary features to obtain a minimal core minimal core

  • TLBs, advanced interrupt controller, debug unit
  • Worst case for Argus-1 area overhead

Worst case for Argus 1 area overhead

GCC 3.4 used to compile benchmarks

  • Patch from opencores.org adds OpenRISC support

Patch from opencores.org adds OpenRISC support

slide-14
SLIDE 14

Argus-1 Pipeline Overview Argus 1 Pipeline Overview

Original Argus

slide-15
SLIDE 15

Argus-1 Compilation Tool Chain Argus 1 Compilation Tool Chain

assemble compile link sign pad

Original Argus

Embed signatures used for data and control flow checking To minimize code bloat, signatures are embedded in unused instruction bits

O g a gus

unused instruction bits

  • Blocks with insufficient unused bits padded with NOPs

Signatures are embedded after linking

  • Compute data flow signatures for each block
  • Determine legal successor blocks
  • Embed signatures of legal successors into unused bits

Embed signatures of legal successors into unused bits

slide-16
SLIDE 16

Argus-1 Error Coverage Argus 1 Error Coverage

Coverage results based on error injection experiments

  • 5000 test-runs each with a single fault injected into a

5000 test-runs, each with a single fault injected into a different randomly selected gate

  • Compare test program run to known correct execution

T t d t fi ti i t i t t

  • Test does not use configuration registers, interrupts,

and exception logic

Argus detects 98.0% of transient and 98.8% of g permanent errors that affected test program Most undetected errors due to aliasing in operand it parity

slide-17
SLIDE 17

Argus-1 Area Overhead Argus 1 Area Overhead

Synthesized with Synopsys Design Compiler using 250nm

Component Overhead Core 16.6% 8KB, 2‐way D‐Cache 5.1%

Compiler using 250nm VTVT standard cell library

, y 8KB, 2‐way I‐Cache 0% Argus‐1 (Core+Caches) 10.6%

library Laid out with Cadence Silicon Ensemble Cache overhead estimated with CACTI

slide-18
SLIDE 18

Argus-1 Performance Overhead Argus 1 Performance Overhead

No direct impact from checkers

  • Checkers work in parallel with regular execution

and never stall the pipeline and never stall the pipeline

  • CAD tools showed no increase in cycle time

Only impact is from padding blocks to embed Only impact is from padding blocks to embed signatures

  • One cycle penalty for each embedded NOP
  • Increased pressure on instruction cache

Performance results obtained by running MediaBench on the OR1K simulator MediaBench on the OR1K simulator

slide-19
SLIDE 19

Performance Overhead Graph Performance Overhead Graph

slide-20
SLIDE 20

Conclusions Conclusions

Self-checking core can be built using a high-level “divide and conquer” approach

  • Correctness of this approach can be shown formally

pp y

Individual tasks can be checked using existing checkers with slight alterations

  • Result is a self checking core with very low area and
  • Result is a self-checking core with very low area and

performance overhead

Not a complete solution for self-checking chip, yet

Mi i d i f i d i i i

  • Missing error detection for exception and interrupt circuitry
  • Use multi-processor aware memory checker to build self-

checking CMP