The Case for Run- -Time Error Checking Time Error Checking The - - PDF document

the case for run time error checking time error checking
SMART_READER_LITE
LIVE PREVIEW

The Case for Run- -Time Error Checking Time Error Checking The - - PDF document

Correction The Case for Run- -Time Error Checking Time Error Checking The Case for Run Todd Austin Advanced Computer Architecture Lab University of Michigan Advanced Computer Architecture Lab The Case for Run-Time Correction University of


slide-1
SLIDE 1

1

Advanced Computer Architecture Lab University of Michigan The Case for Run-Time Correction Todd Austin

The Case for Run The Case for Run-

  • Time Error Checking

Time Error Checking

Todd Austin Advanced Computer Architecture Lab University of Michigan

Correction

Advanced Computer Architecture Lab University of Michigan The Case for Run-Time Correction Todd Austin

Classic Wins in Run Classic Wins in Run-

  • Time Error Correction

Time Error Correction

slide-2
SLIDE 2

2

Advanced Computer Architecture Lab University of Michigan The Case for Run-Time Correction Todd Austin

Run Run-

  • Time Error Correction

Time Error Correction

BFD

  • Checker
Advanced Computer Architecture Lab University of Michigan The Case for Run-Time Correction Todd Austin

Benefits of Run Benefits of Run-

  • Time Error Correction

Time Error Correction

  • Reduce design complexity/cost

– E.g., checker covers hard to find bugs, reduces time-to-market – E.g., checker lends itself to full formal verification

  • Reduce manufacturing complexity/cost

– E.g., fully-testable checker covers defects missed in checked components – E.g., on-chip checkers can be used as a high-B/W tester

  • Optimize a design by eliminating fault-avoidance margins/complexity

– E.g., Razor circuit operation at subcritical voltages – E.g., iA32 checker covers partial memory forwards thru virtual aliases

  • Correct run-time upsets

– E.g., cover SER events – E.g., design for unlikely noise events (rather than "possible" noise events)

slide-3
SLIDE 3

3

Advanced Computer Architecture Lab University of Michigan The Case for Run-Time Correction Todd Austin

Deployment Challenges Deployment Challenges

  • Designer mind-set

– “This is a step backwards to go forward.” – “This is a `sloppy’ approach.”

  • Remedies

– Growing GSRC mindshare – Generate value-added applications – Define desirable “baby steps” to ultimate goals

Advanced Computer Architecture Lab University of Michigan The Case for Run-Time Correction Todd Austin

Functional Correction: DIVA Checker Processor Functional Correction: DIVA Checker Processor

  • The fatalist’s approach to microprocessor verification!
  • Core technology: dynamic verification

– Simple (and correct) checker processor verifies all results before retirement – Reduces the burden of correctness on the core processor design – Checker can be simple by relying on core for branch/address predictions

  • Fundamentally changes the design of a complex microprocessor

– Beta release processors – Low-cost SER protection

speculative instructions in-order with inputs and outputs

Fault Tolerant Core Checker

IF ID REN REG EX/ MEM SCHEDULER CHK Reg/Mem & Caches non-speculative inputs CT
slide-4
SLIDE 4

4

Advanced Computer Architecture Lab University of Michigan The Case for Run-Time Correction Todd Austin

Timing Correction: Razor Low Timing Correction: Razor Low-

  • Power Pipeline

Power Pipeline

  • In-situ error detection and correction

– Delayed shadow latch secures a “second opinion” on all stage computation – Detected errors corrected using microarchitecture speculation recovery mechanism – Tune processor voltage based on error rate

  • Eliminate process, temperature, and safety margins
  • Purposely run below critical operation voltage to capture data margins
0 . 8 1 . 0 1 . 2 1 . 4 1 . 6 1 . 8 2 . 0 2 0 4 0 6 0 S u p p ly V o lt a g e Percentage Errors traditional DVS zero margin sub critical DVS Error_L Error comparator RAZOR FF clk_del Main Flip-Flop clk Shadow Latch Q1 D1 1 Advanced Computer Architecture Lab University of Michigan The Case for Run-Time Correction Todd Austin

How Can the Simple Checker Keep Up? How Can the Simple Checker Keep Up?

Slipstream

  • Slipstream reduces power requirements of trailing car
  • Checker processor executes inside core processor’s slipstream

– fast moving air ⇒ branch/value predictions and cache prefetches – Core processor slipstream reduces complexity requirements of checker – Checker rarely sees branch mispredictions, data hazards, or cache misses

slide-5
SLIDE 5

5

Advanced Computer Architecture Lab University of Michigan The Case for Run-Time Correction Todd Austin

How Can the Simple Checker Keep Up? How Can the Simple Checker Keep Up?

Slipstream

  • Slipstream reduces power requirements of trailing car
  • Checker processor executes inside core processor’s slipstream

– fast moving air ⇒ branch/value predictions and cache prefetches – Core processor slipstream reduces complexity requirements of checker – Checker rarely sees branch mispredictions, data hazards, or cache misses