Argus: Low-cost, Comprehensive Error-Detection for Simple Cores - PowerPoint PPT Presentation

Argus: Low-cost, Comprehensive Error-Detection for Simple Cores Albert Meixner, Michael Bauer, Daniel Sorin Duke University

Introduction Introduction � Hardware error rates are expected to rise as CMOS shrinks � Online error detection techniques can keep errors from propagating to the application propagating to the application • Dual Modular Redundancy • Redundant Multithreading • Checker cores (DIVA) Checker cores (DIVA) � Existing techniques are overly expensive for small, simple cores • Simple cores dominate embedded market Si l d i t b dd d k t • Throughput-oriented CMPs utilize simple cores Sun Ultrasparc T1

Argus Goals and Approach Argus Goals and Approach � Goal: Detect both transient and permanent errors in simple cores at low cost � Approach: Decompose program execution into four high-level tasks and check them independently • Control Flow, Data Flow, Computation, Memory Access C l Fl D Fl C i M A � Advantages of high-level decomposition • Checkers exploit task-specific properties to reduce cost Ch k l it t k ifi ti t d t • Unlike per-component checkers, tasks are abstract and implementation-independent p p

Task Decomposition Task Decomposition Dynamic Instruction Instruction Stream Data Flow Data Flow Computation Computation Correct inputs Operation selected and result r1 ← r3+r4 result data passed computed Inputs correctly correctly Memory Control Flow Data Correct transferred transferred instruction instruction correctly from selected for and to memory execution

From Theory to Practice From Theory to Practice Ideal Checker Tasks Tasks Checkers Hardware CF A CF A CF Z Control Flow CF Checker Definit Hardw DF A DF Z DF Z Data Flow DF Checker Form Desig mal ware Computation Computation gn tion CC A CC A CC Z Computation Checker Memory MC A MC A MC Z Memory A A Z Checker Checker Completeness Equivalence Proof Proof * * * * “Checkers ensure “Argus-1 checkers correct execution” are equivalent to ideal checkers”

Limitations Limitations � Completeness Proof assumes no interrupts, exceptions, or I/O • Single fault assumption Single fault assumption � Equivalence Proof holds under limiting assumptions • Equivalence only holds at block boundary • Perfect checksums (no aliasing) • Known coverage hole in memory checker • Known coverage hole in memory checker � When assumptions are violated, errors can go undetected

Outline Outline � Introduction � Basic Argus concept g p � Argus-1 checker designs � Arg s 1 implementation and e al ation � Argus-1 implementation and evaluation � Conclusions

Control Flow Checker Control Flow Checker loop: loop: A � Similar to prior control flow … checkers bnez r1, L1 � Assign each basic block B C address-independent ID dd i d d t ID L1: • ID computed from block B C … … contents j L2 � Embed IDs of legal � Embed IDs of legal D D successors in each block • Most blocks have one or two L2: legal successors D D … • Pick correct ID at runtime bez r2, loop � Indirect branch addresses are A E more challenging • • See paper See paper E ret

Data Flow Checker Data Flow Checker � B � Based on “Dynamic B Basic Block d “D i sub b r4, r2, r3 4 2 3 mul r5, r2, r6 Dataflow Verification” add r3, r5, r4 • Presented at PACT 2007 k • Compiler computes reference data flow r2 r3 r6 Dat signatures for basic Values Val es bl blocks k aflow Gra protected • Data flow checker tracks actual data flow and with EDC compares to reference t f ph � Data flow signatures are r4 r3 r5 used as block IDs for control flow checker Dataflow Signature

Computation Checkers Computation Checkers � Not a single monolithic checker, but multiple sub-checkers for different operations • Large amounts of prior work on computation checking Large amounts of prior work on computation checking � Operations are checked using redundant hardware • Exploit that checking computation is often easier than performing it � Multiply checker trades coverage for cost • Replay modulo 31 • Non zero probability of missing errors due to aliasing • Non-zero probability of missing errors due to aliasing

Memory Checker Memory Checker � Data corruption detected using parity � Addressing errors are transformed into data corruption • Error in cache logic transforms access to address A into E i h l i t f t dd A i t access to address B • No storage overhead, addresses are embedded into data words � Address computation and alignment errors are detected by redundant computation checkers by redundant computation checkers � Stores that don’t update the cache are not detected • Unlikely error scenario, high-level fixes are expensive y , g p

Outline Outline � Introduction � Basic Argus concept g p � Argus-1 checker designs � Arg s 1 implementation and e al ation � Argus-1 implementation and evaluation � Conclusions

Argus-1 Core Specs Argus 1 Core Specs � Based on Verilog model of OpenRISC 1200 core • 4-stage, single-issue, 32-bit RISC CPU • Fully functional, open source core from opencores.org ll f i l f � Removed unnecessary features to obtain a minimal core minimal core • TLBs, advanced interrupt controller, debug unit • Worst case for Argus-1 area overhead Worst case for Argus 1 area overhead � GCC 3.4 used to compile benchmarks • Patch from opencores.org adds OpenRISC support Patch from opencores.org adds OpenRISC support

Argus-1 Pipeline Overview Argus 1 Pipeline Overview Original Argus

Argus 1 Compilation Tool Chain Argus-1 Compilation Tool Chain compile pad assemble link sign O g a Original Argus gus � Embed signatures used for data and control flow checking � To minimize code bloat, signatures are embedded in unused instruction bits unused instruction bits • Blocks with insufficient unused bits padded with NOPs � Signatures are embedded after linking • Compute data flow signatures for each block • Determine legal successor blocks • Embed signatures of legal successors into unused bits Embed signatures of legal successors into unused bits

Argus-1 Error Coverage Argus 1 Error Coverage � Coverage results based on error injection experiments • 5000 test-runs each with a single fault injected into a 5000 test-runs, each with a single fault injected into a different randomly selected gate • Compare test program run to known correct execution • Test does not use configuration registers, interrupts, T t d t fi ti i t i t t and exception logic � Argus detects 98.0% of transient and 98.8% of g permanent errors that affected test program � Most undetected errors due to aliasing in operand parity it

Argus 1 Area Overhead Argus-1 Area Overhead � Synthesized with Component Overhead Core 16.6% Synopsys Design 8KB, 2 ‐ way D ‐ Cache , y 5.1% Compiler using 250nm Compiler using 250nm 8KB, 2 ‐ way I ‐ Cache 0% VTVT standard cell Argus ‐ 1 (Core+Caches) 10.6% library library � Laid out with Cadence Silicon Ensemble � Cache overhead estimated with CACTI

Argus-1 Performance Overhead Argus 1 Performance Overhead � No direct impact from checkers • Checkers work in parallel with regular execution and never stall the pipeline and never stall the pipeline • CAD tools showed no increase in cycle time � Only impact is from padding blocks to embed Only impact is from padding blocks to embed signatures • One cycle penalty for each embedded NOP • Increased pressure on instruction cache � Performance results obtained by running MediaBench on the OR1K simulator MediaBench on the OR1K simulator

Performance Overhead Graph Performance Overhead Graph

Conclusions Conclusions � Self-checking core can be built using a high-level “divide and conquer” approach • Correctness of this approach can be shown formally pp y � Individual tasks can be checked using existing checkers with slight alterations • Result is a self checking core with very low area and • Result is a self-checking core with very low area and performance overhead � Not a complete solution for self-checking chip, yet • Missing error detection for exception and interrupt circuitry Mi i d i f i d i i i • Use multi-processor aware memory checker to build self- checking CMP

Argus: Low-cost, Comprehensive Error-Detection for Simple Cores - PowerPoint PPT Presentation

Argus: Low-cost, Comprehensive Error-Detection for Simple Cores Albert Meixner, Michael Bauer, Daniel Sorin Duke University Introduction Introduction Hardware error rates are expected to rise as CMOS shrinks Online error detection

Argus Control Systems Our Offering Control for Walk-In and Reach - Environmental Controls

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits

Error Detection Codes Error Detection Two types Nave scheme Error Detection Codes

Introduction to Argus http://qosient.com/argus FloCon 2010 New Orleans, La Jan 11, 2010 Carter

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Error Detection Two types Error Detection Codes (e.g. CRC, Parity, Checksums) Error

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

AI Argus A Unique Insight Into Logistics cs Neo Song SF Technology Department of Computer

Nickel & the Battery Market 1 Argus Metals Week NiCoMo 2019 2 Were in the Midst of

TUTORIAL - TUTORIAL -ABC ABC TOTAL COST for a COST OBJECT TOTAL COST for a COST OBJECT

THE COST OF MEDICAL ERROR The IOM Quality Chasm Series Is Healthcare Dangerous? Cost of Medical

Introduction to Machine Learning Evaluation: Test Error Learning goals training error 0.06

Machine Learning for NLP SVMs for semantic error detection Aurlie Herbelot 2018 Centre for

Physical layer Error detection, correction Martin Heusse X L A TEX E Error detection

Measurement of Timing Error Detection Performance of Software-based Error Detection Mechanisms

Welcome Parents of the Class of 2025 to Woodridge Middle School! Mrs. Shannon Umfleet, Principal

Hays County CRIMINAL JUSTICE SYSTEM UPDATE AND JAIL FACILITY ASSESSMENT FINAL REPORT PRESENTATION

Maple: Simplifying SDN Programming Using Algorithmic Policies Andreas Voellmy Junchang Wang

S9709 Dynamic Sharing of f GPUs and IO IO in in a PCIe Network Hkon Kvale Stensland Senior

Monthl thly y Contra tract ctor or Safe afety ty Meeti ting ng July y 2018 fcx.com

Wind Resistance of Green Roof Systems in Florida Developing A Wind Test Protocol

DRAFT OVERTON POWER DISTRICT NO. 5 FINANCIAL AND STATISTICAL REPORT FROM 01/19 THRU 06/19 PART

KINGSONS Founded in 2006 in Hong Kong, Kingsons focuses on stylish bags and backpacks for the

Sambuz

Useful Links

Newsletter

Mail Us

Argus: Low-cost, Comprehensive Error-Detection for Simple Cores - PowerPoint PPT Presentation

Argus: Low-cost, Comprehensive Error-Detection for Simple Cores Albert Meixner, Michael Bauer, Daniel Sorin Duke University Introduction Introduction Hardware error rates are expected to rise as CMOS shrinks Online error detection

Argus Control Systems Our Offering Control for Walk-In and Reach - Environmental Controls

ERROR DETECTON &amp; CORRECTION Error Detection EDC= Error Detection and Correction bits

Error Detection Codes Error Detection Two types Nave scheme Error Detection Codes

Introduction to Argus http://qosient.com/argus FloCon 2010 New Orleans, La Jan 11, 2010 Carter

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Error Detection Two types Error Detection Codes (e.g. CRC, Parity, Checksums) Error

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

AI Argus A Unique Insight Into Logistics cs Neo Song SF Technology Department of Computer

Nickel &amp; the Battery Market 1 Argus Metals Week NiCoMo 2019 2 Were in the Midst of

TUTORIAL - TUTORIAL -ABC ABC TOTAL COST for a COST OBJECT TOTAL COST for a COST OBJECT

THE COST OF MEDICAL ERROR The IOM Quality Chasm Series Is Healthcare Dangerous? Cost of Medical

Introduction to Machine Learning Evaluation: Test Error Learning goals training error 0.06

Machine Learning for NLP SVMs for semantic error detection Aurlie Herbelot 2018 Centre for

Physical layer Error detection, correction Martin Heusse X L A TEX E Error detection

Measurement of Timing Error Detection Performance of Software-based Error Detection Mechanisms

Welcome Parents of the Class of 2025 to Woodridge Middle School! Mrs. Shannon Umfleet, Principal

Hays County CRIMINAL JUSTICE SYSTEM UPDATE AND JAIL FACILITY ASSESSMENT FINAL REPORT PRESENTATION

Maple: Simplifying SDN Programming Using Algorithmic Policies Andreas Voellmy Junchang Wang

S9709 Dynamic Sharing of f GPUs and IO IO in in a PCIe Network Hkon Kvale Stensland Senior

Monthl thly y Contra tract ctor or Safe afety ty Meeti ting ng July y 2018 fcx.com

Wind Resistance of Green Roof Systems in Florida Developing A Wind Test Protocol

DRAFT OVERTON POWER DISTRICT NO. 5 FINANCIAL AND STATISTICAL REPORT FROM 01/19 THRU 06/19 PART

KINGSONS Founded in 2006 in Hong Kong, Kingsons focuses on stylish bags and backpacks for the

Sambuz

Useful Links

Newsletter

Mail Us

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits

Nickel & the Battery Market 1 Argus Metals Week NiCoMo 2019 2 Were in the Midst of