of Transient Errors Occurring in Processor-based Digital - - PowerPoint PPT Presentation

of transient errors occurring in
SMART_READER_LITE
LIVE PREVIEW

of Transient Errors Occurring in Processor-based Digital - - PowerPoint PPT Presentation

A Software Approach for the Detection of Transient Errors Occurring in Processor-based Digital Architectures: Principles and Experimental Results Dr. Raoul Velazco Director of Researches at CNRS Co-leader of ARIS (Architectures Robust of


slide-1
SLIDE 1

Raoul Velazco – TIMA - ARIS 1

  • Dr. Raoul Velazco

Director of Researches at CNRS Co-leader of ARIS (Architectures Robust of Integrated circuits and Systems) TIMA Labs Grenoble-France

A Software Approach for the Detection

  • f Transient Errors Occurring in

Processor-based Digital Architectures: Principles and Experimental Results

slide-2
SLIDE 2

Raoul Velazco – TIMA - ARIS 2

  • Introduction

Outline

  • State of the Art
  • Methodology for error detection
  • Formal evaluation of proposed error detection technique
  • Automatic Generation of Hardened Programs
  • Experimental Results
  • Conclusion and Perspectives
slide-3
SLIDE 3

Raoul Velazco – TIMA - ARIS 3

Context & Motivation - 1/2

  • Miniaturization due to the constant improvements achieved in

microelectronics technology

– Increased sensitivity to the environment effects (i.e. radiation, EMC, temperature)

  • Processors operating in space are subject to different radiation

phenomena:

– Permanent: Dose effects

  • caused by the cumulated charges trapped in the oxide

– Transient: SEE (Single Event Effects)

  • caused by the impact of a charged particle with sensitive area of a

integrated circuit

slide-4
SLIDE 4

Raoul Velazco – TIMA - ARIS 4

Context & Motivation - 2/2

  • SEL Single Event Latchup - provoking short-circuits between power

supply and ground

– destructive, if is not detected at time

  • SEU Single Event Upset - provoking unexpected modification of

memory cell’s content

– non destructive – depend on the nature of the perturbed cell & the time occurrence

  • SEU’s effects

– incorrect computation – system crash

slide-5
SLIDE 5

Raoul Velazco – TIMA - ARIS 5

Objective & Contributions

Objective

– SIFT (Software Implemented Fault Tolerant) Technique

  • Efficient - high error detection capacity
  • Generic - no hardware dependent
  • Automatisable - fast generation of hardened applications
  • Application Domain - high level SW specification

Contributions

– Improvement of an existing SIFT technique – Automatic Flow to Generate Hardened Applications – Validation for different applications on several processors

  • Fault injection experiments
  • Radiation Campaigns
slide-6
SLIDE 6

Raoul Velazco – TIMA - ARIS 6

  • Introduction

Outline

  • State of the Art
  • Methodology for error detection
  • Formal evaluation of proposed error detection technique
  • Automatic Generation of Hardened Programs
  • Experimental Results
  • Conclusion and Perspectives
slide-7
SLIDE 7

Raoul Velazco – TIMA - ARIS 7

State of the Art

  • Hardware approaches:

– hardware implementation of detection mechanism

  • Hardware/Software approaches:

– hardware and software implementation of detection mechanism

  • Software approaches:

– software implementation of detection mechanism

slide-8
SLIDE 8

Raoul Velazco – TIMA - ARIS 8

Hardware Approaches

  • Design Hardening: modification of the design by suitable techniques

to allow the manufacturing of the reliable circuits

– Logic Gates – Hardened Memory Cells

  • Error Correction Code: adding dedicated circuits for error detection
  • r/and correction for memory cells

– Hamming Code – CRC Code

  • Limitations:

– need hardware modification – not systematic – expensive

slide-9
SLIDE 9

Raoul Velazco – TIMA - ARIS 9

SW/HW approaches

  • Recovery Block:

– a primary software module – alternative software modules (having the same functionality than primary module) – an acceptance test

  • N Version Programming :

– N versions of the same application

  • running in parallel

– a voter to decide the correct output

  • the majority of the outputs
  • Limitations:

– need hardware channels – application dependent

slide-10
SLIDE 10

Raoul Velazco – TIMA - ARIS 10

Software Approaches

  • ABFT (Approach Based on Fault Tolerant): well suited for

application using regular expressions

  • Assertions: insertion of the logic statements at different points of the

program

  • Control Flow Checking: based on signature analysis

– program is decomposed in free-branch blocks – online check of the signature with a golden one (pre-computed)

  • PdT Error Detection Technique:

– transformation rules for error detection introducing redundancy at:

  • data segment
  • program code
slide-11
SLIDE 11

Raoul Velazco – TIMA - ARIS 11

  • Introduction

Outline

  • State of the Art
  • Methodology for error detection
  • Formal evaluation of proposed error detection technique
  • Automatic Generation of Hardened Programs
  • Experimental Results
  • Conclusion and Perspectives
slide-12
SLIDE 12

Raoul Velazco – TIMA - ARIS 12

Proposed Approach

  • Purely Software Approach based on a set of transformation rules
  • The set of rules is issued from:
  • PdT Error Detection Technique
  • The proposed set of rules allowing:
  • Improvement of error detection capacity
  • Reduction of time penalty & the memory space overhead
  • Three Set of Rules are applied to the target program:
  • Data Duplication - targeting errors affecting data
  • Global Execution Flow - targeting errors affecting basic instructions
  • Branching Duplication - targeting errors affecting control instructions
slide-13
SLIDE 13

Raoul Velazco – TIMA - ARIS 13

  • Program: set of specific operations executed by a processor
  • Characteristic elements for a program:

– Data:

  • input
  • intermediaries
  • output

– Instructions:

  • basics: they not change the execution flow

– logic operations (i.e. OR, AND, XOR, NOT) – arithmetic operations (i.e. addition, multiplication, division) – data transfer (i.e. MOV Reg,Mem and vice-versa)

  • control: allow modification of the execution flow

– conditional (i.e. test instructions) – unconditional (i.e. calls and returns from the procedures)

Basic Concepts

slide-14
SLIDE 14

Raoul Velazco – TIMA - ARIS 14

PdT - Data Duplication

  • Every variable is duplicated
  • Every write operation performed on the original variable is repeated

for its replica

  • After each read operation the variables, a consistency check is

introduced between the value of two variables (original and duplicated)

  • Limitation:

– output variables are not checked for consistency

  • errors may not be detected

– time and memory overhead increase direct proportionally with complex

  • perations
slide-15
SLIDE 15

Raoul Velazco – TIMA - ARIS 15

Improved Rules - Data Duplication

  • Identification of the relationships between the variables
  • Classification of the variables according to their role in the program
  • intermediary variables: they are used for calculation of other variable
  • final variables: they do not take part in calculation of any other variable
  • Every variable is duplicated
  • Every write operation performed on the original variable is repeated

for its replica

  • After any write operation on a final variable, a consistency check is

introduced between the value of two variables (original and duplicated)

Legend

  • Added rules
  • Modified rules
slide-16
SLIDE 16

Raoul Velazco – TIMA - ARIS 16

Data Duplication - Example

a = b + 2 c = a + b*6

2 b a 6 + * + c

a1 = b1 + 2 a2 = b2 + 2 c1 = a1 + b1*6 c2 = a2 + b2*6

Applying the proposed rules

a and b are intermediary variables c is final variable

Example of program

if(c1 != c2) error()

slide-17
SLIDE 17

Raoul Velazco – TIMA - ARIS 17

PdT- Global Execution Flow

  • An integer value ki is associated with every block i in the code
  • A global execution check flag gef variable is defined
  • A statement assigning to gef the value of ki is introduced at the

beginning of every block i

  • A test on the value of gef is also introduced at the end of the block
  • Limitation:

– incorrect jumps into the same block are not detected – incorrect jumps to the beginning of another block are not detected – abnormally “reset” of the application are not detected

slide-18
SLIDE 18

Raoul Velazco – TIMA - ARIS 18

Improved Rules - Global Execution Flow

  • Identification of the maximum size blocks in the program
  • Decomposition of maximum size blocks according to the the number
  • f instructions and the instruction’s complexity (computation volume)

in basic blocks

  • A global execution check flag gef is defined, in order to associate an

identification of each basic bloc

  • A boolean variable status_block is defined
  • An integer value ki is associated with every basic block i
  • A statement assigning to gef the value of ki XOR status_block is

introduced at the beginning of every basic block i

  • A test on the value of gef is also introduced at the end of each basic

block

Legend

  • Added rules
  • Modified rules
slide-19
SLIDE 19

Raoul Velazco – TIMA - ARIS 19

i = 10 m = i*5 - j*6 j = 2 k = 3 n = (i + j*6)/(k + 6) n = (i + 4)*k + j/5 m = n*3 + i*j k = i + 3

Global Execution Flow - Example

i = 2 i = 10 j = 2 k = 3 n = (i + j*6)/(k + 6) m = i*5 - j*6 n = (i + 4)*k + j/5 Goto Label m = n*3 + i*j k = i + 3 Label: i = 2 Label: Goto Label n = (i + j*6)/(k + 6) m = i*5 - j*6 i = 10 j = 2 k = 3 n = (i + 4)*k + j/5 m = n*3 + i*j k = i + 3 gef = 1 ^(status_block ^= 1) // ki = 1 gef = 2 ^(status_block ^= 1) // ki = 2 if(gef != 2 && status_block ^= 1) error() gef = 3 ^(status_block ^= 1) // ki = 3 if(gef != 3 && status_block ^= 1) error() gef = 4 ^(status_block ^= 1) // ki = 4 if(gef != 4 && status_block ^= 1) error() gef = 5 ^(status_block ^= 1) // ki = 5 if(gef != 5 && status_block ^= 1) error() if(gef != 1 && status_block ^= 1) error()

Applying the proposed rules Example of program

slide-20
SLIDE 20

Raoul Velazco – TIMA - ARIS 20

PdT - Branching Duplication

Conditional Branching

  • For every test statement the test condition is repeated at the

beginning of the target block for both the “true” and (possible) “false” clause Unconditional Branching

  • An integer value kj is associated with any procedure j in the code
  • Immediately before every return statement of the procedure, the

value kj is assigned to gef

  • A test on the value of gef is also introduced after any call to the

procedure

slide-21
SLIDE 21

Raoul Velazco – TIMA - ARIS 21

Improved Rules - Branching Duplication

Conditional Branching

  • For every test statement the test condition is repeated at the

beginning of the target block for both the “true” and (possible) “false” clause Unconditional Branching

  • A global flag carf is defined in the program
  • An integer value kj is associated with any procedure j in the code
  • At the beginning of each procedure, the value kj is assigned to carf
  • Before each function call we introduce a check on carf value of the

calling function

  • After each function call we introduce a check on carf value of the

called function

Legend

  • Added rules
  • Modified rules
slide-22
SLIDE 22

Raoul Velazco – TIMA - ARIS 22

Conditional Branching Duplication - Example

TRUE FALSE condition Original program Block A Block B FALSE TRUE TRUE FALSE condition TRUE FALSE condition Block A Block B ERROR hardened condition Hardened program

slide-23
SLIDE 23

Raoul Velazco – TIMA - ARIS 23

Unconditional Branching Duplication - Example

calling function called function return carf = 12 check(carf=12) carf = 56 carf = 12 call function check(carf=56) calling function called function return call function

Original program Hardened program

Kj = 12 Kj = 56

slide-24
SLIDE 24

Raoul Velazco – TIMA - ARIS 24

  • Introduction

Outline

  • State of the Art
  • Methodology for errors detection
  • Formal evaluation of proposed error detection technique
  • Automatic Generation of Hardened Programs
  • Experimental Results
  • Conclusion and Perspectives
slide-25
SLIDE 25

Raoul Velazco – TIMA - ARIS 25

  • Program Execution Time

» space

– (1) – (2)

  • space
  • the multitude of the proposed set of rules; where:
  • the hardening function; transformation of a target program into

a hardened one

  • penalty factor; the time overhead introduced by applying a set of

rules on the original program

Formal Evaluation - Basic Concepts

θ

φ Ψ

φ

i n i i P

t k T  

 

   

n i m i ci i li i P

T k T k T

 

gef bd dd φ   

slide-26
SLIDE 26

Raoul Velazco – TIMA - ARIS 26

Concrete case:

  • Intel 80C51

ε = 0,10

  • DSP32C Lucent

ε = 0.09

Application of hardening function Ψ|dd

  • Affecting the read/write operation on the variables

– duplication of all the operation – consistency check performed only for final variables

 

n i i i

  • t

k T

  

n i i i dd

t k T 2

cvi m i i t

k   T

  • - execution time for the original program

Tdd - execution time for the resulting program by applying the Ψ|dd transformation Penalty factor θ

μ T T

  • dd

1 2   1  

V

  • T

T μ

dd

  • dd

T T

ε

2   ) 1 , (

ε

dd

where where

cvi m i i V

t k T  

slide-27
SLIDE 27

Raoul Velazco – TIMA - ARIS 27 gef

  • gef

ε

T T  1

Application of hardening function Ψ|gef

  • Decomposition in basic blocks of the program

– introduction of Tcgef the time spent to check the correct execution flow

T

  • - execution time for the original program

 

   

n i m i ci i li i

  • T

k T k T Tgef - execution time for the resulting program by applying the Ψ|gef transformation Penalty factor θ

T T T

  • gef

1 1 

1  

CB

  • T

T T

) 1 , (

ε

gef

n basic blocks m control instructions where: where Concrete case:

  • Intel 80C51

ε = 0,11

  • DSP32C Lucent

ε = 0.13

 

    

n i m j cj j cgef li i gef

T k T T k T ) (

n j cgef CB

T T

where

slide-28
SLIDE 28

Raoul Velazco – TIMA - ARIS 28

Application of hardening function Ψ|bd

  • Affecting the control instructions

– repeating the test condition (conditional branch) – introduction of Tcarf the time spent to check the correct branching (unconditional branch)

T

  • - execution time for the original program

Tbd - execution time for the resulting program by applying the Ψ|bd transformation

  

     

k j cij j n i l j ccj j li i

  • T

k T k T k T

n basic blocks l conditionl control instructions k unconditional control instructions where:

i c

  • bd

B B T T 1 1 1   

Penalty factor θ

1   

m j ccj j

  • c

T k T B 1   

k j carf j

  • i

T k T B

where Concrete case:

  • Intel 80C51

ε = 0,46

  • DSP32C Lucent

ε = 0.46

bd

  • bd

ε

T T  1 ) 1 , (

ε

bd

  

       

k j carf cij j n i m j ccj j li i bd

T T k T k T k T ) ( 2

slide-29
SLIDE 29

Raoul Velazco – TIMA - ARIS 29

  • Introduction

Outline

  • State of the Art
  • Methodology for errors detection
  • Formal evaluation of proposed error detection technique
  • Automatic Generation of Hardened Programs
  • Experimental Results
  • Conclusion and Perspectives
slide-30
SLIDE 30

Raoul Velazco – TIMA - ARIS 30

C2C Translator

Software Tool

  • High level specification program
  • Parameters:

– Input: target program code – Output: the hardened version of target program – Options:

  • hardening type
  • optimization type
  • Flexible

– set of rules are independently implemented

  • Data Duplication (DD)
  • Global Execution Flag (GEF)
  • Branching Duplication (BD)

C2C Translator Options Original Program Hardened Program

slide-31
SLIDE 31

Raoul Velazco – TIMA - ARIS 31

Automatic generation flow of hardened programs

Hardening Options: Input Program Identification of the maximum block size Identification of the variables Decomposition in basic blocks Hardened Program Optimization Options Check all variables Granularity size GEF BD DD Identification of the variables relationships & variables classification Identification of the control instructions Insertion rules (DD) Insertion rules (BD) Insertion rules (GEF)

slide-32
SLIDE 32

Raoul Velazco – TIMA - ARIS 32

  • Introduction

Outline

  • State of the Art
  • Methodology for errors detection
  • Formal evaluation of proposed error detection technique
  • Automatic Generation of Hardened Programs
  • Experimental Results
  • Conclusion and Perspectives
slide-33
SLIDE 33

Raoul Velazco – TIMA - ARIS 33

Experimental Results

  • Evaluation of proposed approach

– Fault injection experiments – Radiation Testing Campaigns

  • THESIC System

– 80C51 microcontroller based architecture (motherboard) – DUT (Devise Under Test) daughterboard

  • Different target processors

– Intel 80C51 – DSP32C Lucent

slide-34
SLIDE 34

Raoul Velazco – TIMA - ARIS 34

Errors Classification

No Effect

  • Effect Less - the error has no effect on the program behavior

Detected

  • SW Detection - the error is detected by the implemented error

detection mechanism (the transformation rules)

  • HW Detection - the error triggers some hardware mechanism (i.e.

illegal instruction, division by zero) Undetected

  • Loss Sequence - the program triggers time-out condition (i.e.

endless loop)

  • Incorrect Results - the error is not detected in any way and the final

results is different from expected one

slide-35
SLIDE 35

Raoul Velazco – TIMA - ARIS 35

  • Detection Efficiency
  • Error Rate

– D: the number of detected errors by the proposed rules (SW Detection) – H: the number of hardware detected errors (HW Detection) – E: the number of errors escaping to the detection rules (Incorrect Results) – L: the number of errors producing a system crash (Sequence Loss)

Detection Efficiency & Error Rate

H L E D D ξ    

Faults Injected L E τ #  

slide-36
SLIDE 36

Raoul Velazco – TIMA - ARIS 36

Experimental Setup

  • DSP32C Lucent

– RISC Processor – 32 bits – 3 internal RAM (3x2 KB) – 32 KB RAM/EPROM

  • Benchmark program

– CMA (Constant Module Algorithm) – Hardened CMA obtained by using C2C Translator

  • General characteristic of studied programs

CMA Hardened CMA Original Data Size (bytes) Code Size (bytes) Execution Time (cycles) Program’s Characteristics 3,251,254 1,231,162 4,100 1,104 4,032 1,996 2.64x 3.71x 2.02x

slide-37
SLIDE 37

Raoul Velazco – TIMA - ARIS 37

Fault Injection Results - DSP32C’s registers

Benchmark program Detection Efficiency(ξ) Error Rate(τ)

  • 4.83 %

82.62 % 0.22 % CMA Original CMA Hardened

  • Number of injected faults ~ the program execution time

– A program is affected by errors only during its execution

  • More faults were injected for the hardened version

– according to the time penalty factor

  • Drastic reduction of error rate

– 22 times

  • Very good error detection capacity

– 80 %

CMA Hardened Loss Sequence Undetected Error Detected Error Incorrect Results Hardware Detection Software Detection Effect Less Injected Faults Classification of fault’s effect on the benchmark 52800 48720 3371 591 109 9 20000 18810

  • 223

490 477 CMA Original Benchmark Application

slide-38
SLIDE 38

Raoul Velazco – TIMA - ARIS 38

Fault Injection Results - application program code

Benchmark program Detection Efficiency(ξ) Error Rate(τ)

  • 16.58 %

57.22 % 0.97 % CMA Original CMA Hardened

  • Number of injected faults ~ the code size of the program
  • More faults were injected for the hardened version

– according to the memory area overhead factor

  • Drastic reduction of errors rate
  • ~ 17 times
  • Good error detection capacity
  • ~ 60 %

CMA Hardened Loss Sequence Undetected Error Detected Error Incorrect Results Hardware Detection Software Detection Effect Less Injected Faults Classification of fault’s effect on the benchmark 32800 20328 7137 5015 160 160 8832 5951

  • 1416

753 721 CMA Original Benchmark Application

slide-39
SLIDE 39

Raoul Velazco – TIMA - ARIS 39

Fault Injection Results - application workspace

Benchmark program Detection Efficiency(ξ) Error Rate(τ)

  • 18.56 %

100 % 0.00 % CMA Original CMA Hardened

  • More faults were injected for the hardened version

– according to the size of the data workspace

  • No error escape to detection

mechanism CMA Hardened Loss Sequence Undetected Error Detected Error Incorrect Results Hardware Detection Software Detection Effect Less Injected Faults Classification of fault’s effect on the benchmark 10000 6021 3979

  • 5000

4072

  • 928
  • CMA Original

Benchmark Application

slide-40
SLIDE 40

Raoul Velazco – TIMA - ARIS 40

Radiation Test Experiment

Benchmark program CMA Original CMA Hardened Program execution Flux N Exposure Time 351 285 387 525 194 285 506 660

  • Radioactive element

– californium 252

  • Flux = the number of particles reaching the processor per surface unit and time

unit (#particles/ cm2sec.)

  • N = number of the estimated upset
  • Exposure Time = duration of the experiment (sec.)
  • Ta = application execution time

execution upset

N Ta S N     Φ

slide-41
SLIDE 41

Raoul Velazco – TIMA - ARIS 41

Radiation Test Results

CMA Hardened Loss Sequence Undetected Error Detected Error Incorrect Results Hardware Detection Software Detection Observed Upsets Classification of fault’s effect on the benchmark 99 84

  • 15
  • 48
  • 47

1 CMA Original Benchmark Application Benchmark program Detection Efficiency(ξ) Error Rate(τ)

  • 12.40 %

84.85 % 2.96 % CMA Original CMA Hardened

  • reduction of errors rate
  • ~ 4.20 times
  • detection efficiency
  • ~ 85 %
slide-42
SLIDE 42

Raoul Velazco – TIMA - ARIS 42

  • Introduction

Outline

  • State of the Art
  • Methodology for errors detection
  • Formal evaluation of proposed error detection technique
  • Automatic Generation of Hardened Programs
  • Experimental Results
  • Conclusion and Perspectives
slide-43
SLIDE 43

Raoul Velazco – TIMA - ARIS 43

Conclusions - 1/2

  • SIFT technique

– derived from PdT Error Detection Technique – addresses transient errors

  • consequence of environment’s effects

– Suitable for Safety-Critical Application – Advantages:

  • high error detection capacity
  • completely generic & automatisable
  • no hardware dependence

– Disadvantages:

  • ~2.6x execution time penalty
  • ~4x memory space occupied
slide-44
SLIDE 44

Raoul Velazco – TIMA - ARIS 44

Conclusions - 2/2

  • Automatic flow to generate hardened programs

– C2C Translator

  • flexible protection

– Data – Execution Flow – Branching

  • Formal estimation of the hardened applications cost
  • Evaluation of the proposed approach

– Different processors – Fault injection experiments – Radiation testing campaign

slide-45
SLIDE 45

Raoul Velazco – TIMA - ARIS 45

Perspectives

  • More investigation of the proposed approach in order to:

– reduce the error rate – reduce the execution time and memory occupied overhead

  • LWS project

– to find the best compromise

  • SW Approach
  • HW Approach

Satellite panel

LEON unhardened processor Hardened application

v.s.

LEON hardened processor Unhardened application

SW Approach HW Approach