Eliminating Single Points of Failure in Software Based Redundancy - - PowerPoint PPT Presentation

eliminating single points of failure in software based
SMART_READER_LITE
LIVE PREVIEW

Eliminating Single Points of Failure in Software Based Redundancy - - PowerPoint PPT Presentation

Eliminating Single Points of Failure in Software Based Redundancy Peter Ulbrich , Martin Ho ff mann, Rdiger Kapitza, Daniel Lohmann, Reiner Schmid and Wolfgang Schrder-Preikschat EDCC May 9, 2012 SYSTEM SOFTWARE GROUP


slide-1
SLIDE 1

SYSTEM SOFTWARE GROUP

Eliminating Single Points of Failure in 
 Software‐Based Redundancy

http://www4.cs.fau.de

Peter Ulbrich, Martin Hoffmann, Rüdiger Kapitza, Daniel Lohmann, 
 Reiner Schmid and Wolfgang Schröder-Preikschat EDCC May 9, 2012

slide-2
SLIDE 2

■ Transient hardware faults (Soft-Errors)!

■ Induced by e.g., radiation, glitches, insufficient signal integrity ■ Increasingly affecting microcontroller logic

■ Future hardware designs:


Even more performance and parallelism


On the price of being less and less reliable !

Transient Hardware Faults – A Growing Problem

2 [3] (Shivakumar, 2002) Peter Ulbrich – ulbrich@cs.fau.de

slide-3
SLIDE 3

Safety-Critical System!

Countermeasures - Hardware

Peter Ulbrich – ulbrich@cs.fau.de 3

■ Hardware-based countermeasures!

■ Application-specific design or specialised hardware ■ For example ECC, lock-step ! Pragmatic approach (tackles problem right at source) Hardware costs (e.g., redundancy, checker, …) Selectivity (e.g., multi-application systems) Development costs (diverse safety concepts and HW, (re-)certification)

Sensors' Actuators' Safety'Cri+cal.Applica+on. Actuators'

slide-4
SLIDE 4

Safety-Critical System!

Countermeasures - Software

Peter Ulbrich – ulbrich@cs.fau.de 4

■ Different approaches to address transient hardware faults!

■ Hardware vs. software measures ■ Applicability and costs

!

Sensors' Actuators' Safety'Cri+cal.Applica+on.↯

slide-5
SLIDE 5

Safety-Critical System!

Countermeasures - Software

Peter Ulbrich – ulbrich@cs.fau.de 4

■ Different approaches to address transient hardware faults!

■ Hardware vs. software measures ■ Applicability and costs

■ Software-based triple modular redundancy (TMR)!

■ Accepted and proven (e.g., recommended for ASIL D error handling) ■ Selective (e.g., multi-application systems)

Sensors' Actuators' Safety'Cri+cal.Applica+on.(2). Safety'Cri+cal.Applica+on.(3). Safety'Cri+cal.Applica+on.(1).

↯ ✗

slide-6
SLIDE 6

Software-Based Redundancy in Detail

Peter Ulbrich – ulbrich@cs.fau.de 5

■ Software-based TMR requires:!

Temporal and spatial isolation (isolation domains)

Interface and Majority Voter

!

Safety-Critical System!

Sensors' Actuators'

Isola/on'domain'

' '

Sphere'of'replica/on'(SOR)'

Replica.2. Replica.3. Replica.1. Majority. Voter. Interface.

slide-7
SLIDE 7

Software-Based Redundancy in Detail

Peter Ulbrich – ulbrich@cs.fau.de 5

■ Software-based TMR requires:!

Temporal and spatial isolation (isolation domains)

Interface and Majority Voter

■ Single points of failur

Single points of failure!

No error detection

Very small Certain probability

Safety-Critical System!

Sensors' Actuators'

Isola/on'domain'

' '

Sphere'of'replica/on'(SOR)'

Replica.2. Replica.3. Replica.1. Majority. Voter. Interface. Actuators' Majority. Voter. Interface.

↯ ↯

P = 1 1000 P = 998 1000 P = 1 1000

slide-8
SLIDE 8

Software-Based Redundancy in Detail

Peter Ulbrich – ulbrich@cs.fau.de 5

■ Software-based TMR requires:!

Temporal and spatial isolation (isolation domains)

Interface and Majority Voter

■ Single points of failur

Single points of failure!

No error detection

Very small Certain probability

■ Risk analysis

Inherently complex

Random error distribution? (Nightingale, 2011)

Safety-Critical System!

Sensors' Actuators'

Isola/on'domain'

' '

Sphere'of'replica/on'(SOR)'

Replica.2. Replica.3. Replica.1. Majority. Voter. Interface. Actuators' Majority. Voter. Interface.

↯ ↯

P = ? P = ? P = ?

slide-9
SLIDE 9

Software-Based Redundancy in Detail

Peter Ulbrich – ulbrich@cs.fau.de 5

■ Software-based TMR requires:!

Temporal and spatial isolation (isolation domains)

Interface and Majority Voter

■ Single points of failur

Single points of failure!

No error detection

Very small Certain probability

■ Risk analysis

Inherently complex

Random error distribution? (Nightingale, 2011)

Safety-Critical System!

Sensors' Actuators'

Isola/on'domain'

' '

Sphere'of'replica/on'(SOR)'

Replica.2. Replica.3. Replica.1. Majority. Voter. Interface.

slide-10
SLIDE 10

Agenda

■ Introduction! ■ The Co

Combined Red Redundancy Approach! ■ Eliminating Vulnerabilities ■ High-Reliability Voters

■ Example: UAV Flight Control !

■ CoRed Implementation ■ Target System: I4Copter

■ Evaluation!

■ Experimental Setup ■ Results

■ Conclusion!

Peter Ulbrich – ulbrich@cs.fau.de 6

slide-11
SLIDE 11

CoRed Overview – Holistic Protection Approach

■ The Combined Redundancy Approach (CoRed)! !

Peter Ulbrich – ulbrich@cs.fau.de 7

  • Encoded'opera/on'

' '

Sphere'of'replica/on'(SOR)' Isola/on'domain'

' '

{

TMR +

slide-12
SLIDE 12

CoRed Overview – Holistic Protection Approach

■ The Combined Redundancy Approach (CoRed)!

Data-flow encoding

!

Peter Ulbrich – ulbrich@cs.fau.de 7

  • Encoded'opera/on'

' '

Sphere'of'replica/on'(SOR)' Isola/on'domain'

' '

{

TMR +

slide-13
SLIDE 13

CoRed Overview – Holistic Protection Approach

■ The Combined Redundancy Approach (CoRed)!

Data-flow encoding High-reliability voters

!

Peter Ulbrich – ulbrich@cs.fau.de 7

  • Encoded'opera/on'

' '

Sphere'of'replica/on'(SOR)' Isola/on'domain'

' '

{

TMR +

slide-14
SLIDE 14

CoRed Overview – Holistic Protection Approach

■ The Combined Redundancy Approach (CoRed)!

Data-flow encoding High-reliability voters

■ Holistic Protection Approach!

■ Input to output protection
 1 Reading inputs 2 Processing 3 Distributing outputs ■ Composability On application and system level

Peter Ulbrich – ulbrich@cs.fau.de 7

  • Encoded'opera/on'

' '

Sphere'of'replica/on'(SOR)' Isola/on'domain'

' '

1 2 3

{

TMR +

slide-15
SLIDE 15

Eliminating Input and Output Vulnerabilities

■ Inter-domain data-flow protection!

■ Checksum vs. Arithmetic code (AN code) ■ AN Code Encoded data operations ■ Enabler for high-reliability voter !

Peter Ulbrich – ulbrich@cs.fau.de 8

SOR' Encode. Encode. X (Value)' Y (Value)' Decode. Decode. X X’ (Encoded'Value)' Y’ (Encoded'Value)' Y

slide-16
SLIDE 16

Eliminating Input and Output Vulnerabilities

■ Inter-domain data-flow protection!

■ Checksum vs. Arithmetic code (AN code) ■ AN Code Encoded data operations ■ Enabler for high-reliability voter

■ CoRed: Extended AN code

Extended AN code (EAN code) ! ■ Based on VCP (Forin, 1989)

Peter Ulbrich – ulbrich@cs.fau.de 8

SOR' Encode. Encode. X (Value)' Y (Value)' Decode. Decode. X X’ (Encoded'Value)' Y’ (Encoded'Value)' Y

slide-17
SLIDE 17

Eliminating Input and Output Vulnerabilities

■ Inter-domain data-flow protection!

■ Checksum vs. Arithmetic code (AN code) ■ AN Code Encoded data operations ■ Enabler for high-reliability voter

■ CoRed: Extended AN code

Extended AN code (EAN code) ! ■ Based on VCP (Forin, 1989) ■ Data integrity:

Prime

■ Address integrity:

Per variable signature

■ Outdated data:

Timestamp

Peter Ulbrich – ulbrich@cs.fau.de 8

SOR' Encode. Encode. X (Value)' Y (Value)' A

(Prime)'

BX

(Signature)'

D

(Timestamp)'

Decode. Decode. X X’ (Encoded'Value)' Y’ (Encoded'Value)' Y

} X’ = X × A + BX + D

slide-18
SLIDE 18

Eliminating Input and Output Vulnerabilities

■ Inter-domain data-flow protection!

■ Checksum vs. Arithmetic code (AN code) ■ AN Code Encoded data operations ■ Enabler for high-reliability voter

■ CoRed: Extended AN code

Extended AN code (EAN code) ! ■ Based on VCP (Forin, 1989) ■ Data integrity:

Prime

■ Address integrity:

Per variable signature

■ Outdated data:

Timestamp

■ Set of arithmetic operands (+, -, *, =, …) ■ Tailored for efficient encoded data voting

Peter Ulbrich – ulbrich@cs.fau.de 8

SOR' Encode. Encode. X (Value)' Y (Value)' A

(Prime)'

BX

(Signature)'

D

(Timestamp)'

} X’ = X × A + BX + D

Decode. Z = X Y Z’

.

slide-19
SLIDE 19

High-Reliability Voter – Basics (1)

■ CoRed Encoded V

Encoded Voter

  • ter!

■ Input: variants ( X’, Y’, Z’ ) ■ Output: Equality set (E) and winner (W) ■ Based on EAN operations No decoding necessary

■ Branch decisions (equality) on encoded data!

■ IFF difference of encoded values equals difference of static signatures


X = Y X’ – Y’ = BX – BY

■ Each branch decision Unique signature!

Peter Ulbrich – ulbrich@cs.fau.de 9

Encode. Encoded.Voter. Replica.2. Encode. Replica.1 Encode. Replica.3. X’ X Y Z Y’ Z’ {E, W} Provider''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''Encoded'Voter'

slide-20
SLIDE 20

High-Reliability Voter – Basics (2)

■ Correct control-flow!

■ Valid decision Unique control-flow path ■ Each path Unique signature

■ Control-flow signatures!

■ Static signature (expected value): Compile-time 


Used as return value E

■ Dynamic signature (actual value): Runtime, computed from variants 


Applied to winner W

■ Validation: Subsequent check (decode)

Peter Ulbrich – ulbrich@cs.fau.de 10

Encode. Encoded.Voter. Replica.2. Encode. Replica.1 Encode. Replica.3. X’ X Y Z Y’ Z’ {E, W} Check.(Decode). X’ Provider''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''Encoded'Voter''''''''''''''''''''''''''''''''Consumer' e.g.,'X’'is'the'winner'

slide-21
SLIDE 21

CoRed Encoded Voter – Example

■ Control-flow monitoring!

■ Finding quorum Static signature ■ Reapply path specific EAN operations Sign winner with dynamic signature ■ Check Subsequent decode

Peter Ulbrich – ulbrich@cs.fau.de 11

X' = Y' X' = Z'

apply(X', sigdyn{X',Y',Z'}) return sigstatic{X',Y',Z'}

Y' = Z'

apply(Y', sigdyn{Y',Z'}) return sigstatic{Y',Z'} apply(X', sigdyn{X',Y'}) return sigstatic{X',Y'}

X' = Z'

apply(X', sigdyn{X',Z'}) return sigstatic{X',Z'} return sigstatic{}

true true false false false true true false

slide-22
SLIDE 22

CoRed Encoded Voter – Example

  • 1. Improper branch decision: Y’ ≠ Z’

■ Voter elects Y’ as winner (which is incorrect) ■ Returns E and W correctly ■ Subsequent decode will fail! sigstatic ≠ sigdyn !

Peter Ulbrich – ulbrich@cs.fau.de 12

X' = Y' X' = Z'

apply(X', sigdyn{X',Y',Z'}) return sigstatic{X',Y',Z'}

Y' = Z'

apply(Y', sigdyn{Y',Z'}) return sigstatic{Y',Z'} apply(X', sigdyn{X',Y'}) return sigstatic{X',Y'}

X' = Z'

apply(X', sigdyn{X',Z'}) return sigstatic{X',Z'} return sigstatic{}

true true false false false true true false

Y' = Z'

apply(Y', sigdyn{Y',Z'}) return sigstatic{Y',Z'}

1

slide-23
SLIDE 23

CoRed Encoded Voter – Example

  • 1. Improper branch decision: Y’ ≠ Z’

■ Voter elects Y’ as winner (which is incorrect) ■ Returns E and W correctly ■ Subsequent decode will fail! sigstatic ≠ sigdyn

  • 2. Faulty jump!

■ Voter elects X’ and computes W correctly ■ Returns incorrect E Again subsequent decode will fail!

Peter Ulbrich – ulbrich@cs.fau.de 12

X' = Y' X' = Z'

apply(X', sigdyn{X',Y',Z'}) return sigstatic{X',Y',Z'}

Y' = Z'

apply(Y', sigdyn{Y',Z'}) return sigstatic{Y',Z'} apply(X', sigdyn{X',Y'}) return sigstatic{X',Y'}

X' = Z'

apply(X', sigdyn{X',Z'}) return sigstatic{X',Z'} return sigstatic{}

true true false false false true true false

1

apply(X', sigdyn{X',Y'}) return sigstatic{X',Y'} apply(X', sigdyn{X',Z'}) return sigstatic{X',Z'}

2

slide-24
SLIDE 24

Implementation

■ CoRed implementation!

■ Easy-to-use C++ templates and libraries ■ Hardware independent: EAN Code and Encoded Voter ■ Thin OS integration layer

■ PXROS-HR (Industry-strength commercial RTOS) ■ CiAO (AUTOSAR-OS compatible)

■ CoRed artefacts Real-time tasks and jobs

■ Runtime-environment requirements!

■ Temporal isolation static schedule (time triggered) ■ Spatial isolation HW-based memory protection

Peter Ulbrich – ulbrich@cs.fau.de 13

slide-25
SLIDE 25

CoRed Protected Flight Control

■ Target System: I4Copter quadrotor platform

■ Industry-grade hardware and software ■ Triple redundant sensor setting ■ Multi-application system

■ Flight control application!

■ Safety-critical ■ Model-based: MATLAB Simulink ■ Embedded Coder C++ code

Peter Ulbrich – ulbrich@cs.fau.de 14

Infineon TriCore TC1796 Redundant Sensor Setting

slide-26
SLIDE 26

Evaluation – Experimental Setup

■ Fault injection Using hardware debugger!

■ Injection of arbitrary fault patterns ■ Minimal-intrusive Minimizing probe effects

■ Fault list generation (Rebaudengo, 1999)

■ Bits × registers × instructions Potentially huge fault space ■ Vast majority of faults are non-effective Systematic elimination


Peter Ulbrich – ulbrich@cs.fau.de 15

System'Under'Test'

Replica 2 EAN Decode EAN Encode Replica 3 EAN Decode Replica 1 EAN Decode EAN Encode EAN Encode CoRed Encoded Tolerance Voter Sensor 1 Sensor 2 Sensor 3 EAN Encode EAN Encode EAN Encode Sensor System Network Interface EAN Decode CoRed Encoded (Exact) Voter Actuator Remote Node

FlightIControl'Applica/on'

Host'Computer' Hardware'Debugger'

FaultIInjec/on' Campaign'Manager' Fault'DB' Results'DB'

[Rebaudengo, 1999]

slide-27
SLIDE 27

Evaluation – Experimental Setup

■ Fault injection Using hardware debugger!

■ Injection of arbitrary fault patterns ■ Minimal-intrusive Minimizing probe effects

■ Fault list generation (Rebaudengo, 1999)

■ Bits × registers × instructions Potentially huge fault space ■ Vast majority of faults are non-effective Systematic elimination


Peter Ulbrich – ulbrich@cs.fau.de 15

System'Under'Test'

Replica 2 EAN Decode EAN Encode Replica 3 EAN Decode Replica 1 EAN Decode EAN Encode EAN Encode CoRed Encoded Tolerance Voter Sensor 1 Sensor 2 Sensor 3 EAN Encode EAN Encode EAN Encode Sensor System Network Interface EAN Decode CoRed Encoded (Exact) Voter Actuator Remote Node

FlightIControl'Applica/on'

Host'Computer' Hardware'Debugger'

FaultIInjec/on' Campaign'Manager' Fault'DB' Results'DB'

[Rebaudengo, 1999]

Outcome: '401,592'experiments' Effec+ve:' '67,617'''errors' Categories:.Fail'Silent,'Masked,'Hardware'Detected,'EANICode,' 'ControlIFlow,'Silent'Data'Corrup/on''

slide-28
SLIDE 28

Evaluation – Experimental Results (1)

■ Redundant execution campaign (Interface) !

■ Total: ~45,000 Errors

Peter Ulbrich – ulbrich@cs.fau.de 16

Data Address

0 % 10 % 20 % 30 % 40 % 50 % 60 % 70 % 80 % 90 % Distribution of Effective Faults Mask HW EAN SDC Mask HW EAN SDC Mask HW EAN SDC Unprotected Plain TMR CoRed TMR

Replica.2. Replica.3. Replica.1. Interface.

Silent Data Corruptions Hardware Detected EAN-Code Detected Masked

slide-29
SLIDE 29

Evaluation – Experimental Results (1)

■ Redundant execution campaign (Interface) !

■ Total: ~45,000 Errors ■ Unprotected: Suffers from 3,622 corruptions!

Peter Ulbrich – ulbrich@cs.fau.de 16

Data Address

0 % 10 % 20 % 30 % 40 % 50 % 60 % 70 % 80 % 90 % Distribution of Effective Faults Mask HW EAN SDC Mask HW EAN SDC Mask HW EAN SDC Unprotected Plain TMR CoRed TMR

Replica.2. Replica.3. Replica.1. Interface.

Silent Data Corruptions Hardware Detected EAN-Code Detected Masked

slide-30
SLIDE 30

Evaluation – Experimental Results (1)

■ Redundant execution campaign (Interface) !

■ Total: ~45,000 Errors ■ Unprotected: Suffers from 3,622 corruptions! ■ TMR: Suffers from 71 corruptions!

Peter Ulbrich – ulbrich@cs.fau.de 16

Data Address

0 % 10 % 20 % 30 % 40 % 50 % 60 % 70 % 80 % 90 % Distribution of Effective Faults Mask HW EAN SDC Mask HW EAN SDC Mask HW EAN SDC Unprotected Plain TMR CoRed TMR

Replica.2. Replica.3. Replica.1. Interface.

Silent Data Corruptions Hardware Detected EAN-Code Detected Masked

slide-31
SLIDE 31

Evaluation – Experimental Results (1)

■ Redundant execution campaign (Interface) !

■ Total: ~45,000 Errors ■ Unprotected: Suffers from 3,622 corruptions! ■ TMR: Suffers from 71 corruptions! ■ CoRed: Remaining corruptions are covered 0 corruptions

Peter Ulbrich – ulbrich@cs.fau.de 16

Data Address

0 % 10 % 20 % 30 % 40 % 50 % 60 % 70 % 80 % 90 % Distribution of Effective Faults Mask HW EAN SDC Mask HW EAN SDC Mask HW EAN SDC Unprotected Plain TMR CoRed TMR

Replica.2. Replica.3. Replica.1. Interface.

Silent Data Corruptions Hardware Detected EAN-Code Detected Masked

slide-32
SLIDE 32

Evaluation – Experimental Results (2)

■ Voter campaign!

Peter Ulbrich – ulbrich@cs.fau.de 17

Data Address

0 % 10 % 20 % 30 % 40 % 50 % 60 % 70 % 80 % 90 % CFM HW EAN SDC Plain Voter CoRed Encoded Voter Mask CFM HW EAN SDC Mask

Replica.2. Replica.3. Replica.1. Voter.

Silent Data Corruptions Hardware Detected EAN-Code Detected Control-flow Monitoring Masked

slide-33
SLIDE 33

Evaluation – Experimental Results (2)

■ Voter campaign!

■ Plain voter:

Total ~11,000 2,465 masked 7,245 retry 1,223 corruptions 


Peter Ulbrich – ulbrich@cs.fau.de 17

Data Address

0 % 10 % 20 % 30 % 40 % 50 % 60 % 70 % 80 % 90 % CFM HW EAN SDC Plain Voter CoRed Encoded Voter Mask CFM HW EAN SDC Mask

Replica.2. Replica.3. Replica.1. Voter.

Silent Data Corruptions Hardware Detected EAN-Code Detected Control-flow Monitoring Masked

slide-34
SLIDE 34

Evaluation – Experimental Results (2)

■ Voter campaign!

■ Plain voter:

Total ~11,000 2,465 masked 7,245 retry 1,223 corruptions

■ CoRed Encoded Voter: 


Total ~26,000 1,228 masked 24,682 retry 0 corruptions

Peter Ulbrich – ulbrich@cs.fau.de 17

Data Address

0 % 10 % 20 % 30 % 40 % 50 % 60 % 70 % 80 % 90 % CFM HW EAN SDC Plain Voter CoRed Encoded Voter Mask CFM HW EAN SDC Mask

Replica.2. Replica.3. Replica.1. Voter.

Silent Data Corruptions Hardware Detected EAN-Code Detected Control-flow Monitoring Masked

slide-35
SLIDE 35

Evaluation – Experimental Results (2)

■ Voter campaign!

■ Plain voter:

Total ~11,000 2,465 masked 7,245 retry 1,223 corruptions

■ CoRed Voter: 


Total ~26,000 1,228 masked 24,682 retry 0 corruptions

Peter Ulbrich – ulbrich@cs.fau.de 18

Data Address

0 % 10 % 20 % 30 % 40 % 50 % 60 % 70 % 80 % 90 % CFM HW EAN SDC Plain Voter CoRed Encoded Voter Mask CFM HW EAN SDC Mask

Replica.2. Replica.3. Replica.1. Voter.

Silent Data Corruptions Hardware Detected EAN-Code Detected Control-flow Monitoring Masked

100 % 120 % 0 µs 20 µs 40 µs 60 µs 80 µs 100 µs CoRed TMR CoRed Pair

Replication Voting EAN Interface State Recovery

Plain TMR WCET Flight-Control Execution Time 100.0% 107.1% 103.2% 115.1% CoRed Pair + Spare

Evaluation – Overhead

■ Overhead Analysis!

■ I4Copter Flight-Control: 7.1% overhead 


(compared to plain TMR)

■ Absolute numbers: 1,842µs (application)


10.2µs (plain voter) vs. 77.6µs (CoRed voter)

■ Selectivity!

■ I4Copter system CPU utilisation: 41% 


Full replication impossible, CPU: 120%

■ Mission-critical replication of flight control


possible with CoRed, CPU: 60%

slide-36
SLIDE 36

Conclusion

! ! !

Peter Ulbrich – ulbrich@cs.fau.de 19

Safety-Critical System!

Sensors' Actuators' Replica.2. Replica.3. Replica.1. Majority. Voter. Interface.

slide-37
SLIDE 37

Conclusion

■ The Combined Software Redundancy Approach (CoRed)!

■ Eliminate Single Points of Failure in software-based TMR ■ No specific application knowledge necessary ■ Holistic approach: input-to-output protection

! !

Peter Ulbrich – ulbrich@cs.fau.de 19

Safety-Critical System!

Sensors' Actuators' Replica.2. Replica.3. Replica.1. Encoded Voter. Encoded. Interface.

Decode. Decode. Decode. Encode. Encode. Encode.

slide-38
SLIDE 38

Conclusion

■ The Combined Software Redundancy Approach (CoRed)!

■ Eliminate Single Points of Failure in software-based TMR ■ No specific application knowledge necessary ■ Holistic approach: input-to-output protection

! !

Peter Ulbrich – ulbrich@cs.fau.de 19

Safety-Critical System!

Sensors' Actuators' Replica.2. Replica.3. Replica.1. Encoded Voter. Encoded. Interface.

Decode. Decode. Decode. Encode. Encode. Encode.

slide-39
SLIDE 39
  • Conclusion

■ The Combined Software Redundancy Approach (CoRed)!

■ Eliminate Single Points of Failure in software-based TMR ■ No specific application knowledge necessary ■ Holistic approach: input-to-output protection

■ Applicability: Flight control!

■ I4Copter MAV ■ Selective and composable

!

Peter Ulbrich – ulbrich@cs.fau.de 19

slide-40
SLIDE 40
  • Conclusion

■ The Combined Software Redundancy Approach (CoRed)!

■ Eliminate Single Points of Failure in software-based TMR ■ No specific application knowledge necessary ■ Holistic approach: input-to-output protection

■ Applicability: Flight control!

■ I4Copter MAV ■ Selective and composable

■ Experimental Results!

■ CoRed is effective Silent data corruptions can be eliminated ■ Only 7.1% overhead (flight control example)

Peter Ulbrich – ulbrich@cs.fau.de 19

slide-41
SLIDE 41

SYSTEM SOFTWARE GROUP

http://www4.cs.fau.de

Thank you!

slide-42
SLIDE 42

References

(1)

International Roadmap for Semiconductors, 2001

(2)

Implications of microcontroller software on safety-critical automotive systems (Infineon 2008)

(3)

P . Shivakumar, M. Kistler, S. W. Keckler, D. Burger, and L. Alvisi, “Modelling the effect of technology trends on the soft error rate of combinational logic,” in DSN ’02: Proceedings of the 2002 International Conference on Dependable Systems and Networks

(4)

Edmund B. Nightingale, John R Douceur, and Vince Orgovan, Cycles, Cells and Platters: An Empirical Analysis of Hardware Failures on a Million Consumer PCs, in Proceedings of EuroSys 2011, Awarded "Best Paper", ACM, April 2011

(5)

  • M. Rebaudengo and M. S. Reorda, “Evaluating the fault tolerance capabilities of embedded

systems via bdm,” VTS 1999

(6)

Forin, “Vital coded microprocessor principles and application for various transit systems”, 1989

Peter Ulbrich – ulbrich@cs.fau.de 21