eliminating single points of failure in software based
play

Eliminating Single Points of Failure in Software Based Redundancy - PowerPoint PPT Presentation

Eliminating Single Points of Failure in Software Based Redundancy Peter Ulbrich , Martin Ho ff mann, Rdiger Kapitza, Daniel Lohmann, Reiner Schmid and Wolfgang Schrder-Preikschat EDCC May 9, 2012 SYSTEM SOFTWARE GROUP


  1. Eliminating Single Points of Failure in 
 Software ‐ Based Redundancy Peter Ulbrich , Martin Ho ff mann, Rüdiger Kapitza, Daniel Lohmann, 
 Reiner Schmid and Wolfgang Schröder-Preikschat EDCC May 9, 2012 SYSTEM SOFTWARE GROUP http://www4.cs.fau.de

  2. Transient Hardware Faults – A Growing Problem [3] (Shivakumar, 2002) ■ Transient hardware faults (Soft-Errors) ! ■ Induced by e.g., radiation, glitches, insu ffi cient signal integrity ■ Increasingly a ff ecting microcontroller logic ■ Future hardware designs: 
 Even more performance and parallelism 
 � On the price of being less and less reliable ! Peter Ulbrich – ulbrich@cs.fau.de 2

  3. Countermeasures - Hardware Safety-Critical System ! Actuators ' Actuators ' Sensors' Safety'Cri+cal.Applica+on. ■ Hardware-based countermeasures ! ■ Application-specific design or specialised hardware ■ For example ECC, lock-step ! Pragmatic approach (tackles problem right at source) � Hardware costs (e.g., redundancy, checker, …) � Selectivity (e.g., multi-application systems) � Development costs (diverse safety concepts and HW, (re-)certification) Peter Ulbrich – ulbrich@cs.fau.de 3

  4. Countermeasures - Software Safety-Critical System ! Safety'Cri+cal.Applica+on. ↯ Actuators ' Sensors' ■ Different approaches to address transient hardware faults ! ■ Hardware vs. software measures ■ Applicability and costs ! Peter Ulbrich – ulbrich@cs.fau.de 4

  5. Countermeasures - Software ↯ Safety-Critical System ! ✗ Safety'Cri+cal.Applica+on.(1). Actuators ' Sensors' Safety'Cri+cal.Applica+on.(2). Safety'Cri+cal.Applica+on.(3). ■ Different approaches to address transient hardware faults ! ■ Hardware vs. software measures ■ Applicability and costs ■ Software-based triple modular redundancy (TMR) ! ■ Accepted and proven (e.g., recommended for ASIL D error handling) ■ Selective (e.g., multi-application systems) Peter Ulbrich – ulbrich@cs.fau.de 4

  6. Software-Based Redundancy in Detail Safety-Critical System ! Replica.1. Majority. Sensors' Interface. Replica.2. Actuators' Voter. Replica.3. ' Isola/on'domain' Sphere'of'replica/on'(SOR)' ' ■ Software-based TMR requires: ! ■ Temporal and spatial isolation (isolation domains) ■ Interface and Majority Voter ! Peter Ulbrich – ulbrich@cs.fau.de 5

  7. Software-Based Redundancy in Detail Safety-Critical System ! P = 998 1000 1 1 P = P = 1000 1000 ↯ ↯ Replica.1. Majority. Majority. Actuators ' Sensors' Interface. Interface. Replica.2. Actuators' Voter. Voter. Replica.3. ' Isola/on'domain' Sphere'of'replica/on'(SOR)' ' ■ Software-based TMR requires: ! ■ Temporal and spatial isolation (isolation domains) ■ Interface and Majority Voter ■ Single points of failur Single points of failure ! ■ No error detection ■ Very small � Certain probability Peter Ulbrich – ulbrich@cs.fau.de 5

  8. Software-Based Redundancy in Detail Safety-Critical System ! P = ? P = ? P = ? ↯ ↯ Replica.1. Majority. Majority. Actuators ' Sensors' Interface. Interface. Replica.2. Actuators' Voter. Voter. Replica.3. ' Isola/on'domain' Sphere'of'replica/on'(SOR)' ' ■ Software-based TMR requires: ! ■ Temporal and spatial isolation (isolation domains) ■ Interface and Majority Voter ■ Single points of failur Single points of failure ! ■ No error detection ■ Very small � Certain probability ■ Risk analysis ■ Inherently complex ■ Random error distribution? (Nightingale, 2011) Peter Ulbrich – ulbrich@cs.fau.de 5

  9. Software-Based Redundancy in Detail Safety-Critical System ! Replica.1. Majority. Sensors' Interface. Replica.2. Actuators' Voter. Replica.3. ' Isola/on'domain' Sphere'of'replica/on'(SOR)' ' ■ Software-based TMR requires: ! ■ Temporal and spatial isolation (isolation domains) ■ Interface and Majority Voter ■ Single points of failur Single points of failure ! ■ No error detection ■ Very small � Certain probability ■ Risk analysis ■ Inherently complex ■ Random error distribution? (Nightingale, 2011) Peter Ulbrich – ulbrich@cs.fau.de 5

  10. Agenda ■ Introduction ! ■ The Co Combined Red Redundancy Approach ! ■ Eliminating Vulnerabilities ■ High-Reliability Voters ■ Example: UAV Flight Control ! ■ CoRed Implementation ■ Target System: I4 Copter ■ Evaluation ! ■ Experimental Setup ■ Results ■ Conclusion ! Peter Ulbrich – ulbrich@cs.fau.de 6

  11. ��������� ��������� ������������� ������ �������� �������� �������� ����������� ������ ��������� ��������� ����������� �������� ������ ������ ������ ������ ������ ������ ������ ����������� ������ �������� CoRed Overview – Holistic Protection Approach ' ' Isola/on'domain' Encoded'opera/on' Sphere'of'replica/on'(SOR)' ' ' ■ The Combined Redundancy Approach (CoRed) ! { TMR + ! 
 Peter Ulbrich – ulbrich@cs.fau.de 7

  12. ��������� ��������� ������������� ������ �������� �������� �������� ����������� ������ ��������� ��������� ����������� �������� ������ ������ ������ ������ ������ ������ ������ ����������� ������ �������� CoRed Overview – Holistic Protection Approach ' ' Isola/on'domain' Encoded'opera/on' Sphere'of'replica/on'(SOR)' ' ' ■ The Combined Redundancy Approach (CoRed) ! { Data-flow encoding TMR + ! 
 Peter Ulbrich – ulbrich@cs.fau.de 7

  13. ��������� ��������� ������������� ������ �������� �������� �������� ����������� ������ ��������� ��������� ����������� �������� ������ ������ ������ ������ ������ ������ ������ ����������� ������ �������� CoRed Overview – Holistic Protection Approach ' ' Isola/on'domain' Encoded'opera/on' Sphere'of'replica/on'(SOR)' ' ' ■ The Combined Redundancy Approach (CoRed) ! { Data-flow encoding TMR + High-reliability voters ! 
 Peter Ulbrich – ulbrich@cs.fau.de 7

  14. ����������� ��������� ��������� �������� ������������� ������ �������� �������� �������� ����������� ������ ��������� ������ ��������� ������ ������ ������ ������ ������ ������ ����������� ������ �������� CoRed Overview – Holistic Protection Approach 2 1 3 ' ' Isola/on'domain' Encoded'opera/on' Sphere'of'replica/on'(SOR)' ' ' ■ The Combined Redundancy Approach (CoRed) ! { Data-flow encoding TMR + High-reliability voters ■ Holistic Protection Approach ! ■ Input to output protection 
 1 Reading inputs � 2 Processing � 3 Distributing outputs ■ Composability � On application and system level Peter Ulbrich – ulbrich@cs.fau.de 7

  15. Eliminating Input and Output Vulnerabilities SOR' Y Y’ Y Encode. Decode. (Value)' (Encoded'Value)' X X’ X Encode. Decode. (Value)' (Encoded'Value)' ■ Inter-domain data-flow protection ! ■ Checksum vs. Arithmetic code (AN code) ■ AN Code � Encoded data operations ■ Enabler for high-reliability voter ! Peter Ulbrich – ulbrich@cs.fau.de 8

  16. Eliminating Input and Output Vulnerabilities SOR' Y Y’ Y Encode. Decode. (Value)' (Encoded'Value)' X X’ X Encode. Decode. (Value)' (Encoded'Value)' ■ Inter-domain data-flow protection ! ■ Checksum vs. Arithmetic code (AN code) ■ AN Code � Encoded data operations ■ Enabler for high-reliability voter ■ CoRed: Extended AN code Extended AN code (EAN code) ! ■ Based on VCP (Forin, 1989) Peter Ulbrich – ulbrich@cs.fau.de 8

  17. Eliminating Input and Output Vulnerabilities SOR' Y Y’ Y Encode. Decode. (Value)' (Encoded'Value)' X X’ X Encode. Decode. (Value)' (Encoded'Value)' A B X D (Prime)' (Signature)' (Timestamp)' ■ Inter-domain data-flow protection ! ■ Checksum vs. Arithmetic code (AN code) ■ AN Code � Encoded data operations ■ Enabler for high-reliability voter ■ CoRed: Extended AN code Extended AN code (EAN code) ! ■ Based on VCP (Forin, 1989) } X’ = X × A + B X + D ■ Data integrity: Prime ■ Address integrity: Per variable signature ■ Outdated data: Timestamp Peter Ulbrich – ulbrich@cs.fau.de 8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend