Adaptability and Fault Tolerance Adaptability and Fault Tolerance Rogé ério rio de Lemos de Lemos Rog University of Kent, UK University of Kent, UK � Context: self-* and dependability; � Focus: adaptability and fault tolerance; � State of the art; � Conclusions; ICSE 2006 SEAMS – May 2006 – 1 Rogério de Lemos
Self- -* and Dependability * and Dependability Self � Dependability: � the ability to deliver service that can justifiably be trusted; � Self-* properties of systems: � the support for autonomy; � self-adaptable, self-managing, self-optimising, self-healing, self-repairing, self-configuring, etc. � Adaptability: � the ability of a system of accommodating changes while providing its specified services; run-time changes; � ICSE 2006 SEAMS – May 2006 – 2 Rogério de Lemos
Dependability Dependability Dependability - the ability to avoid service failures that are Dependability more frequent and more severe than is acceptable; � threats threats - undesired, but in principle expected � circumstances: � faults, errors and failures; � attributes attributes – properties of the system: � � reliability, availability, integrity, confidentiality, and safety; � technologies technologies – methods and techniques for providing � and reach confidence on ability to attain dependability: � rigorous design, validation & verification, fault tolerance, and system evaluation; ICSE 2006 SEAMS – May 2006 – 3 Rogério de Lemos
Dependability - - Threats Threats Dependability adjudged or ( Yves Yves Deswarte Deswarte & David Powell & David Powell ) ) ( hypothesized cause of an error Fault that part of system state which may lead to a failure activation occurs when delivered service deviates from Error Error implementing the system function propagation Failure Failure causation activation Fault Error Fault Error ICSE 2006 SEAMS – May 2006 – 4 Rogério de Lemos
Adaptability - - Initiators Initiators Adaptability � Changes: � the act, process, or result of altering or modifying; � internal changes: component failures, overload of resources, etc. � � external changes: environmental, requirements, etc. � � There is no fundamental chain of adaptability initiators; ICSE 2006 SEAMS – May 2006 – 5 Rogério de Lemos
Threats and Initiators Threats and Initiators � Changes correspond to events (faults): � changes can be dormant if not activated; � What is the consequence of change (errors)? � what would be the equivalent to error free and erroneous states? � these states are created when changes are activated and can remain latent until detected; � What is the equivalent of failure? � unsuccessful adaptation? � the system might continue to provide its services, but ignoring the change; ICSE 2006 SEAMS – May 2006 – 6 Rogério de Lemos
Dependability - - Technologies Technologies Dependability Fault avoidance : build a system with no faults: Fault avoidance � rigorous design – fault prevention; � formal and rigorous notations, processes, adapters, etc. � verification & validation – fault removal; � model checking, fault injection, testing, simulation, etc. Fault acceptance : impossible to rid the system of faults: Fault acceptance � fault tolerance; � system evaluation – fault forecasting; � empirical approaches, Markov models, etc. ICSE 2006 SEAMS – May 2006 – 7 Rogério de Lemos
Fault Tolerance Fault Tolerance Fault tolerance aims at avoiding the failure of the system: Fault tolerance � error detection : � detects the presence of errors; � recovery : � transforms a system state that contains errors or faults into a error free state, or faults that can be re-activated; error handling: � eliminates errors from the system state; � fault handling: � prevents faults from being activated again; � diagnosis, isolation and reconfiguration; � ICSE 2006 SEAMS – May 2006 – 8 Rogério de Lemos
Fault Tolerance Fault Tolerance adjudged or hypothesized ( Yves Yves Deswarte Deswarte & David Powell & David Powell ) ) ( cause of an error Fault that part of system state which may lead to a failure Fault Handling Fault Handling Fault Handling occurs when delivered service deviates from Diagnosis, Isolation, Diagnosis, Isolation, Error Error Diagnosis, Isolation, implementing the Reconfiguration, Reconfiguration, Reconfiguration, system function Reinitialization Reinitialization Reinitialization Error Detection Error Detection Error Handling Error Handling Error Handling Rollback, Rollforward Rollforward, , Rollback, Rollback, Rollforward, Failure Failure Compensation Compensation Compensation ICSE 2006 SEAMS – May 2006 – 9 Rogério de Lemos
System Structure System Structure Fault tolerance is about system structuring; � structure is what enables the system to generate the behaviour; � determines how effectively this structuring can be used to provide means of error confinement error confinement ; � avoid the propagation of errors; � what interactions can exist and at what rate; � it is not restricted to system architecture; Structural flexibility the basis for adaptation; ICSE 2006 SEAMS – May 2006 – 10 Rogério de Lemos
Fault Assumptions Fault Assumptions Faults are undesirable, though expected circumstances: � systems can fail in many different ways; In the design of fault-tolerant systems, it is essential to define assumptions: � nature nature of faults - dictates the type of redundancy that � must be implemented: � space or time; � replication or diversification; � rate rate of faults - influences the amount of redundancy � needed to attain a given dependability; ICSE 2006 SEAMS – May 2006 – 11 Rogério de Lemos
Fault Assumptions Fault Assumptions How a component behaves when it fails: � crash fault being the simplest and most restrictive (or well-defined) type; � Byzantine being the least restrictive; crash crash crash omission timing Byzantine omission timing Byzantine The different types of changes needs to be classified; � behavioural assumptions; ICSE 2006 SEAMS – May 2006 – 12 Rogério de Lemos
State of the Art State of the Art Adaptive fault tolerance Adaptive fault tolerance � property that enables a system to maintain and improve fault tolerance by adapting to changes in environment and policy; � monitor the system; � reconfigure the application when its configuration of it is not appropriate for the dependability requirements; � distributed systems: � different layers: middleware / fault tolerance /adaptation; � � consensus problem; ICSE 2006 SEAMS – May 2006 – 13 Rogério de Lemos
State of the Art State of the Art � AQuA – CORBA based operating system; � dynamic replication of objects; � Proteus: dynamic fault tolerance through adaptive reconfiguration; � allows to specify the degree of dependability at the application � level; ICSE 2006 SEAMS – May 2006 – 14 Rogério de Lemos
State of the Art State of the Art � Chameleon - adaptive infrastructure; � allows different levels of availability requirements; � explicit representation of adaptive policies; � provides dependability through the use of ARMORs (Adaptive, Reconfigurable, and Mobile Objects for Reliability): managers for monitoring and recovering resources; � daemons for providing communication; � common ARMORs for providing application-required � dependability; � enables multiple fault tolerance strategies to co-exist; ICSE 2006 SEAMS – May 2006 – 15 Rogério de Lemos
State of the Art State of the Art Architectural fault tolerance Architectural fault tolerance � Error detection and recovery; � techniques based on exception-handling; application dependent; � iC2C and iFTE; � � Fault handling � system reconfiguration; replacement of components, connectors and configurations; � � dynamic reconfiguration; ICSE 2006 SEAMS – May 2006 – 16 Rogério de Lemos
State of the Art State of the Art Bio- -inspired computing inspired computing and statistical methods statistical methods : Bio � data-oriented approaches � data mining large quantities of observations for identifying patterns; � anomaly (fault and intrusion) detection; � neural networks, genetic algorithms, etc.; � adaptive error detection using artificial immune systems: problem: how to learn from rare events! � � statistical learning techniques (SLT) applied to system recovery; ICSE 2006 SEAMS – May 2006 – 17 Rogério de Lemos
Conclusions Conclusions � Changes are like faults, though: � they might be desired/undesired and expected/unexpected; � Classification of the types of changes: � otherwise becomes application dependent; e.g., exception handling for the support of fault tolerance; � � How system structuring affects adaptability? � is software that flexible for supporting run-time change? impact of design-time change; � � to scope the impact of change; confinement of the consequence of change; � ICSE 2006 SEAMS – May 2006 – 18 Rogério de Lemos
The Challenge of Scale (Reprised) Fault Tolerance, Scaling and Adaptability
Unit OS10: Fault Tolerance Windows Operating System Internals - by David A.
Lecture 10: Fault Tolerance Fault Tolerant Concurrent Computing The main
Fault Tolerance in Message Passing Fault Tolerance in Message Passing and in
General Principles of Fault- Tolerance Daniel Gottesman Perimeter Institute
Rigorous fault-tolerance thresholds Ben Reichardt UC Berkeley N gate
Distributed Systems 5. Fault Tolerant Systems Fault-Tolerance - 1 Lszl
1 Tr n H i Anh Distributed System CH NG 8: FAULT TOLERANCE TS. Tr n H i
Challenging Malicious Inputs with Fault Tolerance Techniques Bruno Luiz
CSci 5105 Introduction to Distributed Systems Fault Tolerance Last Time
ENSURING QUALITY CARE HEART DISEASE September 2019 Safety, Oversight and
July 2018 Comprehensive Review of Regulations & Interpretive Guidance for
10 th SOW Town Hall Meeting Office of Clinical Standards and Quality Centers
This podcast was developed by Katie Girgulis, with the help of Dr. Karen
Safe Minimum RN Staffing Standards: Improve Quality of Care and Protect
NHS Croydon CCG Annual General Meeting Thursday 28 September 2017 A very
MODELS for ADAPTABILITY Paola Inverardi Software Engineering and Architecture
Building the Adaptable Intelligent World Amit Gupta Vice President Software
Improving Adaptability of Multi-Mode Systems via Program Steering Lee Lin
Design and Run-Time Quality of Service Management Techniques for Publish/
Transfer Adversarial Training: A General Approach to Adapting Deep Classifiers
Building Adaptive and Agile Applications Using Intrusion Detection and