adaptability and fault tolerance adaptability and fault
play

Adaptability and Fault Tolerance Adaptability and Fault Tolerance Rog rio rio de Lemos de Lemos Rog University of Kent, UK University of Kent, UK Context: self-* and dependability; Focus: adaptability and fault tolerance;

0 downloads 1 Views 189 KB Size Report
  1. Adaptability and Fault Tolerance Adaptability and Fault Tolerance Rogé ério rio de Lemos de Lemos Rog University of Kent, UK University of Kent, UK � Context: self-* and dependability; � Focus: adaptability and fault tolerance; � State of the art; � Conclusions; ICSE 2006 SEAMS – May 2006 – 1 Rogério de Lemos

  2. Self- -* and Dependability * and Dependability Self � Dependability: � the ability to deliver service that can justifiably be trusted; � Self-* properties of systems: � the support for autonomy; � self-adaptable, self-managing, self-optimising, self-healing, self-repairing, self-configuring, etc. � Adaptability: � the ability of a system of accommodating changes while providing its specified services; run-time changes; � ICSE 2006 SEAMS – May 2006 – 2 Rogério de Lemos

  3. Dependability Dependability Dependability - the ability to avoid service failures that are Dependability more frequent and more severe than is acceptable; � threats threats - undesired, but in principle expected � circumstances: � faults, errors and failures; � attributes attributes – properties of the system: � � reliability, availability, integrity, confidentiality, and safety; � technologies technologies – methods and techniques for providing � and reach confidence on ability to attain dependability: � rigorous design, validation & verification, fault tolerance, and system evaluation; ICSE 2006 SEAMS – May 2006 – 3 Rogério de Lemos

  4. Dependability - - Threats Threats Dependability adjudged or ( Yves Yves Deswarte Deswarte & David Powell & David Powell ) ) ( hypothesized cause of an error Fault that part of system state which may lead to a failure activation occurs when delivered service deviates from Error Error implementing the system function propagation Failure Failure causation activation Fault Error Fault Error ICSE 2006 SEAMS – May 2006 – 4 Rogério de Lemos

  5. Adaptability - - Initiators Initiators Adaptability � Changes: � the act, process, or result of altering or modifying; � internal changes: component failures, overload of resources, etc. � � external changes: environmental, requirements, etc. � � There is no fundamental chain of adaptability initiators; ICSE 2006 SEAMS – May 2006 – 5 Rogério de Lemos

  6. Threats and Initiators Threats and Initiators � Changes correspond to events (faults): � changes can be dormant if not activated; � What is the consequence of change (errors)? � what would be the equivalent to error free and erroneous states? � these states are created when changes are activated and can remain latent until detected; � What is the equivalent of failure? � unsuccessful adaptation? � the system might continue to provide its services, but ignoring the change; ICSE 2006 SEAMS – May 2006 – 6 Rogério de Lemos

  7. Dependability - - Technologies Technologies Dependability Fault avoidance : build a system with no faults: Fault avoidance � rigorous design – fault prevention; � formal and rigorous notations, processes, adapters, etc. � verification & validation – fault removal; � model checking, fault injection, testing, simulation, etc. Fault acceptance : impossible to rid the system of faults: Fault acceptance � fault tolerance; � system evaluation – fault forecasting; � empirical approaches, Markov models, etc. ICSE 2006 SEAMS – May 2006 – 7 Rogério de Lemos

  8. Fault Tolerance Fault Tolerance Fault tolerance aims at avoiding the failure of the system: Fault tolerance � error detection : � detects the presence of errors; � recovery : � transforms a system state that contains errors or faults into a error free state, or faults that can be re-activated; error handling: � eliminates errors from the system state; � fault handling: � prevents faults from being activated again; � diagnosis, isolation and reconfiguration; � ICSE 2006 SEAMS – May 2006 – 8 Rogério de Lemos

  9. Fault Tolerance Fault Tolerance adjudged or hypothesized ( Yves Yves Deswarte Deswarte & David Powell & David Powell ) ) ( cause of an error Fault that part of system state which may lead to a failure Fault Handling Fault Handling Fault Handling occurs when delivered service deviates from Diagnosis, Isolation, Diagnosis, Isolation, Error Error Diagnosis, Isolation, implementing the Reconfiguration, Reconfiguration, Reconfiguration, system function Reinitialization Reinitialization Reinitialization Error Detection Error Detection Error Handling Error Handling Error Handling Rollback, Rollforward Rollforward, , Rollback, Rollback, Rollforward, Failure Failure Compensation Compensation Compensation ICSE 2006 SEAMS – May 2006 – 9 Rogério de Lemos

  10. System Structure System Structure Fault tolerance is about system structuring; � structure is what enables the system to generate the behaviour; � determines how effectively this structuring can be used to provide means of error confinement error confinement ; � avoid the propagation of errors; � what interactions can exist and at what rate; � it is not restricted to system architecture; Structural flexibility the basis for adaptation; ICSE 2006 SEAMS – May 2006 – 10 Rogério de Lemos

  11. Fault Assumptions Fault Assumptions Faults are undesirable, though expected circumstances: � systems can fail in many different ways; In the design of fault-tolerant systems, it is essential to define assumptions: � nature nature of faults - dictates the type of redundancy that � must be implemented: � space or time; � replication or diversification; � rate rate of faults - influences the amount of redundancy � needed to attain a given dependability; ICSE 2006 SEAMS – May 2006 – 11 Rogério de Lemos

  12. Fault Assumptions Fault Assumptions How a component behaves when it fails: � crash fault being the simplest and most restrictive (or well-defined) type; � Byzantine being the least restrictive; crash crash crash omission timing Byzantine omission timing Byzantine The different types of changes needs to be classified; � behavioural assumptions; ICSE 2006 SEAMS – May 2006 – 12 Rogério de Lemos

  13. State of the Art State of the Art Adaptive fault tolerance Adaptive fault tolerance � property that enables a system to maintain and improve fault tolerance by adapting to changes in environment and policy; � monitor the system; � reconfigure the application when its configuration of it is not appropriate for the dependability requirements; � distributed systems: � different layers: middleware / fault tolerance /adaptation; � � consensus problem; ICSE 2006 SEAMS – May 2006 – 13 Rogério de Lemos

  14. State of the Art State of the Art � AQuA – CORBA based operating system; � dynamic replication of objects; � Proteus: dynamic fault tolerance through adaptive reconfiguration; � allows to specify the degree of dependability at the application � level; ICSE 2006 SEAMS – May 2006 – 14 Rogério de Lemos

  15. State of the Art State of the Art � Chameleon - adaptive infrastructure; � allows different levels of availability requirements; � explicit representation of adaptive policies; � provides dependability through the use of ARMORs (Adaptive, Reconfigurable, and Mobile Objects for Reliability): managers for monitoring and recovering resources; � daemons for providing communication; � common ARMORs for providing application-required � dependability; � enables multiple fault tolerance strategies to co-exist; ICSE 2006 SEAMS – May 2006 – 15 Rogério de Lemos

  16. State of the Art State of the Art Architectural fault tolerance Architectural fault tolerance � Error detection and recovery; � techniques based on exception-handling; application dependent; � iC2C and iFTE; � � Fault handling � system reconfiguration; replacement of components, connectors and configurations; � � dynamic reconfiguration; ICSE 2006 SEAMS – May 2006 – 16 Rogério de Lemos

  17. State of the Art State of the Art Bio- -inspired computing inspired computing and statistical methods statistical methods : Bio � data-oriented approaches � data mining large quantities of observations for identifying patterns; � anomaly (fault and intrusion) detection; � neural networks, genetic algorithms, etc.; � adaptive error detection using artificial immune systems: problem: how to learn from rare events! � � statistical learning techniques (SLT) applied to system recovery; ICSE 2006 SEAMS – May 2006 – 17 Rogério de Lemos

  18. Conclusions Conclusions � Changes are like faults, though: � they might be desired/undesired and expected/unexpected; � Classification of the types of changes: � otherwise becomes application dependent; e.g., exception handling for the support of fault tolerance; � � How system structuring affects adaptability? � is software that flexible for supporting run-time change? impact of design-time change; � � to scope the impact of change; confinement of the consequence of change; � ICSE 2006 SEAMS – May 2006 – 18 Rogério de Lemos

Recommend Documents


the challenge of scale reprised
The Challenge of Scale (Reprised)

The Challenge of Scale (Reprised) Fault Tolerance, Scaling and Adaptability

roadmap for section 10 1
Roadmap for Section 10.1 The Notion of

Unit OS10: Fault Tolerance Windows Operating System Internals - by David A.

lecture 10 fault tolerance fault tolerant concurrent
Lecture 10: Fault Tolerance Fault

Lecture 10: Fault Tolerance Fault Tolerant Concurrent Computing The main

fault tolerance in message passing fault tolerance in
Fault Tolerance in Message Passing

Fault Tolerance in Message Passing Fault Tolerance in Message Passing and in

general principles of fault tolerance
General Principles of Fault- Tolerance

General Principles of Fault- Tolerance Daniel Gottesman Perimeter Institute

rigorous fault tolerance thresholds
Rigorous fault-tolerance thresholds

Rigorous fault-tolerance thresholds Ben Reichardt UC Berkeley N gate

distributed systems
Distributed Systems 5. Fault Tolerant

Distributed Systems 5. Fault Tolerant Systems Fault-Tolerance - 1 Lszl

ch ng 8 fault tolerance
CH NG 8: FAULT TOLERANCE TS. Tr n H

1 Tr n H i Anh Distributed System CH NG 8: FAULT TOLERANCE TS. Tr n H i

challenging malicious inputs with fault tolerance
Challenging Malicious Inputs with

Challenging Malicious Inputs with Fault Tolerance Techniques Bruno Luiz

csci 5105
CSci 5105 Introduction to Distributed

CSci 5105 Introduction to Distributed Systems Fault Tolerance Last Time

ensuring quality care heart disease
ENSURING QUALITY CARE HEART DISEASE

ENSURING QUALITY CARE HEART DISEASE September 2019 Safety, Oversight and

objectives
Objectives 1. Identify the regulatory

July 2018 Comprehensive Review of Regulations & Interpretive Guidance for

10 th sow town hall meeting
10 th SOW Town Hall Meeting Office of

10 th SOW Town Hall Meeting Office of Clinical Standards and Quality Centers

this podcast was developed by katie girgulis with the
This podcast was developed by Katie

This podcast was developed by Katie Girgulis, with the help of Dr. Karen

safe minimum rn staffing standards improve quality of
Safe Minimum RN Staffing Standards:

Safe Minimum RN Staffing Standards: Improve Quality of Care and Protect

nhs croydon ccg
NHS Croydon CCG Annual General Meeting

NHS Croydon CCG Annual General Meeting Thursday 28 September 2017 A very

models for adaptability
MODELS for ADAPTABILITY Paola

MODELS for ADAPTABILITY Paola Inverardi Software Engineering and Architecture

the adaptable
the Adaptable Intelligent World Amit

Building the Adaptable Intelligent World Amit Gupta Vice President Software

improving adaptability of multi mode systems via program
Improving Adaptability of Multi-Mode

Improving Adaptability of Multi-Mode Systems via Program Steering Lee Lin

design and run time quality of service management
Design and Run-Time Quality of Service

Design and Run-Time Quality of Service Management Techniques for Publish/

transfer adversarial training
Transfer Adversarial Training: A

Transfer Adversarial Training: A General Approach to Adapting Deep Classifiers

building adaptive and agile applications using intrusion
Building Adaptive and Agile

Building Adaptive and Agile Applications Using Intrusion Detection and