overview
play

Overview Motivation ECE 753: FAULT-TOLERANT About the Course and - PDF document

1/20/2014 Overview Motivation ECE 753: FAULT-TOLERANT About the Course and the Instructor Conduct, Outline, Coursepack COMPUTING Introduction Terminology and definitions Sources, Overview and Comments Sources


  1. 1/20/2014 Overview • Motivation ECE 753: FAULT-TOLERANT • About the Course and the Instructor – Conduct, Outline, Coursepack COMPUTING • Introduction • Terminology and definitions – Sources, Overview and Comments Sources Overview and Comments Kewal K.Saluja K l K S l j – System defined Department of Electrical and Computer Engineering • Dependability/Security and their attributes • Threat to dependability and modeling FEF chain • Means to attain dependability Motivation and Introduction • Fundamental Principles Lecture Set 1 ECE 753 Fault Tolerant Computing 2 Motivation Motivation • Informal Definition • What is Fault-Tolerance? • Key Attributes • Who, What and Why Study A “fault-tolerant system” is one that continues to perform at desired level of • Examples service in spite of failures in some components that constitute the system. ECE 753 Fault Tolerant Computing 3 ECE 753 Fault Tolerant Computing 4 Motivation (contd.) Motivation (contd.) • Who is concerned about fault-tolerance? • Key attributes – System Users – irrespective of the application but some are a lot more concerned than others • Who is concerned at design stages? Fault - Error - Failure – Universities Universities Performance - Availability - Reliability • R, d, and a (Research, development, applications) – Industry More recently concept of “survivability” • r, D, and A (research, Development, Applications) Inclusions of these constraints at design • Issues stage is likely to be more cost effective. – Design, Analysis/Validation, Implementation, Testing/Validation, Evaluation ECE 753 Fault Tolerant Computing 5 ECE 753 Fault Tolerant Computing 6 1

  2. 1/20/2014 Motivation (contd.) Motivation (contd.) Examples Examples • General Purpose Systems • Reliable Systems – PCs: RAMs with parity checks and possibly ECC – Telephone systems (consideration of re-execution on failure detection is ( id ti f ti f il d t ti i – Banking systems e.g. ATM being investigated) – Stock market – Workstations/Servers: error detection (HW), – CAE - exams/projects occasional corrective action (SW), Even ECC – Football games - display/ticketing (HW), keeping log (SW) ECE 753 Fault Tolerant Computing 7 ECE 753 Fault Tolerant Computing 8 Motivation (contd.) Motivation (contd.) Examples Examples • Critical and Life Critical Systems • Reliable -> Critical Systems – Manned and unmanned space borne systems – 911 telephone switching system – Aircraft control systems – Traffic light control system – Nuclear reactor control systems – Automotive control systems (ABS, Fuel injection system) – Life support systems ECE 753 Fault Tolerant Computing 9 ECE 753 Fault Tolerant Computing 10 About the Course and the Instructor Introduction • Conduct – Historical perspective and major push – homeworks, exam, project, grading – New initiatives • Outline – Goals of fault-tolerance Goals of fault-tolerance • Coursepack – Applications of fault-tolerance – references and reading list ECE 753 Fault Tolerant Computing 11 ECE 753 Fault Tolerant Computing 12 2

  3. 1/20/2014 Introduction (contd.) Introduction (contd.) • Historical Perspective • New initiatives – not a new concept Density of devices more failures likely – first use by J. van Neumann 1956 Power issue – schedular, on-chip sensors • probabilistic logic and synthesis of reliable organism from Failures due to soft-errors, life time degradations unreliable components, Annals of mathematical studies, p , , - hardening, re-exection, Princeton University Press - on-chip ECC • Major push - erconfiguration – Space program - microarchitectural solutions – HW Fault tolerance - then - architectural solutions – SW Fault tolerance later – Merge the two ECE 753 Fault Tolerant Computing 13 ECE 753 Fault Tolerant Computing 14 Introduction (contd.) Introduction (contd.) • New initiatives (contd.) • Goals - different goals for different Deep submicron technology and time to market pressure applications designs not fully verified The key word is “reliability” – has different meaning Implementation of numerous functionalities on chip/board/system possibility of system hi /b d/ t ibilit f t for different users and applications f diff t d li ti hang-up • Intuitive explanations Speculative execution results may need to be re- – Dependability checked Low cost of HW and SW affordable/ecnomical – Service • Hot issues: Soft errors, Life-time failures, Power – Specification and Thermal Management ECE 753 Fault Tolerant Computing 15 ECE 753 Fault Tolerant Computing 16 Introduction (contd.) Introduction (contd.) • Intuitive concepts • Applications – Reliability – continues to work – Space borne system – Availability – works when I need it • long life system – Safety – does not put me in jeopardy – Airplane control system – Performability P f bili • critical system – Maintainability – Transaction processing system – Testability • high availability system – Survivability – will the system survive – Switching system catastrophic events? • high availability over certain level of performance – Security ECE 753 Fault Tolerant Computing 17 ECE 753 Fault Tolerant Computing 18 3

  4. 1/20/2014 Sources, Overview and Comments (1/4) Terminology and definitions Key reference: • Algirdas Avizienis, Jean-Claude Laprie, Brian Randell, and Carl • Reliability and concept of probability Landwehr, Basic Concepts and Taxonomy of Dependable and Secure Computing, IEEE Transactions on Dependable – R(t): conditional probability that a system provides and Secure Computing, Vol. 1, No. 1, Jan-Mar 2004. continuous proper service in the interval [0,t] given that it provided desired service at time 0. Other references: • Availability • Availability • Israel Koren and C. Mani Krishna, Fault Tolerant Systems, Elsevier, 2007. • D. K. Pradhan, editor, Fault-Tolerant Computer System Design, Prentice- • Performabiltiy Hall, 1996. – An Example • B. W. Johnson, Design and analysis of fault tolerant digital systems, Addison-Wesley, First edition, 1989. • Dependability • My course (Fault-Tolerant Computing) URL: http://homepages.cae.wisc.edu/~ece753/INFO.html • Security ECE 753 Fault Tolerant Computing ECE 753 Fault Tolerant Computing 19 Sources, Overview and Comments (3/4) Sources, Overview and Comments (2/4) • How to read the paper? • What does the paper cover? – It is easy to read – scan it first and then read it – Very basic definitions of the terminologies used in – I have organized the material differently – you may dependable computing find it helpful d t e p u – It categorizes definitions in three groups I i d fi i i i h • What is not covered? • System, attributes of dependability, threats to dependability – One attribute almost missing - survivability – Covers very briefly methods to attain – Basic methods of Fault Tolerance and their dependability characterization ECE 753 Fault Tolerant Computing ECE 753 Fault Tolerant Computing System Defined (1/4) Sources, Overview and Comments (4/4) • “. . . an entity that interacts with other entities” – First entity (system) – limited to be “electronic (mostly • Chronology of Developments digital)” or “computer based” – Need for fault-tolerance - inception of the space program – Second entity (recall “Voyager” launched in 1977 is still sending signals) • Hardware, software, human, other systems, .. (can also be called “environment”) – First standard glossary in 1985 g y • • Characterization and fundamental properties Characterization and fundamental properties – Integration of performance etc into fault tolerance – and – Functionality hence the term “Dependability” – book published in 1992 – Performance – Recognition of “Security” as a basic attribute of – Dependability and security dependability – this paper in 2004 – Cost (usuability, managability, adaptabilty : not directly included in the paper) ECE 753 Fault Tolerant Computing ECE 753 Fault Tolerant Computing 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend