An Overview of Human Error Drawn f rom J . Reason, Human Error , - PowerPoint PPT Presentation

An Overview of Human Error Drawn f rom J . Reason, Human Error , Cambridge, 1990 Aaron Brown CS 294- 4 ROC Seminar

Outline • Human error and computer system f ailures • A theory of human error • Human error and accident theory • Addressing human error Slide 2

Dependability and human error • I ndustry data shows that human error is the largest contributor to reduced dependability – HP HA labs: human error is # 1 cause of f ailures (2001) – Oracle: half of DB f ailures due t o human error (1999) – Gray/ Tandem: 42% of f ailures f rom human administ rat or errors (1986) – Murphy/ Gent st udy of VAX syst ems (1993): % of Syst em Crashes Causes of system crashes Ot her 100% 90% 18% Syst em 80% 70% management 60% 53% 50% Sof t ware 40% 30% f ailure 20% 18% 10% 10% Hardware 0% Time (1985-1993) 1985 1993 f ailure Slide 3

Learning f rom other f ields: PSTN • FCC- collected data on outages in the US public- switched telephone network – met ric: breakdown of cust omer calls blocked by syst em out ages (excluding nat ural disast ers). J an-J une 2001 Human error account s f or 9% 56% of all blocked calls 56% 22% Human-co. Human-ext. 5% Hardware Failure Software Failure Overload 47% Vandalism 17% – comparison wit h 1992-4 dat a shows t hat human error is t he only f act or t hat is not improving over t ime Slide 4

Learning f rom other f ields: PSTN • PSTN trends: 1992- 1994 vs. 2001 Minutes (millions of customer minutes/month) Minutes Cause Trend 2001 1992- 94 Human error: 98 176 company Human error: 100 75 ext ernal Hardware 49 49 Sof t ware 15 12 Overload 314 60 Vandalism 5 3 Slide 5

Learning f rom experiments • Human error rates during maintenance of sof tware RAI D system – part icipant s at t empt t o repair RAI D disk f ailures » by replacing broken disk and reconst r uct ing dat a – each part icipant repeat ed t ask several t imes – dat a aggregat ed across 5 part icipant s Error type Windows Solaris Linux Fat al Dat a Loss � �� Unsuccessf ul Repair � Syst em ignored f at al input � User Error – I nt ervent ion Required � �� User Error – User Recovered � �� Total number of trials 35 33 31 Slide 6

Learning f rom experiments • Errors occur despite experience: 3 Windows Solaris Linux Number of errors 2 1 0 1 2 3 4 5 6 7 8 9 Iteration • Training and f amiliarity don’t eliminate errors – t ypes of errors change: mist akes vs. slips/ lapses • System design af f ects error- susceptibilit y Slide 7

A theory of human error (dist illed f rom J . Reason, Human Error, 1990) • Preliminaries: the three stages of cognitive processing f or tasks 1) planning » a goal is ident if ied and a sequence of act ions is select ed t o reach t he goal 2) st orage » t he select ed plan is st ored in memor y unt il it is appropriat e t o carr y it out 3) execut ion » t he plan is implement ed by t he pr ocess of car rying out t he act ions specif ied by t he plan Slide 9

A theory of human error (2) • Each cognit ive st age has an associated f orm of error – slips: execut ion st age » incorrect execut ion of a planned act ion » example: miskeyed command – lapses: st orage st age » incor rect omission of a st ored, planned act ion » examples: skipping a st ep on a checklist , f orget t ing t o rest ore nor mal valve set t ings af t er maint enance – mistakes: planning st age » t he plan is not suit able f or achieving t he desired goal » example: TMI operat ors premat urely disabling HPI pumps Slide 10

Origins of error: the GEMS model • GEMS: Generic Error- Modeling System – an at t empt t o underst and t he origins of human error • GEMS identif ies three levels of cognitive task processing – skill- based: f amiliar , aut omat ic procedural t asks » usually low-level, like knowing t o t ype “ls” t o list f iles – rule- based: t asks approached by pat t ern-mat ching f rom a set of int ernal problem-solving rules » “obser ved sympt oms X mean syst em is in st at e Y” » “if syst em st at e is Y, I should pr obably do Z t o f ix it ” – knowledge- based: t asks approached by reasoning f rom f irst principles » when rules and experience don’t apply Slide 11

GEMS and errors • Errors can occur at each level – skill- based: slips and lapses » usually errors of inat t ent ion or misplaced at t ent ion – rule- based: mist akes » usually a result of picking an inappropriat e rule » caused by misconst r ued view of st at e, over-zealous pat t ern mat ching, f requency gambling, def icient r ules – knowledge- based: mist akes » due t o incomplet e/ inaccurat e underst anding of syst em, conf irmat ion bias, over conf idence, cognit ive st rain, ... • Errors can result f rom operating at wrong level – humans are reluct ant t o move f rom RB t o KB level even if rules aren’t working Slide 12

Error f requencies • I n raw f requencies, SB >> RB > KB – 61% of errors are at skill-based level – 27% of errors are at rule-based level – 11% of errors are at knowledge-based level • But if we look at opportunit ies f or error, the order reverses – humans perf orm vast ly more SB t asks t han RB, and vast ly more RB t han KB » so a given KB t ask is more likely t o result in err or t han a given RB or SB t ask Slide 13

Error detection and correction • Basic detection mechanism is self - monitoring – periodic at t ent ional checks, measurement of progress t oward goal, discovery of surprise inconsist encies, ... • Ef f ectiveness of self - detection of errors – SB errors: 75-95% det ect ed, avg 86% » but some lapse-t ype er ror s were r esist ant t o det ect ion – RB errors: 50-90% det ect ed, avg 73% – KB errors: 50-80% det ect ed, avg 70% • I ncluding correction tells a dif f erent story: – SB: ~70% of all errors det ect ed and correct ed – RB: ~50% det ect ed and correct ed – KB: ~25% det ect ed and correct ed Slide 14

Human error and accident theory • Major systems accidents (“normal accidents”) start with an accumulation of latent errors – most of t hose lat ent errors are human errors » lat ent slips/ lapses, par t icularly in maint enance • example: misconf igured valves in TMI » lat ent mist akes in syst em design, organizat ion, and planning, part icularly of emergency pr ocedures • example: f lowchart s t hat omit unf oreseen pat hs – invisible lat ent errors change syst em realit y wit hout alt ering operat or’s models » seemingly-cor rect act ions can t hen t rigger accident s Slide 16

Accident theory (2) • Accident s are exacerbated by human errors made during operator response – RB errors made due t o lack of experience wit h syst em in f ailure st at es » t raining is r arely suf f icient t o develop a r ule base t hat capt ures syst em response out side of nor mal bounds – KB reasoning is hindered by syst em complexit y and cognit ive st rain » syst em complexit y prohibit s ment al modeling » st ress of an emergency encourages RB appr oaches and diminishes KB ef f ect iveness – syst em visibilit y limit ed by aut omat ion and “def ense in dept h” » result s in improper rule choices and KB reasoning Slide 17

Outline • Human error and computer system f ailures • A theory of human error • Human error and accident theory • Addressing human error – general guidelines – t he ROC approach: syst em-level undo Slide 18

Addressing human error • Challenges – humans are inherent ly f allible and errors are inevit able – hard-t o-det ect lat ent errors can be more t roublesome t han f ront -line errors – human psychology must not be ignored » especially t he SB/ RB/ KB dist inct ion and human behavior at each level • General approach: error- tolerance rather than error- avoidance “I t is now widely held among human reliabilit y specialist s t hat t he most pr oduct ive st rat egy f or dealing wit h act ive err or s is t o f ocus upon cont r olling t heir consequences rat her t han upon st riving f or t heir eliminat ion.” (Reason, p. 246) Slide 19

The Automation I rony • Automation is not the cure f or human error – aut omat ion addresses t he easy SB/ RB t asks, leaving t he complex KB t asks f or t he human » humans are ill-suit ed t o KB t asks, especially under st ress – aut omat ion hinders underst anding and ment al modeling » decreases syst em visibilit y and incr eases complexit y » operat or s don’t get hands-on cont r ol experience » rule-set f or RB t asks and models f or KB t asks are weak – aut omat ion shif t s t he error source f rom operat or errors t o design errors » harder t o det ect / t olerat e/ f ix design errors Slide 20

An Overview of Human Error Drawn f rom J . Reason, Human Error , - PowerPoint PPT Presentation

An Overview of Human Error Drawn f rom J . Reason, Human Error , Cambridge, 1990 Aaron Brown CS 294- 4 ROC Seminar Outline Human error and computer system f ailures A theory of human error Human error and accident theory

Human Error and Human Error Identification Techniques adapted from an IE 545 presentaton by

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

Was it operator error or human error? Commodore David Squire, CBE, FNI, FCMI Editor, Alert! The

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits

Questions From Chapter 1 Figure 1.1: Testing life cycle Ch 12 Error vocabulary 1

Error Detection Codes Error Detection Two types Nave scheme Error Detection Codes

llvm::Error Rich Error Handling in LLVM Error Handling History LLVMs APIs historically

Error Handling in RCMS Error Handling in RCMS An Overview Francesco Lelli

The Capability of Error Correction for Burst-noise Channels Using Error Estimating Code Yaoyu

Diagnostic Error Human Expertise and Cognitive Biases Diagnostic Error A recent article by

Natural and Flexible Error Recovery for Generated Parsers Maartje de Jonge Emma Nilsson-Nyman

10/4/18 What is a medication error? A medication error is defined by the Nation Coordinating

QEC11 Quantum Error Correction and Quantum Error-Correcting Codes Todd A. Brun Center for

Introduction to Machine Learning Evaluation: Training Error compstat-lmu.github.io/lecture_i2ml

Lecture 9: Wireless link layer: Lecture 9: Wireless link layer: error control and wrap-up error

SOFTWARE ENGINEERING FOR CONNECTION SERVICES Pamela Zave AT&T LaboratoriesResearch

Taiwan SIP/ENUM trial status TWNIC 2005/ 2/ 23 Content ! SIP/ Enum trial project ! SIP/ ENUM

Fixedmobile convergence IntroducSon Historical overview Dirk Breuer Deutsche Telekom

T-79.159 Cryptography and Data Security Lecture 10: 10.1 Random number generation Kaufman et

Switching Packet Switching Comparison ITS323: Introduction to Data Communications CSS331:

S-38.121 Routing in Telecommunication Networks Prof. Raimo Kantola raimo.kantola@hut.fi, Tel.

Wireless Communication Systems @CS.NCTU Lecture 10: H.263 and H.263+ Instructor: Kate Ching-Ju

Updates in the Management of Asthma and COPD Soraya Azari, MD Associate Clinical Professor of

An Overview of Human Error Drawn f rom J . Reason, Human Error , - PowerPoint PPT Presentation

An Overview of Human Error Drawn f rom J . Reason, Human Error , Cambridge, 1990 Aaron Brown CS 294- 4 ROC Seminar Outline Human error and computer system f ailures A theory of human error Human error and accident theory

Human Error and Human Error Identification Techniques adapted from an IE 545 presentaton by

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

Was it operator error or human error? Commodore David Squire, CBE, FNI, FCMI Editor, Alert! The

ERROR DETECTON &amp; CORRECTION Error Detection EDC= Error Detection and Correction bits

Questions From Chapter 1 Figure 1.1: Testing life cycle Ch 12 Error vocabulary 1

Error Detection Codes Error Detection Two types Nave scheme Error Detection Codes

llvm::Error Rich Error Handling in LLVM Error Handling History LLVMs APIs historically

Error Handling in RCMS Error Handling in RCMS An Overview Francesco Lelli

The Capability of Error Correction for Burst-noise Channels Using Error Estimating Code Yaoyu

Diagnostic Error Human Expertise and Cognitive Biases Diagnostic Error A recent article by

Natural and Flexible Error Recovery for Generated Parsers Maartje de Jonge Emma Nilsson-Nyman

10/4/18 What is a medication error? A medication error is defined by the Nation Coordinating

QEC11 Quantum Error Correction and Quantum Error-Correcting Codes Todd A. Brun Center for

Introduction to Machine Learning Evaluation: Training Error compstat-lmu.github.io/lecture_i2ml

Lecture 9: Wireless link layer: Lecture 9: Wireless link layer: error control and wrap-up error

SOFTWARE ENGINEERING FOR CONNECTION SERVICES Pamela Zave AT&amp;T LaboratoriesResearch

Taiwan SIP/ENUM trial status TWNIC 2005/ 2/ 23 Content ! SIP/ Enum trial project ! SIP/ ENUM

Fixedmobile convergence IntroducSon Historical overview Dirk Breuer Deutsche Telekom

T-79.159 Cryptography and Data Security Lecture 10: 10.1 Random number generation Kaufman et

Switching Packet Switching Comparison ITS323: Introduction to Data Communications CSS331:

S-38.121 Routing in Telecommunication Networks Prof. Raimo Kantola raimo.kantola@hut.fi, Tel.

Wireless Communication Systems @CS.NCTU Lecture 10: H.263 and H.263+ Instructor: Kate Ching-Ju

Updates in the Management of Asthma and COPD Soraya Azari, MD Associate Clinical Professor of

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits

SOFTWARE ENGINEERING FOR CONNECTION SERVICES Pamela Zave AT&T LaboratoriesResearch