an overview of human error
play

An Overview of Human Error Drawn f rom J . Reason, Human Error , - PowerPoint PPT Presentation

An Overview of Human Error Drawn f rom J . Reason, Human Error , Cambridge, 1990 Aaron Brown CS 294- 4 ROC Seminar Outline Human error and computer system f ailures A theory of human error Human error and accident theory


  1. An Overview of Human Error Drawn f rom J . Reason, Human Error , Cambridge, 1990 Aaron Brown CS 294- 4 ROC Seminar

  2. Outline • Human error and computer system f ailures • A theory of human error • Human error and accident theory • Addressing human error Slide 2

  3. Dependability and human error • I ndustry data shows that human error is the largest contributor to reduced dependability – HP HA labs: human error is # 1 cause of f ailures (2001) – Oracle: half of DB f ailures due t o human error (1999) – Gray/ Tandem: 42% of f ailures f rom human administ rat or errors (1986) – Murphy/ Gent st udy of VAX syst ems (1993): % of Syst em Crashes Causes of system crashes Ot her 100% 90% 18% Syst em 80% 70% management 60% 53% 50% Sof t ware 40% 30% f ailure 20% 18% 10% 10% Hardware 0% Time (1985-1993) 1985 1993 f ailure Slide 3

  4. Learning f rom other f ields: PSTN • FCC- collected data on outages in the US public- switched telephone network – met ric: breakdown of cust omer calls blocked by syst em out ages (excluding nat ural disast ers). J an-J une 2001 Human error account s f or 9% 56% of all blocked calls 56% 22% Human-co. Human-ext. 5% Hardware Failure Software Failure Overload 47% Vandalism 17% – comparison wit h 1992-4 dat a shows t hat human error is t he only f act or t hat is not improving over t ime Slide 4

  5. Learning f rom other f ields: PSTN • PSTN trends: 1992- 1994 vs. 2001 Minutes (millions of customer minutes/month) Minutes Cause Trend 2001 1992- 94 Human error: 98 176 company Human error: 100 75 ext ernal Hardware 49 49 Sof t ware 15 12 Overload 314 60 Vandalism 5 3 Slide 5

  6. Learning f rom experiments • Human error rates during maintenance of sof tware RAI D system – part icipant s at t empt t o repair RAI D disk f ailures » by replacing broken disk and reconst r uct ing dat a – each part icipant repeat ed t ask several t imes – dat a aggregat ed across 5 part icipant s Error type Windows Solaris Linux Fat al Dat a Loss � �� Unsuccessf ul Repair � Syst em ignored f at al input � User Error – I nt ervent ion Required � �� � User Error – User Recovered � ���� �� Total number of trials 35 33 31 Slide 6

  7. Learning f rom experiments • Errors occur despite experience: 3 Windows Solaris Linux Number of errors 2 1 0 1 2 3 4 5 6 7 8 9 Iteration • Training and f amiliarity don’t eliminate errors – t ypes of errors change: mist akes vs. slips/ lapses • System design af f ects error- susceptibilit y Slide 7

  8. Outline • Human error and computer system f ailures • A theory of human error • Human error and accident theory • Addressing human error Slide 8

  9. A theory of human error (dist illed f rom J . Reason, Human Error, 1990) • Preliminaries: the three stages of cognitive processing f or tasks 1) planning » a goal is ident if ied and a sequence of act ions is select ed t o reach t he goal 2) st orage » t he select ed plan is st ored in memor y unt il it is appropriat e t o carr y it out 3) execut ion » t he plan is implement ed by t he pr ocess of car rying out t he act ions specif ied by t he plan Slide 9

  10. A theory of human error (2) • Each cognit ive st age has an associated f orm of error – slips: execut ion st age » incorrect execut ion of a planned act ion » example: miskeyed command – lapses: st orage st age » incor rect omission of a st ored, planned act ion » examples: skipping a st ep on a checklist , f orget t ing t o rest ore nor mal valve set t ings af t er maint enance – mistakes: planning st age » t he plan is not suit able f or achieving t he desired goal » example: TMI operat ors premat urely disabling HPI pumps Slide 10

  11. Origins of error: the GEMS model • GEMS: Generic Error- Modeling System – an at t empt t o underst and t he origins of human error • GEMS identif ies three levels of cognitive task processing – skill- based: f amiliar , aut omat ic procedural t asks » usually low-level, like knowing t o t ype “ls” t o list f iles – rule- based: t asks approached by pat t ern-mat ching f rom a set of int ernal problem-solving rules » “obser ved sympt oms X mean syst em is in st at e Y” » “if syst em st at e is Y, I should pr obably do Z t o f ix it ” – knowledge- based: t asks approached by reasoning f rom f irst principles » when rules and experience don’t apply Slide 11

  12. GEMS and errors • Errors can occur at each level – skill- based: slips and lapses » usually errors of inat t ent ion or misplaced at t ent ion – rule- based: mist akes » usually a result of picking an inappropriat e rule » caused by misconst r ued view of st at e, over-zealous pat t ern mat ching, f requency gambling, def icient r ules – knowledge- based: mist akes » due t o incomplet e/ inaccurat e underst anding of syst em, conf irmat ion bias, over conf idence, cognit ive st rain, ... • Errors can result f rom operating at wrong level – humans are reluct ant t o move f rom RB t o KB level even if rules aren’t working Slide 12

  13. Error f requencies • I n raw f requencies, SB >> RB > KB – 61% of errors are at skill-based level – 27% of errors are at rule-based level – 11% of errors are at knowledge-based level • But if we look at opportunit ies f or error, the order reverses – humans perf orm vast ly more SB t asks t han RB, and vast ly more RB t han KB » so a given KB t ask is more likely t o result in err or t han a given RB or SB t ask Slide 13

  14. Error detection and correction • Basic detection mechanism is self - monitoring – periodic at t ent ional checks, measurement of progress t oward goal, discovery of surprise inconsist encies, ... • Ef f ectiveness of self - detection of errors – SB errors: 75-95% det ect ed, avg 86% » but some lapse-t ype er ror s were r esist ant t o det ect ion – RB errors: 50-90% det ect ed, avg 73% – KB errors: 50-80% det ect ed, avg 70% • I ncluding correction tells a dif f erent story: – SB: ~70% of all errors det ect ed and correct ed – RB: ~50% det ect ed and correct ed – KB: ~25% det ect ed and correct ed Slide 14

  15. Outline • Human error and computer system f ailures • A theory of human error • Human error and accident theory • Addressing human error Slide 15

  16. Human error and accident theory • Major systems accidents (“normal accidents”) start with an accumulation of latent errors – most of t hose lat ent errors are human errors » lat ent slips/ lapses, par t icularly in maint enance • example: misconf igured valves in TMI » lat ent mist akes in syst em design, organizat ion, and planning, part icularly of emergency pr ocedures • example: f lowchart s t hat omit unf oreseen pat hs – invisible lat ent errors change syst em realit y wit hout alt ering operat or’s models » seemingly-cor rect act ions can t hen t rigger accident s Slide 16

  17. Accident theory (2) • Accident s are exacerbated by human errors made during operator response – RB errors made due t o lack of experience wit h syst em in f ailure st at es » t raining is r arely suf f icient t o develop a r ule base t hat capt ures syst em response out side of nor mal bounds – KB reasoning is hindered by syst em complexit y and cognit ive st rain » syst em complexit y prohibit s ment al modeling » st ress of an emergency encourages RB appr oaches and diminishes KB ef f ect iveness – syst em visibilit y limit ed by aut omat ion and “def ense in dept h” » result s in improper rule choices and KB reasoning Slide 17

  18. Outline • Human error and computer system f ailures • A theory of human error • Human error and accident theory • Addressing human error – general guidelines – t he ROC approach: syst em-level undo Slide 18

  19. Addressing human error • Challenges – humans are inherent ly f allible and errors are inevit able – hard-t o-det ect lat ent errors can be more t roublesome t han f ront -line errors – human psychology must not be ignored » especially t he SB/ RB/ KB dist inct ion and human behavior at each level • General approach: error- tolerance rather than error- avoidance “I t is now widely held among human reliabilit y specialist s t hat t he most pr oduct ive st rat egy f or dealing wit h act ive err or s is t o f ocus upon cont r olling t heir consequences rat her t han upon st riving f or t heir eliminat ion.” (Reason, p. 246) Slide 19

  20. The Automation I rony • Automation is not the cure f or human error – aut omat ion addresses t he easy SB/ RB t asks, leaving t he complex KB t asks f or t he human » humans are ill-suit ed t o KB t asks, especially under st ress – aut omat ion hinders underst anding and ment al modeling » decreases syst em visibilit y and incr eases complexit y » operat or s don’t get hands-on cont r ol experience » rule-set f or RB t asks and models f or KB t asks are weak – aut omat ion shif t s t he error source f rom operat or errors t o design errors » harder t o det ect / t olerat e/ f ix design errors Slide 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend