What can nuclear engineering teach us about software? Todd Lewis - - PowerPoint PPT Presentation

what can nuclear engineering teach us about software
SMART_READER_LITE
LIVE PREVIEW

What can nuclear engineering teach us about software? Todd Lewis - - PowerPoint PPT Presentation

What can nuclear engineering teach us about software? Todd Lewis & Eduardo Bellani tlewis@brickabode.com emb@brickabode.com 24 April 2017 Read every word of Lamport Leslie Lamport (1977): "Proving the Correctness of Multiprocess


slide-1
SLIDE 1

What can nuclear engineering teach us about software?

Todd Lewis & Eduardo Bellani tlewis@brickabode.com emb@brickabode.com 24 April 2017

slide-2
SLIDE 2

Read every word of Lamport

  • Leslie Lamport (1977): "Proving

the Correctness of Multiprocess Programs"

  • This paper is amazing
  • Leslie Lamport is amazing
  • He did "Time, clocks, and the
  • rdering of events in a

distributed system" only a year later

  • (Has there ever been a

computer science decade as great as the 1970s?)

slide-3
SLIDE 3

System properties come in two kinds!

  • Computing is great at

liveness: lots of features!

  • Benefit of features often
  • utweighs cost of failure,

so “Move Fast & Break Things”

  • However, we often do

safety so badly that there is opportunity there; lots

  • f low-hanging fruit

Liveness Safety When Sometimes Always Where Somewhere Everywhere Nature Good thing Bad thing Action Happens Does not happen Means Feature Control

slide-4
SLIDE 4

Let’s talk about saving lives

  • Starting in the 1970s, human

factors analysis started happening in aviation

  • What used to be called “pilot

error” is now recognized as “bad interface design”

  • Hundreds of thousands of

people are alive today who would otherwise be dead because of this advance

slide-5
SLIDE 5

Compare and contrast

slide-6
SLIDE 6

Let’s design a nuclear plant!

  • We are putting a

nuclear plant next to the ocean

  • Your mother lives next

door

  • What failures would

you want the designers to care about?

slide-7
SLIDE 7

Multi-system failures (Oceanic edition)

Bad outcome Cause Control

Multi-system failure Tsunami Put critical infrastructure up high Multi-system failure Corrosion Annual inspections Multi-system failure Flooding Sea wall and drainage Multi-system failure Loss of coolant (biomass clogs pipes) Inspect & clean pipes Multi-system failure Sea-borne attack Sea walls Multi-system failure Erosion kills plant Sea walls Multi-system failure Sedimentation blocks coolant Inspect & dredge

slide-8
SLIDE 8

We can do this systematically

1) What failures matter? (“Bad business outcome” is a useful criterion) 2) For each failure, what can cause it? 3) How do you address each cause?

  • Gives you a finite list of hazards

handled

  • Gives you a clear model to give

to your operators: here are the risks we manage, and how

slide-9
SLIDE 9

Pro tip: Create a Red Team

  • It is psychologically difficult

to look at your own designs critically

  • You need distance in order

to tease out assumptions and blindspots

  • Bring an outsider into your

analysis, and encourage them to ask “dumb questions”

slide-10
SLIDE 10

How to do this

1) Get a few hours of whiteboard time: your team, plus a smart outsider 2) Failures → Causes → Controls 3) Write it up 4) Start sharing it with others: here are some new options to improve our system

slide-11
SLIDE 11

Where to find more

  • Engineering a Better

World, by Nancy Leveson

  • Resilience

Engineering, by Hollnagel, Woods and Leveson

  • Drift into Failure, by

Sidney Dekker

slide-12
SLIDE 12

“Correct, On-Time, On-Budget”

  • Do you like building

systems that work?

  • Are you a Haskell,

ML, or Lisp programmer?

  • Meet us after the talk!
  • jobs@brickabode.com