a culture of failure mathias meyer, @roidrage travis-ci.org / - - PowerPoint PPT Presentation

a culture of failure
SMART_READER_LITE
LIVE PREVIEW

a culture of failure mathias meyer, @roidrage travis-ci.org / - - PowerPoint PPT Presentation

a culture of failure mathias meyer, @roidrage travis-ci.org / travis-ci.com failure risk 28 january 1986 sts-51-l 73 seconds there is no root cause "What you call root cause is simply the place where you stop looking any further"


slide-1
SLIDE 1

a culture of failure

mathias meyer, @roidrage

slide-2
SLIDE 2

travis-ci.org / travis-ci.com

slide-3
SLIDE 3

failure

slide-4
SLIDE 4
slide-5
SLIDE 5

risk

slide-6
SLIDE 6

28 january 1986

slide-7
SLIDE 7

sts-51-l

slide-8
SLIDE 8
slide-9
SLIDE 9

73 seconds

slide-10
SLIDE 10
slide-11
SLIDE 11

there is no root cause

"What you call root cause is simply the place where you stop looking any further" -- Sidney Dekker "Overt catastrophic failure occurs when small, apparently innocuous failures join to create opportunity for a systemic accident. Each of these small failures is necessary to cause catastrophe but only the combination is suffjcient to permit failure." -- Richard Cook

slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17

normalization of deviance

slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20

practical drift

slide-21
SLIDE 21
slide-22
SLIDE 22

unknown unknowns

slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25

things we do not know we don't know

slide-26
SLIDE 26

risks we do not know we don't know

slide-27
SLIDE 27
slide-28
SLIDE 28

28 november 2007

slide-29
SLIDE 29

http://xrscorp.com/blog/industry-news/unsafe-driving-basic/

slide-30
SLIDE 30

http://www.datacenterknowledge.com/archives/2012/01/06/rackspace-cloud-will-expand-in-dallas/

slide-31
SLIDE 31

redundancy

slide-32
SLIDE 32

redudant redundancy

slide-33
SLIDE 33
  • ptimize for

mean time to recovery

slide-34
SLIDE 34

risk

slide-35
SLIDE 35

acceptable risk

slide-36
SLIDE 36

safety $$ workload

slide-37
SLIDE 37

cross boundaries

slide-38
SLIDE 38

http://www.flickr.com/photos/rossbelmont/8014054698/

slide-39
SLIDE 39
slide-40
SLIDE 40

verify extreme risks

Bring down your site, kill your servers.

slide-41
SLIDE 41

verify assumptions

slide-42
SLIDE 42
slide-43
SLIDE 43

http://www.bloomberg.com/news/2012-01-13/with-safety-alcoa-showed-mettle-part-3-commentary-by-bratton-and-tumin.html

slide-44
SLIDE 44
slide-45
SLIDE 45

learn from failure

slide-46
SLIDE 46

transform your company

slide-47
SLIDE 47

resilience

slide-48
SLIDE 48

thank you