Kolton Andrus (@deelyle) Overview 1. Why is Failure Testing - - PowerPoint PPT Presentation

kolton andrus deelyle overview
SMART_READER_LITE
LIVE PREVIEW

Kolton Andrus (@deelyle) Overview 1. Why is Failure Testing - - PowerPoint PPT Presentation

Kolton Andrus (@deelyle) Overview 1. Why is Failure Testing Important? 2. How did we build Failure as a Service? 3. How has this made our systems more resilient? Why Failure Testing? 1. Makes our systems immune to failure 2. Prevents larger


slide-1
SLIDE 1

Kolton Andrus (@deelyle)

slide-2
SLIDE 2

Overview

  • 1. Why is Failure Testing Important?
  • 2. How did we build Failure as a Service?
  • 3. How has this made our systems more

resilient?

slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5

Why Failure Testing?

  • 1. Makes our systems immune to failure
  • 2. Prevents larger outages
  • 3. Production verification is requisite
slide-6
SLIDE 6
slide-7
SLIDE 7

Failure testing is a form of Hormesis - we imbibe the poison to become immune.

slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10

Validating that our defenses will work when called upon - by exercising them at scale in production.

slide-11
SLIDE 11

Building Failure as a Service

FIT - Failure Injection Testing

slide-12
SLIDE 12
slide-13
SLIDE 13

What about the monkeys?

slide-14
SLIDE 14

The 5 W’s

  • 1. Why
  • 2. Who - Failure Scope
  • 3. Where - Injection Point
  • 4. What - Injected Failure
  • 5. When - Ad-hoc & Automated
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17

Zuul (Proxy) API Critical Critical Service Secondary Secondary Service Cache C* Circuit Breaker Network Calls Injection Points

slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21

“Knowing how the system behaves in the face of failure is invaluable - our assumptions are often incomplete”

slide-22
SLIDE 22
slide-23
SLIDE 23

Zuul (Proxy) API Critical Critical Critical Secondary Secondary Secondary Cache C* Circuit Breaker Network Calls Injected Failure

Failure Metadata FIT Failure Scope Decorated Request

slide-24
SLIDE 24
slide-25
SLIDE 25

Great, does it work?

slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29

Aggressive failure testing creates not just robust programs, but an antifragile programming culture.

slide-30
SLIDE 30

Take Aways

  • 1. Failure Testing is a worthwhile investment
  • 2. Testing in Production is sustainable
  • 3. It can harden your systems against failure

Kolton Andrus (@deelyle)

slide-31
SLIDE 31

Resources

  • Netflix Techblog - FIT
  • “On Designing and Deploying Internet-Scale

Services” - James Hamilton

  • Drift into Failure - Sidney Dekker
  • Antifragile - Nassim Nicholas Taleb
slide-32
SLIDE 32

Photo Credits

  • Nuclear Blast - Mark Waldrep
  • Forest Fire
  • Poison
  • Needle
  • Explosion
  • Robot
slide-33
SLIDE 33

Demo Slides

slide-34
SLIDE 34
slide-35
SLIDE 35
slide-36
SLIDE 36
slide-37
SLIDE 37
slide-38
SLIDE 38