T owards a resilience pattern language or how to get resilient - - PowerPoint PPT Presentation

t owards a resilience pattern language
SMART_READER_LITE
LIVE PREVIEW

T owards a resilience pattern language or how to get resilient - - PowerPoint PPT Presentation

T owards a resilience pattern language or how to get resilient software design right Uwe Friedrichsen (codecentric AG) Berlin Expert Days Berlin, 16. September 2016 @ufried Uwe Friedrichsen | uwe.friedrichsen@codecentric.de |


slide-1
SLIDE 1

T

  • wards a resilience pattern language
  • r how to get resilient software design right

Uwe Friedrichsen (codecentric AG) – Berlin Expert Days – Berlin, 16. September 2016

slide-2
SLIDE 2

@ufried

Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com

slide-3
SLIDE 3

Previously on “Resilience” …

slide-4
SLIDE 4

Why resilience?

slide-5
SLIDE 5

It‘s all about production!

slide-6
SLIDE 6

Business Production Availability

slide-7
SLIDE 7

Availability ≔ MTTF MTTF + MTTR

MTTF: Mean Time T

  • Failure

MTTR: Mean Time T

  • Recovery
slide-8
SLIDE 8

Traditional stability approach Availability ≔ MTTF MTTF + MTTR

Maximize MTTF

slide-9
SLIDE 9

(Almost) every system is a distributed system

Chas Emerick

slide-10
SLIDE 10

The Eight Fallacies of Distributed Computing

  • 1. The network is reliable
  • 2. Latency is zero
  • 3. Bandwidth is infinite
  • 4. The network is secure
  • 5. T
  • pology doesn't change
  • 6. There is one administrator
  • 7. Transport cost is zero
  • 8. The network is homogeneous

Peter Deutsch

https://blogs.oracle.com/jag/resource/Fallacies.html

slide-11
SLIDE 11

A distributed system is one in which the failure

  • f a computer you didn't even know existed

can render your own computer unusable.

Leslie Lamport

slide-12
SLIDE 12

Failures in todays complex, distributed and interconnected systems are not the exception.

  • They are the normal case
  • They are not predictable
  • They are not avoidable
slide-13
SLIDE 13

Do not try to avoid failures. Embrace them.

slide-14
SLIDE 14

Resilience approach Availability ≔ MTTF MTTF + MTTR

Minimize MTTR

slide-15
SLIDE 15

re resilience (IT) the ability of a system to handle unexpected situations

  • without the user noticing it (best case)
  • with a graceful degradation of service (worst case)
slide-16
SLIDE 16

Do not fall for the “100% available” trap!

slide-17
SLIDE 17

Isolation Latency Control

Fail Fast Circuit Breaker Timeouts Fan out & quickest reply Bounded Queues Shed Load Bulkheads

Loose Coupling

Asynchronous Communication Event-Driven Idempotency Self-Containment Relaxed T emporal Constraints Location Transparency Stateless

Supervision

Monitor Complete Parameter Checking Error Handler Escalation

slide-18
SLIDE 18

… and there is more

  • Recovery & mitigation patterns
  • More supervision patterns
  • Architectural patterns
  • Anti-fragility patterns
  • Fault treatment & prevention patterns

A rich pattern family

slide-19
SLIDE 19

(Title music starts & opening credits shown)

slide-20
SLIDE 20

Let’s complete the picture first …

slide-21
SLIDE 21

Isolation Latency Control Loose Coupling Supervision

slide-22
SLIDE 22

Core

(Architectural)

Detection Treatment Prevention Recovery Mitigation

slide-23
SLIDE 23

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Isolation Loose Coupling Latency Control

Node level

Supervision

System level

slide-24
SLIDE 24

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Isolation Redundancy Communication paradigm Supporting patterns

slide-25
SLIDE 25

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Supporting patterns Communication paradigm Redundancy Isolation

slide-26
SLIDE 26

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Supporting patterns Communication paradigm Redundancy Isolation Bulkhead

slide-27
SLIDE 27

Bulkheads

  • Core isolation pattern (a.k.a. “failure units” or “units of mitigation”)
  • Shaping good bulkheads is extremely hard (pure design issue)
  • Diverse implementation choices available, e.g., µservice, actor, scs, ...
  • Implementation choice impacts system and resilience design a lot
slide-28
SLIDE 28

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Supporting patterns Communication paradigm Redundancy Isolation

slide-29
SLIDE 29

Communication paradigm

  • Request-response <-> messaging <-> events
  • Not a pattern, but heavily influences resilience patterns to be used
  • Also heavily influences functional bulkhead design
  • Very fundamental decision which is often underestimated
slide-30
SLIDE 30

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Supporting patterns Communication paradigm Redundancy Isolation

slide-31
SLIDE 31

Redundancy

  • Core resilience concept
  • Applicable to all failure types
  • Basis for many recovery and mitigation patterns
  • Often different variants implemented in a system
slide-32
SLIDE 32

Failure types

  • Crash failure
  • Omission failure
  • Timing failure
  • Response failure
  • Byzantine failure
slide-33
SLIDE 33

Failure types

  • Crash failure
  • Omission failure
  • Timing failure
  • Response failure
  • Byzantine failure

Usage of redundancy

  • Patterns
  • Failover
  • Schemes
  • Active/Passive
  • Active/Active
  • N+M Redundancy
  • Implementation examples
  • Load balancer + health check


(e.g., HAProxy)

  • Dynamic routing + health check


(e.g., Consul, ZooKeeper)

  • Cluster manager with shared IP


(e.g., Pacemaker & Corosync)

slide-34
SLIDE 34

Failure types

  • Crash failure
  • Omission failure
  • Timing failure
  • Response failure
  • Byzantine failure

Usage of redundancy

  • Patterns
  • Retry (to different replica)
  • Failover
  • Backup Request
  • Schemes
  • Identical replicas
  • Failover schemes (for failover)
  • Implementation examples
  • Client-based routing
  • Load balancer
  • Leaky bucket + dynamic routing
slide-35
SLIDE 35

Failure types

  • Crash failure
  • Omission failure
  • Timing failure
  • Response failure
  • Byzantine failure

Usage of redundancy

  • Patterns
  • Timeout + retry to different replica
  • Timeout + failover
  • Backup Request
  • Schemes
  • Identical replicas
  • Failover schemes (for failover)
  • Implementation examples
  • Client-based routing
  • Load balancer
  • Circuit breaker + dynamic routing
slide-36
SLIDE 36

Failure types

  • Crash failure
  • Omission failure
  • Timing failure
  • Response failure
  • Byzantine failure

Usage of redundancy

  • Patterns
  • Voting
  • Recovery blocks
  • Routine exercise
  • Schemes
  • Identical replicas
  • Different replicas (recovery blocks)
  • Implementation examples
  • Majority based quorum
  • Adaptive weighted sum
  • Synthetic computation
slide-37
SLIDE 37

Failure types

  • Crash failure
  • Omission failure
  • Timing failure
  • Response failure
  • Byzantine failure

Usage of redundancy

  • Patterns
  • Voting
  • Recovery blocks
  • Routine exercise
  • Schemes
  • Identical replicas
  • Different replicas (recovery blocks)
  • Implementation examples
  • n > 3t quorum
  • Adaptive weighted sum
  • Synthetic computation
slide-38
SLIDE 38

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Supporting patterns Communication paradigm Redundancy Isolation Stateless Idempotency Escalation

Structural Behavioral

Zero downtime deployment Location transparency Relaxed temporal constraints

slide-39
SLIDE 39

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Node level Supporting patterns System level

slide-40
SLIDE 40

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Node level Supporting patterns System level Timeout Circuit breaker Complete parameter checking Checksum

slide-41
SLIDE 41

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Node level Supporting patterns System level Monitor Watchdog Heartbeat Acknowledgement

slide-42
SLIDE 42

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Node level Supporting patterns System level Voting Synthetic transaction Leaky bucket Routine checks Health check Fail fast

slide-43
SLIDE 43

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

slide-44
SLIDE 44

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Retry Limit retries

slide-45
SLIDE 45

Retry

  • Very basic recovery pattern
  • Recover from omission or other transient errors
  • Limit retries to minimize extra load on an already loaded resource
  • Limit retries to avoid recurring errors
slide-46
SLIDE 46

Retry example

// doAction returns true if successful, false otherwise boolean doAction(...) { ... } // General pattern boolean success = false int tries = 0; while (!success && (tries < MAX_TRIES)) { success = doAction(...); tries++; } // Alternative one-retry-only variant success = doAction(...) || doAction(...);

slide-47
SLIDE 47

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Retry Limit retries Rollback Checkpoint Safe point

slide-48
SLIDE 48

Rollback

  • Roll back state and/or execution path to a defined safe state
  • Recover from internal errors caused by external failures
  • Use checkpoints and safe points to provide safe rollback points
  • Limit retries to avoid recurring errors
slide-49
SLIDE 49

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Retry Limit retries Rollback Checkpoint Safe point Roll-forward

slide-50
SLIDE 50

Roll-forward

  • Advance execution past the point of error
  • Often used as escalation if retry or rollback do not succeed
  • Not applicable if skipped activity is essential
  • Use checkpoints and safe points to provide safe roll-forward points
slide-51
SLIDE 51

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Retry Limit retries Rollback Roll-forward Checkpoint Safe point Restart Reconnect Data Reset Startup consistency Reset

slide-52
SLIDE 52

Reset

  • Often used as radical escalation if all other measures failed
  • Restart service – do not forget to provide a consistent startup state
  • Reset data to a guaranteed consistent state if nothing else helps
  • Sometimes simply trying to reconnect helps (often forgotten)
slide-53
SLIDE 53

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Retry Limit retries Rollback Restart Roll-forward Reconnect Checkpoint Safe point Data Reset Startup consistency Reset Failover

slide-54
SLIDE 54

Failover

  • Used as escalation if other measures failed or would take too long
  • Requires redundancy – trades resources for availability
  • Many implementation variants available, incl. out-of-the-box solutions
  • Usually implemented as a monitor-dynamic router combination
slide-55
SLIDE 55

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Retry Limit retries Rollback Restart Roll-forward Reconnect Checkpoint Safe point Data Reset Startup consistency Failover Reset Read repair

slide-56
SLIDE 56

Read repair

  • Handle response failures due to relaxed temporal constraints
  • Requires redundancy – trades resources for availability
  • Decides correct state based on conflicting siblings
  • Often implemented in NoSQL databases (but not always accessible)
slide-57
SLIDE 57

Read repair example (Riak, Java) 1/2

public class FooResolver implements ConflictResolver<Foo> { @Override public Foo resolve(List<Foo> siblings) { // Insert your sibling resolution logic here } } public class Buddy { public String name; public Set<String> nicknames; public Buddy(String name, Set<String> nicknames) { this.name = name; this.nicknames = nicknames; } }

slide-58
SLIDE 58

Read repair example (Riak, Java) 2/2

public class BuddyResolver implements ConflictResolver<Buddy> { @Override public Buddy resolve(List<Buddy> siblings) { if (siblings.size == 0) { return null; } else if (siblings.size == 1) { return siblings.get(0); } else { // Name is also used as key. Thus, all siblings have the same name String name = siblings.get(0).name; Set<String> mergedNicknames = new HashSet<String>(); for (Buddy buddy : siblings) { mergedNicknames.addAll(buddy.nicknames); } return new Buddy(name, mergedNicknames); } } }

slide-59
SLIDE 59

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Retry Limit retries Rollback Restart Roll-forward Reconnect Checkpoint Safe point Data Reset Startup consistency Failover Read repair Reset Error handler

slide-60
SLIDE 60

Error Handler

  • Separate business logic and error handling
  • Business logic just focuses on getting the task done
  • Error handler focuses on recovering from errors
  • Easier to maintain – can be extended to structural escalation
slide-61
SLIDE 61

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Retry Limit retries Rollback Restart Roll-forward Reconnect Checkpoint Safe point Data Reset Startup consistency Failover Read repair Error handler Reset

slide-62
SLIDE 62

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

slide-63
SLIDE 63

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Fallback Fail silently Alternative action Default value

slide-64
SLIDE 64

Fallback

  • Execute an alternative action if the original action fails
  • Basis for most mitigation patterns
  • Fail silently – silently ignore the error and continue processing
  • Default value – return a predefined default value if an error occurs
slide-65
SLIDE 65

Fail silently example (Hystrix, Java) 1/2

public class FailSilentlyCommand extends HystrixCommand<String> { private static final String COMMAND_GROUP = "default"; private final boolean preCondition; public FailSilentlyCommand(boolean preCondition) { super(HystrixCommandGroupKey.Factory.asKey(COMMAND_GROUP)); this.preCondition = preCondition; } @Override protected String run() throws Exception { if (!preCondition) throw new RuntimeException((”Action failed")); return ”I am a result"; } @Override protected String getFallback() { return null; // Turn into silent failure } }

slide-66
SLIDE 66

Fail silently example (Hystrix, Java) 2/2

@Test public void shouldSucceed() { FailSilentlyCommand command = new FailSilentlyCommand(true); String s = command.execute(); assertEquals(”I am a result", s); } @Test public void shouldFailSilently() { FailSilentlyCommand command = new FailSilentlyCommand(false); String s = ”Dummy"; try { s = command.execute(); } catch (Exception e) { fail("Did not fail silently"); } assertNull(s); }

slide-67
SLIDE 67

Default value example (Hystrix, Java) 1/2

public class DefaultValueCommand extends HystrixCommand<String> { private static final String COMMAND_GROUP = "default”; private final boolean preCondition; public DefaultValueCommand(boolean preCondition) { super(HystrixCommandGroupKey.Factory.asKey(COMMAND_GROUP)); this.preCondition = preCondition; } @Override protected String run() throws Exception { if (!preCondition) throw new RuntimeException((”Action failed")); return ”I am a smart result"; } @Override protected String getFallback() { return ”I am a default value"; // Return default value if action fails } }

slide-68
SLIDE 68

Default value example (Hystrix, Java) 2/2

@Test public void shouldSucceed() { DefaultValueCommand command = new DefaultValueCommand(true); String s = command.execute(); assertEquals(”I am a smart result", s); } @Test public void shouldProvideDefaultValue () { DefaultValueCommand command = new DefaultValueCommand(false); String s = null; try { s = command.execute(); } catch (Exception e) { fail("Did not return default value"); } assertEquals(”I am a default value", s); }

slide-69
SLIDE 69

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Fallback Fail silently Alternative action Default value Queue for resources Bounded queue Finish work in progress Fresh work before stale

slide-70
SLIDE 70

Queues for resources

  • Protect resource from temporary overload situations
  • Limit queue size to limit latency at longer-lasting overload
  • Finish work in progress – Create pushback on the callers
  • Fresh work before stale – Discard old entries
slide-71
SLIDE 71

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Fallback Fail silently Alternative action Default value Queue for resources Bounded queue Finish work in progress Fresh work before stale Shed load

slide-72
SLIDE 72

Shed Load

  • Use if overload will lead to unacceptable throughput of resource
  • Shed requests in order to keep throughput of resource acceptable
  • Shed load at periphery – Minimize impact on resource itself
  • Usually combined with monitor to watch load of resource
slide-73
SLIDE 73

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Fallback Fail silently Alternative action Default value Shed load Queue for resources Bounded queue Finish work in progress Fresh work before stale Share load

Statically Dynamically
slide-74
SLIDE 74

Share Load

  • Use if overload will lead to unacceptable throughput of resource
  • Share load between (added) resources to keep throughput good
  • Minimize amount of synchronization needed between resources
  • Usually combined with monitor to watch load of resource(s)
slide-75
SLIDE 75

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Fallback Fail silently Alternative action Default value Shed load Share load Queue for resources Bounded queue Finish work in progress Fresh work before stale

Statically Dynamically

Deferrable work

slide-76
SLIDE 76

Deferrable work

  • Maximize resources for online request processing under high load
  • Pause or slow down routine and batch jobs
  • Provide a means to pause routine and batch jobs from outside
  • Alternatively use a scheduler with dynamic resource allocation
slide-77
SLIDE 77

Deferrable work example 1/2

// Do or wait variant <init batch> while(<more to process>) { int load = getLoad(); if (load > THRESHOLD) { waitFixedDuration(); } else { <process next batch of work> } } void waitFixedDuration() { Thread.sleep(DELAY); // try-catch left out for better readability }

slide-78
SLIDE 78

Deferrable work example 2/2

// Adaptive load variant <init batch> while(<more to process>) { waitLoadBased(); <process next batch of work> } void waitLoadBased() { int load = getLoad(); long delay = calcDelay(load); Thread.sleep(delay); // try-catch left out for better readability } long calcDelay(int load) { // Simple example implementation if (load < THRESHOLD) { return 0L; } return (load – THRESHOLD) * DELAY_FACTOR; }

slide-79
SLIDE 79

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Fallback Fail silently Alternative action Default value Shed load Share load Queue for resources Bounded queue Finish work in progress Fresh work before stale Marked data

Statically Dynamically

Deferrable work

slide-80
SLIDE 80

Marked data

  • Avoid repeated and/or spreading errors due to erroneous data
  • Use if time or information to correct data immediately is missing
  • Mark data as being erroneous – check flag before processing data
  • Use routine maintenance job to correct data
slide-81
SLIDE 81

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Fallback Fail silently Alternative action Default value Shed load Share load Marked data Queue for resources Bounded queue Finish work in progress Fresh work before stale

Statically Dynamically

Deferrable work

slide-82
SLIDE 82

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

slide-83
SLIDE 83

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Let sleeping dogs lie Small releases Hot deployments

slide-84
SLIDE 84

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

slide-85
SLIDE 85

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Routine maintenance Anti-entropy

slide-86
SLIDE 86

Routine maintenance

  • Reduce system entropy – keep preventable errors from occurring
  • Especially important if errors were only mitigated, not corrected
  • Check system periodically and fix detected faults and errors
  • Balance benefits, costs and additional system load
slide-87
SLIDE 87

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Routine maintenance Spread the news Anti-entropy

slide-88
SLIDE 88

Spread the news

  • Pro-actively spread information about changes in system state
  • Use a gossip or epidemic protocol for robustness and efficiency
  • Can also be used for data reconciliation
  • Balance benefits, costs and additional network load
slide-89
SLIDE 89

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Routine maintenance Backup request Spread the news Anti-entropy

slide-90
SLIDE 90

Backup request

  • Send request to multiple workers (optionally a bit offset)
  • Use quickest reply and discard all other responses
  • Prevents latent responses (or at least reduces probability)
  • Requires redundancy – trades resources for availability
slide-91
SLIDE 91

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Routine maintenance Backup request Anti-fragility Diversity Jitter Spread the news Anti-entropy

slide-92
SLIDE 92

Anti-fragility

  • Avoid fragility caused by homogenization and standardization
  • Protect against disastrous failures by using diverse solutions
  • Protect against cumulating effects by introducing jitter
  • Balance risks, benefits and added costs and efforts carefully
slide-93
SLIDE 93

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Routine maintenance Backup request Anti-fragility Diversity Jitter Error injection Spread the news Anti-entropy

slide-94
SLIDE 94

Error injection

  • Make resilient software design sustainable
  • Inject errors at runtime and observe how the system reacts
  • Can also be used to detect yet unknown failure modes
  • Make sure to inject errors of all types
slide-95
SLIDE 95
  • Chaos Monkey
  • Chaos Gorilla
  • Chaos Kong
  • Latency Monkey
  • Compliance Monkey
  • Security Monkey
  • Janitor Monkey
  • Doctor Monkey

https://github.com/Netflix/SimianArmy

slide-96
SLIDE 96

Prevention Detection Core

(Architectural)

Recovery Mitigation Treatment

Routine maintenance Backup request Anti-fragility Diversity Jitter Error injection Spread the news Anti-entropy

slide-97
SLIDE 97

T

  • wards a pattern language …
slide-98
SLIDE 98

Decisions to make

  • General decisions
  • Bulkhead type
  • Communication paradigm
  • Decisions per failure scenario (repeat)
  • Error detection on node & system level
  • Recovery/mitigation mechanism
  • Supporting treatment mechanism
  • Supporting prevention mechanism
  • Complementing decisions
  • Complementing redundancy mechanism(s)
  • Complementing architectural patterns
slide-99
SLIDE 99

Core

(Architectural)

Detection Treatment Prevention Recovery Mitigation

Isolation Redundancy Communication paradigm Supporting patterns Node level System level

1

Decide core system properties

2

Choose patterns per failure scenario (Have the different failure types in mind)

3

Decide complementing patterns

Ongoing

Create and refine system design and functional decomposition. Functionally decouple bulkheads (A good functional decomposition on business level is the prerequisite for an effective resilience)

slide-100
SLIDE 100

Core

(Architectural)

Detection Treatment Prevention Recovery Mitigation

Isolation Redundancy Communication paradigm Supporting patterns Node level System level

Example: Erlang (Akka)

Monitor Messaging Actor Escalation Heartbeat Restart

(Let it crash)

Hot deployments

slide-101
SLIDE 101

Core

(Architectural)

Detection Treatment Prevention Recovery Mitigation

Isolation Redundancy Communication paradigm Supporting patterns Node level System level

Example: Netflix

Monitor Request/ response (Micro)Service Retry Zero downtime deployment

(Canary releases)

Fallback Share load Bounded queue Timeout Circuit breaker Several variants Error injection

slide-102
SLIDE 102

Core

(Architectural)

Detection Treatment Prevention Recovery Mitigation

Isolation Redundancy Communication paradigm Supporting patterns Node level System level

What is your pattern language?

slide-103
SLIDE 103

Wrap-up

  • T
  • day’s systems are distributed
  • Failures are not avoidable
  • Failures are not predictable
  • Resilient software design needed
  • Rich pattern language
  • Start with core system properties
  • Choose patterns based on failure scenarios
  • Complement with careful functional design
slide-104
SLIDE 104

Further reading

1. Michael T. Nygard, Release It!,
 Pragmatic Bookshelf, 2007 2. Robert S. Hanmer,
 Patterns for Fault T

  • lerant Software, Wiley, 2007

3. Andrew T anenbaum, Marten van Steen,
 Distributed Systems – Principles and Paradigms,
 Prentice Hall, 2nd Edition, 2006 4. Hystrix Wiki,
 https://github.com/Netflix/Hystrix/wiki 5. Uwe Friedrichsen, Patterns of resilience,
 http://de.slideshare.net/ufried/patterns-of-resilience

slide-105
SLIDE 105

Do not avoid failures. Embrace them!

slide-106
SLIDE 106

@ufried

Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com

slide-107
SLIDE 107