Fault tolerance made easy A head-start to resilient software design - - PowerPoint PPT Presentation

fault tolerance made easy
SMART_READER_LITE
LIVE PREVIEW

Fault tolerance made easy A head-start to resilient software design - - PowerPoint PPT Presentation

Fault tolerance made easy A head-start to resilient software design Uwe Friedrichsen (codecentric AG) QCon London 5. March 2014 @ufried Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried |


slide-1
SLIDE 1

Fault tolerance made easy

A head-start to resilient software design

Uwe Friedrichsen (codecentric AG) – QCon London – 5. March 2014

slide-2
SLIDE 2

@ufried

Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com

slide-3
SLIDE 3

It‘s all about production!

slide-4
SLIDE 4

Production Availability Resilience Fault T

  • lerance
slide-5
SLIDE 5

Your web server doesn‘t look good …

slide-6
SLIDE 6

Pattern #1

Timeouts

slide-7
SLIDE 7

Timeouts (1)

// Basics myObject.wait(); // Do not use this by default myObject.wait(TIMEOUT); // Better use this // Some more basics myThread.join(); // Do not use this by default myThread.join(TIMEOUT); // Better use this

slide-8
SLIDE 8

Timeouts (2)

// Using the Java concurrent library Callable<MyActionResult> myAction = <My Blocking Action> ExecutorService executor = Executors.newSingleThreadExecutor(); Future<MyActionResult> future = executor.submit(myAction); MyActionResult result = null; try { result = future.get(); // Do not use this by default result = future.get(TIMEOUT, TIMEUNIT); // Better use this } catch (TimeoutException e) { // Only thrown if timeouts are used ... } catch (...) { ... }

slide-9
SLIDE 9

Timeouts (3)

// Using Guava SimpleTimeLimiter Callable<MyActionResult> myAction = <My Blocking Action> SimpleTimeLimiter limiter = new SimpleTimeLimiter(); MyActionResult result = null; try { result = limiter.callWithTimeout(myAction, TIMEOUT, TIMEUNIT, false); } catch (UncheckedTimeoutException e) { ... } catch (...) { ... }

slide-10
SLIDE 10

Determining Timeout Duration Configurable Timeouts Self-Adapting Timeouts Timeouts in JavaEE Containers

slide-11
SLIDE 11

Pattern #2

Circuit Breaker

slide-12
SLIDE 12

Circuit Breaker (1)

Client Resource Circuit Breaker

Request Resource unavailable Resource available Closed Open Half-Open

Lifecycle

slide-13
SLIDE 13

Circuit Breaker (2)

Clos

  • sed
  • n call / pass through

call succeeds / reset count call fails / count failure threshold reached / trip breaker

Open Open

  • n call / fail
  • n timeout / attempt reset

trip breaker

Half Half-Open

  • Open
  • n call / pass through

call succeeds / reset call fails / trip breaker trip breaker attempt reset reset Source: M. Nygard, „Release It!“

slide-14
SLIDE 14

Circuit Breaker (3)

public class CircuitBreaker implements MyResource { public enum State { CLOSED, OPEN, HALF_OPEN } final MyResource resource; State state; int counter; long tripTime; public CircuitBreaker(MyResource r) { resource = r; state = CLOSED; counter = 0; tripTime = 0L; } ...

slide-15
SLIDE 15

Circuit Breaker (4)

... public Result access(...) { // resource access Result r = null; if (state == OPEN) { checkTimeout(); throw new ResourceUnavailableException(); } try { r = resource.access(...); // should use timeout } catch (Exception e) { fail(); throw e; } success(); return r; } ...

slide-16
SLIDE 16

Circuit Breaker (5)

... private void success() { reset(); } private void fail() { counter++; if (counter > THRESHOLD) { tripBreaker(); } } private void reset() { state = CLOSED; counter = 0; } ...

slide-17
SLIDE 17

Circuit Breaker (6)

... private void tripBreaker() { state = OPEN; tripTime = System.currentTimeMillis(); } private void checkTimeout() { if ((System.currentTimeMillis - tripTime) > TIMEOUT) { state = HALF_OPEN; counter = THRESHOLD; } } public State getState() return state; } }

slide-18
SLIDE 18

Thread-Safe Circuit Breaker Failure Types Tuning Circuit Breakers Available Implementations

slide-19
SLIDE 19

Pattern #3

Fail Fast

slide-20
SLIDE 20

Fail Fast (1)

Client Resources Expensive Action

Request Uses

slide-21
SLIDE 21

Fail Fast (2)

Client Resources Expensive Action

Request

Fail Fast Guard

Uses Check availability Forward

slide-22
SLIDE 22

Fail Fast (3)

public class FailFastGuard { private FailFastGuard() {} public static void checkResources(Set<CircuitBreaker> resources) { for (CircuitBreaker r : resources) { if (r.getState() != CircuitBreaker.CLOSED) { throw new ResourceUnavailableException(r); } } } }

slide-23
SLIDE 23

Fail Fast (4)

public class MyService { Set<CircuitBreaker> requiredResources; // Initialize resources ... public Result myExpensiveAction(...) { FailFastGuard.checkResources(requiredResources); // Execute core action ... } }

slide-24
SLIDE 24

The dreaded SiteT

  • oSuccessfulException …
slide-25
SLIDE 25

Pattern #4

Shed Load

slide-26
SLIDE 26

Shed Load (1)

Clients Server

T

  • o many Requests
slide-27
SLIDE 27

Shed Load (2)

Server

T

  • o many Requests

Gate Keeper Monitor

Requests Request Load Data Monitor Load Shedded Requests

Clients

slide-28
SLIDE 28

Shed Load (3)

public class ShedLoadFilter implements Filter { Random random; public void init(FilterConfig fc) throws ServletException { random = new Random(System.currentTimeMillis()); } public void destroy() { random = null; } ...

slide-29
SLIDE 29

Shed Load (4)

... public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws java.io.IOException, ServletException { int load = getLoad(); if (shouldShed(load)) { HttpServletResponse res = (HttpServletResponse)response; res.setIntHeader("Retry-After", RECOMMENDATION); res.sendError(HttpServletResponse.SC_SERVICE_UNAVAILABLE); return; } chain.doFilter(request, response); } ...

slide-30
SLIDE 30

Shed Load (5)

... private boolean shouldShed(int load) { // Example implementation if (load < THRESHOLD) { return false; } double shedBoundary = ((double)(load - THRESHOLD))/ ((double)(MAX_LOAD - THRESHOLD)); return random.nextDouble() < shedBoundary; } }

slide-31
SLIDE 31

Shed Load (6)

slide-32
SLIDE 32

Shed Load (7)

slide-33
SLIDE 33

Shedding Strategy Retrieving Load Tuning Load Shedders Alternative Strategies

slide-34
SLIDE 34

Pattern #5

Deferrable Work

slide-35
SLIDE 35

Deferrable Work (1)

Client

Requests

Request Processing Resources

Use

Routine Work

Use

slide-36
SLIDE 36

OVERLOAD

Deferrable Work (2)

Without
 Deferrable Work

100%

OVERLOAD

With
 Deferrable Work

100%

Request Processing Routine Work

slide-37
SLIDE 37

// Do or wait variant ProcessingState state = initBatch(); while(!state.done()) { int load = getLoad(); if (load > THRESHOLD) { waitFixedDuration(); } else { state = processNext(state); } } void waitFixedDuration() { Thread.sleep(DELAY); // try-catch left out for better readability }

Deferrable Work (3)

slide-38
SLIDE 38

// Adaptive load variant ProcessingState state = initBatch(); while(!state.done()) { waitLoadBased(); state = processNext(state); } void waitLoadBased() { int load = getLoad(); long delay = calcDelay(load); Thread.sleep(delay); // try-catch left out for better readability } long calcDelay(int load) { // Simple example implementation if (load < THRESHOLD) { return 0L; } return (load – THRESHOLD) * DELAY_FACTOR; }

Deferrable Work (4)

slide-39
SLIDE 39

Delay Strategy Retrieving Load Tuning Deferrable Work

slide-40
SLIDE 40

I can hardly hear you …

slide-41
SLIDE 41

Pattern #6

Leaky Bucket

slide-42
SLIDE 42

Leaky Bucket (1)

Leaky Bucket

Fill

Problem

  • ccured

Periodically

Leak

Error Handling

Overflowed?

slide-43
SLIDE 43

public class LeakyBucket { // Very simple implementation final private int capacity; private int level; private boolean overflow; public LeakyBucket(int capacity) { this.capacity = capacity; drain(); } public void drain () { this.level = 0; this.overflow = false; } ...

Leaky Bucket (2)

slide-44
SLIDE 44

... public void fill() { level++; if (level > capacity) {

  • verflow = true;

} } public void leak() { level--; if (level < 0) { level = 0; } } public boolean overflowed() { return overflow; } }

Leaky Bucket (3)

slide-45
SLIDE 45

Thread-Safe Leaky Bucket Leaking strategies Tuning Leaky Bucket Available Implementations

slide-46
SLIDE 46

Pattern #7

Limited Retries

slide-47
SLIDE 47

// doAction returns true if successful, false otherwise // General pattern boolean success = false int tries = 0; while (!success && (tries < MAX_TRIES)) { success = doAction(...); tries++; } // Alternative one-retry-only variant success = doAction(...) || doAction(...);

Limited Retries (1)

slide-48
SLIDE 48

Idempotent Actions Closures / Lambdas Tuning Retries

slide-49
SLIDE 49

More Patterns

  • Complete Parameter Checking
  • Marked Data
  • Routine Audits
slide-50
SLIDE 50

Further reading

  • 1. Michael T. Nygard, Release It!,

Pragmatic Bookshelf, 2007

  • 2. Robert S. Hanmer,


Patterns for Fault T

  • lerant Software,

Wiley, 2007

  • 3. James Hamilton, On Designing and

Deploying Internet-Scale Services,
 21st LISA Conference 2007

  • 4. Andrew T

anenbaum, Marten van Steen, Distributed Systems – Principles and Paradigms,
 Prentice Hall, 2nd Edition, 2006

slide-51
SLIDE 51

It‘s all about production!

slide-52
SLIDE 52

@ufried

Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com

slide-53
SLIDE 53