Architecting for Failure in a Containerized World Tom Faulhaber - - PowerPoint PPT Presentation

architecting for failure in a containerized world
SMART_READER_LITE
LIVE PREVIEW

Architecting for Failure in a Containerized World Tom Faulhaber - - PowerPoint PPT Presentation

Architecting for Failure in a Containerized World Tom Faulhaber Infolace How can container tech help us build robust systems? Key takeaway: an architectural toolkit for building robust systems with containers The Rules Decomposition


slide-1
SLIDE 1

Architecting for Failure in a Containerized World

Tom Faulhaber Infolace

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5

How can container tech help us build robust systems?

slide-6
SLIDE 6

Key takeaway: an architectural toolkit for building robust systems with containers

slide-7
SLIDE 7

The Rules

Decomposition Orchestration and Synchronization Managing Stateful Apps

slide-8
SLIDE 8

Simplicity

slide-9
SLIDE 9

Simple means: “Do one thing!”

slide-10
SLIDE 10

The opposite of simple is complex

slide-11
SLIDE 11

Complexity exists within components

slide-12
SLIDE 12

Complexity exists between components

slide-13
SLIDE 13

Example: a counter

Counter Service

1 2 3 4 5

Counter Service

1 2 3 4 5

… x

Counter Service

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

slide-14
SLIDE 14

Example: a counter

Counter Service

1 2 3 4 5

Counter Service

1 2 3 4 5

Load Balancer

1 2 3 4 5 1 2 3 4 5

slide-15
SLIDE 15

State + composition = complexity

slide-16
SLIDE 16

Part 1: Decomposition

slide-17
SLIDE 17

Rule: Decompose vertically

slide-18
SLIDE 18

App Server Service #1 Service #2 Service #3

slide-19
SLIDE 19

App Server

slide-20
SLIDE 20

Rule: Separation of concerns

slide-21
SLIDE 21

Example: Logging

App Core Code Logging Driver

Config

Logging Server

slide-22
SLIDE 22

Example: Logging

Logger App Core Code Logging Driver

Config

Logging Server

StdOut

slide-23
SLIDE 23

Aspect-oriented programming

slide-24
SLIDE 24

Rule: Constrain state

slide-25
SLIDE 25

Relational DB Session Store

slide-26
SLIDE 26

Rule: Battle-tested tools

slide-27
SLIDE 27

Redis MySQL

slide-28
SLIDE 28

Rule: High code churn →Easy restart

slide-29
SLIDE 29

Rule: No start-up order!

slide-30
SLIDE 30

time

a b c d

slide-31
SLIDE 31

time

x a b c d

slide-32
SLIDE 32

time

x a b c d x x x

slide-33
SLIDE 33

time

x a b c d x x x

slide-34
SLIDE 34

time

a b c d

slide-35
SLIDE 35

time

a b c d

slide-36
SLIDE 36

time

a b c d

slide-37
SLIDE 37

Rule: Consider higher-order failure

slide-38
SLIDE 38

The Rules

Decomposition Decompose vertically Separation of concerns Constrain state Battle-tested tools High code churn, easy restart No start-up order! Consider higher-order failure Orchestration and Synchronization Managing Stateful Apps

slide-39
SLIDE 39

Part 2: Orchestration and Synchronization

slide-40
SLIDE 40

Rule: Use Framework Restarts

slide-41
SLIDE 41
  • Mesos: Marathon always restarts
  • Kubernetes: RestartPolicy=Always
  • Docker: Swarm always restarts
slide-42
SLIDE 42

Rule: Create your own framework

slide-43
SLIDE 43

Mesos Agent Framework Executor Mesos Master Framework Driver Mesos Agent Framework Executor Mesos Agent Framework Executor

slide-44
SLIDE 44

Rule: Use Synchronized State

slide-45
SLIDE 45

Synchronized State

Tools:

  • zookeeper
  • etcd
  • consul

Patterns:

  • leader election
  • shared counters
  • peer awareness
  • work partitioning
slide-46
SLIDE 46

Rule: Minimize Synchronized State

slide-47
SLIDE 47

Even battle-tested state management is a headache.

(Source: http://blog.cloudera.com/blog/2014/03/zookeeper-resilience-at-pinterest/)

slide-48
SLIDE 48

The Rules

Decomposition Decompose vertically Separation of concerns Constrain state Battle-tested tools High code churn, easy restart No start-up order! Consider higher-order failure Orchestration and Synchronization Use framework restarts Create your own framework Use synchronized state Minimize synchronized state Managing Stateful Apps

slide-49
SLIDE 49

Part 3: Managing Stateful Apps

slide-50
SLIDE 50

Rule (repeat!): Always use battle-tested tools!

(State is the weak point)

slide-51
SLIDE 51

Rule: Choose the DB architecture

slide-52
SLIDE 52

Option 1: External DB

Execution cluster Database cluster

slide-53
SLIDE 53

Option 1: External DB

Pros

  • Somebody else’s problem!
  • Can use a DB designed for

clustering directly

  • Can use DB as a service

Cons

  • Not really somebody else’s

problem!

  • Higher latency/no reference

locality

  • Can’t leverage orchestration,

etc.

slide-54
SLIDE 54

Option 2: Run on Raw HW

HDFS Mesos Marathon App HDFS Mesos Marathon App HDFS Mesos Marathon App

slide-55
SLIDE 55

Option 2: Run on Raw HW

Pros

  • Use existing recipes
  • Have local data
  • Manage a single cluster

Cons

  • Orchestration doesn’t help with

failure

  • Increased management

complexity

slide-56
SLIDE 56

Option 3: In-memory DB

Mesos Marathon App MemSQL Mesos Marathon App MemSQL Mesos Marathon App MemSQL

slide-57
SLIDE 57

Option 3: In-memory DB

Pros

  • No need for volume tracking
  • Fast
  • Have local data
  • Manage a single cluster

Cons

  • Bets all machines won’t go

down

  • Bets on orchestration

framework

slide-58
SLIDE 58

Option 4: Use Orchestration

Mesos Marathon App Cassandra Mesos Marathon App Cassandra Mesos Marathon App Cassandra

slide-59
SLIDE 59

Option 4: Use Orchestration

Pros

  • Orchestration manages

volumes

  • One model for all programs
  • Have local data
  • Single cluster

Cons

  • Currently the least mature
  • Not well supported by vendors
slide-60
SLIDE 60

Option 5: Roll Your Own

Mesos Marathon App ImageMgr Mesos Master Framework Mesos Marathon App ImageMgr Mesos Marathon App ImageMgr

slide-61
SLIDE 61

Option 5: Roll Your Own

Pros

  • Very precise control
  • You decide whether to use

containers

  • Have local data
  • Can be system aware

Cons

  • You’re on your own!
  • Wedded to a single
  • rchestration platform
  • Not battle tested
slide-62
SLIDE 62

Rule: Have replication

slide-63
SLIDE 63

The Rules

Decomposition Decompose vertically Separation of concerns Constrain state Battle-tested tools High code churn, easy restart No start-up order! Consider higher-order failure Orchestration and Synchronization Use framework restarts Create your own framework Use synchronized state Minimize synchronized state Managing Stateful Apps Battle-tested tools Choose the DB architecture Have replication

slide-64
SLIDE 64

Fin

slide-65
SLIDE 65

References

  • Rich Hickey:


“Are We There Yet?” (https://www.infoq.com/presentations/Are-We- There-Yet-Rich-Hickey)
 “Simple Made Easy” (https://www.infoq.com/presentations/Simple- Made-Easy-QCon-London-2012)

  • David Greenberg, Building Applications on Mesos, O’Reilly, 2016
  • Joe Johnston, et al., Docker in Production: Lessons from the

Trenches, Bleeding Edge Press, 2015

slide-66
SLIDE 66

The Rules

Decomposition Decompose vertically Separation of concerns Constrain state Battle-tested tools High code churn, easy restart No start-up order! Consider higher-order failure Orchestration and Synchronization Use framework restarts Create your own framework Use synchronized state Minimize synchronized state Managing Stateful Apps Battle-tested tools Choose the DB architecture Have replication