Architecting for Failure in a Containerized World Tom Faulhaber - PowerPoint PPT Presentation

Architecting for Failure in a Containerized World Tom Faulhaber Infolace

How can container tech help us build robust systems?

Key takeaway: an architectural toolkit for building robust systems with containers

The Rules Decomposition Orchestration and Synchronization Managing Stateful Apps

Simplicity

Simple means: “Do one thing!”

The opposite of simple is complex

Complexity exists within components

Complexity exists between components

Example: a counter x … 5 5 0 0 1 1 2 2 3 3 4 4 Counter Counter Service Service … 0 1 2 3 4 5 Counter Service 0 1 2 3 4 5 0 1 2 3 4 5

Example: a counter … 5 0 1 2 3 4 Counter Service Balancer Load … 0 1 2 3 4 5 Counter Service 0 0 1 2 1 2 3 3 4 5 4 5

State + composition = complexity

Part 1: Decomposition

Rule: Decompose vertically

App Server Service Service Service #1 #2 #3

App Server

Rule: Separation of concerns

Example: Logging App Logging Server Core Code Logging Driver Config

Example: Logging App Core Code Logging Server StdOut Logger Logging Driver Config

Aspect-oriented programming

Rule: Constrain state

Session Store Relational DB

Rule: Battle-tested tools

Redis MySQL

Rule: High code churn → Easy restart

Rule: No start-up order!

a b c d time

a b c x d time

x a x b x c x d time

a b c d time

Rule: Consider higher-order failure

The Rules Decomposition Orchestration and Synchronization Decompose vertically Separation of concerns Constrain state Battle-tested tools High code churn, easy restart Managing Stateful Apps No start-up order! Consider higher-order failure

Part 2: Orchestration and Synchronization

Rule: Use Framework Restarts

• Mesos: Marathon always restarts • Kubernetes: RestartPolicy=Always • Docker: Swarm always restarts

Rule: Create your own framework

Mesos Master Mesos Mesos Mesos Agent Agent Agent Framework Driver Framework Framework Framework Executor Executor Executor

Rule: Use Synchronized State

Synchronized State Tools: Patterns: - zookeeper - leader election - etcd - shared counters - consul - peer awareness - work partitioning

Rule: Minimize Synchronized State

Even battle-tested state management is a headache. (Source: http://blog.cloudera.com/blog/2014/03/zookeeper-resilience-at-pinterest/)

The Rules Decomposition Orchestration and Synchronization Decompose vertically Use framework restarts Separation of concerns Create your own framework Constrain state Battle-tested tools Use synchronized state High code churn, easy Minimize synchronized state restart Managing Stateful Apps No start-up order! Consider higher-order failure

Part 3: Managing Stateful Apps

Rule (repeat!): Always use battle-tested tools! (State is the weak point)

Rule: Choose the DB architecture

Option 1: External DB Execution cluster Database cluster

Option 1: External DB Pros Cons • Somebody else’s problem! • Not really somebody else’s problem! • Can use a DB designed for • Higher latency/no reference clustering directly locality • Can use DB as a service • Can’t leverage orchestration, etc.

Option 2: Run on Raw HW App App App Marathon Marathon Marathon Mesos Mesos Mesos HDFS HDFS HDFS

Option 2: Run on Raw HW Pros Cons • Use existing recipes • Orchestration doesn’t help with failure • Have local data • Increased management • Manage a single cluster complexity

Option 3: In-memory DB App App App MemSQL MemSQL MemSQL Marathon Marathon Marathon Mesos Mesos Mesos

Option 3: In-memory DB Pros Cons • No need for volume tracking • Bets all machines won’t go down • Fast • Bets on orchestration • Have local data framework • Manage a single cluster

Option 4: Use Orchestration Mesos Mesos Mesos App App App Marathon Marathon Marathon Cassandra Cassandra Cassandra

Option 4: Use Orchestration Pros Cons • Orchestration manages • Currently the least mature volumes • Not well supported by vendors • One model for all programs • Have local data • Single cluster

Option 5: Roll Your Own Mesos Mesos Mesos Mesos Master Framework App App App Marathon Marathon Marathon ImageMgr ImageMgr ImageMgr

Option 5: Roll Your Own Pros Cons • Very precise control • You’re on your own! • You decide whether to use • Wedded to a single containers orchestration platform • Have local data • Not battle tested • Can be system aware

Rule: Have replication

The Rules Decomposition Orchestration and Synchronization Decompose vertically Use framework restarts Separation of concerns Create your own framework Constrain state Battle-tested tools Use synchronized state High code churn, easy Minimize synchronized state restart Managing Stateful Apps No start-up order! Consider higher-order Battle-tested tools failure Choose the DB architecture Have replication

References • Rich Hickey:   “Are We There Yet?” (https://www.infoq.com/presentations/Are-We- There-Yet-Rich-Hickey)   “Simple Made Easy” (https://www.infoq.com/presentations/Simple- Made-Easy-QCon-London-2012) • David Greenberg, Building Applications on Mesos, O’Reilly, 2016 • Joe Johnston, et al. , Docker in Production: Lessons from the Trenches, Bleeding Edge Press, 2015

The Rules Decomposition Orchestration and Synchronization Decompose vertically Use framework restarts Separation of concerns Create your own framework Constrain state Battle-tested tools Use synchronized state High code churn, easy Minimize synchronized state restart Managing Stateful Apps No start-up order! Consider higher-order Battle-tested tools failure Choose the DB architecture Have replication

Architecting for Failure in a Containerized World Tom Faulhaber - PowerPoint PPT Presentation

Architecting for Failure in a Containerized World Tom Faulhaber Infolace How can container tech help us build robust systems? Key takeaway: an architectural toolkit for building robust systems with containers The Rules Decomposition

BARE ROOT AND BARE ROOT AND CONTAINERIZED FOREST CONTAINERIZED FOREST PLANTS PLANTS PLANTS

Architecting the Internet of Things Dieter Uckelmann Mark Harrison Florian Michahelles

Architecting Java solutions for CICS Architecting Java solutions for CICS Course introduction

Architecting a 30 PB all - Architecting a 30 PB all flash file system flash file system Kirill

Health Failure Telehealth Final Report Sarah Briggs Heart Failure Specialist Nurse Heart Failure

Architecting the Blockchain for Failure Conor Svensson @conors10 blk.io Founder web3j Author

Failure is a four-letter word Andreas Zeller Thomas Zimmermann Christian Bird PROMISE

Architecting a Kotlin JVM and JS multiplatform project FELIPE LIMA / OCT 4TH, 2018 / KOTLINCONF

The Role of Event Description in Architecting Dependable Systems Marcio S. Dias Debra J.

RAIC: Architecting Dependable Systems Through Redundancy and Just-In-Time Testing For The ICSE

Architecting Distributed Databases for Failure A Case Study with Druid Fangjin Yang Cofounder @

PALLIATIVE CARE Advanced heart failure Heart failure has a poor prognosis Heart failure

Management of Co- morbidities in Heart Failure (COPD, Renal failure, Anemia) Dr John Parissis,

Isolating Failure Causes Andreas Zeller 1 Isolating Causes Actual world Alternate world

Compaa Sud Americana de Vapores S.A. June, 2010 Agenda CSAV Group Containerized

The Challenge HPC IT departments required to host Data Science and Machine Learning a

SARS CoV-2: Rising to the Testing Challenge in the United States Speakers: Supported by: Steven

Return to School Plan July 2020 TABLE OF CONTENTS S TAFFING A SSIGNMENT 1 T

Follow up for Positive COVID 19 Cases and their Close Contacts Tools for LBOHs April 28,

Captain Green Reducing the global warming gases and increasing fertility of paddy fields The

WELCOME DONN CARY, REGIONAL AGENT MANAGER MARK HARRIS, DIRECTOR A PERFECT COMBINATION THE TOP

Preventing Heat Inj uries Learn to live with the heat Obj ectives Identify types of heat

National Inpatient Sample: Big Data Issues M B Rao Division of Biostatistics and Epidemiology

Time for a new European Enlightenment on alcohol and other drugs? Luxembourg Sept 2015 David