BORING IS AWESOME! INTRODUCTION TWITCH Feel free to ask questions - - PowerPoint PPT Presentation

boring is awesome
SMART_READER_LITE
LIVE PREVIEW

BORING IS AWESOME! INTRODUCTION TWITCH Feel free to ask questions - - PowerPoint PPT Presentation

ARCHITECTING FOR OPERATIONS BORING IS AWESOME! INTRODUCTION TWITCH Feel free to ask questions in the chat at anytime! Giving back feedback through the chat helps me read the audience. Please respond to each other in chat as well.


slide-1
SLIDE 1

BORING IS AWESOME!

ARCHITECTING FOR OPERATIONS

slide-2
SLIDE 2

INTRODUCTION

TWITCH

▸ Feel free to ask questions in the chat at anytime! ▸ Giving back feedback through the chat helps me read the audience. ▸ Please respond to each other in chat as well. ▸ Big thanks to the volunteer student moderators: ▸ Wolgo ▸ CptWesley ▸ We are going to have a break at 1630±

slide-3
SLIDE 3

INTRODUCTION

AUDIENCE

▸ Are you working in the industry? ▸ Are you operating infrastructure? ▸ What do you expect from this lecture?

slide-4
SLIDE 4

MINDSET

slide-5
SLIDE 5

COLLABORATION

slide-6
SLIDE 6

SERVING THE CUSTOMER

slide-7
SLIDE 7

IMPACT

slide-8
SLIDE 8

INTRODUCTION

STEFFAN NORBERHUIS

▸ Freelance Cloud & DevOps Consultant ▸ Twitter: @SNorberhuis ▸ steffan@norberhuis.nl ▸ Feel free to contact me!

slide-9
SLIDE 9

INTRODUCTION

OVERVIEW

▸ Disruption ▸ Infrastructure as Code ▸ Failure is inevitable ▸ Building for Failure

slide-10
SLIDE 10

DISRUPTION

ARCHITECTING FOR OPERATIONS

slide-11
SLIDE 11

YOU BUILD IT, YOU RUN IT

slide-12
SLIDE 12

DISRUPTION

DEVOPS OWNERSHIP

slide-13
SLIDE 13

CLOUD

slide-14
SLIDE 14

DISRUPTION

CLOUD

▸ Operate technology without owning technology ▸ Agility with no planning ▸ Focus on your business

Source: What is Cloud Computing by AWS https://www.twitch.tv/videos/477810350?

slide-15
SLIDE 15

INFRASTRUCTURE AS CODE

ARCHITECTING FOR OPERATIONS

slide-16
SLIDE 16

INFRASTRUCTURE AS CODE

BENEFITS

▸ Automation ▸ Version control ▸ Code Review ▸ Testing ▸ Documentation ▸ Reuse

Source: 5 Lessons Learned From Writing Over 300,000 Lines of Infrastructure Code by Yevgeniy Brickman https://www.youtube.com/watch?v=RTEgE2lcyk4 https://www.youtube.com/watch?v=RTEgE2lcyk4

slide-17
SLIDE 17

INFRASTRUCTURE AS CODE

INFRASTRUCTURE AS CODE

Source: AWS CDK by AWS re:Invent https://www.youtube.com/watch?v=Lh-kVC2r2AU AWS CloudFormation Azure Resource Manager JSON / YAML Declarative Componentized DOM Pulumi AWS Cloud Development Kit HashiCorp Terraform Troposphere

slide-18
SLIDE 18

INFRASTRUCTURE AS CODE

SLOW FEEDBACK LOOP

SERVICES

Jack

slide-19
SLIDE 19

INFRASTRUCTURE AS CODE

SLOW FEEDBACK LOOP

SERVICES A

Jack

slide-20
SLIDE 20

INFRASTRUCTURE AS CODE

SLOW FEEDBACK LOOP

SERVICES B A

Jack

slide-21
SLIDE 21

INFRASTRUCTURE AS CODE

SLOW FEEDBACK LOOP

SERVICES B A C

Jack

slide-22
SLIDE 22

INFRASTRUCTURE AS CODE

SLOW FEEDBACK LOOP

SERVICES B A C

Jack

slide-23
SLIDE 23

INFRASTRUCTURE AS CODE

SLOW FEEDBACK LOOP

A

Jack

slide-24
SLIDE 24

INFRASTRUCTURE AS CODE

COLLISIONS DURING DEVELOPMENT

SERVICES

Jack

slide-25
SLIDE 25

INFRASTRUCTURE AS CODE

COLLISIONS DURING DEVELOPMENT

SERVICES

Susan Jack

slide-26
SLIDE 26

INFRASTRUCTURE AS CODE

COLLISIONS DURING DEVELOPMENT

A B

Susan Jack

slide-27
SLIDE 27

INFRASTRUCTURE AS CODE

COLLISIONS DURING DEVELOPMENT

Jack Susan

A

slide-28
SLIDE 28

INFRASTRUCTURE AS CODE

COLLISIONS DURING DEVELOPMENT

JACK-A ANNE-A

Jack Susan

slide-29
SLIDE 29

INFRASTRUCTURE AS CODE

PROBLEMS

SERVICES B A C

Jack

slide-30
SLIDE 30

INFRASTRUCTURE AS CODE

PROBLEMS: BUGS

SERVICES B A C

Jack

slide-31
SLIDE 31

INFRASTRUCTURE AS CODE

PROBLEMS: DRIFT

SERVICES B A C

Jack

slide-32
SLIDE 32

INFRASTRUCTURE AS CODE

PROBLEMS

A

Jack

slide-33
SLIDE 33

INFRASTRUCTURE AS CODE

TAGGING/BRANCH DEADLOCKS

SERVICES B A C

1.60 Jack

slide-34
SLIDE 34

INFRASTRUCTURE AS CODE

TAGGING DEADLOCKS

SERVICES B A C

1.60 Jack

slide-35
SLIDE 35

INFRASTRUCTURE AS CODE

TAGGING DEADLOCKS

SERVICES B A C

1.60 1.60 1.60 Jack

slide-36
SLIDE 36

INFRASTRUCTURE AS CODE

TAGGING DEADLOCKS

SERVICES B A C

1.60 1.61? 1.60 Jack

slide-37
SLIDE 37

INFRASTRUCTURE AS CODE

TAGGING DEADLOCKS

A

1.60 Jack

slide-38
SLIDE 38

INFRASTRUCTURE AS CODE

CYCLOMATIC DEPENDENCY

FOO

Jack

BAR

slide-39
SLIDE 39

INFRASTRUCTURE AS CODE

CYCLOMATIC DEPENDENCY

FOO

Jack

BAR

slide-40
SLIDE 40

INFRASTRUCTURE AS CODE

CHALLENGES

▸ Feedback speed ▸ Parallel development ▸ Complexity ▸ Different lifecycles ▸ Different teams

Source: Happy Terraforming! By Armin Coralic: https://www.youtube.com/watch?v=G06j6HLWyYo

slide-41
SLIDE 41

INFRASTRUCTURE AS CODE

GUIDELINES

▸ Less frequent changes, higher risk, in lower layers ▸ Small blocks ▸ No cyclomatic dependencies ▸ Decouple independent services ▸ Only deploy pipelines manually

slide-42
SLIDE 42

NEW APPROACH

ARCHITECTURE

CONFIG CONFIG APPLICATION DATA ROLES NETWORK ALERTING VPC SECRETS

INFRA DOMAIN GLOBAL

BOOTSTRAP

SECURITY ACCOUNTS

slide-43
SLIDE 43

NEW APPROACH

ARCHITECTURE

CONFIG CONFIG APPLICATION DATA ROLES NETWORK ALERTING VPC SECRETS

INFRA DOMAIN GLOBAL

BOOTSTRAP

ACCOUNTS SECURITY

slide-44
SLIDE 44

NEW APPROACH

ARCHITECTURE

CONFIG CONFIG APPLICATION DATA ROLES NETWORK ALERTING VPC SECRETS

INFRA DOMAIN GLOBAL

BOOTSTRAP

ACCOUNTS SECURITY

slide-45
SLIDE 45

NEW APPROACH

ARCHITECTURE

CONFIG CONFIG APPLICATION DATA ROLES NETWORK ALERTING VPC SECRETS

INFRA DOMAIN GLOBAL

BOOTSTRAP

ACCOUNTS SECURITY

slide-46
SLIDE 46

NEW APPROACH

ARCHITECTURE

CONFIG CONFIG APPLICATION DATA ROLES NETWORK ALERTING VPC SECRETS

INFRA DOMAIN GLOBAL

BOOTSTRAP

ACCOUNTS SECURITY

slide-47
SLIDE 47

NEW APPROACH

ARCHITECTURE

CONFIG CONFIG APPLICATION DATA ROLES NETWORK ALERTING VPC SECRETS

INFRA DOMAIN GLOBAL

BOOTSTRAP

ACCOUNTS SECURITY

slide-48
SLIDE 48

INFRASTRUCTURE AS CODE

NAMING STANDARDISATION

▸ Environment ▸ Application ▸ Component ▸ Examples: ▸ /prod/billing/foo ▸ /dev-susan/billing/foo ▸ staging-billing-foo

slide-49
SLIDE 49

INFRASTRUCTURE AS CODE

CODE TRACEABILITY

▸ Tag: ▸ github.com/org/teamA/billing-infrastructure/stackA ▸ Naming: ▸ Billing-application-foo -> GitHub.com/org/billing/

infrastructure/src/application/foo

slide-50
SLIDE 50

INFRASTRUCTURE AS CODE

IDENTICAL ENVIRONMENTS

▸ Scaling

DEVELOPMENT ACCEPTANCE PRODUCTION

slide-51
SLIDE 51

INFRASTRUCTURE AS CODE

IDENTICAL ENVIRONMENTS

▸ Scaling ▸ Multiple environments

DEVELOPMENT ACCEPTANCE PRODUCTION

0.05 0.05

CUSTOMER TESTING

0.05

slide-52
SLIDE 52

INFRASTRUCTURE AS CODE

IDENTICAL ENVIRONMENTS

▸ Scaling ▸ Multiple environments ▸ Acceptance tests everything

DEVELOPMENT ACCEPTANCE PRODUCTION

slide-53
SLIDE 53

INFRASTRUCTURE AS CODE

OPEN SOURCE

▸ Terraform: https://github.com/terraform-community-modules ▸ AWS CDK: https://cdkpatterns.com/ ▸ AWS CloudFormation: https://aws.amazon.com/

quickstart/?

▸ Gruntwork*: https://www.gruntwork.io/

slide-54
SLIDE 54

PATH OF ENLIGHTENMENT

slide-55
SLIDE 55

INFRASTRUCTURE AS CODE

DEVOPS METRICS

Source: Accelerate!

LEAD TIME # DEPLOYS CHANGE FAILURE RATE MEAN TIME TO RECOVERY

slide-56
SLIDE 56

INFRASTRUCTURE AS CODE

DEVOPS METRICS

Source: Accelerate!

LEAD TIME # DEPLOYS CHANGE FAILURE RATE MEAN TIME TO RECOVERY

slide-57
SLIDE 57

INFRASTRUCTURE AS CODE

DEVOPS METRICS

Source: Accelerate!

LEAD TIME # DEPLOYS CHANGE FAILURE RATE MEAN TIME TO RECOVERY

slide-58
SLIDE 58

INFRASTRUCTURE AS CODE

DEVOPS METRICS

Source: Accelerate!

LEAD TIME # DEPLOYS CHANGE FAILURE RATE MEAN TIME TO RECOVERY

slide-59
SLIDE 59

TEST DRIVEN DEVELOPMENT

slide-60
SLIDE 60

FAILURE IS INEVITABLE

ARCHITECTING FOR OPERATIONS

slide-61
SLIDE 61

RUNTIME DATA

slide-62
SLIDE 62

COMPLEX SYSTEMS

slide-63
SLIDE 63

FAILURE IS INEVITABLE

slide-64
SLIDE 64

FAILURE IS INEVITABLE

COMMUNICATION

New Relic Cloudwatch OpsGenie Statuspage Customer On Call Incident Captain Business Manager

slide-65
SLIDE 65

SCIENTIFIC APPROACH

slide-66
SLIDE 66

OPERATIONS DETECTIVE

SCIENTIFIC APPROACH

▸ Describe objectively ▸ Formulate a hypothesis ▸ Derive an experiment ▸ Observe outcomes

slide-67
SLIDE 67

LOOK FOR CHANGE

slide-68
SLIDE 68

POST MORTEM

slide-69
SLIDE 69

TRANSPARENCY

slide-70
SLIDE 70

FAILURE IS INEVITABLE

slide-71
SLIDE 71

FAILURE IS INEVITABLE

COMMUNICATION

New Relic Cloudwatch OpsGenie Statuspage Customer On Call Incident Captain Business Manager

slide-72
SLIDE 72

OPERATIONS DETECTIVE

POST MORTEM TEMPLATE

▸ Timeline: What happened? ▸ Impact ▸ Resolutions ▸ Root Cause ▸ Follow up ▸ Public Communication ▸ Improvements ▸ Organisational ▸ Technical

Source:https://response.pagerduty.com/after/post_mortem_process/

slide-73
SLIDE 73

UNIQUE FAILURES

slide-74
SLIDE 74

NO WORK AROUND

slide-75
SLIDE 75

BUILDING FOR FAILURE

QUALITY

Source: Is High Quality Software Worth the Cost? By Martin Fowler https://www.martinfowler.com/articles/ is-quality-worth-cost.html

slide-76
SLIDE 76

BROKEN WINDOW THEORY

slide-77
SLIDE 77

STAGING

slide-78
SLIDE 78

BUILDING FOR FAILURE

ARCHITECTING FOR OPERATIONS

slide-79
SLIDE 79

PESSIMIST

slide-80
SLIDE 80

DEPLOYMENT STRATEGY

slide-81
SLIDE 81

BIG BANG RELEASES

slide-82
SLIDE 82

UNSTOPPABLE RELEASES

slide-83
SLIDE 83

IF IT HURTS, DO IT MORE OFTEN

slide-84
SLIDE 84

A FEDEX EXECUTIVE

BACKWARDS COMPATIBLE

ADDITION CHANGE DELETION

Monday Tuesday Wednesday

slide-85
SLIDE 85

A FEDEX EXECUTIVE

BACKWARDS COMPATIBLE

ADDITION CHANGE DELETION

Januari Februari March

slide-86
SLIDE 86

ALWAYS PUSH TO PRODUCTION

slide-87
SLIDE 87

FEATURE TOGGLES

slide-88
SLIDE 88

A FEDEX EXECUTIVE

SIMPLE FEATURE TOGGLES

function calculate(){ if( featureToggle("use-new-algorithm") ){ return newCalculation(); }else{ return oldCalculation(); } }

slide-89
SLIDE 89

HOPE IS NOT A STRATEGY

slide-90
SLIDE 90

GRACEFUL DEGRATION

slide-91
SLIDE 91

PHD IN FAILURE

GRACEFUL DEGRADATION

▸ Return less precise data ▸ Incomplete data ▸ Cached data ▸ Preset data ▸ No data

slide-92
SLIDE 92

CIRCUIT BREAKER

slide-93
SLIDE 93

A FEDEX EXECUTIVE

CIRCUIT BREAKER

CLIENT BLUE PROVIDER CIRCUIT BREAKER

NORMAL

slide-94
SLIDE 94

A FEDEX EXECUTIVE

CIRCUIT BREAKER

CLIENT BLUE PROVIDER CIRCUIT BREAKER

TIMEOUT

slide-95
SLIDE 95

A FEDEX EXECUTIVE

CIRCUIT BREAKER

CLIENT BLUE PROVIDER CIRCUIT BREAKER

CIRCUIT OPEN

slide-96
SLIDE 96

A FEDEX EXECUTIVE

CIRCUIT BREAKER

CLIENT BLUE PROVIDER CIRCUIT BREAKER

TRIAL

slide-97
SLIDE 97

A FEDEX EXECUTIVE

CIRCUIT BREAKER

CLIENT BLUE PROVIDER CIRCUIT BREAKER

RECOVERY

slide-98
SLIDE 98

A FEDEX EXECUTIVE

CIRCUIT BREAKER

CLIENT BLUE PROVIDER CIRCUIT BREAKER

NORMAL

slide-99
SLIDE 99

A FEDEX EXECUTIVE

CIRCUIT BREAKER

CLIENT BLUE PROVIDER CIRCUIT BREAKER

TIMEOUT

slide-100
SLIDE 100

A FEDEX EXECUTIVE

CIRCUIT BREAKER

CLIENT BLUE PROVIDER CIRCUIT BREAKER

CIRCUIT OPEN

slide-101
SLIDE 101

A FEDEX EXECUTIVE

CIRCUIT BREAKER

CLIENT BLUE PROVIDER CIRCUIT BREAKER

TRIAL

slide-102
SLIDE 102

A FEDEX EXECUTIVE

CIRCUIT BREAKER

CLIENT BLUE PROVIDER CIRCUIT BREAKER

RECOVERY

slide-103
SLIDE 103

ASYNCHRONOUS

slide-104
SLIDE 104

BUILDING FOR FAILURE

ASYNCHRONOUS

Source: Asynchronous patterns for Cloud Functions by Preston Holmes https://cloud.google.com/ community/tutorials/cloud-functions-async

CLIENT API

slide-105
SLIDE 105

BUILDING FOR FAILURE

ASYNCHRONOUS

CLIENT API WORKER QUEUE

Source: Asynchronous patterns for Cloud Functions by Preston Holmes https://cloud.google.com/ community/tutorials/cloud-functions-async

slide-106
SLIDE 106

BUILDING FOR FAILURE

ASYNCHRONOUS

CLIENT API WORKER QUEUE

v2 v1 Source: Asynchronous patterns for Cloud Functions by Preston Holmes https://cloud.google.com/ community/tutorials/cloud-functions-async

slide-107
SLIDE 107

BUILDING FOR FAILURE

ASYNCHRONOUS

CLIENT API WORKER QUEUE JOBSTATE

Source: Asynchronous patterns for Cloud Functions by Preston Holmes https://cloud.google.com/ community/tutorials/cloud-functions-async

slide-108
SLIDE 108

EVENT DRIVEN

slide-109
SLIDE 109

BUILDING FOR FAILURE

EVENT DRIVEN

PRODUCER CONSUMER

Source: Event Driven by Martin Fowler https://martinfowler.com/articles/201701-event-driven.html

slide-110
SLIDE 110

BUILDING FOR FAILURE

EVENT DRIVEN

PRODUCER CONSUMER

Source: Event Driven by Martin Fowler https://martinfowler.com/articles/201701-event-driven.html v1 v2

slide-111
SLIDE 111

BUILDING FOR FAILURE

EVENT DRIVEN

PRODUCER CONSUMER

Command Source: Event Driven by Martin Fowler https://martinfowler.com/articles/201701-event-driven.html Expected

slide-112
SLIDE 112

QUALITY VS INNOVATION

slide-113
SLIDE 113

BUILDING FOR FAILURE

SITE RELIABILITY ENGINEERING

▸ How much quality have we agreed upon? ▸ How much quality do we provide? ▸ How much quality do we want?

Source: Site Reliability Engineering ISBN: https://landing.google.com/sre/sre-book/toc/

slide-114
SLIDE 114

QUANTIFY QUALITY

slide-115
SLIDE 115

ERROR BUDGET

slide-116
SLIDE 116

TOIL

slide-117
SLIDE 117

BUILDING FOR FAILURE

TOIL

▸ Designate Engineer ▸ Focus on incidents ▸ Shields the team ▸ Engineers solutions ▸ Close collaboration with Product Owner

slide-118
SLIDE 118

OBSERVABILITY

slide-119
SLIDE 119

FALSE POSITIVES

slide-120
SLIDE 120

USER NOTIFICATIONS

slide-121
SLIDE 121

CONCLUSION

ATTRIBUTION

▸ Sources are on bottom of the slides ▸ All pictures are from unsplash.com and their creators

slide-122
SLIDE 122

CONCLUSION

FURTHER READING

Source: AWS Well-Architected Framework Whitepaper https://aws.amazon.com/architecture/well- architected/

slide-123
SLIDE 123

ARCHITECTING FOR OPERATIONS

STEFFAN NORBERHUIS

▸ Freelance Cloud & DevOps

Consultant

▸ Twitter: @SNorberhuis ▸ steffan@norberhuis.nl

ANY QUESTIONS?