Architectures that Scale Deep: Regaining Control in Deep Systems - - PowerPoint PPT Presentation

architectures that scale deep
SMART_READER_LITE
LIVE PREVIEW

Architectures that Scale Deep: Regaining Control in Deep Systems - - PowerPoint PPT Presentation

Architectures that Scale Deep: Regaining Control in Deep Systems Ben Sigelman (@el_bhs, bhs@lightstep.com) Co-founder & CEO: LightStep Co-creator: OpenTracing, OpenTelemetry, Google Dapper, Google Monarch QCon SF, November 2019 Part I


slide-1
SLIDE 1

Ben Sigelman (@el_bhs, bhs@lightstep.com)

Co-founder & CEO: LightStep Co-creator: OpenTracing, OpenTelemetry, Google Dapper, Google Monarch

Architectures that Scale Deep:

Regaining Control in Deep Systems

QCon SF, November 2019

slide-2
SLIDE 2

Part I

Scaling, and Deep Systems

slide-3
SLIDE 3

What is scale, anyway?

slide-4
SLIDE 4

Scaling wide

slide-5
SLIDE 5

Scaling wide

slide-6
SLIDE 6

Scaling wide

slide-7
SLIDE 7

Scaling wide

slide-8
SLIDE 8

Scaling wide

slide-9
SLIDE 9

Scaling deep

slide-10
SLIDE 10

Scaling deep

slide-11
SLIDE 11

Scaling deep

slide-12
SLIDE 12

Scaling deep

slide-13
SLIDE 13

Scaling deep

slide-14
SLIDE 14

How does this look for software?

slide-15
SLIDE 15

Software: Scaling wide

slide-16
SLIDE 16

Software: Scaling deep

slide-17
SLIDE 17

How do real-world systems look?

slide-18
SLIDE 18

Microservices at scale aren’t just wide systems, they’re deep systems

slide-19
SLIDE 19

Deep Systems

Architectures with ≥ 4 layers of independently operated services

(including external/cloud dependencies)

Deep Systems

Architectures with ≥ 4 layers of independently operated services

(including external/cloud dependencies)

slide-20
SLIDE 20

What do deep systems sound like?

slide-21
SLIDE 21

“Don’t deploy on Fridays” What do deep systems sound like?

slide-22
SLIDE 22

“Where’s Chris?! I’m dealing with a P0 and they’re the only one who knows how to debug this.” What do deep systems sound like?

slide-23
SLIDE 23

“It can’t be our fault, our dashboard says we’re healthy” What do deep systems sound like?

slide-24
SLIDE 24

“Kafka is on fire” What do deep systems sound like?

slide-25
SLIDE 25

“I need 100% availability from your team. One hundred percent.” What do deep systems sound like?

slide-26
SLIDE 26

“I didn’t know I depended on that region” What do deep systems sound like?

slide-27
SLIDE 27

“That was on a dashboard but I can’t find it” What do deep systems sound like?

slide-28
SLIDE 28

Lots of challenges:

  • People-management
  • Security
  • Multi-tenancy
  • “Big-customer” success
  • Performance
  • Observability

What do deep systems sound like?

slide-29
SLIDE 29

Part II

Control Theory: TL;DR Edition

slide-30
SLIDE 30

Why do we care so much about observability, anyway?

slide-31
SLIDE 31
slide-32
SLIDE 32

A System Inputs Outputs

… and its state vector,

slide-33
SLIDE 33

Inputs A System Outputs

… and its state vector,

Observability

How well can you infer internal state using only the outputs?

slide-34
SLIDE 34

Outputs A System Inputs

… and its state vector,

Controllability

How well can you control internal state using only the inputs?

slide-35
SLIDE 35

Controllability is the dual of Observability

slide-36
SLIDE 36

Controllability is the dual of Observability

slide-37
SLIDE 37

Part III

What Deep Systems Mean for Observability

slide-38
SLIDE 38

# of services developers per service

Architectural evolution

Deep Systems Pure Monoliths

slide-39
SLIDE 39

Stress (n): responsibility without control Stress

what you can control what you are responsible for

slide-40
SLIDE 40
slide-41
SLIDE 41

Observability: Shrink This Gap

slide-42
SLIDE 42

Mental models

A System

slide-43
SLIDE 43

Managing Deep Systems

Services must have SLOs

(“Service Level Objectives”: latency, errors, etc)

For effective service management, only three things matter: 0. Releasing service functionality 1. Gradually improving SLOs 2. Rapidly restoring SLOs In a deep system, we must control the entire “triangle” to maintain our SLOs

slide-44
SLIDE 44

Controllability == Observability Controllability == Observability There’s that word again…

slide-45
SLIDE 45

Observability: “The Conventional Wisdom”

Observing microservices is hard Google and Facebook solved this (right???) They used Metrics, Logging, and Distributed Tracing… … So we should, too.

slide-46
SLIDE 46

3 Pillars, 3 Experiences

Metrics Logs Traces

slide-47
SLIDE 47
slide-48
SLIDE 48

Three Pillars? Three Pillars? Two giant pipes…

Logs Metrics

Without Traces: Cognitive Load ≈ O(depth2)

slide-49
SLIDE 49

Three Pillars? Three Pillars? Two giant pipes…

Logs Metrics

slide-50
SLIDE 50
slide-51
SLIDE 51

Two giant pipes…

Logs Metrics

Without Traces: Cognitive Load ≈ O(depth2)

slide-52
SLIDE 52
slide-53
SLIDE 53

Traces

slide-54
SLIDE 54

Traces provide Context

slide-55
SLIDE 55

Traces provide Context And context rules out invalid hypotheses

slide-56
SLIDE 56

Two giant pipes and a filter

Logs Metrics

Context

(from traces)

slide-57
SLIDE 57

Context

(from traces)

Context reduces cognitive load

With Traces: Cognitive Load ≈ O(depth)

Relevant Metrics Relevant Logs

slide-58
SLIDE 58

Observability: Shrink This Gap

slide-59
SLIDE 59
slide-60
SLIDE 60

Let’s Review

slide-61
SLIDE 61

Microservices don’t just scale wide, they scale deep Recognize deep systems

slide-62
SLIDE 62

Stress (n): responsibility without control Stress

what you can control what you are responsible for

slide-63
SLIDE 63

“Controllability” (of SLOs) depends on observability

slide-64
SLIDE 64

… and traces are not sprinkles

“The Three Pillars of Observability” is a lousy metaphor

slide-65
SLIDE 65

Tracing can reduce cognitive load from O(depth2) to O(depth)

slide-66
SLIDE 66

Tracing is the backbone of simple observability in deep systems

slide-67
SLIDE 67

Thank You

Feedback always welcome:

twitter → @el_bhs the emails → bhs@lightstep.com

Play with LightStep, for free, anytime:

(no email address required!)

lightstep.com/play