High ! Availability ! at ! Heroku Mark ! McGranaghan - - PowerPoint PPT Presentation

high availability at heroku
SMART_READER_LITE
LIVE PREVIEW

High ! Availability ! at ! Heroku Mark ! McGranaghan - - PowerPoint PPT Presentation

High ! Availability ! at ! Heroku Mark ! McGranaghan


slide-1
SLIDE 1

High !Availability ! at !Heroku

Mark !McGranaghan

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10

Scale !& !Scope

slide-11
SLIDE 11

O(1,000) !instances O(1,000,000) !apps

slide-12
SLIDE 12

Success !& !Failure

slide-13
SLIDE 13

Architecture Execution

slide-14
SLIDE 14

Architecture Execution

slide-15
SLIDE 15
slide-16
SLIDE 16

Platform-Enabled HA !Routing

slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21

Crashes ! & Supervision

slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25

Crashes !as !the !

  • nly !code !path
slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28

Crashes !as !a !hot ! code !path

slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31

Error !Kernel

slide-32
SLIDE 32
slide-33
SLIDE 33

Layered !design

slide-34
SLIDE 34
slide-35
SLIDE 35

Message ! passing...

slide-36
SLIDE 36
slide-37
SLIDE 37

{slug: “https://aws...”, cmd: “java ...”, env: {“JAVA_OPTS”: ..., “DATABASE_URL”: ..., “SESSION_SECRET”: ...}}

slide-38
SLIDE 38

...of !narrow, ! versioned !values

slide-39
SLIDE 39

{slug: “https://s3...”, cmd: “java -cp ...”, env: {“JAVA_OPTS”: ..., “DATABASE_URL”: ..., “SESSION_SECRET”: ...}, flag: “extra_cpu”}

slide-40
SLIDE 40
slide-41
SLIDE 41

No !Stopping !the ! World

slide-42
SLIDE 42
slide-43
SLIDE 43

Load !balancing Supervision Crash-only Error !kernels Layered !design Message-passing

slide-44
SLIDE 44
slide-45
SLIDE 45

Erlang

Designed !for granular !failure

slide-46
SLIDE 46

Distributed ! Systems

Defined !as granular !failure

slide-47
SLIDE 47

Brokered ! Queueing

slide-48
SLIDE 48
slide-49
SLIDE 49
slide-50
SLIDE 50
slide-51
SLIDE 51
slide-52
SLIDE 52

Publish !one !/ Subscribe !many

slide-53
SLIDE 53
slide-54
SLIDE 54
slide-55
SLIDE 55
slide-56
SLIDE 56

Distributed !call ! graphs

slide-57
SLIDE 57
slide-58
SLIDE 58
slide-59
SLIDE 59
slide-60
SLIDE 60
slide-61
SLIDE 61

Read !call !graph Partial !failure

slide-62
SLIDE 62
slide-63
SLIDE 63

Write !call !graph de-synchronizing

slide-64
SLIDE 64
slide-65
SLIDE 65
slide-66
SLIDE 66

Architecture Execution

slide-67
SLIDE 67

Architecture Execution

slide-68
SLIDE 68

“...we !deployed !a !code !change... ...visible !under !unusual...conditions... ...introduced !a...problem... ...engineers !noticed !a !deviation... ...began !to !escalate... ...system...entered !into !a !feedback !loop... ...engineers...deactivated !the !feedback...

slide-69
SLIDE 69

Evolving Socio-Technical ! Systems

slide-70
SLIDE 70
slide-71
SLIDE 71
slide-72
SLIDE 72

Availability !>> !Architecture

slide-73
SLIDE 73

Failed !deploys Bad !visibility Cascading !feedback

slide-74
SLIDE 74

Evolving Socio-Technical ! Systems

slide-75
SLIDE 75

Failed !deploys Bad !visibility Cascading !feedback

slide-76
SLIDE 76

Deploy !tooling Visibility !services Feedback !controls

slide-77
SLIDE 77

bin/ship

slide-78
SLIDE 78

bin/ship \

  • -component api \
  • -version v408
slide-79
SLIDE 79

Incremental !deploys

slide-80
SLIDE 80
slide-81
SLIDE 81
slide-82
SLIDE 82

Incremental !rollouts

slide-83
SLIDE 83

prep_launch

  • launch_without_lxc

+ launch-with_lxc monitor_launch

slide-84
SLIDE 84

if flag_on?(“lxc”) launch_with_lxc else launch_without_lxc

slide-85
SLIDE 85

app.flag_on(“lxc”) app.flag_off(“lxc”)

slide-86
SLIDE 86
slide-87
SLIDE 87
slide-88
SLIDE 88

Real-time !visibility

slide-89
SLIDE 89
slide-90
SLIDE 90

Service-level !assertions

slide-91
SLIDE 91
slide-92
SLIDE 92

assert(index > 0)

slide-93
SLIDE 93

assert(index > 0)

  • bjects[index]
slide-94
SLIDE 94

assert(p99_latency < 50)

slide-95
SLIDE 95

assert(p99_latency < 50)

slide-96
SLIDE 96

assert(active_cons > 10)

slide-97
SLIDE 97

assert(active_cons > 10)

slide-98
SLIDE 98

Flow !control & !Backpressure

slide-99
SLIDE 99
slide-100
SLIDE 100
slide-101
SLIDE 101
slide-102
SLIDE 102
slide-103
SLIDE 103

limit(“publish”, 10) do Bus.publish(msg) end

slide-104
SLIDE 104

limit(“publish”, prate) do publish(msg) end echo 0 >/etc/rates/publish

slide-105
SLIDE 105

Evolving Socio-Technical ! Systems

slide-106
SLIDE 106

Architecture Execution

slide-107
SLIDE 107
slide-108
SLIDE 108

Thanks

slide-109
SLIDE 109

Questions?