The Paved PaaS to Microservices Yunong Xiao, Principal Software - - PowerPoint PPT Presentation

the paved paas to microservices
SMART_READER_LITE
LIVE PREVIEW

The Paved PaaS to Microservices Yunong Xiao, Principal Software - - PowerPoint PPT Presentation

June 26, 2017 The Paved PaaS to Microservices Yunong Xiao, Principal Software Engineer, Netflix yunong@netflix.com, @yunongx, http://yunong.io 100 million customers in over 190 countries streaming 125 million hrs/day What is a Platform as a


slide-1
SLIDE 1

The Paved PaaS to Microservices

Yunong Xiao, Principal Software Engineer, Netflix yunong@netflix.com, @yunongx, http://yunong.io

June 26, 2017

slide-2
SLIDE 2

100 million customers in over 190 countries streaming 125 million hrs/day

slide-3
SLIDE 3

–Wikipedia

“Platform as a service (PaaS)… allows customers to develop, run, and manage applications without the complexity of building and maintaining the infrastructure and platform…”

What is a Platform as a Service (Paas), Anyway?

slide-4
SLIDE 4

Our Use Case

slide-5
SLIDE 5
slide-6
SLIDE 6

Netflix Edge API

Script A Script B Script C Script D Script N …

Client Library A Client Library B Client Library C Client Library N

Backend Service A Backend Service B Backend Service C Backend Service N

1000+ 100

slide-7
SLIDE 7

TV iOS Android Windows Browsers

Discovery Playback Non- member …

Backend Service A Backend Service B Backend Service C Backend Service N

Clients Standalone Services Edge API Backend Services

Owned by client teams

Mostly

slide-8
SLIDE 8

Know Your Customer

slide-9
SLIDE 9

Goals

Velocity Reliability

slide-10
SLIDE 10

Our PaaS to Achieving Both

  • 1. Standardized components
  • 2. Preassembled platform
  • 3. Automation and tooling
slide-11
SLIDE 11
  • 1. Standardized Components
slide-12
SLIDE 12

What’s Inside a Microservice?

RPC Discovery Registration Runtime OS Configuration Metrics Logging Tracing Dashboards Alerts Stream Processing

slide-13
SLIDE 13

Why Standards?

slide-14
SLIDE 14

Mélange of RPC

slide-15
SLIDE 15

Microservice Interactions

µService Client uService µService µService µService

µService

µService

slide-16
SLIDE 16

Failure: When not If

slide-17
SLIDE 17

– Managers everywhere

“Is it fixed yet?”

Mean Time to Detection (MTTD) Mean Time to Repair (MTTR)

slide-18
SLIDE 18

N Flavors of RPC

µService Client uService µService µService µService

µService

µService

slide-19
SLIDE 19

One Standard RPC

µService Client uService µService µService µService

µService

µService

slide-20
SLIDE 20

Benefits of Standardizing

Consistency Leverage Interoperability Quality Support

RPC Discovery Registration Runtime OS Configuration Metrics Logging Tracing Dashboards Alerts Stream Processing

slide-21
SLIDE 21

But I'm a Snowflake...

Freedom Responsibility &

Off-road Innovation Reintegrate Burden Vacuum New

slide-22
SLIDE 22
  • 2. Preassembled Platform
slide-23
SLIDE 23

Assembly Required

RPC Discovery Registration Runtime OS Configuration Metrics Logging Tracing Dashboards Alerts Stream Processing

slide-24
SLIDE 24

Getting Out of the Blocks

Docs Copy/paste Which versions? Initialization Configuration Missing components Days or weeks

slide-25
SLIDE 25

Velocity Reliability Product Innovation

Not a single line of business logic!

slide-26
SLIDE 26

Assembly Required

RPC Discovery Registration Runtime OS Configuration Metrics Logging Tracing Dashboards Alerts Stream Processing

slide-27
SLIDE 27

Preassembled Platform

RPC Discovery Registration Runtime OS Configuration Metrics Logging Tracing Dashboards Alerts Stream Processing

slide-28
SLIDE 28

Out of the Box

slide-29
SLIDE 29

Component Management

slide-30
SLIDE 30
slide-31
SLIDE 31

Insights

slide-32
SLIDE 32

Application, system, and runtime metrics & logs

slide-33
SLIDE 33

Consistent application, system, and runtime metrics & logs

Reduces MTTD & MTTR

slide-34
SLIDE 34

Configures and Initializes Correctly

BOEING 747-400 NORMAL PROCEDURES CHECKLIST POWER UP / SAFETY CHECK First Officer Captain CIRCUITBREAKERS...………………...CHECKED BATTERY……………...………………….………ON STANDBY POWER………………...………..AUTO HYDRAULIC DEMAND PUMPS….………..…..OFF WINDSHIELD WIPERS....……………….……..OFF ALTERNATE FLAPS AND GEAR…………….OFF GEAR LEVER…….………….....…………..DOWN FLAPS…..……………………………....CHECKED APU….………………………………...….RUNNING ELECTRICAL SYSTEM....……SET/APU AVAIL ON APU BLEED AIR…...……………………………ON ISOLATION VALVES…………………………OPEN PACKS………………...……………………NORMAL PREFLIGHT First Officer Captain EMERGENCY EQUIPMENT…….…….CHECKED FIRE PROTECTION…………...………CHECKED INTERRUPT SWITCHES………………………ON PASSENGER OXYGEN……………..……NORMAL STAB TRIM CUTOUT SWITCHES………….AUTO NAV EQUIPMENT………………………..CHECKED TRANSPONDER…………………………………SET SOURCE SELECTORS…………………...…….SET CLOCKS…………………………………………..SET CRT SELECTORS…………………………NORMAL PFD………………………………………CHECKED ND…………………………………………CHECKED AUTOBRAKES…………………………………RTO EIU SELECTOR………………………………AUTO HDG REFERENCE SWITCH….…………NORMAL FMC MASTER SELECTOR…………………LEFT GROUND PROX SYSTEM……………CHECKED BEFORE STARTING First Officer Captain HYDRAULIC DEMAND PUMPS……………………. .............................................AUTO, AUX (1 AND 4) BRAKE PRESSURE……..……………….NORMAL FUEL QUANTITY………………..…………. ____KG FUEL SYSTEM………………………………….SET X-FEEDS…………………OPEN (1 & 4 CLOSED) SEAT BELTS SIGN………………………………ON NOTOC………………………………….CHECKED SHIPS PAPERS…………………………ON BOARD PERFORMANCE DATA......…CHECKED AND SET V2……………………………….…………………SET

slide-35
SLIDE 35

Versions Updates Compatibility

slide-36
SLIDE 36

Important Questions

slide-37
SLIDE 37

What’s in and out?

slide-38
SLIDE 38

Maintenance vs Convenience

RPC Discovery Registration Runtime OS Configuration Metrics Logging Tracing Dashboards Alerts Stream Processing

Base Platform

slide-39
SLIDE 39

Solution: Layers & Flavors

Base platform Data access Rendering Backend …

slide-40
SLIDE 40

How to Ensure Platform Correctness?

slide-41
SLIDE 41

Test, Test, Test

Unit Integration Functional Cloud

Every PR

slide-42
SLIDE 42

Dog Food with your own Service

slide-43
SLIDE 43

Component Correctness

Lock down versions Test components Updates require PRs Gate Keeper

RPC Discovery Registration Runtime OS Configuration Metrics Logging Tracing Dashboards Alerts Stream Processing

Customers

slide-44
SLIDE 44

How Locked Down is it?

slide-45
SLIDE 45

Tradeoffs

vs

Flexibility Reliability Consistency Support

slide-46
SLIDE 46

Stay on paved path!

slide-47
SLIDE 47

Season to Taste

Config overrides Startup & shutdown hooks Access to 3rd party libs Swap, disable, or configure components Raw component access

slide-48
SLIDE 48

Platform Versioning?

slide-49
SLIDE 49

API Semantic Versioning

1.2.3 ^1.0.0 ~1.3.0

slide-50
SLIDE 50

Use Conventional Changelog

slide-51
SLIDE 51
  • 3. Automation and Tooling
slide-52
SLIDE 52

Ship a Feature

slide-53
SLIDE 53

Development Testing Deployment Operations

Steps

slide-54
SLIDE 54

Development

slide-55
SLIDE 55
slide-56
SLIDE 56

Env bootstrap Integrate tooling & services Run local & cloud

CLI for common dev experience

slide-57
SLIDE 57

Local Development

Live reload Attach debugger localhost

PROD

slide-58
SLIDE 58

Testing

slide-59
SLIDE 59

Testing

slide-60
SLIDE 60

Testing

Preassembled Platform

RPC Discovery Registration Runtime OS Configuration Metrics Logging Tracing Dashboards Alerts Stream Processing

slide-61
SLIDE 61

Preassembled Platform

Pre-Prod

RPC Discovery Registration Runtime OS Configuration Metrics Logging Tracing Dashboards Alerts Stream Processing

slide-62
SLIDE 62

Provide First Class Mocks

RPC Discovery Registration Runtime OS Configuration Metrics Logging Tracing Dashboards Alerts Stream Processing

Preassembled Platform

slide-63
SLIDE 63

Mock Data Generation

RPC

+

slide-64
SLIDE 64

Mock Ownership

RPC Discovery Registration Configuration Metrics Logging Tracing Alerts Stream Processing

slide-65
SLIDE 65

Platform Testing API

  • Just like a runtime API, need a testing API
  • Provide mocks interface for components
  • Gets platform out of the loop for providing mocks
slide-66
SLIDE 66
slide-67
SLIDE 67

Deployment

slide-68
SLIDE 68

“Production is war!”

slide-69
SLIDE 69

Experience Differences

slide-70
SLIDE 70

Deploy and Manage Services

  • Pre-configured pipelines for deployment and rollback
  • Single command deploy to any stack
  • Integration for automated canary analysis
  • Pre-configured autoscaling
slide-71
SLIDE 71

Operations

slide-72
SLIDE 72

Consolidated View

slide-73
SLIDE 73

RPS Latency Error rates CPU Memory

Generated Dashboards & Alerts

slide-74
SLIDE 74

CPU profiling Core dump analysis

Automated Analytics & Tooling

slide-75
SLIDE 75

Our PaaS to Velocity & Reliability

  • 1. Standardized components
  • 2. Preassembled platform
  • 3. Automation and tooling
slide-76
SLIDE 76

The Paved PaaS to Microservices

Yunong Xiao, Principal Software Engineer, Netflix yunong@netflix.com, @yunongx, http://yunong.io

June 26, 2017