How Netflix directs 1/3rd of Haley Tucker QCon San Francisco - - PowerPoint PPT Presentation

how netflix directs 1 3rd of
SMART_READER_LITE
LIVE PREVIEW

How Netflix directs 1/3rd of Haley Tucker QCon San Francisco - - PowerPoint PPT Presentation

How Netflix directs 1/3rd of Haley Tucker QCon San Francisco Mohit Vora Nov 16, 2015 Playback Overview DATA PLANE NETFLIX STREAM DEVICE (CDN) CONTROL PLANE Project 366 #59; 280212 Days Gone By..., CC BY-SA, Pete 2012, Flickr VIDEO


slide-1
SLIDE 1

How Netflix directs 1/3rd of

Haley Tucker Mohit Vora

QCon

San Francisco Nov 16, 2015

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4

Playback Overview

slide-5
SLIDE 5

DATA PLANE

(CDN)

CONTROL PLANE

STREAM NETFLIX DEVICE

slide-6
SLIDE 6
slide-7
SLIDE 7

Project 366 #59; 280212 Days Gone By..., CC BY-SA, Pete 2012, Flickr

slide-8
SLIDE 8

AUDIO VIDEO TEXT

STREAMS

slide-9
SLIDE 9

How do we build a streaming “tape”?

slide-10
SLIDE 10

Determine the preferred experience

DEVICE TITLE CONNECTIONS COUNTRY NETWORK

Broadband - wired or wifi Cellular - Edge, 3G, LTE, ...

CUSTOMER

slide-11
SLIDE 11

That’s exactly what I want ...now where can I get it?

slide-12
SLIDE 12

Point the device to appropriate locations

Steering

slide-13
SLIDE 13

GENERATE PLAYBACK MANIFEST PLAYBACK MANIFEST

PLAYBACK MANIFEST

slide-14
SLIDE 14

Uh-oh, the content is encrypted!

Keymaster, CC BY-SA, Sean McGrath 2007, Flickr

slide-15
SLIDE 15

LICENSE

LICENSE

slide-16
SLIDE 16

And...Action!

slide-17
SLIDE 17

SESSION (START, STOP, PAUSE, RESUME, KEEPALIVE)

SESSION EVENTS

slide-18
SLIDE 18

LICENSE PLAYBACK MANIFEST GENERATE PLAYBACK MANIFEST SESSION (START, STOP, PAUSE, RESUME, KEEPALIVE)

PLAYBACK LIFECYCLE

slide-19
SLIDE 19
slide-20
SLIDE 20

Data Plane

(CDN)

slide-21
SLIDE 21

What is a Content Delivery Network?

slide-22
SLIDE 22

Open Connect

A NETFLIX ORIGINAL

slide-23
SLIDE 23

CONTENT RANK B Y T E S S T R E A M E D

PREDICTABLE VIEWING PATTERNS

slide-24
SLIDE 24

FILLING WHEN YOU SLEEP

Dreaming…, CC BY-SA, Eleni Boulsaiki 2009, Flickr

slide-25
SLIDE 25

FILLING WHEN YOU SLEEP

Open Connect

A NETFLIX ORIGINAL

READ XOR WRITE

ONE WAY, CC BY-SA, Kenny Louie 2010, Flickr

slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31

Content Delivery Mechanisms

slide-32
SLIDE 32

DATA PLANE

(CDN)

CONTROL PLANE

STREAM NETFLIX DEVICE

slide-33
SLIDE 33

STREAM ISP DATA CENTER

ISP ROUTER

NETFLIX DEVICE

slide-34
SLIDE 34

STREAM ISP DATA CENTER

ISP ROUTER

NETFLIX DEVICE

ISP CO-LOCATION

slide-35
SLIDE 35

STREAM ISP DATA CENTER

ISP ROUTER

NETFLIX DEVICE

slide-36
SLIDE 36

STREAM ISP DATA CENTER NETFLIX DEVICE IXP DATA CENTER

NFLX ROUTER ISP ROUTER ISP ROUTER

NETFLIX

slide-37
SLIDE 37

STREAM ISP DATA CENTER NETFLIX DEVICE IXP DATA CENTER

NFLX ROUTER ISP ROUTER ISP ROUTER

NETFLIX

slide-38
SLIDE 38

STREAM ISP DATA CENTER NETFLIX DEVICE IXP DATA CENTER

NFLX ROUTER ISP ROUTER ISP ROUTER

IXP INTERCONNECTION

NETFLIX

slide-39
SLIDE 39

Control Plane

slide-40
SLIDE 40

OPEN CONNECT

STREAM NETFLIX DEVICE CDN CONTROL PLANE DEVICE CONTROL PLANE

DON’T KEEP SECRETS

Network Proximity Content Positioning Load Distribution

slide-41
SLIDE 41

Network Proximity

Social Network in a Course, CC BY-SA, Hans Põldoja 2010, Flickr

slide-42
SLIDE 42

By Specification?

slide-43
SLIDE 43

By Specification?

Doesn’t scale

slide-44
SLIDE 44

Border Gateway Protocol

TAKEAWAY

BGP ROUTE 175.231.128.0/24

(+ proximity attributes)

Use BGP

slide-45
SLIDE 45

ISP2 DATA CENTER

ISP2 BGP ROUTES

CONTROL PLANE IXP DATA CENTER

ISP1 BGP ROUTES

ISP1 DATA CENTER

ISP1 NFLX

BGP ROUTE 175.231.128.0/24

(+ proximity attributes)

slide-46
SLIDE 46

Content Positioning

slide-47
SLIDE 47

LOCALIZE TRAFFIC

ISP DATA CENTER SERVE CACHE MISS

slide-48
SLIDE 48

HOW DO WE DETERMINE WHAT CONTENT WILL BE POPULAR TOMORROW?

slide-49
SLIDE 49

CHANGING CATALOG

slide-50
SLIDE 50

EVOLVING MEMBER TASTES

slide-51
SLIDE 51

MINIMIZE FILL CHURN

ISP DATA CENTER OFF PEAK FILL

slide-52
SLIDE 52

USE HISTORICAL DATA

CONTENT RANK

B Y T E S S T R E A M E D

bytesStreamed/bytesStored

slide-53
SLIDE 53

IS ONE DAY OF HISTORY ENOUGH?

slide-54
SLIDE 54

EXPONENTIALLY WEIGHTED MOVING AVERAGE

WEIGHT DAYS AGO 10 20 30 40

  • = 0.9

TAKEAWAY Weigh Recent Data Higher

slide-55
SLIDE 55

HOW SHOULD CONTENT BE ALLOCATED?

slide-56
SLIDE 56

MILLIONS OF FILES THOUSANDS OF SERVERS HOW SHOULD CONTENT BE ALLOCATED?

slide-57
SLIDE 57

SVR4 SVR2 SVR1

SVR3

FILE1 FILE3 FILE1

TAKEAWAY

ALLOCATE MULTIPLE REPLICAS RESILIENT TO CLUSTER CHANGES REPEATABLE

Consistent Hashing

slide-58
SLIDE 58

ISP2 DATA CENTER

WHAT TO FILL?

CONTROL PLANE IXP DATA CENTER

WHERE TO FILL FROM?

ISP1 DATA CENTER S3

FILL OVER HTTP

slide-59
SLIDE 59

Load Distribution

slide-60
SLIDE 60

CONTENT RANK BYTES STREAMED

LOTS OF THROUGHPUT LOTS OF STORAGE

CONTENT WITH CONFLICTING CONSTRAINTS

slide-61
SLIDE 61

SSD BASED SPINNING DISK BASED

WITHIN CLUSTERS ON EACH SERVER

MEMORY

CONTENT RANK BYTES STREAMED

SSD S P I N N I N G D I S K

TAKEAWAY Tier Infrastructure

slide-62
SLIDE 62

ACROSS SERVERS WITHIN CLUSTERS

BALANCE BALANCE

ACROSS EQUIDISTANT CLUSTERS HOW DO WE BALANCE LOAD?

slide-63
SLIDE 63

OPEN CONNECT

NETFLIX DEVICE CDN CONTROL PLANE DEVICE CONTROL PLANE LOAD BALANCER STREAM

slide-64
SLIDE 64

USING CONTENT DISTRIBUTION HOW DO WE BALANCE LOAD?

slide-65
SLIDE 65

FLIP A COIN AND WHEN WE HAVE EQUALLY ATTRACTIVE LOCATIONS TO SERVE FROM –

slide-66
SLIDE 66

INCIDENT LOAD S Y S T E M M E T R I C S

MAX

INSANE SANE

HOW DO WE LOAD SERVERS OPTIMALLY?

slide-67
SLIDE 67

… AMIDST EVER CHANGING INTERNET WEATHER

slide-68
SLIDE 68

TRAFFIC t

… AND DAILY TRAFFIC EBBS AND FLOWS

slide-69
SLIDE 69

+

SERVE STREAMS FEEDBACK

  • TRAFFIC

EFFECT ON SYSTEM METRICS CONTROL

WE INTRODUCE A FEEDBACK LOOP

slide-70
SLIDE 70

TAKEAWAY PID CONTROLLER

slide-71
SLIDE 71

TAKEAWAY PID CONTROLLER

Process Variable Set Point Control Variable Current RPM Desired RPM Input Voltage System Metrics System Metrics Max Controlled Traffic DC MOTOR

slide-72
SLIDE 72

TAKEAWAY PID CONTROLLER

Process Variable Set Point Control Variable System Metrics System Metrics Max Controlled Traffic Current RPM Desired RPM Input Voltage LOADING SERVERS

slide-73
SLIDE 73

ISP2 DATA CENTER

CONTROL TO 80%

CONTROL PLANE IXP DATA CENTER

NO CONTROL

ISP1 DATA CENTER

0.0 < CONTROL VAR < 1.0

slide-74
SLIDE 74

TRAFFIC t NEXT HOP

TRAFFIC SHIFTS TO NEXT HOP LOCATION

slide-75
SLIDE 75

Steering

slide-76
SLIDE 76

STREAM NETFLIX DEVICE CDN CONTROL PLANE PLAYBACK SERVICES STEERING

Got URLs for f1, f2, …, fn? Yes, here’s the URLs PROXIMITY HEALTH CONTENT

CASS

KAFKA

OPEN CONNECT

slide-77
SLIDE 77

Architecture Evolution

5 CHALLENGES

slide-78
SLIDE 78

API

STEERING SESSION MANIFEST DRM LICENSE

How did we evolve from here...

slide-79
SLIDE 79

API

STEERING SESSION MANIFEST DRM LICENSE CLIENT SCRIPTS SERVICE LAYER

RULES INSIGHTS

...to here.

5 SOLUTIONS

CACHE

slide-80
SLIDE 80

DEVICE CUSTOMER TITLE NETWORK

Broadband - wired or wifi Cellular - Edge, 3G, LTE, ...

CONNECTIONS COUNTRY

High dimensionality

CHALLENGE

slide-81
SLIDE 81
slide-82
SLIDE 82

How can we quickly alter the playback experience in a targeted manner?

slide-83
SLIDE 83

ALL STREAMS FOR CONTENT ENGINE

RULES

BEST STREAMS FOR SESSION

Stream Filtering USE CASE

slide-84
SLIDE 84

EXAMPLE RULES

slide-85
SLIDE 85

ENGINE

CONFIGURATION MANAGEMENT UI

UPDATING RULES

TOPIC

PUBLISH

RULES

SUBSCRIBE

slide-86
SLIDE 86
slide-87
SLIDE 87
slide-88
SLIDE 88

Dynamic Business Rules

API STEERING SESSION MANIFEST DRM LICENSE

RULES

TAKEAWAY

slide-89
SLIDE 89

Pinpoint what is broken

CHALLENGE

Haystacks, CC BY-SA, John Pavelka 2008, Flickr

slide-90
SLIDE 90

3:00 AM : Pager goes off

slide-91
SLIDE 91

METRICS AND ALERTING

slide-92
SLIDE 92

OK...error code 105 is elevated. But why?

slide-93
SLIDE 93

Indexed Logging

slide-94
SLIDE 94

Detailed Domain Insights

API STEERING SESSION MANIFEST DRM LICENSE

RULES

INSIGHTS

TAKEAWAY

slide-95
SLIDE 95

Large amount of state

CHALLENGE

slide-96
SLIDE 96

How can we enable faster UIs and low-end devices?

slide-97
SLIDE 97

We introduced a server-side caching tier

MANIFESTS

CUSTOMER A CUSTOMER A CUSTOMER B

slide-98
SLIDE 98

Watch out for resiliency issues!!

Ping Pong project, CC BY-SA, Michael Knowles 2008, Flickr

slide-99
SLIDE 99

API STEERING SESSION MANIFEST DRM LICENSE

RULES INSIGHTS

Reduce client state TAKEAWAY

CACHE

slide-100
SLIDE 100

Managing device protocols

CHALLENGE

Square peg, round hole, CC BY-SA, Simon Law 2006, Flickr

slide-101
SLIDE 101

Can we allow devices to define their

  • wn protocols?
slide-102
SLIDE 102

DYNAMIC SCRIPTING PLATFORM

SESSION LICENSE MANIFEST

XBOX iPHONE HTML5 PLAYER

iphone.groovy JAVA SERVICE LAYER xbox.groovy html5.groovy API

slide-103
SLIDE 103

STEERING SESSION MANIFEST DRM LICENSE

RULES INSIGHTS

Client-driven protocols

API CLIENT SCRIPTS SERVICE LAYER

TAKEAWAY

CACHE

slide-104
SLIDE 104

Enabling high-velocity innovation

CHALLENGE

CC BY-SA, Nathan E Photography 2008, Flickr

slide-105
SLIDE 105

How can we expose new data with the least amount of churn?

slide-106
SLIDE 106

API

MANIFEST Stream

  • Bitrate
  • Framerate
  • Dynamic Data

Stream’

  • Bitrate
  • Dynamic Data

This works from API:

  • stream.getBitrate()
  • stream.getDynamicData().get(“FRAME_RATE”)

Works both ways!

slide-107
SLIDE 107

This works from CLIENT SCRIPT!

  • stream.getDynamicData().get(“BIT_RATE”)
  • stream.getDynamicData().get(“FRAME_RATE”)

CLIENT SCRIPT Stream’’

  • Dynamic Data

Works both ways! API MANIFEST Stream

  • Bitrate
  • Framerate
  • Dynamic Data

Stream’

  • Bitrate
  • Dynamic Data

Works both ways!

slide-108
SLIDE 108

API CLIENT SCRIPTS SERVICE LAYER STEERING SESSION MANIFEST DRM LICENSE

RULES INSIGHTS

Data pass-thru TAKEAWAY

CACHE

slide-109
SLIDE 109

TAKEAWAYS

  • BGP based proximity
  • Tiered Infrastructure
  • PID Controller
  • EWMA for historical data
  • Consistent Hashing
  • Dynamic business rules
  • Detailed domain insights
  • Reduce client state
  • Client-driven protocols
  • Data pass-thru
slide-110
SLIDE 110

TAKEAWAYS

  • BGP based proximity
  • Tiered Infrastructure
  • PID Controller
  • EWMA for historical data
  • Consistent Hashing
  • Dynamic business rules
  • Detailed domain insights
  • Reduce client state
  • Client-driven protocols
  • Data pass-thru

Questions?

Haley Tucker @hwilson1204 Mohit Vora @mohitvora

slide-111
SLIDE 111

STREAM NETFLIX DEVICE NETFLIX DEVICE STREAM

SPINNING DISK SERVERS SSD SERVERS WHAT TO FILL? WHERE TO FILL FROM?

API CLIENT SCRIPTS SERVICE LAYER CACHE CONTROL

DON’T KEEP SECRETS

STEERING SESSION MANIFEST DRM LICENSE

RULES

CACHE

INSIGHTS

IXP DATA CENTER ISP1 ISP2

ISP2 BGP ROUTES ISP1 BGP ROUTES CONTROL TO 80%

slide-112
SLIDE 112
  • Background image from https://www.flickr.com/photos/centralasian/4099515384, Image was

cropped and red lines and dots were drawn on top, https://creativecommons.org/licenses/by/2.0/.

  • Image from https://www.flickr.com/photos/28705377@N04/4142872268, No modifications made,

https://creativecommons.org/licenses/by/2.0/.

  • Image of cassette is from https://www.flickr.com/photos/comedynose/6939206771, Image was

cropped, https://creativecommons.org/licenses/by/2.0/.

  • Image of speaker is from https://www.flickr.com/photos/av_hire_london/5578975575, No

changes made, https://creativecommons.org/licenses/by/2.0/.

  • Image of television is from https://www.flickr.com/photos/jvcamerica/3660897684/, No changes

made, https://creativecommons.org/licenses/by/2.0/.

  • Image of text is from https://www.flickr.com/photos/dno1967b/5754743006, No changes made,

https://creativecommons.org/licenses/by/2.0/.

  • Background image from https://www.flickr.com/photos/mcgraths/866572532, Image was cropped,

https://creativecommons.org/licenses/by/2.0/.

  • Image from https://www.flickr.com/photos/thatguyfromcchs08/2300190277, Image is dimmed,

https://creativecommons.org/licenses/by/2.0/.

  • Image from https://www.flickr.com/photos/mknowles/3134373590, Image was cropped, https:

//creativecommons.org/licenses/by-sa/2.0/.

Image Attributions