Practical, Real-time Centralized Control for CDN-based Live Video - - PowerPoint PPT Presentation

practical real time centralized control for cdn based
SMART_READER_LITE
LIVE PREVIEW

Practical, Real-time Centralized Control for CDN-based Live Video - - PowerPoint PPT Presentation

Practical, Real-time Centralized Control for CDN-based Live Video Delivery Matt Mukerjee , David Naylor, Junchen Jiang, Dongsu Han, Srini Seshan, Hui Zhang Live Video is Becoming Wildly Popular Commercial sports streams User-generated


slide-1
SLIDE 1

Practical, Real-time Centralized Control for CDN-based Live Video Delivery

Matt Mukerjee, David Naylor, Junchen Jiang, Dongsu Han, Srini Seshan, Hui Zhang

slide-2
SLIDE 2

Live Video is Becoming Wildly Popular

  • Commercial sports streams
  • User-generated streams
slide-3
SLIDE 3

Live Video is Becoming Wildly Popular

  • Commercial sports streams
  • Single World Cup stream = 40% global

Internet traffic

  • User-generated streams (e.g., Twitch)
  • Users watch 150b min of live video per month
  • Amazon buys Twitch for ~$1Billion
slide-4
SLIDE 4

Our Contributions

  • We design a video delivery network (VDN)

to efficiently manage quality and cost, with high responsiveness

Central Optimization Distributed Control Quality and cost management Responsiveness to joins and failures Hybrid Control

slide-5
SLIDE 5

Outline

Centralized Control Distributed Control Problems with Live Video Today Putting it all Together Hybrid Control

slide-6
SLIDE 6

CDN Live Video Delivery Background

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

Video Requests

HTTP GET HTTP RESPONSE Video 2 Video 2 Legend Video 1 Requests: Video 1 Responses:

slide-7
SLIDE 7

CDN Live Video Delivery Background

A B

Video Sources

E F G

Edge Clusters DNS

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700

Link Cost

100 100 120 25 20 15 15 1 10 1

slide-8
SLIDE 8

CDN Live Video Delivery Background

A B

Video Sources

E F G

Edge Clusters DNS

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700

Link Cost

100 100 120 25 20 15 15 1 10 1

Objective: Maximize service quality & Minimize delivery cost

slide-9
SLIDE 9

Problems with CDNs Today

2 4 6 8 10 # of Videos (Thousands) 1000 2000 3000 4000 5000 6000 7000 8000

  • Avg. Bitrate (Kbps)

Optimal CDN

Service Quality

Simulation using Conviva traces, modeling user-generated content

Delivery Cost

Simulation using Conviva traces, modeling large sports events

(per request)

CDN

2.0x

OPTIMAL

1.0x

slide-10
SLIDE 10

QUALITATIVE QUANTITATIVE

Problems with CDNs Today

2 4 6 8 10 # of Videos (Thousands) 1000 2000 3000 4000 5000 6000 7000 8000

  • Avg. Bitrate (Kbps)
Optimal CDN

Service Quality Delivery Cost

CDN

2.0x

OPTIMAL

1.0x

Not Fine-Grained Slow DNS Updates

Videos aggregated into large groups

Can’t push updates DNS entries get cached

slide-11
SLIDE 11

QUALITATIVE QUANTITATIVE

Goals

Service Quality Fine-Grained Control Real-time Response

Per-video Control

Sub-second response to failures and joins

2 4 6 8 10 # of Videos (Thousands) 1000 2000 3000 4000 5000 6000 7000 8000

  • Avg. Bitrate (Kbps)
Optimal CDN

CDN

2.0x

OPTIMAL

1.0x

Room for improvement, but Internet latency / loss

Delivery Cost

slide-12
SLIDE 12

QUALITATIVE QUANTITATIVE

Goals

Service Quality Fine-Grained Control Real-time Response

Per-video Control

Sub-second response to failures and joins

2 4 6 8 10 # of Videos (Thousands) 1000 2000 3000 4000 5000 6000 7000 8000

  • Avg. Bitrate (Kbps)
Optimal CDN

CDN

2.0x

OPTIMAL

1.0x

Room for improvement, but Internet latency / loss

Delivery Cost

Centralization!

[Liu, Xi et. al. A Case for a Coordinated Video Control Plane. SIGCOMM 2012]

slide-13
SLIDE 13

Outline

Centralized Control Distributed Control Problems with Live Video Today Putting it all Together Hybrid Control

slide-14
SLIDE 14

Motivating Centralized Optimization

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700

300 200

DNS

slide-15
SLIDE 15

Motivating Centralized Optimization

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700

300 200 300

DNS Congestion!

slide-16
SLIDE 16

Motivating Centralized Optimization

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700

300 200 300 200

DNS

slide-17
SLIDE 17

Motivating Centralized Optimization

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700

300 200 300 200

DNS

Needs global view to coordinate videos and network resources

slide-18
SLIDE 18

Motivating Centralized Optimization

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700

300 200 300 200

DNS

slide-19
SLIDE 19

Motivating Centralized Optimization

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700

300 200 300 200

Central Controller

slide-20
SLIDE 20

DELIVERY COST

Solving Centralized Optimization

SERVICE QUALITY DON’T EXCEED LINK CAPACITY SENDER MUST HAVE RECEIVED VIDEO

MAXIMIZE MINIMIZE SUBJECT TO

slide-21
SLIDE 21

∀ ∈ ∈ ∈ { } ∀l ∈ L : P

  • Bitrate(o) · Servesl,o ≤ Capacity(l)

∀l ∈ L, o ∈ O : P

l0∈InLinks(l) Servesl0,o ≥ Servesl,o

P − · P

l∈L,o∈O

· ·

l,o

subject to: ∀l ∈ L, o ∈ O : Servesl,o ∈ {0, 1} P max ws · P

l∈L AS ,o∈O Priorityo · Requestl,o · Servesl,o

− wc · P

l∈L,o∈O Cost(l) · Bitrate(o) · Servesl,o

subject to:

Solving Centralized Optimization

SERVICE QUALITY DELIVERY COST DON’T EXCEED LINK CAPACITY SENDER MUST HAVE RECEIVED VIDEO

P − · P

subject to: ∀ ∈ ∈ max −

slide-22
SLIDE 22

Flexibility of Centralized Optimization

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

2K

Link Capacity

2K 2K 2K 2K 1K 2K 1K 800 1K 800

Link Cost

1 1 1 1 1 1 1 1 1 10 1

Central Controller

800 800

?

slide-23
SLIDE 23

Flexibility of Centralized Optimization

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800 900

2K

Link Capacity

2K 2K 2K 2K 1K 2K 1K 800 1K 800

Link Cost

1 1 1 1 1 1 1 1 1 10 1

Video Priority

100 1

Central Controller

800 800 900

slide-24
SLIDE 24

Centralized Optimization

2 4 6 8 10 # of Videos (Thousands) 1000 2000 3000 4000 5000 6000 7000 8000

  • Avg. Bitrate (Kbps)

Optimal CDN

Service Quality Delivery Cost

(per request)

CDN

2.0x

OPTIMAL

1.0x

Simulation using Conviva traces, modeling user-generated content

Simulation using Conviva traces, modeling large sports events

slide-25
SLIDE 25

Centralized Optimization

Delivery Cost

(per request)

CDN

2.0x

VDN

1.0x

2 4 6 8 10 # of Videos (Thousands) 1000 2000 3000 4000 5000 6000 7000 8000

  • Avg. Bitrate (Kbps)

VDN CDN

Service Quality

Simulation using Conviva traces, modeling user-generated content

Simulation using Conviva traces, modeling large sports events

slide-26
SLIDE 26

Unfortunately… No Free Lunch

50 100 150 200 # of videos 5 10 15 20 25 Join Time (Seconds)

Light Load

  • Med. Load
  • Hvy. Load

Fully Centralized

Slow join times! Experiments on EC2 nodes with a centralized controller at CMU across the Internet

slide-27
SLIDE 27

Problems with Centralization

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Video 1 Control Traffic: The Internet HIGH LATENCY HIGH LATENCY

slide-28
SLIDE 28

Outline

Centralized Control Distributed Control Problems with Live Video Today Putting it all Together Hybrid Control Slow join times

slide-29
SLIDE 29

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800

?

slide-30
SLIDE 30

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 2K 800 500 300 750 700 1K 1K

Video 1 Responses:

800

?

Build “distance-to-video” tables at each cluster, for each video

slide-31
SLIDE 31

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800

? DISTANCE AT CLUSTER F VIDEO 1: 1; (B, 2K)

slide-32
SLIDE 32

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800

? DISTANCE AT CLUSTER F VIDEO 1: VIA C: 2; (B, 1K) 1; (B, 2K) # OF HOPS TO VIDEO PATH BOTTLENECK

slide-33
SLIDE 33

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800

? DISTANCE AT CLUSTER F VIDEO 1: VIA C: 2; (B, 1K) VIA D: 1; (D, 800) 1; (B, 2K)

slide-34
SLIDE 34

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800

? DISTANCE AT CLUSTER F VIDEO 1: VIA C: 2; (B, 1K) VIA D: 1; (D, 800) 1; (B, 2K) PICK SHORTEST PATH WITH ENOUGH CAPACITY

slide-35
SLIDE 35

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800 800

DISTANCE AT CLUSTER F VIDEO 1: VIA C: 2; (B, 1K) VIA D: 1; (D, 800) PICK SHORTEST PATH WITH ENOUGH CAPACITY

slide-36
SLIDE 36

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800 800

DISTANCE AT CLUSTER F VIDEO 1: VIA C: 2; (B, 1K) VIA D: 1; (D, 800)

Distributed decisions fast (ms) but sub-optimal

PICK SHORTEST PATH WITH ENOUGH CAPACITY

slide-37
SLIDE 37

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800 800

DISTANCE AT CLUSTER F VIDEO 1: VIA C: 2; (B, 1K) VIA D: 1; (D, 800)

Combine approaches? “Hybrid Control”

PICK SHORTEST PATH WITH ENOUGH CAPACITY

slide-38
SLIDE 38

Outline

Centralized Control Distributed Control Problems with Live Video Today Putting it all Together Hybrid Control Slow join times Low bitrate

slide-39
SLIDE 39

Hybrid Control

Central Optimization Distributed Control Quality and cost management (minutes) Responsiveness to joins and failures (milliseconds) Hybrid Control

slide-40
SLIDE 40

Challenges of Hybrid Control

  • Forwarding loops
  • Always forward requests upwards
  • State transitions
  • Versioning and “shadow FIBS”
  • Avoid bad control loop 


interactions

TRIVIAL PRIOR WORK CHALLENGING

slide-41
SLIDE 41

Combining Approaches: Hybrid

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Legend Video 1 Data Requests:

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800

? Central Controller The Internet HIGH LATENCY HIGH LATENCY

slide-42
SLIDE 42

Combining Approaches: Hybrid

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Legend Video 1 Data Requests:

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800

Central Controller The Internet HIGH LATENCY HIGH LATENCY

800

slide-43
SLIDE 43

Combining Approaches: Hybrid

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Legend Video 1 Data Requests:

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800

Central Controller The Internet HIGH LATENCY HIGH LATENCY

800

Video 1 Control Traffic:

slide-44
SLIDE 44

Challenges of Hybrid Control

  • Forwarding loops
  • Always forward requests upwards
  • State transitions
  • Versioning and “shadow FIBS”

TRIVIAL PRIOR WORK CHALLENGING

  • Avoid bad control loop 


interactions

slide-45
SLIDE 45

Challenges of Hybrid Control

CHALLENGING

  • 1. Centralized decision has priority
  • 2. Distributed uses residual after centralized
  • 3. Distributed has no impact on current/future

centralized decisions

  • 4. Distributed’s changes don’t propagate
  • Avoid bad control loop 


interactions

slide-46
SLIDE 46

50 100 150 200 # of videos 5 10 15 20 25 Join Time (Seconds)

Light Load

  • Med. Load
  • Hvy. Load

Hybrid Control Fully Centralized Fully Distributed

Hybrid Control and Responsiveness

Slow join times! Experiments on EC2 nodes with a centralized controller at CMU across the Internet

slide-47
SLIDE 47

50 100 150 200 # of videos 5 10 15 20 25 Join Time (Seconds)

Light Load

  • Med. Load
  • Hvy. Load

Hybrid Control Fully Centralized Fully Distributed

Hybrid Control and Responsiveness

Slow join times! Experiments on EC2 nodes with a centralized controller at CMU across the Internet Not stable

slide-48
SLIDE 48

50 100 150 200 # of videos 5 10 15 20 25 Join Time (Seconds)

Light Load

  • Med. Load
  • Hvy. Load

Hybrid Control Fully Centralized Fully Distributed

Hybrid Control and Responsiveness

Slow join times! Experiments on EC2 nodes with a centralized controller at CMU across the Internet Not stable Great join times and more stable

slide-49
SLIDE 49

Outline

Centralized Control Distributed Control Problems with Live Video Today Putting it all Together Hybrid Control Slow join times Low bitrate “Better than both”

slide-50
SLIDE 50

Putting it all Together

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients Logically centralized controller “Local Agent” per cluster

slide-51
SLIDE 51

DISCOVERY CONTROL DISCOVERY CONTROL

Putting it all Together

DISCOVERY CONTROL

CENTRAL CONTROLLER

DISCOVERY CONTROL

LOCAL AGENT DATA PLANE HYBRID CONTROL CENTRALIZED DISTRIBUTED TOPOLOGY AND VIDEO INFO DISTRIBUTION TREES

HTTP Server HTTP Server HTTP Server

slide-52
SLIDE 52

Key Results

  • Trace-driven eval - centralized optimization
  • High quality & low delivery cost? 1.7x / 2x
  • Scalable / fine grain? 10K videos; 2K clusters
  • End-to-end eval - hybrid control
  • Responsive? 200ms
  • More results in paper
  • Operator Control? Failures? Partitions?
slide-53
SLIDE 53

Conclusion

  • VDN presents a new approach for CDN-

based live video delivery

Central Optimization Distributed Control Quality and cost management Responsiveness to joins and failures Hybrid Control

slide-54
SLIDE 54

Practical, Real-time Centralized Control for CDN-based Live Video Delivery

Matt Mukerjee, David Naylor, Junchen Jiang, Dongsu Han, Srini Seshan, Hui Zhang

slide-55
SLIDE 55

Backup slides…

slide-56
SLIDE 56

Problems with Traffic Engineering

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700

300 200 1.5K 1.5K

Even Split (1K)

300 200

slide-57
SLIDE 57

Problems with Traffic Engineering

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700

300 200 1.5K 1.5K

Uneven Split (1.5K / 500)

200 1.5K

slide-58
SLIDE 58

Distributed: Example of Sub-optimal

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 2K 800 500 300 750 700 1K 1K

Video 1 Responses:

800 800

Wasting bandwidth

slide-59
SLIDE 59

Distributed: Example of Sub-optimal

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 2K 800 500 300 750 700 1K 1K

Video 1 Responses:

800 800

Coordination difficult without centralization

slide-60
SLIDE 60

Trace-Driven Eval

  • 3 Traces
  • Avg Day: raw trace of music video provider
  • Large Event: synthesized basketball game
  • Heavy Tail: synthesized twitch/ustream like

workload

  • 4 Systems
  • Everything Everywhere: all vids to all servers
  • Overlay Multicast: globally optimal; no coordination
  • CDN: greedy distribution scheme w/ DNS
  • VDN: our system
slide-61
SLIDE 61

Trace-Driven Eval

slide-62
SLIDE 62

Existing Solutions

  • Traffic Engineering (SWAN, B4, …)
  • Works on aggregates at coarse timescales
  • Overlay Multicast (Overcast, Bullet, …)
  • Not designed for coordinating across streams
  • Modern CDNs
  • Previous work shows a centralized system

could greatly improve user experience but would be difficult to design over Internet