Practical, Real-time Centralized Control for CDN-based Live Video - - PowerPoint PPT Presentation

practical real time centralized control for cdn based
SMART_READER_LITE
LIVE PREVIEW

Practical, Real-time Centralized Control for CDN-based Live Video - - PowerPoint PPT Presentation

Practical, Real-time Centralized Control for CDN-based Live Video Delivery Matt Mukerjee , David Naylor, Junchen Jiang, Dongsu Han, Srini Seshan, Hui Zhang Combating Latency in Wide Area Control Planes Centralization can provide major


slide-1
SLIDE 1

Practical, Real-time Centralized Control for CDN-based Live Video Delivery

Matt Mukerjee, David Naylor, Junchen Jiang, Dongsu Han, Srini Seshan, Hui Zhang

slide-2
SLIDE 2

Combating Latency in Wide Area Control Planes

  • Centralization can provide major benefits
  • e.g., better performance, reliability,

policy management, …

  • Scalability is hard on WAN due to latency
slide-3
SLIDE 3

Control Planes in the 4D* Model

CENTRAL CONTROLLER

HTTP Server HTTP Server

*Yan, Hong, et al. "Tesseract: A 4D Network Control Plane." NSDI. Vol. 7. 2007.

DATA DISSEMINATION DISCOVERY

ROUTERS, etc.

DECISION

slide-4
SLIDE 4

WAN Control Plane Latency

CENTRAL CONTROLLER

HTTP Server HTTP Server DATA DISSEMINATION DISCOVERY

ROUTERS, etc.

DECISION

slide-5
SLIDE 5

WAN Control Plane Latency

CENTRAL CONTROLLER

HTTP Server HTTP Server DATA DISSEMINATION DISCOVERY

ROUTERS, etc.

DECISION

slide-6
SLIDE 6

WAN Control Plane Latency

CENTRAL CONTROLLER

HTTP Server HTTP Server DATA DISSEMINATION DISCOVERY

ROUTERS, etc.

DECISION

slide-7
SLIDE 7

WAN Control Plane Latency

CENTRAL CONTROLLER

HTTP Server HTTP Server DATA DISSEMINATION DISCOVERY

ROUTERS, etc.

DECISION

slide-8
SLIDE 8

WAN Problems and Decision Planes

Traffic Engineering Live Video Delivery Solve with LP Solve with ILP? High Latency Decision Plane! Low Latency Decision Plane

slide-9
SLIDE 9

Attacking Decision Plane Latency

Central Optimization Distributed Control Quality and cost management Responsiveness to joins and failures Hybrid Control

slide-10
SLIDE 10

Outline

Centralized Control Distributed Control Problems with Live Video Today Putting it all Together Hybrid Control

slide-11
SLIDE 11

CDN Live Video Delivery Background

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

Video Requests

HTTP GET HTTP RESPONSE Video 2 Video 2 Legend Video 1 Requests: Video 1 Responses:

slide-12
SLIDE 12

CDN Live Video Delivery Background

A B

Video Sources

E F G

Edge Clusters DNS

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700

Link Cost

100 100 120 25 20 15 15 1 10 1

slide-13
SLIDE 13

CDN Live Video Delivery Background

A B

Video Sources

E F G

Edge Clusters DNS

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700

Link Cost

100 100 120 25 20 15 15 1 10 1

Objective: Maximize service quality & Minimize delivery cost

slide-14
SLIDE 14

Outline

Centralized Control Distributed Control Problems with Live Video Today Putting it all Together Hybrid Control

slide-15
SLIDE 15

Motivating Centralized Optimization

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700

300 200

DNS

slide-16
SLIDE 16

Motivating Centralized Optimization

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700

300 200 300

DNS Congestion!

slide-17
SLIDE 17

Motivating Centralized Optimization

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700

300 200 300 200

DNS

slide-18
SLIDE 18

Motivating Centralized Optimization

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700

300 200 300 200

DNS

Needs global view to coordinate videos and network resources

slide-19
SLIDE 19

Motivating Centralized Optimization

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700

300 200 300 200

DNS

slide-20
SLIDE 20

Motivating Centralized Optimization

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700

300 200 300 200

Central Controller

slide-21
SLIDE 21

DELIVERY COST

Solving Centralized Optimization

SERVICE QUALITY DON’T EXCEED LINK CAPACITY SENDER MUST HAVE RECEIVED VIDEO

MAXIMIZE MINIMIZE SUBJECT TO

slide-22
SLIDE 22

∀ ∈ ∈ ∈ { } ∀l ∈ L : P

  • Bitrate(o) · Servesl,o ≤ Capacity(l)

∀l ∈ L, o ∈ O : P

l0∈InLinks(l) Servesl0,o ≥ Servesl,o

P − · P

l∈L,o∈O

· ·

l,o

subject to: ∀l ∈ L, o ∈ O : Servesl,o ∈ {0, 1} P max ws · P

l∈L AS ,o∈O Priorityo · Requestl,o · Servesl,o

− wc · P

l∈L,o∈O Cost(l) · Bitrate(o) · Servesl,o

subject to:

Solving Centralized Optimization

SERVICE QUALITY DELIVERY COST DON’T EXCEED LINK CAPACITY SENDER MUST HAVE RECEIVED VIDEO

P − · P

subject to: ∀ ∈ ∈ max −

slide-23
SLIDE 23

Centralized Optimization

2 4 6 8 10 # of Videos (Thousands) 1000 2000 3000 4000 5000 6000 7000 8000

  • Avg. Bitrate (Kbps)

Optimal CDN

Service Quality Delivery Cost

(per request)

CDN

2.0x

OPTIMAL

1.0x

Simulation using Conviva traces, modeling user-generated content

Simulation using Conviva traces, modeling large sports events

slide-24
SLIDE 24

Effects of Latency in Decision Plane

50 100 150 200 # of videos 5 10 15 20 25 Join Time (Seconds)

Light Load

  • Med. Load
  • Hvy. Load

Fully Centralized

Slow join times! Experiments on EC2 nodes with a centralized controller at CMU across the Internet

slide-25
SLIDE 25

Problems with Centralization

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Video 1 Control Traffic: The Internet HIGH LATENCY HIGH LATENCY

slide-26
SLIDE 26

Outline

Centralized Control Distributed Control Problems with Live Video Today Putting it all Together Hybrid Control Slow join times

slide-27
SLIDE 27

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800

?

slide-28
SLIDE 28

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 2K 800 500 300 750 700 1K 1K

Video 1 Responses:

800

?

Build “distance-to-video” tables at each cluster, for each video

slide-29
SLIDE 29

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800

? DISTANCE AT CLUSTER F VIDEO 1: 1; (B, 2K)

slide-30
SLIDE 30

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800

? DISTANCE AT CLUSTER F VIDEO 1: VIA C: 2; (B, 1K) 1; (B, 2K) # OF HOPS TO VIDEO PATH BOTTLENECK

slide-31
SLIDE 31

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800

? DISTANCE AT CLUSTER F VIDEO 1: VIA C: 2; (B, 1K) VIA D: 1; (D, 800) 1; (B, 2K)

slide-32
SLIDE 32

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800

? DISTANCE AT CLUSTER F VIDEO 1: VIA C: 2; (B, 1K) VIA D: 1; (D, 800) 1; (B, 2K) PICK SHORTEST PATH WITH ENOUGH CAPACITY

slide-33
SLIDE 33

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800 800

DISTANCE AT CLUSTER F VIDEO 1: VIA C: 2; (B, 1K) VIA D: 1; (D, 800) PICK SHORTEST PATH WITH ENOUGH CAPACITY

slide-34
SLIDE 34

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800 800

DISTANCE AT CLUSTER F VIDEO 1: VIA C: 2; (B, 1K) VIA D: 1; (D, 800)

Distributed decisions fast (ms) but sub-optimal

PICK SHORTEST PATH WITH ENOUGH CAPACITY

slide-35
SLIDE 35

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800 800

DISTANCE AT CLUSTER F VIDEO 1: VIA C: 2; (B, 1K) VIA D: 1; (D, 800)

Combine approaches? “Hybrid Control”

PICK SHORTEST PATH WITH ENOUGH CAPACITY

slide-36
SLIDE 36

Outline

Centralized Control Distributed Control Problems with Live Video Today Putting it all Together Hybrid Control Slow join times Low bitrate

slide-37
SLIDE 37

Combining Approaches: Hybrid

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Legend Video 1 Data Requests:

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800

? Central Controller The Internet HIGH LATENCY HIGH LATENCY

slide-38
SLIDE 38

Combining Approaches: Hybrid

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Legend Video 1 Data Requests:

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800

Central Controller The Internet HIGH LATENCY HIGH LATENCY

800

slide-39
SLIDE 39

Combining Approaches: Hybrid

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Legend Video 1 Data Requests:

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800

Central Controller The Internet HIGH LATENCY HIGH LATENCY

800

Video 1 Control Traffic:

slide-40
SLIDE 40
  • Avoid bad control loop 


interactions

  • Forwarding loops
  • Always forward requests upwards
  • State transitions
  • Versioning and “shadow FIBS”

Challenges of Hybrid Control

TRIVIAL PRIOR WORK CHALLENGING

slide-41
SLIDE 41

Challenges of Hybrid Control

CHALLENGING

  • 1. Centralized decision has priority
  • 2. Distributed uses residual after centralized
  • 3. Distributed has no impact on current/future

centralized decisions

  • 4. Distributed’s changes don’t propagate
  • Avoid bad control loop 


interactions

slide-42
SLIDE 42

50 100 150 200 # of videos 5 10 15 20 25 Join Time (Seconds)

Light Load

  • Med. Load
  • Hvy. Load

Hybrid Control Fully Centralized Fully Distributed

Hybrid Control and Responsiveness

Slow join times! Experiments on EC2 nodes with a centralized controller at CMU across the Internet

slide-43
SLIDE 43

50 100 150 200 # of videos 5 10 15 20 25 Join Time (Seconds)

Light Load

  • Med. Load
  • Hvy. Load

Hybrid Control Fully Centralized Fully Distributed

Hybrid Control and Responsiveness

Slow join times! Experiments on EC2 nodes with a centralized controller at CMU across the Internet Not stable

slide-44
SLIDE 44

50 100 150 200 # of videos 5 10 15 20 25 Join Time (Seconds)

Light Load

  • Med. Load
  • Hvy. Load

Hybrid Control Fully Centralized Fully Distributed

Hybrid Control and Responsiveness

Slow join times! Experiments on EC2 nodes with a centralized controller at CMU across the Internet Not stable Great join times and more stable

slide-45
SLIDE 45

Conclusion

  • We present a possible solution for

combating decision plane latency

Central Optimization Distributed Control Quality and cost management Responsiveness to joins and failures Hybrid Control

slide-46
SLIDE 46

Conclusion

Traffic Engineering Live Video Delivery Solve with LP Solve with ILP? Solve with X? Solve with X?

slide-47
SLIDE 47

Practical, Real-time Centralized Control for CDN-based Live Video Delivery

Matt Mukerjee, David Naylor, Junchen Jiang, Dongsu Han, Srini Seshan, Hui Zhang

slide-48
SLIDE 48

Backup slides…

slide-49
SLIDE 49

Problems with Traffic Engineering

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700

300 200 1.5K 1.5K

Even Split (1K)

300 200

slide-50
SLIDE 50

Problems with Traffic Engineering

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700

300 200 1.5K 1.5K

Uneven Split (1.5K / 500)

200 1.5K

slide-51
SLIDE 51

Distributed: Example of Sub-optimal

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 2K 800 500 300 750 700 1K 1K

Video 1 Responses:

800 800

Wasting bandwidth

slide-52
SLIDE 52

Distributed: Example of Sub-optimal

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 2K 800 500 300 750 700 1K 1K

Video 1 Responses:

800 800

Coordination difficult without centralization

slide-53
SLIDE 53

Trace-Driven Eval

  • 3 Traces
  • Avg Day: raw trace of music video provider
  • Large Event: synthesized basketball game
  • Heavy Tail: synthesized twitch/ustream like

workload

  • 4 Systems
  • Everything Everywhere: all vids to all servers
  • Overlay Multicast: globally optimal; no coordination
  • CDN: greedy distribution scheme w/ DNS
  • VDN: our system
slide-54
SLIDE 54

Trace-Driven Eval

slide-55
SLIDE 55

Existing Solutions

  • Traffic Engineering (SWAN, B4, …)
  • Works on aggregates at coarse timescales
  • Overlay Multicast (Overcast, Bullet, …)
  • Not designed for coordinating across streams
  • Modern CDNs
  • Previous work shows a centralized system

could greatly improve user experience but would be difficult to design over Internet