Practical, Real-time Centralized Control for CDN-based Live Video - - PowerPoint PPT Presentation
Practical, Real-time Centralized Control for CDN-based Live Video - - PowerPoint PPT Presentation
Practical, Real-time Centralized Control for CDN-based Live Video Delivery Matt Mukerjee , David Naylor, Junchen Jiang, Dongsu Han, Srini Seshan, Hui Zhang Combating Latency in Wide Area Control Planes Centralization can provide major
Combating Latency in Wide Area Control Planes
- Centralization can provide major benefits
- e.g., better performance, reliability,
policy management, …
- Scalability is hard on WAN due to latency
Control Planes in the 4D* Model
CENTRAL CONTROLLER
HTTP Server HTTP Server
*Yan, Hong, et al. "Tesseract: A 4D Network Control Plane." NSDI. Vol. 7. 2007.
DATA DISSEMINATION DISCOVERY
ROUTERS, etc.
DECISION
WAN Control Plane Latency
CENTRAL CONTROLLER
HTTP Server HTTP Server DATA DISSEMINATION DISCOVERY
ROUTERS, etc.
DECISION
WAN Control Plane Latency
CENTRAL CONTROLLER
HTTP Server HTTP Server DATA DISSEMINATION DISCOVERY
ROUTERS, etc.
DECISION
WAN Control Plane Latency
CENTRAL CONTROLLER
HTTP Server HTTP Server DATA DISSEMINATION DISCOVERY
ROUTERS, etc.
DECISION
WAN Control Plane Latency
CENTRAL CONTROLLER
HTTP Server HTTP Server DATA DISSEMINATION DISCOVERY
ROUTERS, etc.
DECISION
WAN Problems and Decision Planes
Traffic Engineering Live Video Delivery Solve with LP Solve with ILP? High Latency Decision Plane! Low Latency Decision Plane
Attacking Decision Plane Latency
Central Optimization Distributed Control Quality and cost management Responsiveness to joins and failures Hybrid Control
Outline
Centralized Control Distributed Control Problems with Live Video Today Putting it all Together Hybrid Control
CDN Live Video Delivery Background
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
Video Requests
HTTP GET HTTP RESPONSE Video 2 Video 2 Legend Video 1 Requests: Video 1 Responses:
CDN Live Video Delivery Background
A B
Video Sources
E F G
Edge Clusters DNS
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
1K
Link Capacity
2K 3K 200 750 2K 300 500 300 750 700
Link Cost
100 100 120 25 20 15 15 1 10 1
CDN Live Video Delivery Background
A B
Video Sources
E F G
Edge Clusters DNS
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
1K
Link Capacity
2K 3K 200 750 2K 300 500 300 750 700
Link Cost
100 100 120 25 20 15 15 1 10 1
Objective: Maximize service quality & Minimize delivery cost
Outline
Centralized Control Distributed Control Problems with Live Video Today Putting it all Together Hybrid Control
Motivating Centralized Optimization
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
1K
Link Capacity
2K 3K 200 750 2K 300 500 300 750 700
300 200
DNS
Motivating Centralized Optimization
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
1K
Link Capacity
2K 3K 200 750 2K 300 500 300 750 700
300 200 300
DNS Congestion!
Motivating Centralized Optimization
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
1K
Link Capacity
2K 3K 200 750 2K 300 500 300 750 700
300 200 300 200
DNS
Motivating Centralized Optimization
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
1K
Link Capacity
2K 3K 200 750 2K 300 500 300 750 700
300 200 300 200
DNS
Needs global view to coordinate videos and network resources
Motivating Centralized Optimization
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
1K
Link Capacity
2K 3K 200 750 2K 300 500 300 750 700
300 200 300 200
DNS
Motivating Centralized Optimization
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
1K
Link Capacity
2K 3K 200 750 2K 300 500 300 750 700
300 200 300 200
Central Controller
DELIVERY COST
Solving Centralized Optimization
SERVICE QUALITY DON’T EXCEED LINK CAPACITY SENDER MUST HAVE RECEIVED VIDEO
MAXIMIZE MINIMIZE SUBJECT TO
∀ ∈ ∈ ∈ { } ∀l ∈ L : P
- Bitrate(o) · Servesl,o ≤ Capacity(l)
∀l ∈ L, o ∈ O : P
l0∈InLinks(l) Servesl0,o ≥ Servesl,o
P − · P
l∈L,o∈O
· ·
l,o
subject to: ∀l ∈ L, o ∈ O : Servesl,o ∈ {0, 1} P max ws · P
l∈L AS ,o∈O Priorityo · Requestl,o · Servesl,o
− wc · P
l∈L,o∈O Cost(l) · Bitrate(o) · Servesl,o
subject to:
Solving Centralized Optimization
SERVICE QUALITY DELIVERY COST DON’T EXCEED LINK CAPACITY SENDER MUST HAVE RECEIVED VIDEO
P − · P
∈
subject to: ∀ ∈ ∈ max −
Centralized Optimization
2 4 6 8 10 # of Videos (Thousands) 1000 2000 3000 4000 5000 6000 7000 8000
- Avg. Bitrate (Kbps)
Optimal CDN
Service Quality Delivery Cost
(per request)
CDN
2.0x
OPTIMAL
1.0x
Simulation using Conviva traces, modeling user-generated content
Simulation using Conviva traces, modeling large sports events
Effects of Latency in Decision Plane
50 100 150 200 # of videos 5 10 15 20 25 Join Time (Seconds)
Light Load
- Med. Load
- Hvy. Load
Fully Centralized
Slow join times! Experiments on EC2 nodes with a centralized controller at CMU across the Internet
Problems with Centralization
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
800
Central Controller Legend Video 1 Data Requests: Video 1 Control Traffic: The Internet HIGH LATENCY HIGH LATENCY
Outline
Centralized Control Distributed Control Problems with Live Video Today Putting it all Together Hybrid Control Slow join times
Alternate Approach: Distributed
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
800
Central Controller Legend Video 1 Data Requests: Link Capacity
2K 3K 200 2K 800 800 500 300 750 700 1K 1K
Video 1 Responses:
800
?
Alternate Approach: Distributed
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
800
Central Controller Legend Video 1 Data Requests: Link Capacity
2K 3K 200 2K 2K 800 500 300 750 700 1K 1K
Video 1 Responses:
800
?
Build “distance-to-video” tables at each cluster, for each video
Alternate Approach: Distributed
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
800
Central Controller Legend Video 1 Data Requests: Link Capacity
2K 3K 200 2K 800 800 500 300 750 700 1K 1K
Video 1 Responses:
800
? DISTANCE AT CLUSTER F VIDEO 1: 1; (B, 2K)
Alternate Approach: Distributed
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
800
Central Controller Legend Video 1 Data Requests: Link Capacity
2K 3K 200 2K 800 800 500 300 750 700 1K 1K
Video 1 Responses:
800
? DISTANCE AT CLUSTER F VIDEO 1: VIA C: 2; (B, 1K) 1; (B, 2K) # OF HOPS TO VIDEO PATH BOTTLENECK
Alternate Approach: Distributed
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
800
Central Controller Legend Video 1 Data Requests: Link Capacity
2K 3K 200 2K 800 800 500 300 750 700 1K 1K
Video 1 Responses:
800
? DISTANCE AT CLUSTER F VIDEO 1: VIA C: 2; (B, 1K) VIA D: 1; (D, 800) 1; (B, 2K)
Alternate Approach: Distributed
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
800
Central Controller Legend Video 1 Data Requests: Link Capacity
2K 3K 200 2K 800 800 500 300 750 700 1K 1K
Video 1 Responses:
800
? DISTANCE AT CLUSTER F VIDEO 1: VIA C: 2; (B, 1K) VIA D: 1; (D, 800) 1; (B, 2K) PICK SHORTEST PATH WITH ENOUGH CAPACITY
Alternate Approach: Distributed
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
800
Central Controller Legend Video 1 Data Requests: Link Capacity
2K 3K 200 2K 800 800 500 300 750 700 1K 1K
Video 1 Responses:
800 800
DISTANCE AT CLUSTER F VIDEO 1: VIA C: 2; (B, 1K) VIA D: 1; (D, 800) PICK SHORTEST PATH WITH ENOUGH CAPACITY
Alternate Approach: Distributed
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
800
Central Controller Legend Video 1 Data Requests: Link Capacity
2K 3K 200 2K 800 800 500 300 750 700 1K 1K
Video 1 Responses:
800 800
DISTANCE AT CLUSTER F VIDEO 1: VIA C: 2; (B, 1K) VIA D: 1; (D, 800)
Distributed decisions fast (ms) but sub-optimal
PICK SHORTEST PATH WITH ENOUGH CAPACITY
Alternate Approach: Distributed
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
800
Central Controller Legend Video 1 Data Requests: Link Capacity
2K 3K 200 2K 800 800 500 300 750 700 1K 1K
Video 1 Responses:
800 800
DISTANCE AT CLUSTER F VIDEO 1: VIA C: 2; (B, 1K) VIA D: 1; (D, 800)
Combine approaches? “Hybrid Control”
PICK SHORTEST PATH WITH ENOUGH CAPACITY
Outline
Centralized Control Distributed Control Problems with Live Video Today Putting it all Together Hybrid Control Slow join times Low bitrate
Combining Approaches: Hybrid
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
800
Legend Video 1 Data Requests:
2K 3K 200 2K 800 800 500 300 750 700 1K 1K
Video 1 Responses:
800
? Central Controller The Internet HIGH LATENCY HIGH LATENCY
Combining Approaches: Hybrid
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
800
Legend Video 1 Data Requests:
2K 3K 200 2K 800 800 500 300 750 700 1K 1K
Video 1 Responses:
800
Central Controller The Internet HIGH LATENCY HIGH LATENCY
800
Combining Approaches: Hybrid
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
800
Legend Video 1 Data Requests:
2K 3K 200 2K 800 800 500 300 750 700 1K 1K
Video 1 Responses:
800
Central Controller The Internet HIGH LATENCY HIGH LATENCY
800
Video 1 Control Traffic:
- Avoid bad control loop
interactions
- Forwarding loops
- Always forward requests upwards
- State transitions
- Versioning and “shadow FIBS”
Challenges of Hybrid Control
TRIVIAL PRIOR WORK CHALLENGING
Challenges of Hybrid Control
CHALLENGING
- 1. Centralized decision has priority
- 2. Distributed uses residual after centralized
- 3. Distributed has no impact on current/future
centralized decisions
- 4. Distributed’s changes don’t propagate
- Avoid bad control loop
interactions
50 100 150 200 # of videos 5 10 15 20 25 Join Time (Seconds)
Light Load
- Med. Load
- Hvy. Load
Hybrid Control Fully Centralized Fully Distributed
Hybrid Control and Responsiveness
Slow join times! Experiments on EC2 nodes with a centralized controller at CMU across the Internet
50 100 150 200 # of videos 5 10 15 20 25 Join Time (Seconds)
Light Load
- Med. Load
- Hvy. Load
Hybrid Control Fully Centralized Fully Distributed
Hybrid Control and Responsiveness
Slow join times! Experiments on EC2 nodes with a centralized controller at CMU across the Internet Not stable
50 100 150 200 # of videos 5 10 15 20 25 Join Time (Seconds)
Light Load
- Med. Load
- Hvy. Load
Hybrid Control Fully Centralized Fully Distributed
Hybrid Control and Responsiveness
Slow join times! Experiments on EC2 nodes with a centralized controller at CMU across the Internet Not stable Great join times and more stable
Conclusion
- We present a possible solution for
combating decision plane latency
Central Optimization Distributed Control Quality and cost management Responsiveness to joins and failures Hybrid Control
Conclusion
Traffic Engineering Live Video Delivery Solve with LP Solve with ILP? Solve with X? Solve with X?
Practical, Real-time Centralized Control for CDN-based Live Video Delivery
Matt Mukerjee, David Naylor, Junchen Jiang, Dongsu Han, Srini Seshan, Hui Zhang
Backup slides…
Problems with Traffic Engineering
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
1K
Link Capacity
2K 3K 200 750 2K 300 500 300 750 700
300 200 1.5K 1.5K
Even Split (1K)
300 200
Problems with Traffic Engineering
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
1K
Link Capacity
2K 3K 200 750 2K 300 500 300 750 700
300 200 1.5K 1.5K
Uneven Split (1.5K / 500)
200 1.5K
Distributed: Example of Sub-optimal
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
800
Legend Video 1 Data Requests: Link Capacity
2K 3K 200 2K 2K 800 500 300 750 700 1K 1K
Video 1 Responses:
800 800
Wasting bandwidth
Distributed: Example of Sub-optimal
A B
Video Sources
E F G
Edge Clusters
C D
Reflector Clusters
Control ▶︎ ◀ Data
H I J
Clients
800
Legend Video 1 Data Requests: Link Capacity
2K 3K 200 2K 2K 800 500 300 750 700 1K 1K
Video 1 Responses:
800 800
Coordination difficult without centralization
Trace-Driven Eval
- 3 Traces
- Avg Day: raw trace of music video provider
- Large Event: synthesized basketball game
- Heavy Tail: synthesized twitch/ustream like
workload
- 4 Systems
- Everything Everywhere: all vids to all servers
- Overlay Multicast: globally optimal; no coordination
- CDN: greedy distribution scheme w/ DNS
- VDN: our system
Trace-Driven Eval
Existing Solutions
- Traffic Engineering (SWAN, B4, …)
- Works on aggregates at coarse timescales
- Overlay Multicast (Overcast, Bullet, …)
- Not designed for coordinating across streams
- Modern CDNs
- Previous work shows a centralized system