Eliminating Adverse Control Plane Interactions in Independent - - PowerPoint PPT Presentation

eliminating adverse control plane interactions in
SMART_READER_LITE
LIVE PREVIEW

Eliminating Adverse Control Plane Interactions in Independent - - PowerPoint PPT Presentation

Eliminating Adverse Control Plane Interactions in Independent Network Systems Matthew K. Mukerjee Computer Science PhD Thesis Defense May 1st, 2018 Network Control CDN server selection Routing Congestion Control VM migration Network


slide-1
SLIDE 1

Eliminating Adverse Control Plane Interactions in Independent Network Systems

Matthew K. Mukerjee

Computer Science PhD Thesis Defense

May 1st, 2018

slide-2
SLIDE 2

Network Control

VM migration Routing CDN server selection Congestion Control

slide-3
SLIDE 3

Network Control

VM migration Routing CDN server selection Congestion Control

? ? ? ?

Coordination Coordination Coordination Coordination

slide-4
SLIDE 4

Destination Source Which way? Routing: “figure out” best path (periodically computed) Forwarding: data transmission (done per-packet) Control

Control Plane Data Plane

slide-5
SLIDE 5

Controller

Distributed

OSPF

Centralized

SDN

slide-6
SLIDE 6

Controller

Distributed

OSPF

Centralized

SDN

Quick failure response Good at performance

  • ptimization

Bad at performance

  • ptimization

Slow failure response

slide-7
SLIDE 7

Controller

Distributed

OSPF

Centralized

SDN

Quick failure response Good at performance

  • ptimization

Bad at performance

  • ptimization

Slow failure response

slide-8
SLIDE 8

Bad CDN server selection → ISP paying for costly routes

Coordination?

Cheap Expensive

CDN server A CDN server B

App TE + ISP TE

slide-9
SLIDE 9

User decisions TCP decisions Application decisions

Issues? Issues? Issues?

slide-10
SLIDE 10

Categorizing Control Coordination

App TE + ISP TE

Cheap Expensive

BGP + BGP

Expensive Cheap

OSPF ISP C OSPF ISP B OSPF ISP A

BGP Internet-scale Routing (BGP + OSPF) Coflow (App + DC scheduling) +

slide-11
SLIDE 11

Coflow (App + DC scheduling) + BGP + BGP

Expensive Cheap

Categorizing Control Coordination

App TE + ISP TE

OSPF ISP C OSPF ISP B OSPF ISP A

BGP

Cheap Expensive

Internet-scale Routing (BGP + OSPF)

Reaction Priority Ranking Transparency Hierarchical Partitioning

slide-12
SLIDE 12

Control Coordination

Hierarchical Partitioning Transparency

Cofmow

VDN

Internet-scale Routing

VDX

Reaction

App TE + ISP TE

Priority Ranking

BGP + BGP

Etalon

Scenario:

Scalability

Scenario:

Admin

Scenario:

Layering

slide-13
SLIDE 13

Control Coordination

Hierarchical Partitioning Transparency

Cofmow

VDN

Internet-scale Routing

VDX

Reaction

App TE + ISP TE

Priority Ranking

BGP + BGP

Etalon

Scenario:

Scalability

Scenario:

Admin

Scenario:

Layering

slide-14
SLIDE 14

Difficult to scale datacenters with demand

P-FatTree: A multi-channel datacenter network topology. HotNets 2016.

Higher Bandwidth

+

Higher Port Count

CMOS limits…

slide-15
SLIDE 15

Use circuits to build bigger + faster networks!

+

Reconfigurable Datacenter Networks (RDCNs)

Packet Switch Circuit Switch

slide-16
SLIDE 16

Network Scheduling End-to-End Challenges

Circuit Switch

Circuit Switch Design

What existing things break? How do you make use of it? How do you physical build it?

slide-17
SLIDE 17

Packet Network

Packet Switch

RDCN switch design

Challenge: Workloads

Rack 1 Server 1 Server 2 Server M

ToR Switch

Packet Switch Circuit Switch

Rack 2 Server 1 Server 2 Server M

ToR Switch

Rack N Server 1 Server 2 Server M

ToR Switch

… … …

App Demand

Circuit Switch
slide-18
SLIDE 18

Packet Network

Packet Switch

RDCN switch design

Challenge: Workloads

Rack 1 Server 1 Server 2 Server M

ToR Switch

Packet Switch Circuit Switch

Rack 2 Server 1 Server 2 Server M

ToR Switch

Rack N Server 1 Server 2 Server M

ToR Switch

App Demand

Circuit Switch

… … …

slide-19
SLIDE 19

Rack 1

Packet Switch Circuit Switch

Rack 2 Rack N

1 —> 2 1 —> 3 1 —> N 1 —> 2 2 —> 3 2 —> 6 2 —> N 2 —> 5 N —> 4 N —> 1 N —> 2 N —> 7

RDCN scheduling

slide-20
SLIDE 20

Rack 1

Packet Switch Circuit Switch

Rack 2 Rack N

1 —> 2 1 —> 3 1 —> N 1 —> 2 2 —> 3 2 —> 6 2 —> N 2 —> 5 N —> 4 N —> 1 N —> 2 N —> 7

RDCN Scheduling Algorithm (e.g., Solstice)

RDCN scheduling

slide-21
SLIDE 21

Rack 1

Packet Switch Circuit Switch

Rack 2 Rack N

1 —> 2 1 —> 3 1 —> N 1 —> 2 2 —> 3 2 —> 6 2 —> N 2 —> 5 N —> 4 N —> 1 N —> 2 N —> 7

RDCN Scheduling Algorithm (e.g., Solstice) For Circuit Switch For Packet Switch

RDCN scheduling

slide-22
SLIDE 22

1 —> 2 1 —> 3 1 —> N 1 —> 2 2 —> 3 2 —> 6 2 —> N 2 —> 5 N —> 4 N —> 1 N —> 2 N —> 7

RDCN Scheduling Algorithm (e.g., Solstice) For Circuit Switch For Packet Switch

Rack 1

Packet Switch Circuit Switch

Rack 2 Rack N

RDCN scheduling

slide-23
SLIDE 23

Config 1 1 —> 2 2 —> 3 … N —> 4 RECONFIG DELAY

Circuit Schedule:

300μs Time (μs) 20μs

Config 2 1 —> N 2 —> 6 … N —> 2 Config 3 1 —> 3 2 —> N … N —> 1 RECONFIG DELAY RECONFIG DELAY

180μs 20μs 20μs 518μs

1 —> 1 —> 3 1 —> N 1 —> 2 2 —> 3 2 —> 6 2 —> N 2 —> 5 N —> 4 N —> 1 N —> 2 N —> 7 RDCN Scheduling Algorithm (e.g., Solstice) For Circuit Switch For Packet Switch

Rack 1

Packet Switch Circuit Switch

Rack 2 Rack N

RDCN scheduling

slide-24
SLIDE 24

Contributions

End-to-End Challenges

Challenge: Demand Estimation Challenge: Workloads Challenge: BW Fluct.

Solution: Endhost-based Estimation Solution: Dynamic Buffer Resizing Solution: App-specific Modification

>_

Etalon, an RDCN Emulator

slide-25
SLIDE 25

Overview

End-to-End Challenges

Challenge: Demand Estimation Challenge: Workloads Challenge: BW Fluct.

Solution: Endhost-based Estimation Solution: Dynamic Buffer Resizing Solution: App-specific Modification

>_

Etalon, an RDCN Emulator

slide-26
SLIDE 26

Circuit Switch Packet Switch Circuit Switch Sender Receiver High BW Low BW ToR Queue ToR Queue

Challenge: BW Fluct.

Solution: Dynamic Buffer Resizing

TCP and rapid bw fluctuations

Time (μs) BW

slide-27
SLIDE 27

Circuit Switch Packet Switch Sender Receiver High BW Low BW ToR Queue ToR Queue

Challenge: BW Fluct.

Solution: Dynamic Buffer Resizing

TCP and rapid bw fluctuations

Time (μs) BW

slide-28
SLIDE 28

Packet Switch

Sender Receiver Circuit Switch High BW Low BW ToR Queue ToR Queue

Challenge: BW Fluct.

Solution: Dynamic Buffer Resizing

TCP and rapid bw fluctuations

Time (μs) BW

slide-29
SLIDE 29

Packet Switch

Sender Receiver Circuit Switch High BW Low BW ToR Queue ToR Queue

Challenge: BW Fluct.

Solution: Dynamic Buffer Resizing

TCP and rapid bw fluctuations

Time (μs) BW

slide-30
SLIDE 30

Packet Switch

Sender Receiver Circuit Switch High BW Low BW ToR Queue ToR Queue

Challenge: BW Fluct.

Solution: Dynamic Buffer Resizing

Circuit Switch

TCP and rapid bw fluctuations

Time (μs) BW

slide-31
SLIDE 31

Circuit Switch Packet Switch Circuit Switch Sender Receiver High BW Low BW ToR Queue ToR Queue

Challenge: BW Fluct.

Solution: Dynamic Buffer Resizing

TCP and rapid bw fluctuations

Time (μs) BW

slide-32
SLIDE 32

Circuit Switch Packet Switch Circuit Switch Sender Receiver High BW Low BW ToR Queue ToR Queue

Challenge: BW Fluct.

Solution: Dynamic Buffer Resizing

TCP and rapid bw fluctuations

Time (μs) BW Time (μs) BW

slide-33
SLIDE 33

Challenge: BW Fluct.

Solution: Dynamic Buffer Resizing

TCP and rapid bw fluctuations

Time (μs) BW Time (μs) BW

What we want What we get

slide-34
SLIDE 34

Packet Switch

Sender Receiver Circuit Switch High BW Low BW ToR Queue ToR Queue

SMALL

Challenge: BW Fluct.

Solution: Dynamic Buffer Resizing

Latency

TCP and rapid bw fluctuations

Time (μs) BW

slide-35
SLIDE 35

Packet Switch

Sender Receiver Circuit Switch High BW Low BW ToR Queue ToR Queue

BIG

Challenge: BW Fluct.

Solution: Dynamic Buffer Resizing

Latency

TCP and rapid bw fluctuations

Time (μs) BW

slide-36
SLIDE 36

Circuit Switch

TCP and rapid bw fluctuations

Packet Switch Circuit Switch

Sender Receiver ToR Queue ToR Queue High BW Low BW

BIG

Challenge: BW Fluct.

Solution: Dynamic Buffer Resizing

Bandwidth Time (μs) BW How Early?

slide-37
SLIDE 37

99 92 61 39 25 16

Low utilization SMALL BIG

Challenge: BW Fluct.

Solution: Dynamic Buffer Resizing

Static buffers provide good circuit util or latency

Buffer size (packets)

  • Avg. circuit utilization

4 8 16 32 64 128

slide-38
SLIDE 38

SMALL BIG High Latency

Challenge: BW Fluct.

Solution: Dynamic Buffer Resizing

99 92 61 39 25 16

Buffer size (packets)

  • Avg. circuit utilization

4 8 16 32 64 128 Buffer size (packets) Median latency (μs) 48 16 32 64 128

  • 100

200 300 400 500 600

  • Static buffers provide

good circuit util or latency

slide-39
SLIDE 39

100 100 88 79 65 56 52 39

Challenge: BW Fluct.

Solution: Dynamic Buffer Resizing

Early buffer resize (μs) 200 400 600 800 1000 1200 1400

  • Avg. circuit utilization

Buffer resize provides good circuit util and latency

slide-40
SLIDE 40

100 100 88 79 65 56 52 39

Steady Latency 2x increase in utilization

Challenge: BW Fluct.

Solution: Dynamic Buffer Resizing

Early buffer resize (μs) 200 400 600 800 1000 1200 1400

  • Avg. circuit utilization

Early buffer resize (μs) Median latency (μs) 100 200 300 400 500 600

  • 200

400 600 1200 1400

  • 800 1000
  • Buffer resize provides

good circuit util and latency

slide-41
SLIDE 41

Overview

End-to-End Challenges

Challenge: Demand Estimation Challenge: Workloads Challenge: BW Fluct.

Solution: Endhost-based Estimation Solution: Dynamic Buffer Resizing Solution: App-specific Modification

>_

Etalon, an RDCN Emulator

slide-42
SLIDE 42

Difficult to schedule workloads

Name Node Rack A

Data Node Data Node Data Node Data Node

Rack B

Data Node Data Node Data Node Data Node

Rack C

Data Node Data Node Data Node Data Node

Rack D

Data Node Data Node Data Node Data Node

1 1 1 2 2 2 3 3 3

Challenge: Workloads

Solution: App-specific Modification

slide-43
SLIDE 43

Difficult to schedule workloads

Name Node Rack A

Data Node Data Node Data Node Data Node

Rack B

Data Node Data Node Data Node Data Node

Rack C

Data Node Data Node Data Node Data Node

Rack D

Data Node Data Node Data Node Data Node

1 1 1 2 2 2 3 3 3

Challenge: Workloads

Solution: App-specific Modification

slide-44
SLIDE 44

workloads

Config 1 A —> B B —> C C —> D D —> A Config 2 A —> C B —> D C —> A D —> B Config 3 A —> D B —> A C —> B D —> C RECONFIG DELAY RECONFIG DELAY RECONFIG DELAY

Schedule:

Name Node Rack A

Data Node Data Node Data Node Data Node

Rack B

Data Node Data Node Data Node Data Node

Rack C

Data Node Data Node Data Node Data Node

Rack D

Data Node Data Node Data Node Data Node

1 1 1 2 2 2 3 3 3

slide-45
SLIDE 45

Config 1 A —> B B —> C C —> D D —> A Config 2 A —> C B —> D C —> A D —> B Config 3 A —> D B —> A C —> B D —> C RECONFIG DELAY RECONFIG DELAY RECONFIG DELAY

Schedule:

Name Node Rack A

Data Node Data Node Data Node Data Node

Rack B

Data Node Data Node Data Node Data Node

Rack C

Data Node Data Node Data Node Data Node

Rack D

Data Node Data Node Data Node Data Node

1 1 1 2 2 2 3 3 3

slide-46
SLIDE 46

reHDFS

Config 1 A —> C B —> D C —> A D —> B

Schedule:

Name Node Rack A

Data Node Data Node Data Node Data Node

Rack B

Data Node Data Node Data Node Data Node

Rack C

Data Node Data Node Data Node Data Node

Rack D

Data Node Data Node Data Node Data Node

1 1 1 2 2 2 3 3 3

Challenge: Workloads

Solution: App-specific Modification

slide-47
SLIDE 47

reHDFS reduces tail latency

9x decrease in write time

Challenge: Workloads

Solution: App-specific Modification

HDFS write completion time (ms) CDF (%)

50 100

  • 500

1500

  • 1000
  • HDFS

reHDFS

slide-48
SLIDE 48

Control Coordination

Hierarchical Partitioning Transparency

Cofmow

VDN

Internet-scale Routing

VDX

Reaction

App TE + ISP TE

Priority Ranking

BGP + BGP

Etalon

Scenario:

Scalability

Scenario:

Admin

Scenario:

Layering

slide-49
SLIDE 49

Control Coordination

Hierarchical Partitioning Transparency

Cofmow

VDN

Internet-scale Routing

VDX

Reaction

App TE + ISP TE

Priority Ranking

BGP + BGP

Etalon

Scenario:

Scalability

Scenario:

Admin

Scenario:

Layering

slide-50
SLIDE 50

Control Coordination

Hierarchical Partitioning Transparency

Cofmow

VDN

Internet-scale Routing

VDX

Reaction

App TE + ISP TE

Priority Ranking

BGP + BGP

Etalon

Scenario:

Scalability

Scenario:

Admin

Scenario:

Layering

slide-51
SLIDE 51

Traditional Content Delivery

CDN Client Content Provider (CP)

Content Legend:

slide-52
SLIDE 52

Changing Content Delivery

CDN Client Content Provider (CP)

Content Legend:

Client Client CDN

slide-53
SLIDE 53

Brokered Content Delivery

CDN Content Provider (CP)

Content Legend:

Broker

Control

Client Client Client CDN

slide-54
SLIDE 54

Brokered Content Delivery

Content Provider (CP)

Content Legend:

Broker

Control

Client Client Client CDN CDN

Easier for CPs to meet performance and cost goals

slide-55
SLIDE 55

Brokered Content Delivery

Content Provider (CP)

Content Legend:

Broker

Control

Client Client Client B B B CDN CDN

Brokers select “best” CDN for clients to minimize cost and meet performance goals

slide-56
SLIDE 56

Brokered Content Delivery

Content Provider (CP)

Content Legend:

Broker

Control

Client Client Client B B B CDN CDN

How do brokers and CDNs impact each other? (this talk)

slide-57
SLIDE 57

Contributions

  • Identify challenges that brokers and CDNs create

for each other by analyzing data from both

  • Examine the design space of CDN-broker

interfaces

  • Evaluate the efficacy of different designs
slide-58
SLIDE 58

CDN Cost and Pricing

CDN Client Content Provider (CP)

Legend: Content

Internal Costs: Bandwidth (mostly)

slide-59
SLIDE 59

CDN Cost and Pricing

CDN Client Content Provider (CP)

Legend: Content

Internal Costs: Bandwidth (mostly)

Do bandwidth costs differ across geographic regions?

slide-60
SLIDE 60

CDN Cost / Byte Delivered

30x

difference in cost per byte between the most expensive and least expensive countries

slide-61
SLIDE 61

CDN Internal Cost

CDN Y CDN X CDN X CDN X

slide-62
SLIDE 62

CDN Internal Cost

CDN Y $ CDN X $ CDN X $ CDN X $$$$

slide-63
SLIDE 63

CDN External Price

CDN Y $ CDN X $ CDN X $ CDN X $$$$

Content Provider (CP)

CDN Y CDN X

CDN Pricing

$$ $$$

Client Client Client Client Client Client Client

slide-64
SLIDE 64

CDN External Price

CDN Y $

Client Client Client Client Client Client Client

CDN X $ CDN X $ CDN X $$$$

Content Provider (CP)

CDN Y CDN X

CDN Pricing

$$ $$$

CDN Y makes money, CDN X loses money

slide-65
SLIDE 65

Client Client Client Client Client Client Client

CDN External Price

CDN Y $ CDN X $ CDN X $ CDN X $$$$

Content Provider (CP)

CDN Y CDN X

CDN Pricing

$$ $$$

Do we see traffic patterns like this at the country level?

slide-66
SLIDE 66

Country Level Traffic

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Country (Anonymized) 25 50 75 100 % Used in Country

CDN A CDN B CDN C Other

slide-67
SLIDE 67

Country Level Traffic

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Country (Anonymized) 25 50 75 100 % Used in Country

CDN A CDN B CDN C Other

slide-68
SLIDE 68

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Country (Anonymized) 25 50 75 100 % Used in Country

CDN A CDN B CDN C Other

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Country (Anonymized) 25 50 75 100

Country Level Traffic

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Country (Anonymized) 25 50 75 100 % Used in Country

CDN A CDN B CDN C Other

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Country (Anonymized) 25 50 75 100 % Used in Country

CDN A CDN B CDN C Other

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Country (Anonymized) 25 50 75 100 % Used in Country

CDN A CDN B CDN C Other

slide-69
SLIDE 69

Country Level Traffic

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Country (Anonymized) 25 50 75 100 % Used in Country

CDN A CDN B CDN C Other

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Country (Anonymized) 25 50 75 100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Country (Anonymized) 25 50 75 100 % Used in Country

CDN A CDN B CDN C Other

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Country (Anonymized) 25 50 75 100 % Used in Country

CDN A CDN B CDN C Other

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Country (Anonymized) 25 50 75 100 % Used in Country

CDN A CDN B CDN C Other

Flat pricing makes CDN profits unpredictable with brokers Country 8 costly —> CDN B loses money! Country 7 cheap —> CDN A profits!

slide-70
SLIDE 70

Contributions

  • Identify challenges that brokers and CDNs create

for each other by analyzing data from both

  • Examine the design space of CDN-broker

interfaces

  • Evaluate the efficacy of different designs
slide-71
SLIDE 71

Brokered Delivery Today

CDN Content Provider (CP) Broker

Content Legend: Control

Client Client Client CDN

slide-72
SLIDE 72

Brokered Delivery Today

CDN Content Provider (CP) Broker Client Client Client CDN

slide-73
SLIDE 73

Brokered Delivery Today

CDN Content Provider (CP) Broker Client Client Client CDN

slide-74
SLIDE 74

Brokered Delivery Today

CDN Content Provider (CP) Broker Client Client Client CDN

slide-75
SLIDE 75

Brokered Delivery Today

CDN Content Provider (CP) Broker Client Client Client CDN

slide-76
SLIDE 76

Brokered Delivery Today

CDN Content Provider (CP) Broker Client Client Client CDN

slide-77
SLIDE 77

Brokered Delivery Today

CDN Broker Client Client Client CDN

slide-78
SLIDE 78

Brokered Delivery Today

CDN Broker Client Client Client CDN

Latency & loss Measurements ISP, device type, location, …

slide-79
SLIDE 79

Brokered Delivery Today

CDN Broker Client Client Client CDN

Which cluster to receive from Which CDN to use

slide-80
SLIDE 80

Brokered Delivery Today

CDN Broker Client Client Client CDN

CDN

slide-81
SLIDE 81

VDX

CDN Broker Client Client Client CDN

CDN

1 2 3

slide-82
SLIDE 82

Example

Client

CDN Y $

Client Client Client Client Client Client

CDN X $ CDN X $ CDN X $$$$

Content Provider (CP)

CDN Y CDN X

CDN Pricing

$$ $$$

slide-83
SLIDE 83

Example

Client

CDN Y $

Client Client Client Client Client Client

CDN X $ CDN X $ CDN X $$$$

Content Provider (CP)

CDN X

CDN Pricing

$$ $$$$

CDN X

CDN Y $$

slide-84
SLIDE 84

Example

Client

CDN Y $

Client Client Client Client Client Client

CDN X $ CDN X $ CDN X $$$$

Content Provider (CP)

CDN X

CDN Pricing

$$ $$$$

CDN X

CDN Y $$

slide-85
SLIDE 85

Example

Client

CDN Y $

Client Client Client Client Client Client

CDN X $ CDN X $ CDN X $$$$

Content Provider (CP)

CDN X

CDN Pricing

$$ $$$$

CDN X

CDN Y $$

CDN X can compete with

  • ther CDNs across regions
slide-86
SLIDE 86

Evaluation

  • Simulator using data from a broker & CDN, as well

as public data from 13 other CDNs

  • CDN data provides cluster locations, cluster-to-

client performance, delivery costs, etc.

  • Broker data provides client locations, request

distributions, etc.

slide-87
SLIDE 87

1 2 3 4 5 6 7 8 9 10 11 12 13 14

CDN Profit

Brokered VDX

Per-CDN Profits

Today VDX

slide-88
SLIDE 88

1 2 3 4 5 6 7 8 9 10 11 12 13 14

CDN Profit

Brokered VDX

Per-CDN Profits

Today VDX

slide-89
SLIDE 89

Evaluation Takeaways

  • Today’s world (Brokered) is pretty broken

(performance can be better; most CDNs lose money on brokered video delivery)

  • Marketplace (VDX) fixes this by exposing clusters

and cost

slide-90
SLIDE 90

Control Coordination

Hierarchical Partitioning Transparency

Cofmow

VDN

Internet-scale Routing

VDX

Reaction

App TE + ISP TE

Priority Ranking

BGP + BGP

Etalon

Scenario:

Scalability

Scenario:

Admin

Scenario:

Layering

slide-91
SLIDE 91

Control Coordination

Hierarchical Partitioning Transparency

Cofmow

VDN

Internet-scale Routing

VDX

Reaction

App TE + ISP TE

Priority Ranking

BGP + BGP

Etalon

Scenario:

Scalability

Scenario:

Admin

Scenario:

Layering

slide-92
SLIDE 92

Control Coordination

Hierarchical Partitioning Transparency

Cofmow

VDN

Internet-scale Routing

Reaction

App TE + ISP TE

Priority Ranking

BGP + BGP

Etalon

Scenario:

Scalability

Scenario:

Admin

Scenario:

Layering

Information Sharing

Some Full

VDX

slide-93
SLIDE 93

Control Coordination

Hierarchical Partitioning Transparency

Cofmow

VDN

Internet-scale Routing

Reaction

App TE + ISP TE

Priority Ranking

BGP + BGP

Etalon

Scenario:

Scalability

Scenario:

Admin

Scenario:

Layering

Information Sharing

Some Full

VDX

slide-94
SLIDE 94

Live Video is Becoming Wildly Popular

  • Commercial sports streams
  • User-generated streams
slide-95
SLIDE 95

Live Video is Becoming Wildly Popular

  • Commercial sports streams
  • Single World Cup stream = 40% global

Internet traffic

  • User-generated streams (e.g., Twitch)
  • Users watch 150b min of live video per month
  • Amazon buys Twitch for ~$1Billion
slide-96
SLIDE 96

CDN Live Video Delivery Background

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

Video Requests

HTTP GET HTTP RESPONSE Video 2 Video 2 Legend Video 1 Requests: Video 1 Responses:

slide-97
SLIDE 97

CDN Live Video Delivery Background

A B

Video Sources

E F G

Edge Clusters DNS

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700

Link Cost

100 100 120 25 20 15 15 1 10 1
slide-98
SLIDE 98

CDN Live Video Delivery Background

A B

Video Sources

E F G

Edge Clusters DNS

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients Link Capacity Link Cost

Objective: Reasonable service quality & Minimal delivery cost

slide-99
SLIDE 99

Problems with CDNs Today

2 4 6 8 10 # of Videos (Thousands) 1000 2000 3000 4000 5000 6000 7000 8000

  • Avg. Bitrate (Kbps)

Optimal CDN

Service Quality

Simulation using Conviva traces, modeling user-generated content

Delivery Cost

Simulation using Conviva traces, modeling large sports events

(per request)

CDN

2.0x

OPTIMAL

1.0x

slide-100
SLIDE 100

QUALITATIVE QUANTITATIVE

Problems with CDNs Today

2 4 6 8 10 # of Videos (Thousands) 1000 2000 3000 4000 5000 6000 7000 8000
  • Avg. Bitrate (Kbps)
Optimal CDN

Service Quality Delivery Cost

CDN

2.0x

OPTIMAL

1.0x

Not Fine-Grained Slow DNS Updates

Videos aggregated into large groups

Can’t push updates DNS entries get cached

slide-101
SLIDE 101

QUALITATIVE QUANTITATIVE

Solution?

2 4 6 8 10 # of Videos (Thousands) 1000 2000 3000 4000 5000 6000 7000 8000
  • Avg. Bitrate (Kbps)
Optimal CDN

Service Quality Delivery Cost

CDN

2.0x

OPTIMAL

1.0x

Not Fine-Grained Slow DNS Updates

Videos aggregated into large groups

Can’t push updates DNS entries get cached

Centralization!

[Liu, Xi et. al. A Case for a Coordinated Video Control Plane. SIGCOMM 2012]

slide-102
SLIDE 102

Outline

Centralized Control Distributed Control Problems with Live Video Today Putting it all Together Hybrid Control

slide-103
SLIDE 103

Motivating Centralized Optimization

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700 300 200

DNS

slide-104
SLIDE 104

Motivating Centralized Optimization

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700 300 200 300

DNS Congestion!

slide-105
SLIDE 105

Motivating Centralized Optimization

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

1K

Link Capacity

2K 3K 200 750 2K 300 500 300 750 700 300 200 300 200

DNS

slide-106
SLIDE 106

Motivating Centralized Optimization

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients Link Capacity

500 300 750 700 300 200 300 200

DNS

Needs global view to coordinate videos and network resources

slide-107
SLIDE 107

Unfortunately… No Free Lunch

50 100 150 200 # of videos 5 10 15 20 25 Join Time (Seconds)

Light Load

  • Med. Load
  • Hvy. Load

Fully Centralized

Slow join times! Experiments on EC2 nodes with a centralized controller at CMU across the Internet

slide-108
SLIDE 108

Outline

Centralized Control Distributed Control Problems with Live Video Today Putting it all Together Hybrid Control Slow join times

slide-109
SLIDE 109

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800

?

slide-110
SLIDE 110

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 500 300 750 700 1K 1K

Video 1 Responses:

800

?

Build “distance-to-video” tables at each cluster, for each video

slide-111
SLIDE 111

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800 800

DISTANCE AT CLUSTER F VIDEO 1: VIA C: 2; (B, 1K) VIA D: 1; (D, 800) PICK SHORTEST PATH WITH ENOUGH CAPACITY

slide-112
SLIDE 112

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 500 300 750 700 1K 1K

Video 1 Responses:

800 800

DISTANCE AT CLUSTER F VIDEO 1: VIA C: 2; (B, 1K) VIA D: 1; (D, 800)

Distributed decisions fast (ms) but sub-optimal

PICK SHORTEST PATH WITH ENOUGH CAPACITY

slide-113
SLIDE 113

Alternate Approach: Distributed

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Central Controller Legend Video 1 Data Requests: Link Capacity

2K 3K 500 300 750 700 1K 1K

Video 1 Responses:

800 800

DISTANCE AT CLUSTER F VIDEO 1: VIA C: 2; (B, 1K) VIA D: 1; (D, 800)

Combine approaches? “Hybrid Control”

PICK SHORTEST PATH WITH ENOUGH CAPACITY

slide-114
SLIDE 114

Outline

Centralized Control Distributed Control Problems with Live Video Today Putting it all Together Hybrid Control Slow join times Low bitrate

slide-115
SLIDE 115

Combining Approaches: VDN

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Legend Video 1 Data Requests:

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800

? Central Controller The Internet HIGH LATENCY HIGH LATENCY

slide-116
SLIDE 116

Combining Approaches: VDN

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Legend Video 1 Data Requests:

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800

Central Controller The Internet HIGH LATENCY HIGH LATENCY

800
slide-117
SLIDE 117

Combining Approaches: VDN

A B

Video Sources

E F G

Edge Clusters

C D

Reflector Clusters

Control ▶︎ ◀ Data

H I J

Clients

800

Legend Video 1 Data Requests:

2K 3K 200 2K 800 800 500 300 750 700 1K 1K

Video 1 Responses:

800

Central Controller The Internet HIGH LATENCY HIGH LATENCY

800

Video 1 Control Traffic:

slide-118
SLIDE 118
  • Avoid bad control loop 


interactions

Challenges of Hybrid Control

  • Forwarding loops
  • Always forward requests upwards
  • State transitions
  • Versioning and “shadow FIBS”

TRIVIAL PRIOR WORK CHALLENGING

slide-119
SLIDE 119

Challenges of Hybrid Control

CHALLENGING

  • 1. Centralized decision has priority
  • 2. Distributed uses slack in network
  • Avoid bad control loop 


interactions

slide-120
SLIDE 120

50 100 150 200 # of videos 5 10 15 20 25 Join Time (Seconds)

Light Load

  • Med. Load
  • Hvy. Load

Hybrid Control Fully Centralized Fully Distributed

Hybrid Control and Responsiveness

Slow join times! Experiments on EC2 nodes with a centralized controller at CMU across the Internet

slide-121
SLIDE 121

50 100 150 200 # of videos 5 10 15 20 25 Join Time (Seconds)

Light Load

  • Med. Load
  • Hvy. Load

Hybrid Control Fully Centralized Fully Distributed

Hybrid Control and Responsiveness

Slow join times! Experiments on EC2 nodes with a centralized controller at CMU across the Internet Not stable

slide-122
SLIDE 122

50 100 150 200 # of videos 5 10 15 20 25 Join Time (Seconds)

Light Load

  • Med. Load
  • Hvy. Load

Hybrid Control Fully Centralized Fully Distributed

Hybrid Control and Responsiveness

Slow join times! Experiments on EC2 nodes with a centralized controller at CMU across the Internet Not stable Great join times and more stable

slide-123
SLIDE 123

Control Coordination

Hierarchical Partitioning

VDN

Reaction

App TE + ISP TE

Scenario:

Scalability

Internet Routing

Transparency

Cofmow

Etalon

Scenario:

Layering

Priority Ranking

Scenario:

Admin

BGP + BGP

VDX

Information Sharing

Some Full

slide-124
SLIDE 124

Control Coordination

Hierarchical Partitioning

VDN

Internet Routing

Reaction

App TE + ISP TE

Scenario:

Scalability

Information Sharing

Some Full

Transparency

Cofmow

Etalon

Scenario:

Layering

Priority Ranking

Scenario:

Admin

BGP + BGP

VDX

slide-125
SLIDE 125

Control Coordination

Reaction

App TE + ISP TE

Transparency

Cofmow

Etalon

Scenario:

Layering

Priority Ranking

Scenario:

Admin

BGP + BGP

VDX

Hierarchical Partitioning

VDN

Internet Routing

Scenario:

Scalability

Information Sharing

Some Full

slide-126
SLIDE 126

Control Coordination

Hierarchical Partitioning Transparency

Cofmow

VDN

Internet Routing

Reaction

App TE + ISP TE

Priority Ranking

Etalon

Scenario:

Scalability

Scenario:

Admin

Scenario:

Layering

Information Sharing

Some Full

BGP + BGP

VDX

Shared resources

Yes No

slide-127
SLIDE 127

Control Coordination

Hierarchical Partitioning Transparency

Cofmow

VDN

Internet Routing

Reaction

App TE + ISP TE

Priority Ranking

Etalon

Scenario:

Scalability

Scenario:

Admin

Scenario:

Layering

Information Sharing

Some Full

BGP + BGP

VDX

Shared resources

Yes No None

slide-128
SLIDE 128

Control Coordination

Hierarchical Partitioning Transparency

Cofmow

VDN

Internet Routing

Reaction

App TE + ISP TE

Priority Ranking

Etalon

Scenario:

Scalability

Scenario:

Admin

Scenario:

Layering

Information Sharing

Some Full

BGP + BGP

VDX

Shared resources

Yes No None

Route Redistribution Pytheas C3 (Conviva) OSPF Fibbing Bohatei Klein Wiser P4P (vanilla) DASH + HTTP + TCP OSPF Areas Congestion Control CC + AQM

slide-129
SLIDE 129

Future Work

  • Control theory / verification approach
  • Validating VDN
  • Extending VDX to multi-broker
  • Principled approach to reconfigurable datacenters
  • Network / endhost co-design
  • e.g., network-aware applications
slide-130
SLIDE 130

Control Coordination

Hierarchical Partitioning Transparency

Cofmow

VDN

Internet Routing

Reaction

App TE + ISP TE

Priority Ranking

Etalon

Scenario:

Scalability

Scenario:

Admin

Scenario:

Layering

Information Sharing

Some Full

BGP + BGP

VDX

Shared resources

Yes No None

slide-131
SLIDE 131

Eliminating Adverse Control Plane Interactions in Independent Network Systems

Matthew K. Mukerjee

Computer Science PhD Thesis Defense

May 1st, 2018