DevoFlow: Scaling Flow Management for High-Performance Networks - - PowerPoint PPT Presentation

devoflow scaling flow management for high performance
SMART_READER_LITE
LIVE PREVIEW

DevoFlow: Scaling Flow Management for High-Performance Networks - - PowerPoint PPT Presentation

DevoFlow: Scaling Flow Management for High-Performance Networks Andy Curtis Je ff Mogul Jean Tourrilhes Praveen Yalagandula Puneet Sharma Sujata Banerjee Wednesday, August 17, 11 Software-de fi ned networking Wednesday, August 17, 11


slide-1
SLIDE 1

DevoFlow: Scaling Flow Management for High-Performance Networks

Andy Curtis Jeff Mogul Jean Tourrilhes Praveen Yalagandula Puneet Sharma Sujata Banerjee

Wednesday, August 17, 11

slide-2
SLIDE 2

Software-defined networking

Wednesday, August 17, 11

slide-3
SLIDE 3

Software-defined networking

  • Enables programmable networks

Wednesday, August 17, 11

slide-4
SLIDE 4

Software-defined networking

  • Enables programmable networks
  • Implemented by OpenFlow

Wednesday, August 17, 11

slide-5
SLIDE 5

Software-defined networking

  • Enables programmable networks
  • Implemented by OpenFlow
  • OpenFlow is a great concept, but...
  • its original design imposes excessive overheads

Wednesday, August 17, 11

slide-6
SLIDE 6

Control-plane Data-plane

Traditional switch

Wednesday, August 17, 11

slide-7
SLIDE 7

Control-plane Data-plane

Routed packets Inbound packets

Traditional switch

Wednesday, August 17, 11

slide-8
SLIDE 8

Control-plane Data-plane

Reachability Routed packets Reachability Inbound packets

Traditional switch

Wednesday, August 17, 11

slide-9
SLIDE 9

Control-plane Data-plane

Routed packets Inbound packets OpenFlow switch Centralized controller

Wednesday, August 17, 11

slide-10
SLIDE 10

Flow setups Link state Forwarding rule stats Forwarding table entries Statistics requests

Control-plane Data-plane

Routed packets Inbound packets Centralized controller

Wednesday, August 17, 11

slide-11
SLIDE 11

OpenFlow enables innovative management solutions

Wednesday, August 17, 11

slide-12
SLIDE 12

OpenFlow enables innovative management solutions

  • Consistent routing and security policy enforcement

[Ethane, SIGCOMM 2007]

  • Data center network architectures like VL2 and

PortLand [Tavakoli et al. Hotnets 2009]

  • Client load-balancing with commodity switches

[Aster*x, ACLD demo 2010; Wang et al., HotICE 2011]

  • Flow scheduling [Hedera, NSDI 2010]
  • Energy-proportional networking [ElasticTree, NSDI 2010]
  • Automated data center QoS [Kim et al., INM/WREN 2010]

Wednesday, August 17, 11

slide-13
SLIDE 13

But OpenFlow is not perfect...

  • Scaling these solutions to data center-

sized networks is challenging

Wednesday, August 17, 11

slide-14
SLIDE 14

Contributions

  • Characterize overheads of implementing

OpenFlow in hardware

  • Propose DevoFlow to enable cost-

effective, scalable flow management

  • Evaluate DevoFlow by applying it to data

center flow scheduling

Wednesday, August 17, 11

slide-15
SLIDE 15

Contributions

  • Characterize overheads of implementing

OpenFlow in hardware Experience drawn from implementing OpenFlow

  • n HP ProCurve switches

Wednesday, August 17, 11

slide-16
SLIDE 16

OpenFlow couples flow setup with visibility

. . . . . .

Central controller

Edge switches Aggregation Core

Wednesday, August 17, 11

slide-17
SLIDE 17

. . . . . .

Central controller

Edge switches Aggregation Core

Flow arrival

Wednesday, August 17, 11

slide-18
SLIDE 18

. . . . . .

Central controller

Edge switches Aggregation Core

If no forwarding table rule at switch (exact-match or wildcard)

Wednesday, August 17, 11

slide-19
SLIDE 19

. . . . . .

Central controller

Edge switches Aggregation Core

Two problems arise...

Wednesday, August 17, 11

slide-20
SLIDE 20

bottleneck at controller problem 1:

. . . . . .

Central controller

Wednesday, August 17, 11

slide-21
SLIDE 21

bottleneck at controller problem 1:

. . . . . .

Up to 10 million new flows per second in data center with 100 edge switches

[Benson et al. IMC 2010] Central controller

Wednesday, August 17, 11

slide-22
SLIDE 22

bottleneck at controller problem 1:

. . . . . .

Up to 10 million new flows per second in data center with 100 edge switches

[Benson et al. IMC 2010]

If controller can handle 30K flow setups/

  • sec. then, we need at least 333 controllers!

Central controller

Wednesday, August 17, 11

slide-23
SLIDE 23

. . . . . .

Central controller

Onix [Koponen et al. OSDI 2010] Maestro [Cai et al. Tech Report 2010] HyperFlow [Tootoonchian and Ganjali, WREN 2010] Devolved controller [Tam et al. WCC 2011]

Wednesday, August 17, 11

slide-24
SLIDE 24

stress on switch control-plane problem 2:

. . . . . .

Central controller

Wednesday, August 17, 11

slide-25
SLIDE 25

Control-plane Data-plane

Routed packets Inbound packets

Wednesday, August 17, 11

slide-26
SLIDE 26

Control-plane Data-plane Switch CPU ASIC Switch control-plane

Routed packets Inbound packets

Wednesday, August 17, 11

slide-27
SLIDE 27

Control-plane Switch CPU ASIC

Routed packets Inbound packets

Data-plane Switch control-plane

Wednesday, August 17, 11

slide-28
SLIDE 28

Scaling problem: switches

  • Inherent overheads
  • Implementation-imposed overheads

Wednesday, August 17, 11

slide-29
SLIDE 29

Scaling problem: switches

  • Inherent overheads
  • Bandwidth
  • OpenFlow creates too much control traffic

~1 control packet for every 2–3 data packets

  • Latency
  • Implementation-imposed overheads

Wednesday, August 17, 11

slide-30
SLIDE 30

Scaling problem: switches

  • Inherent overheads
  • Implementation-imposed overheads
  • Flow setup
  • Statistics gathering
  • State size (see paper)

Wednesday, August 17, 11

slide-31
SLIDE 31

Flow setup

Client A Client B OpenFlow controller

ProCurve 5406 zl switch

Wednesday, August 17, 11

slide-32
SLIDE 32

Flow setup

Client A Client B OpenFlow controller

ProCurve 5406 zl switch

We believe our measurement numbers are representative of the current generation of OpenFlow switches

Wednesday, August 17, 11

slide-33
SLIDE 33

Flow setup

50 100 150 200 250 300

Flow setups per sec.

275

5406 zl

Wednesday, August 17, 11

slide-34
SLIDE 34

Flow setup

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Flow setups per sec.

5406 zl

10,000

Expected

We can expect up to 10K flow arrivals / sec.

[Benson et al. IMC 2010]

40x difference!

Wednesday, August 17, 11

slide-35
SLIDE 35

Flow setup

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Flow setups per sec.

5406 zl

10,000

Expected

Too much latency: adds 2ms to flow setup

Wednesday, August 17, 11

slide-36
SLIDE 36

Control-plane

Routed packets Inbound packets

Data-plane Switch CPU

300 Gbps

Wednesday, August 17, 11

slide-37
SLIDE 37

Control-plane

Routed packets Inbound packets

Data-plane Switch CPU

80 Mbps

Wednesday, August 17, 11

slide-38
SLIDE 38

Control-plane

Routed packets Inbound packets

Data-plane Switch CPU

17 Mbps

Wednesday, August 17, 11

slide-39
SLIDE 39

Stats-gathering

  • Flow setups and stat-pulling compete for this bandwidth

Wednesday, August 17, 11

slide-40
SLIDE 40

Stats-gathering

  • Flow setups and stat-pulling compete for this bandwidth

50 100 150 200 250 300

Flow setups per sec.

Never 1 s 500 ms Stat-pull frequency:

Wednesday, August 17, 11

slide-41
SLIDE 41

Stats-gathering

  • Flow setups and stat-pulling compete for this bandwidth
  • 2.5 sec. to collect stats from the average data center

edge switch

Wednesday, August 17, 11

slide-42
SLIDE 42

Can we solve the problem with more hardware?

  • Faster CPU may help, but won’t be enough
  • Control-plane datapath needs at least two orders of

magnitude more bandwidth

  • Ethernet speeds accelerating faster than

CPU speeds

  • OpenFlow won’t drive chip-area budgets

for several generations

Wednesday, August 17, 11

slide-43
SLIDE 43

Contributions

  • Characterize overheads of implementing

OpenFlow in hardware

  • Propose DevoFlow to enable cost-

effective, scalable flow management

  • Evaluate DevoFlow by applying it to data

center flow scheduling

Wednesday, August 17, 11

slide-44
SLIDE 44

Devolved OpenFlow

We devolve control over most flows back to the switches

Wednesday, August 17, 11

slide-45
SLIDE 45

DevoFlow design

  • Keep flows in the data-plane
  • Maintain just enough visibility for effective

flow management

  • Simplify the design and implementation
  • f high-performance switches

Wednesday, August 17, 11

slide-46
SLIDE 46

DevoFlow mechanisms

  • Control mechanisms
  • Statistics-gathering mechanisms

Wednesday, August 17, 11

slide-47
SLIDE 47

DevoFlow mechanisms

  • Control mechanisms
  • Rule cloning
  • ASIC clones a wildcard rule as an exact match rule

for new microflows

Wednesday, August 17, 11

slide-48
SLIDE 48

DevoFlow mechanisms

  • Control mechanisms
  • Rule cloning

wildcard rules

src dst src port dst Port * 129.100.1.5 * * src dst src port dst Port

exact-match rules

Wednesday, August 17, 11

slide-49
SLIDE 49

DevoFlow mechanisms

  • Control mechanisms
  • Rule cloning

wildcard rules

src dst src port dst Port * 129.100.1.5 * * src dst src port dst Port

exact-match rules

Wednesday, August 17, 11

slide-50
SLIDE 50

DevoFlow mechanisms

  • Control mechanisms
  • Rule cloning

wildcard rules

src dst src port dst Port * 129.100.1.5 * * src dst src port dst Port

exact-match rules

Wednesday, August 17, 11

slide-51
SLIDE 51

DevoFlow mechanisms

  • Control mechanisms
  • Rule cloning

wildcard rules

src dst src port dst Port * 129.100.1.5 * * src dst src port dst Port 129.200.1.1 129.100.1.5 4832 80

exact-match rules

Wednesday, August 17, 11

slide-52
SLIDE 52

DevoFlow mechanisms

  • Control mechanisms
  • Rule cloning
  • ASIC clones a wildcard rule as an exact match rule

for new microflows

  • Local actions
  • Rapid re-routing
  • Gives fallback paths for when a port fails
  • Multipath support

Wednesday, August 17, 11

slide-53
SLIDE 53

Control-mechanisms

  • Control mechanisms
  • Rule cloning
  • ASIC clones a wildcard rule as an exact match rule

for new microflows

  • Local actions
  • Rapid re-routing
  • Gives fallback paths for when a port fails
  • Multipath support

1/3 1/6 1/2

Wednesday, August 17, 11

slide-54
SLIDE 54

Statistics-gathering mechanisms

Wednesday, August 17, 11

slide-55
SLIDE 55

Statistics-gathering mechanisms

  • Sampling
  • Packet header is sent to controller with 1/1000

probability

Wednesday, August 17, 11

slide-56
SLIDE 56

Statistics-gathering mechanisms

  • Sampling
  • Packet header is sent to controller with 1/1000

probability

  • Triggers and reports
  • Can set a threshold per rule; when threshold is

reached, flow is setup at central controller

Wednesday, August 17, 11

slide-57
SLIDE 57

Statistics-gathering mechanisms

  • Sampling
  • Packet header is sent to controller with 1/1000

probability

  • Triggers and reports
  • Can set a threshold per rule; when threshold is

reached, flow is setup at central controller

  • Approximate counters
  • Tracks all flows matching a wildcard rule

Wednesday, August 17, 11

slide-58
SLIDE 58

Implementing DevoFlow

  • Have not implemented in hardware
  • Can reuse existing functional blocks for

most mechanisms

Wednesday, August 17, 11

slide-59
SLIDE 59

Using DevoFlow

  • Provides tools to scale your SDN

application, but scaling is still a challenge

  • Example: flow scheduling
  • Follows Hedera’s approach [Al-Fares et al. NSDI 2010]

Wednesday, August 17, 11

slide-60
SLIDE 60

Using DevoFlow: flow scheduling

Wednesday, August 17, 11

slide-61
SLIDE 61

Using DevoFlow: flow scheduling

  • Switches use multipath forwarding rules for

new flows

Wednesday, August 17, 11

slide-62
SLIDE 62

Using DevoFlow: flow scheduling

  • Switches use multipath forwarding rules for

new flows

  • Central controller uses sampling or triggers

to detect elephant flows

  • Elephant flows are dynamically scheduled by the

central controller

  • Uses a bin packing algorithm, see paper

Wednesday, August 17, 11

slide-63
SLIDE 63

Evaluation

  • How much can we reduce flow scheduling
  • verheads while still achieving high

performance?

Wednesday, August 17, 11

slide-64
SLIDE 64

Evaluation: methodology

  • Custom built simulator
  • Flow-level model of network traffic
  • Models OpenFlow based on our measurements
  • f the 5406 zl

Wednesday, August 17, 11

slide-65
SLIDE 65

Evaluation: methodology

  • Custom built simulator
  • Flow-level model of network traffic
  • Models OpenFlow based on our measurements
  • f the 5406 zl

17 Mbps

Wednesday, August 17, 11

slide-66
SLIDE 66

Evaluation: methodology

  • Clos topology
  • 1600 servers
  • 640 Gbps bisection

bandwidth

  • 20 servers per rack

. . . . . .

  • HyperX topology
  • 1620 servers
  • 405 Gbps bisection

bandwidth

  • 20 servers per rack

Wednesday, August 17, 11

slide-67
SLIDE 67

Evaluation: methodology

  • Workloads
  • Shuffle, 128 MB to all servers, five at a time
  • Reverse-engineered MSR workload

[Kandula et al. IMC 2009]

Wednesday, August 17, 11

slide-68
SLIDE 68

Evaluation: methodology

  • Workloads
  • Shuffle, 128 MB to all servers, five at a time
  • Reverse-engineered MSR workload

[Kandula et al. IMC 2009]

  • Based on two distributions: inter-arrival times and

bytes per flow

Wednesday, August 17, 11

slide-69
SLIDE 69

Evaluation: methodology

  • Schedulers
  • ECMP
  • OpenFlow
  • Course-grained using wildcard rules
  • Fine-grained using stat-pulling

(i.e., Hedera [Al-Fares et al. NSDI 2010])

  • DevoFlow
  • Statistics via sampling
  • Triggers and reports at a specified threshold of bytes transferred

Wednesday, August 17, 11

slide-70
SLIDE 70

Evaluation: metrics

  • Performance
  • Aggregate throughput
  • Overheads
  • Packets/sec. to central controller
  • Forwarding table size

Wednesday, August 17, 11

slide-71
SLIDE 71

Performance: Clos topology

50 100 150 200 250 300 Aggregate Throughput (Gbps)

Shuffle with 400 servers

ECMP Wildcard Stat-pulling Sampling Thresholds

OpenFlow-based DevoFlow-based

Wednesday, August 17, 11

slide-72
SLIDE 72

Performance: Clos topology

50 100 150 200 250 300 Aggregate Throughput (Gbps)

Shuffle with 400 servers

ECMP Wildcard Stat-pulling Sampling Thresholds

OpenFlow-based DevoFlow-based

37% increase

Wednesday, August 17, 11

slide-73
SLIDE 73

Performance: Clos topology

MSR workload

ECMP Wildcard Stat-pulling Sampling Thresholds

OpenFlow-based DevoFlow-based

50 100 150 200 250 Aggregate throughput (Gbps)

Wednesday, August 17, 11

slide-74
SLIDE 74

Performance: HyperX topology

Shuffle with 400 servers

ECMP VLB Stat-pulling Sampling Thresholds OpenFlow-based DevoFlow-based 40 80 120 160 200 240 Aggregate Throughput (Gbps)

Wednesday, August 17, 11

slide-75
SLIDE 75

Performance: HyperX topology

Shuffle with 400 servers

ECMP VLB Stat-pulling Sampling Thresholds OpenFlow-based DevoFlow-based 40 80 120 160 200 240 Aggregate Throughput (Gbps)

55% increase

Wednesday, August 17, 11

slide-76
SLIDE 76

Performance: HyperX topology

Shuffle with 400 servers

ECMP VLB Stat-pulling Sampling Thresholds OpenFlow-based DevoFlow-based 40 80 120 160 200 240 Aggregate Throughput (Gbps)

55% increase

Wednesday, August 17, 11

slide-77
SLIDE 77

Performance: HyperX topology

Shuffle with 400 servers

ECMP VLB Stat-pulling Sampling Thresholds OpenFlow-based DevoFlow-based 40 80 120 160 200 240 Aggregate Throughput (Gbps)

55% increase

Wednesday, August 17, 11

slide-78
SLIDE 78

Performance: HyperX topology

Shuffle + MSR workload

ECMP VLB Stat-pulling Sampling Thresholds OpenFlow-based DevoFlow-based 50 100 150 200 250 300 350 400 Aggregate throughput (Gbps)

Wednesday, August 17, 11

slide-79
SLIDE 79

Performance: HyperX topology

Shuffle + MSR workload

ECMP VLB Stat-pulling Sampling Thresholds OpenFlow-based DevoFlow-based 50 100 150 200 250 300 350 400 Aggregate throughput (Gbps)

18% increase

Wednesday, August 17, 11

slide-80
SLIDE 80

Overheads: Control traffic

DevoFlow-based OpenFlow-based

2000 4000 6000 8000

  • No. packets / sec. to controller

Wildcard Stats-pulling Sampling Threshold

483 7758 705 74

Wednesday, August 17, 11

slide-81
SLIDE 81

Overheads: Control traffic

DevoFlow-based OpenFlow-based

2000 4000 6000 8000

  • No. packets / sec. to controller

Wildcard Stats-pulling Sampling Threshold

483 7758 705 74

10% 1%

Wednesday, August 17, 11

slide-82
SLIDE 82

Overheads: Control traffic

DevoFlow-based OpenFlow-based

2000 4000 6000 8000

  • No. packets / sec. to controller

Wildcard Stats-pulling Sampling Threshold

483 7758 705 74

10% 1%

Multipath wildcard rules and targeted stats collection reduces control traffic

Wednesday, August 17, 11

slide-83
SLIDE 83

250 500 750 1000 Forwarding table entries

DevoFlow-based OpenFlow-based

Overheads: Flow table entries

Wildcard Stats-pulling Sampling Threshold

71 926 6 13

Wednesday, August 17, 11

slide-84
SLIDE 84

250 500 750 1000 Forwarding table entries

DevoFlow-based OpenFlow-based

Overheads: Flow table entries

Wildcard Stats-pulling Sampling Threshold

71 926 8 15

70–150x reduction in table entries at average edge switch

Wednesday, August 17, 11

slide-85
SLIDE 85

Evaluation: overheads

Control-plane bandwidth needed

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 50 100 150 200 250 0.1 0.5 1 10 30 Never Percent of forwarding bandwidth needed for control plane bandwidth Control plane throughput needed (Mbps) Stat-pulling rate 95th percentile 99th percentile

(s)

Wednesday, August 17, 11

slide-86
SLIDE 86

Evaluation: overheads

Control-plane bandwidth needed

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 50 100 150 200 250 0.1 0.5 1 10 30 Never Percent of forwarding bandwidth needed for control plane bandwidth Control plane throughput needed (Mbps) Stat-pulling rate 95th percentile 99th percentile

5406zl bandwidth

(s)

Wednesday, August 17, 11

slide-87
SLIDE 87

Evaluation: overheads

Control-plane bandwidth needed

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 50 100 150 200 250 0.1 0.5 1 10 30 Never Percent of forwarding bandwidth needed for control plane bandwidth Control plane throughput needed (Mbps) Stat-pulling rate 95th percentile 99th percentile

5406zl bandwidth This is to meet a 2ms switch- internal queuing deadline

(s)

Wednesday, August 17, 11

slide-88
SLIDE 88

Conclusions

  • OpenFlow imposes high overheads on

switches

  • Proposed DevoFlow to give tools to reduce

reliance on the control-plane

  • DevoFlow can reduce overheads by 10–50x

for data center flow scheduling

Wednesday, August 17, 11

slide-89
SLIDE 89

Wednesday, August 17, 11

slide-90
SLIDE 90

Other uses of DevoFlow

  • Client load-balancing (Similar to Wang et al. HotICE 2011)
  • Network virtualization [Sherwood et al. OSDI 2010]
  • Data center QoS
  • Multicast
  • Routing as a service [Chen et al. INFOCOM 2011]
  • Energy-proportional routing [Heller et al. NSDI 2010]

Wednesday, August 17, 11

slide-91
SLIDE 91

Implementing DevoFlow

  • Rule cloning:
  • May be difficult to implement on ASIC. Can definitely be done

with use of switch CPU

  • Multipath support:
  • Similar to LAG and ECMP
  • Sampling:
  • Already implemented in most switches
  • Triggers:
  • Similar to rate limiters

Wednesday, August 17, 11

slide-92
SLIDE 92

Flow table size

  • Constrained resource
  • Commodity switches:

32K–64K exact match entries ~1500 TCAM entries

  • Virtualization may strain table size
  • 10s of VMs per machine implies >100K table entries

Wednesday, August 17, 11

slide-93
SLIDE 93

5000 10000 15000 20000 25000 30000 35000 Flow table size (flows) 2 4 6 8 Time to pull statistics (s) Average reply time Maximum reply time

Wednesday, August 17, 11

slide-94
SLIDE 94

5000 10000 15000 20000 25000 30000 35000 Flow table size (flows) 2 4 6 8 Time to pull statistics (s) Average reply time Maximum reply time

2.5 sec. to pull stats at average edge switch

Wednesday, August 17, 11

slide-95
SLIDE 95

1 2 3 4 5 6 7 8 9 10 Stats request rate (req / s) 50 100 150 200 250 300 TCP connections rate (sockets / s) TCP connection rate 2000 4000 6000 8000 Stats collected (entries / s) Stats collected

Wednesday, August 17, 11

slide-96
SLIDE 96

Evaluation: methodology

  • Workloads
  • Reverse-engineered MSR workload

[Kandula et al. IMC 2009]

Inter-arrival times Flow sizes

Wednesday, August 17, 11

slide-97
SLIDE 97

100 200 300 400 500 600 MSR, 25% inter-rack MSR, 75% inter-rack shu e, n=400 MSR + shu e, n = 400 MSR + shu e, n = 800 Aggregate Throughput (Gbps) ECMP Wildcard 1s Pull-based 5s Sampling 1/1000 Threshold 1MB

Evaluation: performance Clos topology

Workload

DevoFlow-based OpenFlow-based

Wednesday, August 17, 11

slide-98
SLIDE 98

Evaluation: overheads Control traffic

504 483 446 29,451 7,758 4,871 7,123 709 71 432 181 74 5000 10000 15000 20000 25000 30000 0.1s 1s 10s 0.1s 1s 10s 1/100 1/1000 1/10000 128KB 1MB 10MB Wildcard Pull-based Sampling Threshold

  • No. packets / sec. to controller

MSR, 25% inter-rack MSR, 75% inter-rack

DevoFlow schedulers OpenFlow schedulers

DevoFlow-based OpenFlow-based

Wednesday, August 17, 11

slide-99
SLIDE 99

Evaluation: overheads Flow table entries

200 400 600 800 1000 1200 1400 1600 1800 0.1s 1s 10s 0.1s 1s 10s 1/100 1/1000 1/10000 128KB 1MB 10MB Wildcard Pull-based Sampling Threshold

  • No. ow table entries

Avg - MSR, 25% inter-rack Max - MSR, 25% inter-rack Avg - MSR, 75% inter-rack Max - MSR, 75% inter-rack

DevoFlow schedulers OpenFlow schedulers

10x 50x

DevoFlow-based OpenFlow-based

DevoFlow aggressively uses multipath wildcard rules

Wednesday, August 17, 11