PAST Scalable Ethernet for Data Centers Brent Stephens , Alan Cox - - PowerPoint PPT Presentation

past
SMART_READER_LITE
LIVE PREVIEW

PAST Scalable Ethernet for Data Centers Brent Stephens , Alan Cox - - PowerPoint PPT Presentation

PAST Scalable Ethernet for Data Centers Brent Stephens , Alan Cox , Wes Felter , Colin Dixon , and John Carter Rice University IBM Research December 11th, 2012 Brent Stephens PAST: Scalable Ethernet for Data Centers


slide-1
SLIDE 1

PAST

Scalable Ethernet for Data Centers

Brent Stephens †, Alan Cox †, Wes Felter ‡, Colin Dixon ‡, and John Carter ‡

†Rice University ‡IBM Research

December 11th, 2012

Brent Stephens PAST: Scalable Ethernet for Data Centers 1 / 31

slide-2
SLIDE 2

PAST . . .

. . . is a large flat L2 network for using arbitrary topologies . . . is implementable on existing Ethernet switch hardware and unmodified host network stacks . . . meets or exceeds the performance of the state of the art

Brent Stephens PAST: Scalable Ethernet for Data Centers 2 / 31

slide-3
SLIDE 3

Data Center Network Requirements

Host mobility Effective use of bandwidth Autonomous Scalability

Brent Stephens PAST: Scalable Ethernet for Data Centers 3 / 31

slide-4
SLIDE 4

Additional Design Requirements

No hardware changes Respects Layering Topology Independent

Brent Stephens PAST: Scalable Ethernet for Data Centers 4 / 31

slide-5
SLIDE 5

PAST

Per-Address Spanning Tree routing algorithm Unmodified Ethernet switches and hosts

◮ Implementable today ◮ Exploit existing features

Arbitrary topologies

◮ 10’s of thousands of hosts

Performance comparable to or greater than ECMP

Brent Stephens PAST: Scalable Ethernet for Data Centers 5 / 31

slide-6
SLIDE 6

Routing Space

Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31

slide-7
SLIDE 7

Routing Space

Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31

slide-8
SLIDE 8

Routing Space

Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31

slide-9
SLIDE 9

Routing Space

Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31

slide-10
SLIDE 10

Routing Space

Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31

slide-11
SLIDE 11

Routing Space

Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31

slide-12
SLIDE 12

Routing Space

Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31

slide-13
SLIDE 13

Routing Space

Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31

slide-14
SLIDE 14

Routing Space

Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31

slide-15
SLIDE 15

Routing Space

Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31

slide-16
SLIDE 16

Routing Space

Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31

slide-17
SLIDE 17

PAST Algorithm

Goal: Route using the Ethernet table (DMAC, VLAN) Constraint 1: Full pair-wise connectivity per-VLAN Constraint 2: Ethernet table forces a tree Solution: Build a spanning tree rooted at each address Load Balances at the address ((v-)host) granularity

Brent Stephens PAST: Scalable Ethernet for Data Centers 7 / 31

slide-18
SLIDE 18

PAST Algorithm

Goal: Route using the Ethernet table (DMAC, VLAN) Constraint 1: Full pair-wise connectivity per-VLAN Constraint 2: Ethernet table forces a tree Solution: Build a spanning tree rooted at each address Load Balances at the address ((v-)host) granularity

Brent Stephens PAST: Scalable Ethernet for Data Centers 7 / 31

slide-19
SLIDE 19

PAST Algorithm

Goal: Route using the Ethernet table (DMAC, VLAN) Constraint 1: Full pair-wise connectivity per-VLAN Constraint 2: Ethernet table forces a tree Solution: Build a spanning tree rooted at each address Load Balances at the address ((v-)host) granularity

Brent Stephens PAST: Scalable Ethernet for Data Centers 7 / 31

slide-20
SLIDE 20

Tree Construction

Goal: Use efficient paths Solution: Use a BFS tree for minimal paths Goal: Load-balance

  • ver all links

Solution: Tree selection

◮ Random ◮ Weight links

by load

Brent Stephens PAST: Scalable Ethernet for Data Centers 8 / 31

slide-21
SLIDE 21

Tree Construction

Goal: Use efficient paths Solution: Use a BFS tree for minimal paths Goal: Load-balance

  • ver all links

Solution: Tree selection

◮ Random ◮ Weight links

by load

Brent Stephens PAST: Scalable Ethernet for Data Centers 8 / 31

slide-22
SLIDE 22

Tree Construction

Goal: Use efficient paths Solution: Use a BFS tree for minimal paths Goal: Load-balance

  • ver all links

Solution: Tree selection

◮ Random ◮ Weight links

by load

Brent Stephens PAST: Scalable Ethernet for Data Centers 8 / 31

slide-23
SLIDE 23

Tree Construction

Goal: Use efficient paths Solution: Use a BFS tree for minimal paths Goal: Load-balance

  • ver all links

Solution: Tree selection

◮ Random ◮ Weight links

by load

Brent Stephens PAST: Scalable Ethernet for Data Centers 8 / 31

slide-24
SLIDE 24

Tree Construction

Goal: Use efficient paths Solution: Use a BFS tree for minimal paths Goal: Load-balance

  • ver all links

Solution: Tree selection

◮ Random ◮ Weight links

by load

Brent Stephens PAST: Scalable Ethernet for Data Centers 8 / 31

slide-25
SLIDE 25

Tree Construction

Goal: Use efficient paths Solution: Use a BFS tree for minimal paths Goal: Load-balance

  • ver all links

Solution: Tree selection

◮ Random ◮ Weight links

by load

Brent Stephens PAST: Scalable Ethernet for Data Centers 8 / 31

slide-26
SLIDE 26

Tree Construction

Goal: Use efficient paths Solution: Use a BFS tree for minimal paths Goal: Load-balance

  • ver all links

Solution: Tree selection

◮ Random ◮ Weight links

by load

Brent Stephens PAST: Scalable Ethernet for Data Centers 8 / 31

slide-27
SLIDE 27

Tree Construction

Goal: Use efficient paths Solution: Use a BFS tree for minimal paths Goal: Load-balance

  • ver all links

Solution: Tree selection

◮ Random ◮ Weight links

by load

Brent Stephens PAST: Scalable Ethernet for Data Centers 8 / 31

slide-28
SLIDE 28

Tree Construction

Goal: Use efficient paths Solution: Use a BFS tree for minimal paths Goal: Load-balance

  • ver all links

Solution: Tree selection

◮ Random ◮ Weight links

by load

Brent Stephens PAST: Scalable Ethernet for Data Centers 8 / 31

slide-29
SLIDE 29

Valiant Load Balancing

Brent Stephens PAST: Scalable Ethernet for Data Centers 9 / 31

slide-30
SLIDE 30

Non-Minimal Tree Construction

NM-PAST

◮ Root the tree

for host h at a random intermediate switch i

◮ Inspired by

Valiant Load Balancing

Brent Stephens PAST: Scalable Ethernet for Data Centers 10 / 31

slide-31
SLIDE 31

Non-Minimal Tree Construction

NM-PAST

◮ Root the tree

for host h at a random intermediate switch i

◮ Inspired by

Valiant Load Balancing

Brent Stephens PAST: Scalable Ethernet for Data Centers 10 / 31

slide-32
SLIDE 32

Non-Minimal Tree Construction

NM-PAST

◮ Root the tree

for host h at a random intermediate switch i

◮ Inspired by

Valiant Load Balancing

Brent Stephens PAST: Scalable Ethernet for Data Centers 10 / 31

slide-33
SLIDE 33

PAST Discussion

Broadcast/Multicast

  • Unaffected. May be provided through STP or SDN

Security Use VLANs as normal Virtualization Use any higher layer virtualization overlay (NetLord, SecondNet, MOOSE, VXLAN)

Brent Stephens PAST: Scalable Ethernet for Data Centers 11 / 31

slide-34
SLIDE 34

PAST Implementation

IBM RackSwitch G8264

Brent Stephens PAST: Scalable Ethernet for Data Centers 12 / 31

slide-35
SLIDE 35

Implementation Scalability

Eliminate Broadcasts Improve scalability by using controller for address detection and resolution (ARP, DHCP, IPv6 ND, and RS) Route Computation 8,000 hosts ⇒ 40µs − 1ms per tree (300ms per network) Trivially Parallelizable Route Installation Install and forward to 100K addresses 2-12ms rule install latency ⇒ masked by migration latency Failure Recovery Should patch affected portions of trees

Brent Stephens PAST: Scalable Ethernet for Data Centers 13 / 31

slide-36
SLIDE 36

Simulator

Simulate to evaluate performance at scale

◮ Flow based simulator assumes max-min fairness

Workloads

URand-8 i ∈ 1..8 Dsti = rand()%N, Benign Stride-64 Dstn = (n + 64)%N, Adversarial Shuffle-10 128MB to all hosts, Random order, 10 active connections, More stressful than URand MSR Synthetically generated from 1500-server cluster, Light load

Brent Stephens PAST: Scalable Ethernet for Data Centers 14 / 31

slide-37
SLIDE 37

Topologies

Compare equal bisection-bandwidth (oversubscription ratio) networks

EGFT (Fat Tree) HyperX (Flattened Butterfly) Jellyfish (Random Regular Graph)

Brent Stephens PAST: Scalable Ethernet for Data Centers 15 / 31

slide-38
SLIDE 38

Evaluation

Demonstrate PAST performance equal to or greater than other routing algorithms Demonstrate PAST performs well under adversarial workloads Demonstrate that PAST can effectively use a variety

  • f topologies

Brent Stephens PAST: Scalable Ethernet for Data Centers 16 / 31

slide-39
SLIDE 39

Evaluation

Demonstrate PAST performance equal to or greater than other routing algorithms

◮ URand-8 on a 1:2 Bisection Bandwidth HyperX ◮ Shuffle-10 on a 1:2 Bisection Bandwidth HyperX

Demonstrate PAST performs well under adversarial workloads Demonstrate that PAST can effectively use a variety

  • f topologies

Brent Stephens PAST: Scalable Ethernet for Data Centers 17 / 31

slide-40
SLIDE 40

URand-8 on a 1:2 Bisection Bandwidth HyperX

PAST matches ECMP

1K 2K 3K 4K 5K 6K 7K 8K Number of hosts 0.0 0.2 0.4 0.6 0.8 1.0 Aggregate Throughput PAST ECMP NM-PAST VAL EthAir STP

PAST/ECMP

Brent Stephens PAST: Scalable Ethernet for Data Centers 18 / 31

slide-41
SLIDE 41

Shuffle-10 on a 1:2 Bisection Bandwidth HyperX

EthAir scales poorly

1K 2K 3K 4K 5K 6K 7K 8K Number of hosts 0.0 0.2 0.4 0.6 0.8 1.0 Aggregate Throughput PAST ECMP NM-PAST VAL EthAir STP

EthAir

Brent Stephens PAST: Scalable Ethernet for Data Centers 19 / 31

slide-42
SLIDE 42

Evaluation

Demonstrate PAST performance equal to or greater than other routing algorithms

◮ PAST matches ECMP ◮ EthAir scales poorly

Demonstrate PAST performs well under adversarial workloads Demonstrate that PAST can effectively use a variety

  • f topologies

Brent Stephens PAST: Scalable Ethernet for Data Centers 20 / 31

slide-43
SLIDE 43

Evaluation

Demonstrate PAST performance equal to or greater than other routing algorithms Demonstrate PAST performs well under adversarial workloads

◮ Stride-64 on a 1:1 Bisection Bandwidth HyperX ◮ Shuffle-10 on a 1:2 Bisection Bandwidth HyperX

Demonstrate that PAST can effectively use a variety

  • f topologies

Brent Stephens PAST: Scalable Ethernet for Data Centers 21 / 31

slide-44
SLIDE 44

Stride-64 on a 1:1 Bisection Bandwidth HyperX

NM-PAST can double performance . . .

Brent Stephens PAST: Scalable Ethernet for Data Centers 22 / 31

slide-45
SLIDE 45

Stride-64 on a 1:1 Bisection Bandwidth HyperX

. . . and NM-PAST matches VAL . . .

Brent Stephens PAST: Scalable Ethernet for Data Centers 23 / 31

slide-46
SLIDE 46

Stride-64 on a 1:1 Bisection Bandwidth HyperX

. . . although collisions can hurt performance

Brent Stephens PAST: Scalable Ethernet for Data Centers 24 / 31

slide-47
SLIDE 47

Shuffle-10 on a 1:2 Bisection Bandwidth HyperX

VAL halves performance under uniform workloads

Brent Stephens PAST: Scalable Ethernet for Data Centers 25 / 31

slide-48
SLIDE 48

Evaluation

Demonstrate PAST performance equal to or greater than other routing algorithms Demonstrate PAST performs well under adversarial workloads

◮ NM-PAST can double performance ◮ NM-PAST matches VAL ◮ NM-PAST and VAL halve performance under uniform workloads

Demonstrate that PAST can effectively use a variety

  • f topologies

Brent Stephens PAST: Scalable Ethernet for Data Centers 26 / 31

slide-49
SLIDE 49

Evaluation

Demonstrate PAST performance equal to or greater than other routing algorithms Demonstrate PAST performs well under adversarial workloads Demonstrate that PAST can effectively use a variety

  • f topologies

◮ URand-8 on 1:2 Bisection Bandwidth networks ◮ Stride-64 on 1:2 Bisection Bandwidth networks Brent Stephens PAST: Scalable Ethernet for Data Centers 27 / 31

slide-50
SLIDE 50

URand-8 on 1:2 Bisection Bandwidth Networks

PAST on HyperX and Jellyfish

  • utperforms EGFT

Brent Stephens PAST: Scalable Ethernet for Data Centers 28 / 31

slide-51
SLIDE 51

Stride-64 on 1:2 Bisection Bandwidth Networks

NM-PAST on a HyperX matches PAST on an EGFT

1K 2K 3K 4K 5K 6K 7K 8K Number of hosts 0.0 0.2 0.4 0.6 0.8 1.0 Aggregate Throughput

PAST

1K 2K 3K 4K 5K 6K 7K 8K Number of hosts

NM-PAST

HyperX Jellyfish EGFT

Brent Stephens PAST: Scalable Ethernet for Data Centers 29 / 31

slide-52
SLIDE 52

Evaluation

Demonstrate PAST performance equal to or greater than other routing algorithms Demonstrate PAST performs well under adversarial workloads Demonstrate that PAST can effectively use a variety

  • f topologies

◮ PAST on HyperX and Jellyfish outperforms EGFT ◮ NM-PAST enables HyperX to perform well under adversarial workloads Brent Stephens PAST: Scalable Ethernet for Data Centers 30 / 31

slide-53
SLIDE 53

Conclusions

PAST is a datacenter network that supports full host mobility, high bandwidth, self-configuration, and tens

  • f thousands of hosts

PAST can provide near optimal throughput

◮ Worst case performance equal to ECMP ◮ Best case performance double ECMP ◮ ECMP is not as useful (or necessary) as previously thought

PAST supports commodity switches and exploits only the most basic Ethernet hardware

◮ PAST is implementable today Brent Stephens PAST: Scalable Ethernet for Data Centers 31 / 31