PAST
Scalable Ethernet for Data Centers
Brent Stephens †, Alan Cox †, Wes Felter ‡, Colin Dixon ‡, and John Carter ‡
†Rice University ‡IBM Research
December 11th, 2012
Brent Stephens PAST: Scalable Ethernet for Data Centers 1 / 31
PAST Scalable Ethernet for Data Centers Brent Stephens , Alan Cox - - PowerPoint PPT Presentation
PAST Scalable Ethernet for Data Centers Brent Stephens , Alan Cox , Wes Felter , Colin Dixon , and John Carter Rice University IBM Research December 11th, 2012 Brent Stephens PAST: Scalable Ethernet for Data Centers
Brent Stephens †, Alan Cox †, Wes Felter ‡, Colin Dixon ‡, and John Carter ‡
†Rice University ‡IBM Research
December 11th, 2012
Brent Stephens PAST: Scalable Ethernet for Data Centers 1 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 2 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 3 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 4 / 31
◮ Implementable today ◮ Exploit existing features
◮ 10’s of thousands of hosts
Brent Stephens PAST: Scalable Ethernet for Data Centers 5 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 6 / 31
Goal: Route using the Ethernet table (DMAC, VLAN) Constraint 1: Full pair-wise connectivity per-VLAN Constraint 2: Ethernet table forces a tree Solution: Build a spanning tree rooted at each address Load Balances at the address ((v-)host) granularity
Brent Stephens PAST: Scalable Ethernet for Data Centers 7 / 31
Goal: Route using the Ethernet table (DMAC, VLAN) Constraint 1: Full pair-wise connectivity per-VLAN Constraint 2: Ethernet table forces a tree Solution: Build a spanning tree rooted at each address Load Balances at the address ((v-)host) granularity
Brent Stephens PAST: Scalable Ethernet for Data Centers 7 / 31
Goal: Route using the Ethernet table (DMAC, VLAN) Constraint 1: Full pair-wise connectivity per-VLAN Constraint 2: Ethernet table forces a tree Solution: Build a spanning tree rooted at each address Load Balances at the address ((v-)host) granularity
Brent Stephens PAST: Scalable Ethernet for Data Centers 7 / 31
Goal: Use efficient paths Solution: Use a BFS tree for minimal paths Goal: Load-balance
Solution: Tree selection
◮ Random ◮ Weight links
by load
Brent Stephens PAST: Scalable Ethernet for Data Centers 8 / 31
Goal: Use efficient paths Solution: Use a BFS tree for minimal paths Goal: Load-balance
Solution: Tree selection
◮ Random ◮ Weight links
by load
Brent Stephens PAST: Scalable Ethernet for Data Centers 8 / 31
Goal: Use efficient paths Solution: Use a BFS tree for minimal paths Goal: Load-balance
Solution: Tree selection
◮ Random ◮ Weight links
by load
Brent Stephens PAST: Scalable Ethernet for Data Centers 8 / 31
Goal: Use efficient paths Solution: Use a BFS tree for minimal paths Goal: Load-balance
Solution: Tree selection
◮ Random ◮ Weight links
by load
Brent Stephens PAST: Scalable Ethernet for Data Centers 8 / 31
Goal: Use efficient paths Solution: Use a BFS tree for minimal paths Goal: Load-balance
Solution: Tree selection
◮ Random ◮ Weight links
by load
Brent Stephens PAST: Scalable Ethernet for Data Centers 8 / 31
Goal: Use efficient paths Solution: Use a BFS tree for minimal paths Goal: Load-balance
Solution: Tree selection
◮ Random ◮ Weight links
by load
Brent Stephens PAST: Scalable Ethernet for Data Centers 8 / 31
Goal: Use efficient paths Solution: Use a BFS tree for minimal paths Goal: Load-balance
Solution: Tree selection
◮ Random ◮ Weight links
by load
Brent Stephens PAST: Scalable Ethernet for Data Centers 8 / 31
Goal: Use efficient paths Solution: Use a BFS tree for minimal paths Goal: Load-balance
Solution: Tree selection
◮ Random ◮ Weight links
by load
Brent Stephens PAST: Scalable Ethernet for Data Centers 8 / 31
Goal: Use efficient paths Solution: Use a BFS tree for minimal paths Goal: Load-balance
Solution: Tree selection
◮ Random ◮ Weight links
by load
Brent Stephens PAST: Scalable Ethernet for Data Centers 8 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 9 / 31
NM-PAST
◮ Root the tree
for host h at a random intermediate switch i
◮ Inspired by
Valiant Load Balancing
Brent Stephens PAST: Scalable Ethernet for Data Centers 10 / 31
NM-PAST
◮ Root the tree
for host h at a random intermediate switch i
◮ Inspired by
Valiant Load Balancing
Brent Stephens PAST: Scalable Ethernet for Data Centers 10 / 31
NM-PAST
◮ Root the tree
for host h at a random intermediate switch i
◮ Inspired by
Valiant Load Balancing
Brent Stephens PAST: Scalable Ethernet for Data Centers 10 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 11 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 12 / 31
Eliminate Broadcasts Improve scalability by using controller for address detection and resolution (ARP, DHCP, IPv6 ND, and RS) Route Computation 8,000 hosts ⇒ 40µs − 1ms per tree (300ms per network) Trivially Parallelizable Route Installation Install and forward to 100K addresses 2-12ms rule install latency ⇒ masked by migration latency Failure Recovery Should patch affected portions of trees
Brent Stephens PAST: Scalable Ethernet for Data Centers 13 / 31
◮ Flow based simulator assumes max-min fairness
URand-8 i ∈ 1..8 Dsti = rand()%N, Benign Stride-64 Dstn = (n + 64)%N, Adversarial Shuffle-10 128MB to all hosts, Random order, 10 active connections, More stressful than URand MSR Synthetically generated from 1500-server cluster, Light load
Brent Stephens PAST: Scalable Ethernet for Data Centers 14 / 31
EGFT (Fat Tree) HyperX (Flattened Butterfly) Jellyfish (Random Regular Graph)
Brent Stephens PAST: Scalable Ethernet for Data Centers 15 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 16 / 31
◮ URand-8 on a 1:2 Bisection Bandwidth HyperX ◮ Shuffle-10 on a 1:2 Bisection Bandwidth HyperX
Brent Stephens PAST: Scalable Ethernet for Data Centers 17 / 31
1K 2K 3K 4K 5K 6K 7K 8K Number of hosts 0.0 0.2 0.4 0.6 0.8 1.0 Aggregate Throughput PAST ECMP NM-PAST VAL EthAir STP
PAST/ECMP
Brent Stephens PAST: Scalable Ethernet for Data Centers 18 / 31
1K 2K 3K 4K 5K 6K 7K 8K Number of hosts 0.0 0.2 0.4 0.6 0.8 1.0 Aggregate Throughput PAST ECMP NM-PAST VAL EthAir STP
EthAir
Brent Stephens PAST: Scalable Ethernet for Data Centers 19 / 31
◮ PAST matches ECMP ◮ EthAir scales poorly
Brent Stephens PAST: Scalable Ethernet for Data Centers 20 / 31
◮ Stride-64 on a 1:1 Bisection Bandwidth HyperX ◮ Shuffle-10 on a 1:2 Bisection Bandwidth HyperX
Brent Stephens PAST: Scalable Ethernet for Data Centers 21 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 22 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 23 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 24 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 25 / 31
◮ NM-PAST can double performance ◮ NM-PAST matches VAL ◮ NM-PAST and VAL halve performance under uniform workloads
Brent Stephens PAST: Scalable Ethernet for Data Centers 26 / 31
◮ URand-8 on 1:2 Bisection Bandwidth networks ◮ Stride-64 on 1:2 Bisection Bandwidth networks Brent Stephens PAST: Scalable Ethernet for Data Centers 27 / 31
Brent Stephens PAST: Scalable Ethernet for Data Centers 28 / 31
1K 2K 3K 4K 5K 6K 7K 8K Number of hosts 0.0 0.2 0.4 0.6 0.8 1.0 Aggregate Throughput
PAST
1K 2K 3K 4K 5K 6K 7K 8K Number of hosts
NM-PAST
HyperX Jellyfish EGFT
Brent Stephens PAST: Scalable Ethernet for Data Centers 29 / 31
◮ PAST on HyperX and Jellyfish outperforms EGFT ◮ NM-PAST enables HyperX to perform well under adversarial workloads Brent Stephens PAST: Scalable Ethernet for Data Centers 30 / 31
◮ Worst case performance equal to ECMP ◮ Best case performance double ECMP ◮ ECMP is not as useful (or necessary) as previously thought
◮ PAST is implementable today Brent Stephens PAST: Scalable Ethernet for Data Centers 31 / 31