Pr ProjecToR: : Agile gile Rec econfigu figurab able le Data - - PowerPoint PPT Presentation

pr projector agile gile rec econfigu figurab able le data
SMART_READER_LITE
LIVE PREVIEW

Pr ProjecToR: : Agile gile Rec econfigu figurab able le Data - - PowerPoint PPT Presentation

Pr ProjecToR: : Agile gile Rec econfigu figurab able le Data C Da Center I Interconne nnect Monia Ghobadi Ratul Mahajan Amar Phanishayee Pierre Blanche Houman Rastegarfar Nikhil Devanur Janardhan Kulkarni Madeleine Glick


slide-1
SLIDE 1

Pr ProjecToR: : Agile gile Rec econfigu figurab able le Da Data C Center I Interconne nnect

Monia Ghobadi

Ratul Mahajan Amar Phanishayee Nikhil Devanur Janardhan Kulkarni Gireeja Ranade Pierre Blanche Houman Rastegarfar Madeleine Glick Daniel Kilper

slide-2
SLIDE 2

To Today’s data center interconnects

A B C D

3 3 3 3 3 3 3 3 3 3 3 3

Ideal demand matrix: uniform and static Static capacity between ToR pairs

6 6

Non-ideal demand matrix: skewed and dynamic A B C D A B C D A B C D A B C D

10Gbps 10Gbps

2

8 6 7 7 12 8 6 6

slide-3
SLIDE 3

Ne Need d fo for a a reconfig igur urable able in interconne nnect

Data:

  • 200K servers across 4 production clusters
  • Cluster sizes: 100 -- 2500 racks

Observation:

  • Many rack pairs exchange little traffic
  • Only some hot rack pairs are active

Implication:

  • Static topology with uniform capacity:
  • Over-provisioned for most rack pairs
  • Under-provisioned for few others

Reconfigurable interconnect: To dynamically provide additional capacity between hot rack pairs

3

slide-4
SLIDE 4

De Desirabl ble pr prope perti ties of a reconfigur urabl ble interconne nnect

4

Optical switch

A B C D Observation:

  • Traffic matrices differ widely

Implication:

  • Difficult to determine static vs. reconfigurable divide

(Seamless interconnect) Static Reconfigurable

slide-5
SLIDE 5

De Desirabl ble pr prope perti ties of a reconfigur urabl ble interconne nnect

5

Observation:

  • Source racks send large amounts of traffic to many other racks

Implications:

  • Should create direct links to lots of other racks (high fan-out)
  • Should switch quickly among destinations (low switching time)
slide-6
SLIDE 6

Enabler technology Seamless High Fan-out Low switching time Helios, Mordia [sigcomm’10, sigcomm’13] Optical Circuit Switch 3D Beam forming, Flyways [sigcomm’12, hotnets’09] 60GHz FireFly [sigcomm’14] Free-Space Optics

ProjecToR

Free-Space Optics

Pr Properties of reco configurable interco connects

6

Enabler technology Seamless High Fan-out Low switching time Helios, Mordia [sigcomm’10, sigcomm’13] Optical Circuit Switch Flyways, 3D Beam forming [sigcomm’11, sigcomm’12] 60GHz FireFly [sigcomm’14] Free-Space Optics

slide-7
SLIDE 7

Pr ProjecToR inte nterconnect

7

  • Free-space topology (seamless)
  • 18,000 fan-out (60 x more than optical circuit switches)
  • 12 us switching time (2500 x faster than optical circuit switches)

7

Laser Photodetector Static topology

slide-8
SLIDE 8

Re Reconfiguration in a Pr ProjecToR in interconne nnect

8

  • Digital micromirror device to redirect light
  • Mirror assembly to magnify reach

8

slide-9
SLIDE 9

Di Digital Mi Micr cromirr rror De Devi vice (DM DMD) D)

9

Array of micromirrors (10 um) Memory cell

slide-10
SLIDE 10

10

  • Theoretical number of accessible locations: total number of micromirrors
  • 768x768 = 589824
  • Cross-talk between adjacent locations
  • Achievable number of accessible locations
  • 768x768 / 32 = 18,432

Us Using D g DMDs t to r

  • red

edirec ect l ligh ght

1 1 1 1 1 1 1 1 1

slide-11
SLIDE 11

11

Us Using g mirror assemblies to magnify y reach

11

  • Challenge: DMDs have a narrow angular reach
  • Solution: Coupling DMDs with angled mirrors
slide-12
SLIDE 12

Que Questi tions ns to ans nswer

12

  • How feasible is a ProjecToR interconnect?
  • Built and micro-benchmarked a small ProjecToR prototype
  • Robustness to environmental conditions
  • How should packets be routed in a ProjecToR interconnect?
  • Devised a scheduling algorithm and simulated its performance
  • How much does a ProjecToR interconnect cost?
  • Estimated cost based on cost break down of each component
slide-13
SLIDE 13

13

Prototyp ype: A 3-To ToR Pr ProjecToR in interconne nnect

ToR1 ToR2 ToR3

slide-14
SLIDE 14

14

Prototyp ype: A 3-To ToR Pr ProjecToR in interconne nnect

Source laser

DMD

Mirrors reflecting to ToR2 and ToR3

slide-15
SLIDE 15

15

Prototyp ype: A 3-To ToR Pr ProjecToR in interconne nnect

ToR1 ToR2 ToR3

slide-16
SLIDE 16

Prototyp ype: throughput

16

0.00 0.20 0.40 0.60 0.80 1.00

8.8 8.9 9 9.1 9.2 9.3 9.4

CDF TCP Throughput (Gbps) ProjecToR Link Wired Link

slide-17
SLIDE 17

Prototyp ype: switching time

17

ToR1 ToR2 ToR3

slide-18
SLIDE 18

Prototyp ype: swi witch ching time

18

  • 50
  • 45
  • 40
  • 35
  • 30
  • 25
  • 20
  • 15
  • 10

5 10 15 20

Receive Power (dBm) Time (us) ToR 1 -> ToR 2 ToR 1 -> ToR 3 12 us

slide-19
SLIDE 19

Con Connecting l g lasers a and p phot

  • tod
  • detector
  • rs
  • Two topology approach
  • Slow switching topology or dedicated topology
  • Fast switching links or opportunistic links

ToR1 ToR2 ToR3 ToR1 ToR2 ToR3 ToR1 ToR1 ToR2 ToR3 ToR2 ToR3 ToR1 ToR2 ToR3 ToR1 ToR2 ToR3

lasers photodetectors

dedicated topology

  • pportunistic links

19

slide-20
SLIDE 20

Ro Routing packets

ToR1 ToR2 ToR3 ToR1 ToR2 ToR3

dedicated topology

20

2

3 3 3

Virtual output queues

K-shortest paths routing

  • pportunistic link

2 2 2 2 2 2 2

slide-21
SLIDE 21

Sch Scheduling opport rtunistic c links

  • Given a set of potential links and current traffic demand, find a set of

active opportunistic links

ToR1 ToR2 ToR3 ToR1 ToR2 ToR3

100 100 s

  • u

r c e

d e s t i n a t i o n

21

slide-22
SLIDE 22

Sch Scheduling opport rtunistic c links

  • Standard switch scheduling problem
  • Blossom matching
  • Matrix decomposition
  • Centralized scheduler
  • Single tiered matching

100 100 100 s

  • u

r c e

d e s t i n a t i o n

input

  • utput

22

slide-23
SLIDE 23
  • Standard switch scheduling problem
  • Blossom matching
  • Matrix decomposition
  • Centralized scheduler
  • Single tiered matching

Sch Scheduling opport rtunistic c links

input

  • utput

23

Src ToRs Dst ToRs

Extended the Gale-Shapely algorithm for finding stable matches [GS-1962] Constant competitive against an offline optimal allocation Two-tiered Decentralized

slide-24
SLIDE 24

24

Fat tree FireFly ProjecToR

Si Simulations

  • 128-ToR (1024 servers) with 16 lasers and photodetectors
  • Day-long traffic matrix: to build the dedicated topology
  • 5-min traffic matrix: to generate probability of ToR pair communication
  • TCP flows arrival with poison arrival rate and realistic flow sizes
slide-25
SLIDE 25
  • Slow switching time
  • Low fan-out

+ Reconfigurable + Switching time: 12us + high fan-out

  • Tail flow completion time
  • Different traffic matrices
  • Impact of fan-out
  • Impact of switching time

Si Simulation res esults

5 10 15 20 25 30 35 40

20 30 40 50 60 70 80

Average Flow Completion Time (ms)

Average Load (%)

25

  • No reconfigurability

95%

ProjecToR Fat tree FireFly

slide-26
SLIDE 26

Pr ProjecToR: : A reconfig igur urable able da data a center

ToR1 ToR2 ToR3 ToR1 ToR2 ToR3

26

Seamless, high fan-

  • ut, low switching

time interconnect Small prototype demonstrates feasibility Decentralized flow scheduling algorithm