pr projector agile gile rec econfigu figurab able le data
play

Pr ProjecToR: : Agile gile Rec econfigu figurab able le Data - PowerPoint PPT Presentation

Pr ProjecToR: : Agile gile Rec econfigu figurab able le Data C Da Center I Interconne nnect Monia Ghobadi Ratul Mahajan Amar Phanishayee Pierre Blanche Houman Rastegarfar Nikhil Devanur Janardhan Kulkarni Madeleine Glick


  1. Pr ProjecToR: : Agile gile Rec econfigu figurab able le Data C Da Center I Interconne nnect Monia Ghobadi Ratul Mahajan Amar Phanishayee Pierre Blanche Houman Rastegarfar Nikhil Devanur Janardhan Kulkarni Madeleine Glick Daniel Kilper Gireeja Ranade

  2. To Today’s data center interconnects A B C D A B C D 0 3 3 3 0 0 0 0 0 0 0 6 0 6 0 0 0 0 0 0 A 0 6 6 0 A 10Gbps 3 0 3 3 B B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 C C 3 3 0 3 0 0 8 0 0 0 0 0 6 0 0 0 0 0 0 7 0 0 0 0 10Gbps D D 3 3 3 0 0 0 0 0 12 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 A B C D Ideal demand matrix: Non-ideal demand matrix: uniform and static skewed and dynamic Static capacity between ToR pairs 2

  3. Ne Need d fo for a a reconfig igur urable able in interconne nnect Data: • 200K servers across 4 production clusters • Cluster sizes: 100 -- 2500 racks Observation: • Many rack pairs exchange little traffic • Only some hot rack pairs are active Implication: • Static topology with uniform capacity: • Over-provisioned for most rack pairs • Under-provisioned for few others Reconfigurable interconnect: To dynamically provide additional capacity between hot rack pairs 3

  4. De Desirabl ble pr prope perti ties of a reconfigur urabl ble interconne nnect Static Reconfigurable Optical switch A B C D Observation: Traffic matrices differ widely • Implication: Difficult to determine static vs. reconfigurable divide • (Seamless interconnect) 4

  5. De Desirabl ble pr prope perti ties of a reconfigur urabl ble interconne nnect Observation: Source racks send large amounts of traffic to many other racks • Implications: Should create direct links to lots of other racks (high fan-out) • Should switch quickly among destinations (low switching time) • 5

  6. Properties of reco Pr configurable interco connects Enabler technology Seamless High Fan-out Low switching Enabler technology Seamless High Fan-out Low switching time time Helios, Mordia Optical Circuit Switch Helios, Mordia Optical Circuit Switch [sigcomm’10, sigcomm’13] [sigcomm’10, sigcomm’13] 3D Beam forming, Flyways 60GHz Flyways, 3D Beam forming 60GHz [sigcomm’12, hotnets’09] [sigcomm’11, sigcomm’12] FireFly [sigcomm’14] Free-Space Optics FireFly [sigcomm’14] Free-Space Optics ProjecToR Free-Space Optics 6

  7. Pr ProjecToR inte nterconnect • Free-space topology (seamless) • 18,000 fan-out (60 x more than optical circuit switches) • 12 us switching time (2500 x faster than optical circuit switches) Laser Photodetector 7 Static topology 7

  8. Re Reconfiguration in a Pr ProjecToR in interconne nnect • Digital micromirror device to redirect light • Mirror assembly to magnify reach 8 8

  9. Di Digital Mi Micr cromirr rror De Devi vice (DM DMD) D) Array of micromirrors (10 um) Memory cell 9

  10. Us Using D g DMDs t to r o red edirec ect l ligh ght 1 1 1 0 0 0 1 0 1 0 1 0 1 1 1 0 0 0 • Theoretical number of accessible locations: total number of micromirrors • 768x768 = 589824 • Cross-talk between adjacent locations • Achievable number of accessible locations • 768x768 / 32 = 18,432 10

  11. Us Using g mirror assemblies to magnify y reach • Challenge: DMDs have a narrow angular reach • Solution: Coupling DMDs with angled mirrors 11 11

  12. Que Questi tions ns to ans nswer • How feasible is a ProjecToR interconnect? • Built and micro-benchmarked a small ProjecToR prototype • Robustness to environmental conditions • How should packets be routed in a ProjecToR interconnect? • Devised a scheduling algorithm and simulated its performance • How much does a ProjecToR interconnect cost? • Estimated cost based on cost break down of each component 12

  13. Prototyp ype: A 3-To ToR Pr ProjecToR in interconne nnect ToR 3 ToR 2 ToR 1 13

  14. Prototyp ype: A 3-To ToR Pr ProjecToR in interconne nnect Mirrors reflecting to ToR 2 and ToR 3 DMD Source laser 14

  15. Prototyp ype: A 3-To ToR Pr ProjecToR in interconne nnect ToR 3 ToR 2 ToR 1 15

  16. Prototyp ype: throughput ProjecToR Link Wired Link 1.00 0.80 0.60 CDF 0.40 0.20 0.00 8.8 8.9 9 9.1 9.2 9.3 9.4 TCP Throughput (Gbps) 16

  17. Prototyp ype: switching time ToR 3 ToR 2 ToR 1 17

  18. Prototyp ype: swi witch ching time ToR 1 -> ToR 2 ToR 1 -> ToR 3 -10 12 us Receive Power (dBm) -15 -20 -25 -30 -35 -40 -45 -50 0 5 10 15 20 Time (us) 18

  19. Con Connecting l g lasers a and p phot otod odetector ors lasers photodetectors ToR 1 ToR 1 ToR 1 ToR 1 ToR 1 ToR 1 ToR 2 ToR 2 ToR 2 ToR 2 ToR 2 ToR 2 ToR 3 ToR 3 ToR 3 ToR 3 ToR 3 ToR 3 dedicated topology opportunistic links • Two topology approach • Slow switching topology or dedicated topology • Fast switching links or opportunistic links 19

  20. Routing packets Ro 2 2 2 2 2 3 2 3 ToR 1 ToR 1 2 2 3 Virtual output queues opportunistic link ToR 2 ToR 2 ToR 3 ToR 3 dedicated topology K-shortest paths routing 20

  21. Scheduling opport Sch rtunistic c links • Given a set of potential links and current traffic demand, find a set of active opportunistic links d e s t i n a t i o n ToR 1 ToR 1 s 0 0 100 o u 100 0 0 r ToR 2 ToR 2 c 0 0 0 e ToR 3 ToR 3 21

  22. Sch Scheduling opport rtunistic c links • Standard switch scheduling problem input output • Blossom matching d e s t i n a t i o n • Matrix decomposition s 0 100 0 o u • Centralized scheduler 0 0 100 r c • Single tiered matching 100 0 0 e 22

  23. Scheduling opport Sch rtunistic c links • Standard switch scheduling problem Dst ToRs input output • Blossom matching Src ToRs • Matrix decomposition Decentralized • Centralized scheduler Two-tiered • Single tiered matching Extended the Gale-Shapely algorithm for finding stable matches [GS-1962] Constant competitive against an offline optimal allocation 23

  24. Si Simulations Fat tree FireFly ProjecToR • 128-ToR (1024 servers) with 16 lasers and photodetectors • Day-long traffic matrix: to build the dedicated topology • 5-min traffic matrix: to generate probability of ToR pair communication • TCP flows arrival with poison arrival rate and realistic flow sizes 24

  25. Si Simulation res esults 40 Average Flow Completion Time - Slow switching time FireFly 35 - Low fan-out 30 • Tail flow completion time 25 • Different traffic matrices Fat tree - No reconfigurability (ms) 20 • Impact of fan-out 15 • Impact of switching time 95% 10 + Reconfigurable 5 ProjecToR + Switching time: 12us 0 20 30 40 50 60 70 80 + high fan-out Average Load (%) 25

  26. ProjecToR: Pr : A reconfig igur urable able da data a center ToR 1 ToR 1 ToR 2 ToR 2 ToR 3 ToR 3 Seamless, high fan- Small prototype Decentralized flow out, low switching demonstrates scheduling time interconnect feasibility algorithm 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend