photonic networks on chip for
play

Photonic Networks-on-Chip for Maximizing Performance and Improving - PowerPoint PPT Presentation

Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance Randy Morris , Avinash Kodi and Ahmed Louri School of Electrical Engineering and Computer Science, Ohio University


  1. Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance Randy Morris Ϯ , Avinash Kodi Ϯ and Ahmed Louri ‡ School of Electrical Engineering and Computer Science, Ohio University Ϯ Department of Electrical and Computer Engineering, University of Arizona ‡ E-mail: kodi@ohio.edu, louri@email.arizona.edu 45 th International Symposium on Microarchitecture (MICRO) December 1 – December 5, 2012 Vancouver BC, Canada

  2. Talk Outline • Motivation & Background • R-3PO: Architecture & Reconfiguration • Performance Analysis • Conclusions 2

  3. Multicores & Network-on-Chips Tilera-64 1 80-core Intel TeraFlops 2 512-core FERMI (Nvidia) 3 • With increasing cores, communication-centric design paradigm is becoming important (Network-on-Chips) • Energy for communication is increasing • Delivered throughput is decreasing 1 http://www.tilera.com/products/processors/TILE64 2 http://techresearch.intel.com/ProjectDetails.aspx?Id=151 3 http://www.nvidia.com/object/fermi_architecture.html 3

  4. Energy Discrepancy & Throughput On-die energy • Energy discrepancy between computation and global communication 1.2 Compute Energy 1 with technology scaling Interconnect Energy 0.8 Relative 0.6  Need to reduce global communication energy 0.4 0.2 0 45 32 22 14 10 7 Technology (nm) Source: Shekar Borkar, Intel Tile Power: Intel Tera-Flops (65 nm) 1 • Reduced throughput due to aggressive 250 1.33 Tflops 225 Power (watts) voltage and clock scaling At 230 W 200 175  Need to provide scalable bandwidth 1 Tflops 150 at 97 W without sacrificing performance 125 100 75 50 => Potential solutions: 25 0 Nanophotonics, 3D Stacking Voltage Y. Hoskote , “A 5 - GHz Mesh Interconnect for A Teraflops Processor,” 1. IEEE Computer Society, 2007 pp. 51-61 4

  5. Nanophotonics & Optical 3D Stacking • Nanophotonics offers several advantages: • Low energy (7.9 fJ/bit ) • Small Footprint (~2.5 µm) • High Bandwidth (~40 Gbps) • CMOS compatibility 1. L. Xu, W. Zhang, Q. Li, J. Chan, H. L. R. Lira, M. Lipson, K. Bergman, "40-Gb/s DPSK Data Transmission Through a Silicon Microring Switch," IEEE Photonics Technology Letters 24 . 2. Sasikanth Manipatruni, Kyle Preston, Long Chen, and Michal Lipson, "Ultra-low voltage, ultra-small mode volume silicon microring modulator," Opt. Express 18, 18235-18242 (2010) • Optical 3D stacking offers several advantages: • Shorter interconnect length • Higher bandwidth density • Optical vias create power-efficient Layer 2 inter-layer communication Layer 1 3. P. Koonath and B. Jalali , “Multilayer 3 - d photonics in silicon,” Opt. Express, vol. 15, pp. 12 686 – 12 691, 2007. 4. A. Biberman, K. Preston, G. Hendry, N. Sherwood-Droz, J. Chan, J. S. Levy, M. Lipson, and K. Bergman, “Photonic network -on-chip architectures using multilayer deposited silicon materials for high performance chip multiprocessors,” J. Emerg. Technol. Comput. Syst., vol. 7, pp. 1 – 25, July 2011. 5

  6. Recent Work on Photonic NoC, among others • Shared-Bus [Cornell, MICRO’06] • Free-Space Architecture [ISCA’10] Optical Proximity [Sun, ISCA’10] • • Circuit Switch [Columbia, NoCs’07] PROPEL [Ohio, NoCs’10] • • CORONA [HP/Wisconsin, ISCA’08] System Level Trimming [UC Davis, • • Processor-DRAM [MIT, Hot Int’08] HPCA’11] • Firefly [Northwestern, ISCA’09] Atomic Coherence [Wisconsin/HP, HPCA’11] • • Phastlane [Cornell, ISCA’09] FeatherWeight [Northwestern/KAIST, • MICRO’11 ] • Flexishare [Northwestern, HPCA’10] Resilient Microring Design [UCDavis, • • Oblivious Router [Cornell, ASPLOS’10] MICRO’11] • ATAC [MIT, PACT’10 ] Tolerating Process Variations [Pittsburgh, • • MPNoC [Arizona, DAC’10] ISCA’12] • However, there are several issues not addressed • 2D planar connections have waveguide crossings • Static network resource allocation • Lack of fault tolerance 6

  7. Talk Outline • Motivation & Background • R-3PO: Architecture & Reconfiguration • Performance Analysis • Conclusions 7

  8. R-3PO Architecture • Decomposed optical crossbar • Reduces optical hardware complexity by having smaller crossbars • Reduces crossover losses (~ 0.05 dB/crossing) • Optical vias • Light switched via photonic rings (reduces electrical power) • Eases fabrication as optical and electrical dies can be separately grown • Reconfiguration of network resources by re-allocating bandwidth • Reduces application execution time by monitoring link and buffer utilization • Provides fault tolerance as faulty channels are bypassed 8

  9. R-3PO Architecture (1/6) Electrical Contact Optical Layer 3 Optical Optical Layer 2 Die Optical Layer 1 Optical Layer 0 Electro-Optic Transceivers External Laser TSVs Electrical Die Core + Cache + MC Heat Sink 9

  10. R-3PO Architecture (1/6) L1 Cache L1 Cache Core Core 0 1 Shared L2 L1 Cache L1 Cache Core Core 2 3 Electrical Die Core + Cache + MC Heat Sink 10

  11. R-3PO Architecture (2/6) Limiting Driver for Photo- TIA detector Amplifier Electronics Buffer Chain T x T x T x T x R x R x R x R x Micro-ring resonator λ 1 λ 2 λ 3 λ 4 λ 1 λ 2 λ 3 λ 4 Off- Chip Laser Core A Core B Electro-Optic Transceivers External Laser TSVs Electrical Die Core + Cache + MC Heat Sink 11

  12. R-3PO Architecture (3/6) Group 1 Group 0 Group 2 Group 3 Optical Layer 0 Electro-Optic Transceivers External Laser TSVs Electrical Die Core + Cache + MC Heat Sink 12

  13. R-3PO Architecture (4/6) Group 1 Group 0 Group 3 Group 2 Optical Layer 1 Optical Layer 0 External Electro-Optic Transceivers Laser TSVs Electrical Die Core + Cache + MC Heat Sink 13

  14. R-3PO Architecture (5/6) Group 0 Group 1 Group 3 Group 2 Optical Layer 2 Optical Layer 1 Optical Layer 0 Electro-Optic Transceivers External Laser TSVs Electrical Die Core + Cache + MC Heat Sink 14

  15. R-3PO Architecture (6/6) Group 0 Group 1 Electrical Contact Group 2 Group 3 Optical Layer 3 Optical Optical Layer 2 Die Optical Layer 1 Optical Layer 0 Electro-Optic Transceivers External Laser TSVs Electrical Die Core + Cache + MC Heat Sink 15

  16. Router Microarchitecture Tile 0 Header Route Computation Token capture Token (RC) IB 0 Req + Rel release To Optical E/O Tx Layer 0 demux MRR BW BW BW BW RC RC EO EO OL OL OL OL OL OL OE OE SA SA Modulators S D Token IB 3 BW BW BW BW Req + Rel RC RC EO EO OL OL OL OL OL OL OE OE SA SA L2 Shared Cache S D To Optical E/O Tx Layer 3 RC: Route Computation BWS: Buffer Write (Source) Switch Allocator Token Token 0B 0 Control (SA) Re-generation EO: Electrical to Optical Driver From Optical O/E OL: Optical link latency (1-3 cycles) Rx Layer 0 OE: Optical to Electrical (Dest) mux BWD: Buffer Write (Dest) Token 0B 3 Control SA: Switch Allocation O/E From Optical Rx Layer 3 MRR Filters 16

  17. Static Communication Layer 2 Source Group 0 Group 1 Communication demand between • Tile 0 and Tile 15 is high based on application If there are under-utilized links, • then the bandwidth can be re- allocated to improve the performance Group 3 Group 2 17

  18. Network Reconfiguration Layer 0 Layer 1 Source Group 1 Group 0 Group 1 Group 0 Switch point Combine point Layer 1 Layer 0 Group 3 Group 2 Group 3 Group 2 Destination 2x increase in bandwidth is obtained by routing half the data through two other nanophotonic channels 18

  19. Reconfiguration • Reconfiguration in R-3PO takes place between the different layers as follows: • R-3P0-L1 : Reconfiguration between Layer0/Layer1 & Layer2/Layer3 • R-3P0-LA : Reconfiguration between adjacent layers • R-3P0-L2 : Reconfiguration between two adjacent layers • R-3P0-L3 : Reconfiguration between all layers • Reconfiguration algorithm monitors network resources • Link & Buffer utilization • Accomplished with hardware counters & electrical circuitry 19

  20. Reconfiguration Algorithm Step 1: Wait for Reconfiguration window, R W t Step 2: RC i sends a request packet to all local tiles requesting Link Util and Buffer Util for previous R W t-1 Step 3: Each hardware counter sends Link Util and Buffer Util statistics from the pervious R W t-1 to RC i Step 4: RC i classifies the link statistic for each hardware counter as: If Link util = 0.0 Not-Utilized: Use β 4 If Link util ≤ Lmin Under-Utilized: Use β 3 If Link util ≥ L min and Buffer util < B con Normal-Utilized: Use β 2 If Bufferutil > Bcon Over-Utilized: Use β 1 Step 5: Each RC i sends bandwidth available information to RC j , (i ≠ j). Step 6: If RC j can use any of the free links then notify RC i of their use, else RC j will forward to next RC j Step 7a: RC i receives response back from RC j and activates corresponding microrings Step 7b: RC j notifies the tiles of additional bandwidth and RC i notifies RC j that the additional bandwidth is now available Step 8: Goto Step 1 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend