Photonic Many-Core Architecture Study Nadya Bliss 1 , Krste Asanovi - PowerPoint PPT Presentation

Photonic Many-Core Architecture Study Nadya Bliss 1 , Krste Asanovi ć 2 , Keren Bergman 3 , Luca Carloni 3 , Jeremy Kepner 1 , Sanjeev Mohindra 1 , Vladimir Stojanovi ć 4 1 MIT Lincoln Laboratory, 2 University of California Berkeley, 3 Columbia University, 4 MIT Research Laboratory of Electronics September 23 rd , 2008 PM: Jagdeep Shah This work is sponsored by DARPA under Air Force contract FA8721-05-C-0002. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government. MIT Lincoln Laboratory HPEC2008 1 NTBliss 9/29/2008

Outline • Introduction • Logical Architecture Abstraction • Modeling and Mapping • Experiments and Results • Summary MIT Lincoln Laboratory HPEC2008 2 NTBliss 9/29/2008

Emerging Device Trends 3D Fabrication Feature Size Reduction Photonic Interconnects 1970s Intel 80486DX2 Die: 12x6.75mm Intel 4004 10 microns Sun Sparc 0.8 microns AMD Athlon 0.18 microns Reduced path length STI Cell for accesses across 65 nm the memory hierarchy VS Intel Core 2 45 nm 2008 Emerging device technologies create a large Emerging device technologies create a large parameter space of possible future architectures parameter space of possible future architectures MIT Lincoln Laboratory HPEC2008 3 NTBliss 9/29/2008

Benefits of Photonic Interconnects ELECTRONICS OPTICS TO MEMORY • Use optical network as an efficient global crossbar • Communication to memory banks is chip power and pin/wire density limited • Better scaling with N groups • Poor scaling of on-chip mem controllers with cores • Expected performance - 40-80 Tb/sec • At most 3-6 Tb/sec in the next few years CORE-TO-CORE RX RX RX RX RX TX RX TX TX TX TX TX • Modulate/receive data once per communication • Buffer, receive and re-transmit at every switch • Scalable, low power switch fabric • Power dissipation grows with data rate • Balanced communication and computation Photonics can provide high bandwidth, low latency communication Photonics can provide high bandwidth, low latency communication while meeting power requirements of embedded systems. while meeting power requirements of embedded systems. MIT Lincoln Laboratory HPEC2008 4 NTBliss 9/29/2008

System Level View -Photonic Many-core Architecture Network: PhotoMAN- Selecting a system level architecture allows the parameter space Selecting a system level architecture allows the parameter space to be narrowed while meeting requirements of DoD applications. to be narrowed while meeting requirements of DoD applications. • Manycore processor chip – 64-256 cores (in 22nm node) • Off-chip memory – a set of DRAM chips – minimum capacity - 128 GB (at 22nm) • Evaluate interaction of the photonic network and memory hierarchy • Board power limit 500 W – Consistent with power constraints of medium-sized UAV RQ-7 Shadow To evaluate the architecture develop To evaluate the architecture develop 1. Expressive logical abstraction 1. Expressive logical abstraction 2. Modeling and mapping framework 2. Modeling and mapping framework MIT Lincoln Laboratory HPEC2008 5 NTBliss 9/29/2008

Logical Abstraction -Kuck* Memory Hierarchy- 2-LEVEL HIERARCHY EXAMPLE SM 2 Legend: • P - processor SMN 2 • N - inter-processor network • M - memory • SM - shared memory SM 1 SM 1 • SMN - shared memory network ... SMN 1 SMN 1 Subscript indicates hierarchy level M 0 ... M 0 M 0 ... M 0 P 0 P 0 P 0 P 0 N 0.5 N 0.5 x.5 subscript for N indicates indirect memory access N 1.5 The Kuck notation provides a clear way of describing a hardware The Kuck notation provides a clear way of describing a hardware architecture along with the memory and communication hierarchy architecture along with the memory and communication hierarchy MIT Lincoln Laboratory HPEC2008 7 *High Performance Computing: Challenges for Future Systems , David Kuck, 1996 NTBliss 9/29/2008

PhotoMAN Logical Representation -MIT/UCB 1 Group Memory Configuration- Detailed System-Level High-Level The Kuck notation is suitable for both high-level The Kuck notation is suitable for both high-level Legend: and detailed physical descriptions of the and detailed physical descriptions of the • AP - access point • APG - access point group architecture, such as groups and access points. architecture, such as groups and access points. MIT Lincoln Laboratory HPEC2008 8 NTBliss 9/29/2008

PhotoMAN Logical Representation -MIT/UCB 4 Group Memory Configuration- 0 15 SM 0...15 are DRAM SM 2 SM 2 XS to SM connections memory banks, are 1-to-1 8GB each 0 XSG ... 0 15 XS 2 XS 2 APN connections are 0 1-to-Number of Groups APN 1 Number of access ... ... 0 3 ... APG APG points per group is 0 15 0 15 AP 1 AP 1 AP 1 AP 1 equal to number of SMN 0...3 is an memory banks ... electrical mesh 0 3 SMN 1 SMN 1 connecting only processors within the group 0 1 2 3 255 M 0 M 0 M 0 M 0 M 0 Logical view of the 16 (N) group configuration ... is similar 0 1 2 3 255 P 0 P 0 P 0 P 0 P 0 N 0.5 is a single electrical mesh N 0.5 Legend: While the Kuck representation is flexible, the PhotoMAN study While the Kuck representation is flexible, the PhotoMAN study • APN - access point network • XS - cross bar is focused on 1, 4, and 16 group memory configurations. is focused on 1, 4, and 16 group memory configurations. • XSG - cross bar group MIT Lincoln Laboratory HPEC2008 9 NTBliss 9/29/2008

pMapper: Modeling and Mapping Machine description together with an abstraction layer is used to generate a Maps (distribution specifications) performance model are generated for the application Application specification Results can be used to predict (MATLAB) is used to application performance and architecture parameters generate a signal flow graph pMapper performs • application to architecture mapping • application on APPLICATION architecture simulation SIGNAL FLOW GRAPH MIT Lincoln Laboratory HPEC2008 11 NTBliss 9/29/2008

PhotoMAN Machine Description Given a hardware model H Given a hardware model H and a program parse tree T , and a program parse tree T , pMapper finds maps M pMapper finds maps M that that minimize execution latency: minimize execution latency: Focus of the PhotoMAN study MIT Lincoln Laboratory HPEC2008 12 NTBliss 9/29/2008

Memory Hierarchy Formulation -MIT/UCB 1 Group Memory Configuration- PHYSICAL VIEW CORE-TO-CORE NETWORK, N 0.5 SHARED MEMORY NETWORK, SMN 1 • Bandwidth and latency matrices have the • Bandwidth and latency matrices have the same pattern of non-zeros same pattern of non-zeros • Topology for N 0.5 and SMN 1 is the same • Topology for N 0.5 and SMN 1 is the same for the 1-Group configuration for the 1-Group configuration • Diagonal entries encode • Diagonal entries encode • R N - bandwidth to local store • R N - bandwidth to local store AP-to-SM • R Mon - whether P i is an access point • R Mon - whether P i is an access point ACCESS POINTS MIT Lincoln Laboratory HPEC2008 13 NTBliss 9/29/2008

Memory Hierarchy Formulation -MIT/UCB N G Group Memory Configuration- PHYSICAL VIEW SHARED MEMORY NETWORK, SMN 1 ACCESS POINTS AP-XS-MEMORY NETWORK • Core-to-core network not shown and is • Core-to-core network not shown and is the same as in 1 group case the same as in 1 group case • While memory access requires one • While memory access requires one additional transfer, the topology is additional transfer, the topology is represented with a single matrix - R AXSon represented with a single matrix - R AXSon AP-XS BANDWIDTH XS-MEMORY BANDWIDTH MIT Lincoln Laboratory HPEC2008 14 NTBliss 9/29/2008

Maps 1D CYCLIC 2D CYCLIC 1D HIERARCHICAL 1D BLOCK 2D BLOCK ... P0 P1 P2 P3 INCREASING PROGRAMMING COMPLEXITY • High programmability is a desirable architecture characteristic • High programmability is a desirable architecture characteristic • Complexity of mapping chosen to optimize performance (minimize • Complexity of mapping chosen to optimize performance (minimize execution time) provides insight into programmability of hardware execution time) provides insight into programmability of hardware • The higher complexity of the mapping, the lower programmability • The higher complexity of the mapping, the lower programmability MIT Lincoln Laboratory HPEC2008 16 NTBliss 9/29/2008

Photonic Many-Core Architecture Study Nadya Bliss 1 , Krste Asanovi - PowerPoint PPT Presentation

Photonic Many-Core Architecture Study Nadya Bliss 1 , Krste Asanovi 2 , Keren Bergman 3 , Luca Carloni 3 , Jeremy Kepner 1 , Sanjeev Mohindra 1 , Vladimir Stojanovi 4 1 MIT Lincoln Laboratory, 2 University of California Berkeley, 3 Columbia

Photonic Crystals Derek Stewart CNF Fall Workshop What are photonic crystals? Photonic crystals

Welcome Welcome Core: Core A Regional Destination Core: Core UL Core: Core Downtown

Photonic Crystals Photonic Crystals and Si Photonics and Si Photonics Toshihiko Baba Toshihiko

How the colour is created? Semiconductors vs Photonic Crystals (PCs) Semiconductors vs Photonic

Self-Assembly of Metal-Organic Framework Photonic Sensors Nanyang Research Programme Loi Si

Recent Advances in Photonic Recent Advances in Photonic effect employing IP- based distributed

Recent Progress in Recent Progress in Photonic Crystal Devices Photonic Crystal Devices

Silicon nitride based TriPleX Photonic Integrated Circuits for sensing applications Arne Leinse

Caching, Parallelism, Fault Tolerance Marco Serafini COMPSCI 532 Lectures 2-3 Memory Hierarchy

Real-Time Multi/Many-Core Architecture Heechul Yun 1 Real-Time Multi/Many-Core Architecture

Toward Efficient Many-to-Many Broadcast in Dynamic Wireless Networks Fabian Mager , Carsten

Photonic Crystal Cavities (coupled with InAs Quantum Dots) Wayne McKenzie Introduction

Pseudospin-1 Physics with Photonic Crystals Anan Fang, 2016 Department of Physics, HKUST Outline

Photonic Geometries for Light Trapping and Manipulation Zin Lin PI: Steven G. Johnson Outline

High-Q 3D Photonic Bandgap Cavities for Axion Detection LLNL Axion n Cavit ity Worksho hop p

Soliton self-frequency blueshift in Kagome hollow-core photonic crystal fibers LENCOS 2012,

An Evolutionary Exascale Programming Model Deserves Revolutionary Support Barbara Chapman

Real-time Monitoring of Large Scientific Simulations D. E. Laney V. Pascucci, R. J. Frank, G.

Midlands Highway Alliance Risk Based Approach to Maintenance Novotel, July 18 th 2017

City of Atlanta Dow ntow n and Midtow n W ayfinding Signage System System Overview Dow ntow n

D E M A N D D R I V E N A R C H I T E C T U R E K O VA S B O G U TA & D A V I D N O L E

Networking Named Content Van Jacobson, Diana K. Smetters, James D. Thornton, Michael Plass, Nick

Multi-Cloud Federated Kubernetes at CERN Clenimar Filemon @clenimar clenimar@lsd.ufcg.edu.br

Real-time Network Measurements Ran Ben Basat, Technion Joint work with Gil Einziger, Erez

Photonic Many-Core Architecture Study Nadya Bliss 1 , Krste Asanovi - PowerPoint PPT Presentation

Photonic Many-Core Architecture Study Nadya Bliss 1 , Krste Asanovi 2 , Keren Bergman 3 , Luca Carloni 3 , Jeremy Kepner 1 , Sanjeev Mohindra 1 , Vladimir Stojanovi 4 1 MIT Lincoln Laboratory, 2 University of California Berkeley, 3 Columbia

Photonic Crystals Derek Stewart CNF Fall Workshop What are photonic crystals? Photonic crystals

Welcome Welcome Core: Core A Regional Destination Core: Core UL Core: Core Downtown

Photonic Crystals Photonic Crystals and Si Photonics and Si Photonics Toshihiko Baba Toshihiko

How the colour is created? Semiconductors vs Photonic Crystals (PCs) Semiconductors vs Photonic

Self-Assembly of Metal-Organic Framework Photonic Sensors Nanyang Research Programme Loi Si

Recent Advances in Photonic Recent Advances in Photonic effect employing IP- based distributed

Recent Progress in Recent Progress in Photonic Crystal Devices Photonic Crystal Devices

Silicon nitride based TriPleX Photonic Integrated Circuits for sensing applications Arne Leinse

Caching, Parallelism, Fault Tolerance Marco Serafini COMPSCI 532 Lectures 2-3 Memory Hierarchy

Real-Time Multi/Many-Core Architecture Heechul Yun 1 Real-Time Multi/Many-Core Architecture

Toward Efficient Many-to-Many Broadcast in Dynamic Wireless Networks Fabian Mager , Carsten

Photonic Crystal Cavities (coupled with InAs Quantum Dots) Wayne McKenzie Introduction

Pseudospin-1 Physics with Photonic Crystals Anan Fang, 2016 Department of Physics, HKUST Outline

Photonic Geometries for Light Trapping and Manipulation Zin Lin PI: Steven G. Johnson Outline

High-Q 3D Photonic Bandgap Cavities for Axion Detection LLNL Axion n Cavit ity Worksho hop p

Soliton self-frequency blueshift in Kagome hollow-core photonic crystal fibers LENCOS 2012,

An Evolutionary Exascale Programming Model Deserves Revolutionary Support Barbara Chapman

Real-time Monitoring of Large Scientific Simulations D. E. Laney V. Pascucci, R. J. Frank, G.

Midlands Highway Alliance Risk Based Approach to Maintenance Novotel, July 18 th 2017

City of Atlanta Dow ntow n and Midtow n W ayfinding Signage System System Overview Dow ntow n

D E M A N D D R I V E N A R C H I T E C T U R E K O VA S B O G U TA &amp; D A V I D N O L E

Networking Named Content Van Jacobson, Diana K. Smetters, James D. Thornton, Michael Plass, Nick

Multi-Cloud Federated Kubernetes at CERN Clenimar Filemon @clenimar clenimar@lsd.ufcg.edu.br

Real-time Network Measurements Ran Ben Basat, Technion Joint work with Gil Einziger, Erez

D E M A N D D R I V E N A R C H I T E C T U R E K O VA S B O G U TA & D A V I D N O L E