 
              Hermes-A: An Asynchronous NoC Router with Distributed Routing Julian Pontes Matheus Moreira Fernando Moraes Ney Calazans 1
Outline • Introduction • Related Work • Architecture – Input Port • Path Calculation – Output Port • Output Control • Results • Future Work • Conclusions 2
Introduction A B S • Dual Rail Encoding 0 0 SF • Four Phase Protocol 0 1 ST • DIMS Logic 1 0 ST 1 1 SF CD CD CD CD R R R R R R e Logic e e e e e Logic g g g g g g 3
Introduction • Asynchronous Circuits – Less Simultaneous Switching ☺ • Less EMI • Less IR Drop ( Slight PowerPlan ) • Less Peak Power ( No Decap Cells ) • Less Crosstalk Problems in Data Links ??? (DI Codes - Four Phase) ( Partial Shielding in data links ) – Average Case Delay ☺ – Reduce Dynamic Power (5 times – 65nm comparison) ☺ 4
Introduction • Asynchronous Circuits – Area and Leakage Overhead (~5-3 times more – 65nm) � – Lack of CAD Tools and Standards � • Synthesis Tools – Traditional Tools (~45 Thousand loop breakers in a 3x3 NoC) – Asynchronous Synthesis Tools (Balsa, Teak) » Lack of traditional optimizations (Pin Swapping, Reordering, Retime, …) • STA – Liberty File Support (is_async_reg) – New Set of Constraints (Cycle Time Definition) 5
Introduction • Networks on Chip – Offer large communication parallelism – Can provide alternate paths • Asynchronous Network on Chip – Enable the Design of Complex GALS Systems on Chip 6
Objetive • Design an asynchronous router architecture capable to support the design of GALS Systems – High Throughput – Low Power – Permit the implementation of fine grain control power » MVS » Power Shut-Off 7
Related Work Characteristics Topology Routing / Flow Network Asynchronous Links and Implementati � Control Interface Style encoding on NoC As. QNoC 2D Mesh Source / wormhole N.A 4-phase 10-bit flits 180 nm, (Irreg/Reg) / credit-based with bundled-data 200Mflits/s, preemption ASIC 8VCs RasP Framework Source / bit serial Ad hoc QDI Point-to- 180nm, / point-to- point 700Mb/s point pipelined (Irreg/Reg) serial links ASPIN 2D Mesh Distributed XY / A2S, S2A Bundled-data/ Dual-rail, 4- 90nm, (Reg) wormhole / EOP FIFOs QDI ph., 34-bit 714Mflits/s flits ANoC 2D Mesh Source / Adaptive - QDI One of Four 130nm/ 2VCs 5Gb/s (router) Hermes-A 2D Mesh Distributed XY / Dual-Rail QDI Dual-Rail 180nm, wormhole / BOP- SCAFFI 727Mbits/s, EOP (454Mflits/s 3.6Gb/s per router) ASIC 8
Related Work Characteristics Topology Routing / Flow Network Asynchronous Links and Implementati � Control Interface Style encoding on NoC As. QNoC 2D Mesh Source / wormhole N.A 4-phase 10-bit flits 180 nm, (Irreg/Reg) / credit-based with bundled-data 200Mflits/s, preemption ASIC 8VCs RasP Framework Source / bit serial Ad hoc QDI Point-to- 180nm, / point-to- point 700Mb/s point pipelined (Irreg/Reg) serial links ASPIN 2D Mesh Distributed XY / A2S, S2A Bundled-data/ Dual-rail, 4- 90nm, (Reg) wormhole / EOP FIFOs QDI ph., 34-bit 714Mflits/s flits ANoC 2D Mesh Source / Adaptive - QDI One of Four 130nm/ 2VCs 5Gb/s (router) Hermes-A 2D Mesh Distributed XY / Dual-Rail QDI Dual-Rail 180nm, wormhole / BOP- SCAFFI 727Mbits/s, EOP (454Mflits/s 3.6Gb/s per router) ASIC 9
Related Work Characteristics Topology Routing / Flow Network Asynchronous Links and Implementati � Control Interface Style encoding on NoC As. QNoC 2D Mesh Source / wormhole N.A 4-phase 10-bit flits 180 nm, (Irreg/Reg) / credit-based with bundled-data 200Mflits/s, preemption ASIC 8VCs RasP Framework Source / bit serial Ad hoc QDI Point-to- 180nm, / point-to- point 700Mb/s point pipelined (Irreg/Reg) serial links ASPIN 2D Mesh Distributed XY / A2S, S2A Bundled-data/ Dual-rail, 4- 90nm, (Reg) wormhole / EOP FIFOs QDI ph., 34-bit 714Mflits/s flits ANoC 2D Mesh Source / Adaptive - QDI One of Four 130nm/ 2VCs 5Gb/s (router) Hermes-A 2D Mesh Distributed XY / Dual-Rail QDI Dual-Rail 180nm, wormhole / BOP- SCAFFI 727Mbits/s, EOP (454Mflits/s 3.6Gb/s per router) ASIC 10
Related Work Characteristics Topology Routing / Flow Network Asynchronous Links and Implementati � Control Interface Style encoding on NoC As. QNoC 2D Mesh Source / wormhole N.A 4-phase 10-bit flits 180 nm, (Irreg/Reg) / credit-based with bundled-data 200Mflits/s, preemption ASIC 8VCs RasP Framework Source / bit serial Ad hoc QDI Point-to- 180nm, / point-to- point 700Mb/s point pipelined (Irreg/Reg) serial links ASPIN 2D Mesh Distributed XY / A2S, S2A Bundled-data/ Dual-rail, 4- 90nm, (Reg) wormhole / EOP FIFOs QDI ph., 34-bit 714Mflits/s flits ANoC 2D Mesh Source / Adaptive - QDI One of Four 130nm/ 2VCs 5Gb/s (router) Hermes-A 2D Mesh Distributed XY / Dual-Rail QDI Dual-Rail 180nm, wormhole / BOP- SCAFFI 727Mbits/s, EOP Clock (454Mflits/s Stretching 3.6Gb/s per router) ASIC 11
Related Work Characteristics Topology Routing / Flow Network Asynchronous Links and Implementati � Control Interface Style encoding on NoC As. QNoC 2D Mesh Source / wormhole N.A 4-phase 10-bit flits 180 nm, (Irreg/Reg) / credit-based with bundled-data 200Mflits/s, preemption ASIC 8VCs RasP Framework Source / bit serial Ad hoc QDI Point-to- 180nm, / point-to- point 700Mb/s point pipelined (Irreg/Reg) serial links ASPIN 2D Mesh Distributed XY / A2S, S2A Bundled-data/ Dual-rail, 4- 90nm, (Reg) wormhole / EOP FIFOs QDI ph., 34-bit 714Mflits/s flits ANoC 2D Mesh Source / Adaptive - QDI One of Four 130nm/ 2VCs 5Gb/s (router) Hermes-A 2D Mesh Distributed XY / Dual-Rail QDI Dual-Rail 180nm, wormhole / BOP- SCAFFI 727Mbits/s, EOP Clock (454Mflits/s Stretching 3.6Gb/s per router) ASIC 12
Related Work Characteristics Topology Routing / Flow Network Asynchronous Links and Implementati � Control Interface Style encoding on NoC As. QNoC 2D Mesh Source / wormhole N.A 4-phase 10-bit flits 180 nm, (Irreg/Reg) / credit-based with bundled-data 200Mflits/s, preemption ASIC 8VCs RasP Framework Source / bit serial Ad hoc QDI Point-to- 180nm, / point-to- point 700Mb/s point pipelined (Irreg/Reg) serial links ASPIN 2D Mesh Distributed XY / A2S, S2A Bundled-data/ Dual-rail, 4- 90nm, (Reg) wormhole / EOP FIFOs QDI ph., 34-bit 714Mflits/s flits ANoC 2D Mesh Source / Adaptive - QDI One of Four 130nm/ 2VCs 5Gb/s (router) Hermes-A 2D Mesh Distributed XY / Dual-Rail QDI Dual-Rail 180nm, wormhole / BOP- SCAFFI 727Mbits/s, EOP Clock (454Mflits/s Stretching 3.6Gb/s per router) ASIC 13
Related Work Characteristics Topology Routing / Flow Network Asynchronous Links and Implementati � Control Interface Style encoding on NoC As. QNoC 2D Mesh Source / wormhole N.A 4-phase 10-bit flits 180 nm, (Irreg/Reg) / credit-based with bundled-data 200Mflits/s, preemption ASIC 8VCs RasP Framework Source / bit serial Ad hoc QDI Point-to- 180nm, / point-to- point 700Mb/s point pipelined (Irreg/Reg) serial links ASPIN 2D Mesh Distributed XY / A2S, S2A Bundled-data/ Dual-rail, 4- 90nm, (Reg) wormhole / EOP FIFOs QDI ph., 34-bit 714Mflits/s flits ANoC 2D Mesh Source / Adaptive - QDI One of Four 130nm/ 2VCs 5Gb/s (router) Hermes-A 2D Mesh Distributed XY / Dual-Rail QDI Dual-Rail 180nm, wormhole / BOP- SCAFFI 727Mbits/s, EOP Clock (454Mflits/s Stretching 3.6Gb/s per router) ASIC 14
Router Architecture • Distributed Routing • Independent Ports • Dual Rail Encoding • Weak Conditioned Half Buffer • DIMS Logic 15
Input Port • Packet – First Flit contains the address – BOP and EOP delimiters – Three main paths • First Flit (1), Last Flit (3) and other Flits (2) 16
Input Port 10 17
Path Calculation • All logic employs Delay Insensitive Minterm Synthesis • First Flit contains the XY address 18
Input Port 14 19
Input Port 10 4 20
Input Port 10 4 21
Input Port 14 22
Input Port 10 4 23
Input Port 10 4 24
Input Port 4 10 25
Input Port 14 26
Input Port 10 4 27
Input Port 4 K 28
Input Port 14 29
Input Port K 30
Input Port S-Control 31
S-Element - Enclosure • Starts with a handshake at the input port • Perform two handshakes – First to send the last flit – Second to close the communication section at the output port (EOP = BOP = 1) • Speed Independent Design – Circuit generated with Petrify 32
S-Control S-Control INPUT LAST FLIT Input Ack S-Control – Output Port A LAST FLIT Ack A S-Control – Output Port B BOP = EOP =1 AckB 33
Output Port • Arbitration • Kill Section 34
Output Port 35
Recommend
More recommend