opensmart single cycle multi hop noc generator in bsv and
play

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel - PowerPoint PPT Presentation

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) hyoukjun@gatech.edu April 25, 2017 Hardware Development Cost


  1. OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) hyoukjun@gatech.edu April 25, 2017

  2. Hardware Development Cost source: Todd Austin, Micro-49 keynote • Low cost challenge 2

  3. IP IP IP IP IP IP IP IP IP IP Many-IP Heterogeneous System Wireless … CPU1 CPU2 GPU Network Network-on-Chip (NoC) … Sensor Sensor2 Memory Accelerator • Scalability challenge • Flexibility challenge 3

  4. Diverse System Requirements Throughput Critical Latency Critical source: MNIST, Engadget, TheStack 4

  5. Challenges for NoCs • Low-cost - Low design/verification costs of custom/generic NoCs - Design Automation of high-performance, low-energy NoCs • Scalability - Many-IP heterogeneous system support - Low latency - Low energy - Low area • Flexibility - Diverse connectivity - Diverse latency/throughput requirements 5

  6. OpenSMART Low Cost Flexibility Scalability User-configurable SMART NoC Automatic NoC Generation Krishna et al, HPCA 2013 Chen et al, DATE 2013 High-level Krishna et al, IEEE Micro Top Picks 2014 HW Lanugage Arbitrary Area/power-efficient Verified on FPGA Topology Support RTL Building Blocks OpenSMART 6

  7. Outline • Motivation: Scalable, Flexible, and Low-cost NoCs • Background: SMART NoCs • OpenSMART - Design Flow - Building Blocks - Walk-through Examples • Case Studies - Mesh vs. SMART - High-radix vs. Low-radix • Conclusions 7

  8. Outline • Motivation: Scalable, Flexible, and Low-cost NoCs • Background: SMART NoCs • OpenSMART - Design Flow - Building Blocks - Walk-through Examples • Case Studies - Mesh vs. SMART - High-radix vs. Low-radix • Conclusions 8

  9. SMART NoC • Single-cycle Multi-hop Asynchronous Repeated Traversal SSR (SMART Setup Request) SSR (SMART Setup Request)SSR (SMART Setup Request) D S SMART: achieve the performance of dedicated HPCmax Krishna et al, HPCA 2013 Chen et al, DATE 2013 connections over a network of shared links Krishna et al, IEEE Micro Top Picks 2014, 1-cycle (no other traffic) 9

  10. Is 1-cycle Network Possible? Yes Is wire fast enough to support 1-cycle network? • Wire traversal length within 1ns (1Ghz): 10-16mm • Wire delay over technology: constant • Chip dimension: remain similar (~20mm) On-chip wires are fast enough to transmit across the chip • Clock frequency: remain similar (1~3GHz) within 1-2 cycles at 1GHz even if technology scales • Tile dimension: decrease over technology ~20mm ~20mm ~20mm ~20mm 10

  11. Features of SMART • Low latency network - Dynamic bypass of intermediate routers between any two routers - Limit: HPCmax (hops per cycle max), maximum number of “hops” that the underlying wire allows the flit to traverse within a clock cycle • Separate control path - HPCmax bits from every router along each direction - Arbitration of multiple bypass requests on the same link - No ACK required 11

  12. Outline • Motivation: Scalable, Flexible, and Low-cost NoCs • Background: SMART NoCs • OpenSMART - Design Flow - Building Blocks - Walk-through Examples • Case Studies - Mesh vs. SMART - High-radix vs. Low-radix • Conclusions 12

  13. OpenSMART Design Flow OpenSMART Front-end Topology Verilog BSV/Chisel Input Output Files Unit Unit Compiler - Bandwidth - VC SMART Switch - Routing Unit Unit … … Building Block Library (RTL) ASIC/FPGA Configuration Synthesis Tool HPCmax Analyzer User External Specification OpenSMART Tool Chains 13

  14. IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP IP OpenSMART NoC NoC generated by OpenSMART NoC generated by OpenSMART NIC NIC NIC NIC NIC NIC NIC NIC Router Router Router Router Router Router Router Router … … Router Router Router Router Router Router Interface Router Router AMBA Wishbone NIC NIC NIC NIC NIC NIC NIC NIC Custom 14

  15. Outline • Motivation: Scalable, Flexible, and Low-cost NoCs • Background: SMART NoCs • OpenSMART - Design Flow - Building Blocks - Walk-through Examples • Case Studies - Mesh vs. SMART - High-radix vs. Low-radix • Conclusions 15

  16. OpenSMART Building Blocks input Buffer + input VC arbitration Bu ff er Arbiter Input Unit output VC selection + output port arbitration + credit management VC Arbiter Selector Output Unit >> switching (via crossbar) + routing calculation Routing Crossbar Calculator Switch Unit SSR SSR communication & arbitration + bypass flag SSR Controller Bypass Flag SMART Unit 16

  17. OpenSMART Router Input Unit Arbiter OpenSMART Router (Baseline) Arbiter Incoming Outgoing Flits Flits Output Units Input Units Flit Flit >> Header Header … Switch Unit Flit Flit Data … Flit Size Flit Data Input Bu ff ers Number of VCs/VC Depth 17

  18. OpenSMART Router Output Unit Arbiter OpenSMART Router (Baseline) Incoming Output Port Outgoing Flits Request Flits Output Port Grant Output Units Arbiter Input Units VC Selector >> nextVC Switch Unit VC nextVC VC queue hasCredit Credit Manager Credit 18

  19. OpenSMART Router OpenSMART Router (Baseline) Incoming Outgoing Flits Flits Switching Unit Outgoing From Output Units Flits Input Units >> Input Units >> >> >> >> Routing Algorithm Switch Unit Routing Crossbar Unit 19

  20. OpenSMART Router (SMART) SMART Unit Incoming Outgoing OpenSMART Router (SMART) Incoming Outgoing SSRs SSRs >> SSRs SSRs SSR Controller Priority Incoming Flits SMART Unit SMART Arbiter HPC max Input Units Output Units Bypass Flag >> Outgoing Priority Flits Switch Unit Prioritization by distance -> SSR from a nearer router gets the higher priority SSR Prioritization (Local (distance = 0) has the highest prirority) SSR From Bypass MUX Local Router Selection 20

  21. OpenSMART Router (1cycle) OpenSMART Router (Baseline) Incoming Outgoing Flits Flits Cycle 0 Cycle 1 Output Units Input Units >> Switch Unit 21

  22. OpenSMART Router (2cycle/SMART) Cycle 1 OpenSMART Router (SMART) Incoming Outgoing >> SSRs SSRs Priority Incoming Flits Cycle 0 SMART Unit Input Units Output Units >> Outgoing Flits Switch Unit Cycle 3 22

  23. Outline • Motivation: Scalable, Flexible, and Low-cost NoCs • Background: SMART NoCs • OpenSMART - Design Flow - Building Blocks - Walk-through Examples • Case Studies - Mesh vs. SMART - High-radix vs. Low-radix • Conclusions 23

  24. r4 Walk-through Example 1 • Router r4 sends a flit to router r7 • HPCmax = 3 bypass, bypass, stop Cycle 1: Multi-hop Bypass Cycle 0: SSR Send 110 110 110 SSR (SMART Setup Request) r5 r6 r7 24

  25. r4 Walk-through Example 2 • Router r4 sends a flit to router r7 Incoming Incoming • Router r5 sends a flit to router r7 SSRs SSRs SMART Arbiter SMART Arbiter • HPCmax = 3 Dist = 3 Bypass Flag Dist = 3 Bypass Flag Cycle 1: Multi-hop Bypass Cycle 0: SSR Send Dist = 2 Dist = 2 Priority Priority Dist = 1 Dist = 1 110 110 SSR (SMART Setup Request) 110 From: r4 100 100 Dist = 0 From: r5 Winner Dist = 0 SSR From SSR From Local Router Local Router r5 r6 r7 SMART Unit in r5 25

  26. OpenSMART: Features • Language - BSV and Chisel • Flow control - VC and SMART • Buffer Management - Credit-based buffer management • Router Microarchitecture - 1- and 2-cycle state-of-the-art packet switching router - SMART router 26

  27. OpenSMART: Features • Routing Calculation - XY , YX , and source-routing - One-hot encoding hop count + shift-based routing calculation - For SMART, routing calculation is done during bypasses • VC Selection - FIFO -based dynamic VC selection - Next VC is stored in a separate register - For SMART, VC selection is done during bypasses 27

  28. Outline • Motivation: Scalable, Flexible, and Low-cost NoCs • Background: SMART NoCs • OpenSMART - Design Flow - Building Blocks - Walk-through Examples • Case Studies - Mesh vs. SMART - High-radix vs. Low-radix • Conclusions 28

  29. Latency 5X 4X (b) Bit-complement (a) Uniform Random 29

  30. Energy Consumption Repeaters require less energy than clocked latches 30

  31. HPCmax (a) HPCmax on ASIC (b) HPCmax on FPGA 31

  32. Outline • Motivation: Scalable, Flexible, and Low-cost NoCs • Background: SMART NoCs • OpenSMART - Design Flow - Building Blocks - Walk-through Examples • Case Studies - Mesh vs. SMART - High-radix vs. Low-radix • Conclusions 32

  33. Router Area Number of Ports (a) ASIC area (b) FPGA LUTs (c) FPGA FFs 33

  34. Router Power Number of Ports (a) ASIC (b) FPGA 34

  35. Maximum Clock Frequency 35

  36. Outline • Motivation: Scalable, Flexible, and Low-cost NoCs • Background: SMART NoCs • OpenSMART - Design Flow - Building Blocks - Walk-through Examples • Case Studies - Mesh vs. SMART - High-radix vs. Low-radix • Conclusions 36

  37. Conclusion • NoCs are crucial components to support many- IP heterogeneous systems – Providing connectivity while satisfying their diverse requrements. • OpenSMART provides automatic generation of NoCs for many-IP heterogeneous systems – Supports recent low latency SMART NoC as well as highly-optimized 1-cycle routers – Written in high-level HDLs 37

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend