router microarchitecture and scalability of ring topology
play

Router Microarchitecture and Scalability of Ring Topology in - PowerPoint PPT Presentation

Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks John Kim, Hanjoon Kim Department of Computer Science KAIST Ring Router Microarchitecture NoCArc09 1 Topology Topology efficiently exploits the available


  1. Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks John Kim, Hanjoon Kim Department of Computer Science KAIST Ring Router Microarchitecture NoCArc’09 1

  2. Topology • Topology efficiently exploits the available packaging technology to meet the requirements at a minimum cost saturation throughput zero-load latency Ring Router Microarchitecture NoCArc’09 2

  3. On-chip networks are different [Scott et al. ISCA06] [src: Intel Developers Forum] On-Chip Networks Off-Chip Networks Ring Router Microarchitecture NoCArc’09 3

  4. Topologies for On-Chip Networks • Crossbar is often sufficient – if it can be done efficiently • 2D mesh topology commonly assumed • Many different topologies recently proposed – CMESH [ICS’06] – Flattened butterfly [Micro’07] – Express Cubes [HPCA’09] – Hierarchical Network [HPCA’09] – … • Recent multicore architectures have used the ring topology – Cell processor, Intel processors, … Ring Router Microarchitecture NoCArc’09 4

  5. Why Ring Topology? • Routing – route with clockwise or counterclockwise – route until destination reached • Low-radix router – each “router” only requires 3 ports (local port, left & right port) • Flow control – Arbitration can be simplified – 3 ports but only two maximum requests • Can be implemented without “routers” – Bufferless router – Simple topology Ring Router Microarchitecture NoCArc’09 5

  6. Today’s Talk • Background in On-Chip Networks and Topology • Router Microarchitecture for Ring Topology • Scalability of Ring Topology • Summary Ring Router Microarchitecture NoCArc’09 6

  7. Bufferless router in ring topology • Simplified arbitration – Priority to packets already in flight – Guaranteed (deterministic) latency to destination • No buffers needed – No misrouting [Bufferless router ISCA’09] – No packet dropping [SCARAB Micro’09] • Only two-input muxes • No routing deadlock Ring Router Microarchitecture NoCArc’09 7

  8. Conventional Router Microarchitecture Ring Router Microarchitecture NoCArc’09 8

  9. Bufferless Ring Topology Router Microarchitecture Ring Router Microarchitecture NoCArc’09 9

  10. No buffers needed Ring Router Microarchitecture NoCArc’09 10

  11. Bufferless router in ring topology • Simplified arbitration – Priority to packets already in flight – Guaranteed (deterministic) latency to destination • No buffers needed – No misrouting [Bufferless router ISCA’09] – No packet dropping [SCARAB Micro’09] • Only two-input muxes • No routing deadlock • However… – Requires reserving the path to destination – Can reduce performance/throughput Ring Router Microarchitecture NoCArc’09 11

  12. Lightweight Router Microarchitecture • Add a buffer entry (2 buffer entry per input port) • Credit-based flow control for backpressure • Maintain same prioritized arbitration for packets in flight • Arbitration needed when ejecting packets lightweight bufferless Ring Router Microarchitecture NoCArc’09 12

  13. Lightweight Router Microarchitecture • No predetermined routing – Bufferless : only in the appropriate slot was packet injected into the network – Lightweight : the packet can be injected at any time • Deadlock – Packets in the bufferless router were guaranteed to make progress – Routing deadlock still avoided without additional virtual channels ( see paper for detail ) Ring Router Microarchitecture NoCArc’09 13

  14. Evaluation • Cycle accurate simulator used to compared ring router microarchitecture • Simulator parameters include – N = 16 – single-flit packet (1 flit = 512 bits) – synthetic traffic patterns • Orion2.0 used to model area / power (results in paper) • Following microarchitectures compared: – baseline (3 cycle) – bufferless (1 cycle) – lightweight (1 cycle) Ring Router Microarchitecture NoCArc’09 14

  15. Performance Comparison 30 30 25 25 Latency (cycles) Latency (cycles) 20 20 bufferless bufferless 15 15 lightweight lightweight 10 baseline (b=2) baseline (b=2) 10 baseline (b=8) baseline (b=8) 5 5 0 0 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 Offered load (fraction of capacity) Offered load (fraction of capacity) uniform random bit complement Ring Router Microarchitecture NoCArc’09 15

  16. Impact of Prioritized Arbitration 30 25 Latency (cycles) 20 baseline (b=1) 15 baseline (b=2) 10 lightweight 5 0 0 0.2 0.4 0.6 0.8 Offered load (fraction of capacity) Ring Router Microarchitecture NoCArc’09 16

  17. Today’s Talk • Background in On-Chip Networks and Topology • Router Microarchitecture for Ring Topology • Scalability of Ring Topology • Summary Ring Router Microarchitecture NoCArc’09 17

  18. How Scalable is the Ring Topology? • Assumption : same bisection bandwidth comparing ring and 2D mesh  The bandwidth PER channel for ring is higher than 2D mesh  Trade-off of hop count vs serialization latency  Per-hop latency can be higher with 2D mesh Ring Router Microarchitecture NoCArc’09 18

  19. Synthetic Workload 2.5 Normalized runtime 2 1.5 ring 1 mesh 0.5 0 network size (N) 16 36 64 16 36 64 16 36 64 16 36 64 max oustanding req (r) 2 4 8 16 Ring Router Microarchitecture NoCArc’09 19

  20. Bandwidth Fragmentation • 2D mesh : – short packets (req) = 1 flit – long packets (reply) = 4 flits • ring : – short packets (req) = 1 flit – long packets (reply) = 1 flit  Wide channels results in high bandwidth for ring  However, for short packets, ring only utilizes ¼ of the channel bandwidth  Ring topology inefficient for short packets Ring Router Microarchitecture NoCArc’09 20

  21. Bandwidth Fragmentation 2.5 2.5 Normalized runtime Normalized runtime 2 2 1.5 1.5 ring 1 1 mesh 0.5 0.5 0 0 16 36 64 16 36 64 16 36 64 16 36 64 16 36 64 16 36 64 16 36 64 16 36 64 2 4 8 16 2 4 8 16 single flits pkts bimodal pkts Ring Router Microarchitecture NoCArc’09 21

  22. Limitations of this study • “Packaging” of on-chip network topology = 2D layout of the topology • Layout of topology can impact the performance – 2D mesh : only require communicating with neighbors – Ring : long links can be needed as network scale • Hierarchical rings not investigated. • Router complexity (for mesh) not properly modeled. Ring Router Microarchitecture NoCArc’09 22

  23. Summary • On-chip networks presents different constraints compared to off- chip networks – can exploit different router microarchitecture. • Ring topology presents a simple topology and bufferless router microarchitecture can be implemented. • Lightweight router microarchitecture proposed to increase performance with minimal additional complexity. • Ring topology can scale but because of bandwidth fragmentation, can be limited in scalability – especially high traffic. • Can we scale this router microarchitecture to 2D mesh topology? Ring Router Microarchitecture NoCArc’09 23

  24. Low-Cost Router Microarchitecture (Micro’09) Ring Router Microarchitecture NoCArc’09 24

  25. Thank you Questions? Ring Router Microarchitecture NoCArc’09 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend