1 NoCArc’09 Ring Router Microarchitecture
Router Microarchitecture and Scalability of Ring Topology in - - PowerPoint PPT Presentation
Router Microarchitecture and Scalability of Ring Topology in - - PowerPoint PPT Presentation
Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks John Kim, Hanjoon Kim Department of Computer Science KAIST Ring Router Microarchitecture NoCArc09 1 Topology Topology efficiently exploits the available
2 NoCArc’09 Ring Router Microarchitecture
Topology
- Topology efficiently exploits the available packaging
technology to meet the requirements at a minimum cost
zero-load latency saturation throughput
3 NoCArc’09
[Scott et al. ISCA06]
On-chip networks are different
Ring Router Microarchitecture
Off-Chip Networks On-Chip Networks
[src: Intel Developers Forum]
4 NoCArc’09
Topologies for On-Chip Networks
- Crossbar is often sufficient – if it can be done efficiently
- 2D mesh topology commonly assumed
- Many different topologies recently proposed
– CMESH [ICS’06] – Flattened butterfly [Micro’07] – Express Cubes [HPCA’09] – Hierarchical Network [HPCA’09] – …
- Recent multicore architectures have used the ring topology
– Cell processor, Intel processors, …
Ring Router Microarchitecture
5 NoCArc’09
Why Ring Topology?
- Routing
– route with clockwise or counterclockwise – route until destination reached
- Low-radix router
– each “router” only requires 3 ports (local port, left & right port)
- Flow control
– Arbitration can be simplified – 3 ports but only two maximum requests
- Can be implemented without “routers”
– Bufferless router – Simple topology
Ring Router Microarchitecture
6 NoCArc’09 Ring Router Microarchitecture
Today’s Talk
- Background in On-Chip Networks and Topology
- Router Microarchitecture for Ring Topology
- Scalability of Ring Topology
- Summary
7 NoCArc’09
Bufferless router in ring topology
- Simplified arbitration
– Priority to packets already in flight – Guaranteed (deterministic) latency to destination
- No buffers needed
– No misrouting [Bufferless router ISCA’09] – No packet dropping [SCARAB Micro’09]
- Only two-input muxes
- No routing deadlock
Ring Router Microarchitecture
8 NoCArc’09
Conventional Router Microarchitecture
Ring Router Microarchitecture
9 NoCArc’09
Bufferless Ring Topology Router Microarchitecture
Ring Router Microarchitecture
10 NoCArc’09
No buffers needed
Ring Router Microarchitecture
11 NoCArc’09
Bufferless router in ring topology
- Simplified arbitration
– Priority to packets already in flight – Guaranteed (deterministic) latency to destination
- No buffers needed
– No misrouting [Bufferless router ISCA’09] – No packet dropping [SCARAB Micro’09]
- Only two-input muxes
- No routing deadlock
- However…
– Requires reserving the path to destination – Can reduce performance/throughput
Ring Router Microarchitecture
12 NoCArc’09
Lightweight Router Microarchitecture
- Add a buffer entry (2 buffer entry per input port)
- Credit-based flow control for backpressure
- Maintain same prioritized arbitration for packets in flight
- Arbitration needed when ejecting packets
Ring Router Microarchitecture
bufferless lightweight
13 NoCArc’09
Lightweight Router Microarchitecture
- No predetermined routing
– Bufferless : only in the appropriate slot was packet injected into the network – Lightweight : the packet can be injected at any time
- Deadlock
– Packets in the bufferless router were guaranteed to make progress – Routing deadlock still avoided without additional virtual channels ( see paper for detail )
Ring Router Microarchitecture
14 NoCArc’09
Evaluation
- Cycle accurate simulator used to compared ring router
microarchitecture
- Simulator parameters include
– N = 16 – single-flit packet (1 flit = 512 bits) – synthetic traffic patterns
- Orion2.0 used to model area / power (results in paper)
- Following microarchitectures compared:
– baseline (3 cycle) – bufferless (1 cycle) – lightweight (1 cycle)
Ring Router Microarchitecture
15 NoCArc’09
Performance Comparison
Ring Router Microarchitecture 5 10 15 20 25 30 0.2 0.4 0.6 0.8
Latency (cycles) Offered load (fraction of capacity)
bufferless lightweight baseline (b=2) baseline (b=8) 5 10 15 20 25 30 0.2 0.4 0.6 0.8
Latency (cycles) Offered load (fraction of capacity)
bufferless lightweight baseline (b=2) baseline (b=8)
uniform random bit complement
16 NoCArc’09
Impact of Prioritized Arbitration
Ring Router Microarchitecture
5 10 15 20 25 30 0.2 0.4 0.6 0.8 Latency (cycles) Offered load (fraction of capacity)
baseline (b=1) baseline (b=2) lightweight
17 NoCArc’09 Ring Router Microarchitecture
Today’s Talk
- Background in On-Chip Networks and Topology
- Router Microarchitecture for Ring Topology
- Scalability of Ring Topology
- Summary
18 NoCArc’09
How Scalable is the Ring Topology?
- Assumption : same bisection bandwidth comparing ring and 2D
mesh The bandwidth PER channel for ring is higher than 2D mesh Trade-off of hop count vs serialization latency Per-hop latency can be higher with 2D mesh
Ring Router Microarchitecture
19 NoCArc’09
0.5 1 1.5 2 2.5 16 36 64 16 36 64 16 36 64 16 36 64 2 4 8 16
Normalized runtime ring mesh
Synthetic Workload
Ring Router Microarchitecture
network size (N) max oustanding req (r)
20 NoCArc’09
Bandwidth Fragmentation
- 2D mesh :
– short packets (req) = 1 flit – long packets (reply) = 4 flits
- ring :
– short packets (req) = 1 flit – long packets (reply) = 1 flit Wide channels results in high bandwidth for ring However, for short packets, ring only utilizes ¼ of the channel bandwidth Ring topology inefficient for short packets
Ring Router Microarchitecture
21 NoCArc’09
0.5 1 1.5 2 2.5 16 36 64 16 36 64 16 36 64 16 36 64 2 4 8 16
Normalized runtime
0.5 1 1.5 2 2.5 16 36 64 16 36 64 16 36 64 16 36 64 2 4 8 16
Normalized runtime
ring mesh
Bandwidth Fragmentation
Ring Router Microarchitecture
bimodal pkts single flits pkts
22 NoCArc’09
Limitations of this study
- “Packaging” of on-chip network topology = 2D layout of the
topology
- Layout of topology can impact the performance
– 2D mesh : only require communicating with neighbors – Ring : long links can be needed as network scale
- Hierarchical rings not investigated.
- Router complexity (for mesh) not properly modeled.
Ring Router Microarchitecture
23 NoCArc’09 Ring Router Microarchitecture
Summary
- On-chip networks presents different constraints compared to off-
chip networks – can exploit different router microarchitecture.
- Ring topology presents a simple topology and bufferless router
microarchitecture can be implemented.
- Lightweight router microarchitecture proposed to increase
performance with minimal additional complexity.
- Ring topology can scale but because of bandwidth
fragmentation, can be limited in scalability – especially high traffic.
- Can we scale this router microarchitecture to 2D mesh
topology?
24 NoCArc’09
Low-Cost Router Microarchitecture (Micro’09)
Ring Router Microarchitecture
25 NoCArc’09 Ring Router Microarchitecture