Router Microarchitecture and Scalability of Ring Topology in - - PowerPoint PPT Presentation

router microarchitecture and scalability of ring topology
SMART_READER_LITE
LIVE PREVIEW

Router Microarchitecture and Scalability of Ring Topology in - - PowerPoint PPT Presentation

Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks John Kim, Hanjoon Kim Department of Computer Science KAIST Ring Router Microarchitecture NoCArc09 1 Topology Topology efficiently exploits the available


slide-1
SLIDE 1

1 NoCArc’09 Ring Router Microarchitecture

Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

John Kim, Hanjoon Kim Department of Computer Science KAIST

slide-2
SLIDE 2

2 NoCArc’09 Ring Router Microarchitecture

Topology

  • Topology efficiently exploits the available packaging

technology to meet the requirements at a minimum cost

zero-load latency saturation throughput

slide-3
SLIDE 3

3 NoCArc’09

[Scott et al. ISCA06]

On-chip networks are different

Ring Router Microarchitecture

Off-Chip Networks On-Chip Networks

[src: Intel Developers Forum]

slide-4
SLIDE 4

4 NoCArc’09

Topologies for On-Chip Networks

  • Crossbar is often sufficient – if it can be done efficiently
  • 2D mesh topology commonly assumed
  • Many different topologies recently proposed

– CMESH [ICS’06] – Flattened butterfly [Micro’07] – Express Cubes [HPCA’09] – Hierarchical Network [HPCA’09] – …

  • Recent multicore architectures have used the ring topology

– Cell processor, Intel processors, …

Ring Router Microarchitecture

slide-5
SLIDE 5

5 NoCArc’09

Why Ring Topology?

  • Routing

– route with clockwise or counterclockwise – route until destination reached

  • Low-radix router

– each “router” only requires 3 ports (local port, left & right port)

  • Flow control

– Arbitration can be simplified – 3 ports but only two maximum requests

  • Can be implemented without “routers”

– Bufferless router – Simple topology

Ring Router Microarchitecture

slide-6
SLIDE 6

6 NoCArc’09 Ring Router Microarchitecture

Today’s Talk

  • Background in On-Chip Networks and Topology
  • Router Microarchitecture for Ring Topology
  • Scalability of Ring Topology
  • Summary
slide-7
SLIDE 7

7 NoCArc’09

Bufferless router in ring topology

  • Simplified arbitration

– Priority to packets already in flight – Guaranteed (deterministic) latency to destination

  • No buffers needed

– No misrouting [Bufferless router ISCA’09] – No packet dropping [SCARAB Micro’09]

  • Only two-input muxes
  • No routing deadlock

Ring Router Microarchitecture

slide-8
SLIDE 8

8 NoCArc’09

Conventional Router Microarchitecture

Ring Router Microarchitecture

slide-9
SLIDE 9

9 NoCArc’09

Bufferless Ring Topology Router Microarchitecture

Ring Router Microarchitecture

slide-10
SLIDE 10

10 NoCArc’09

No buffers needed

Ring Router Microarchitecture

slide-11
SLIDE 11

11 NoCArc’09

Bufferless router in ring topology

  • Simplified arbitration

– Priority to packets already in flight – Guaranteed (deterministic) latency to destination

  • No buffers needed

– No misrouting [Bufferless router ISCA’09] – No packet dropping [SCARAB Micro’09]

  • Only two-input muxes
  • No routing deadlock
  • However…

– Requires reserving the path to destination – Can reduce performance/throughput

Ring Router Microarchitecture

slide-12
SLIDE 12

12 NoCArc’09

Lightweight Router Microarchitecture

  • Add a buffer entry (2 buffer entry per input port)
  • Credit-based flow control for backpressure
  • Maintain same prioritized arbitration for packets in flight
  • Arbitration needed when ejecting packets

Ring Router Microarchitecture

bufferless lightweight

slide-13
SLIDE 13

13 NoCArc’09

Lightweight Router Microarchitecture

  • No predetermined routing

– Bufferless : only in the appropriate slot was packet injected into the network – Lightweight : the packet can be injected at any time

  • Deadlock

– Packets in the bufferless router were guaranteed to make progress – Routing deadlock still avoided without additional virtual channels ( see paper for detail )

Ring Router Microarchitecture

slide-14
SLIDE 14

14 NoCArc’09

Evaluation

  • Cycle accurate simulator used to compared ring router

microarchitecture

  • Simulator parameters include

– N = 16 – single-flit packet (1 flit = 512 bits) – synthetic traffic patterns

  • Orion2.0 used to model area / power (results in paper)
  • Following microarchitectures compared:

– baseline (3 cycle) – bufferless (1 cycle) – lightweight (1 cycle)

Ring Router Microarchitecture

slide-15
SLIDE 15

15 NoCArc’09

Performance Comparison

Ring Router Microarchitecture 5 10 15 20 25 30 0.2 0.4 0.6 0.8

Latency (cycles) Offered load (fraction of capacity)

bufferless lightweight baseline (b=2) baseline (b=8) 5 10 15 20 25 30 0.2 0.4 0.6 0.8

Latency (cycles) Offered load (fraction of capacity)

bufferless lightweight baseline (b=2) baseline (b=8)

uniform random bit complement

slide-16
SLIDE 16

16 NoCArc’09

Impact of Prioritized Arbitration

Ring Router Microarchitecture

5 10 15 20 25 30 0.2 0.4 0.6 0.8 Latency (cycles) Offered load (fraction of capacity)

baseline (b=1) baseline (b=2) lightweight

slide-17
SLIDE 17

17 NoCArc’09 Ring Router Microarchitecture

Today’s Talk

  • Background in On-Chip Networks and Topology
  • Router Microarchitecture for Ring Topology
  • Scalability of Ring Topology
  • Summary
slide-18
SLIDE 18

18 NoCArc’09

How Scalable is the Ring Topology?

  • Assumption : same bisection bandwidth comparing ring and 2D

mesh  The bandwidth PER channel for ring is higher than 2D mesh  Trade-off of hop count vs serialization latency  Per-hop latency can be higher with 2D mesh

Ring Router Microarchitecture

slide-19
SLIDE 19

19 NoCArc’09

0.5 1 1.5 2 2.5 16 36 64 16 36 64 16 36 64 16 36 64 2 4 8 16

Normalized runtime ring mesh

Synthetic Workload

Ring Router Microarchitecture

network size (N) max oustanding req (r)

slide-20
SLIDE 20

20 NoCArc’09

Bandwidth Fragmentation

  • 2D mesh :

– short packets (req) = 1 flit – long packets (reply) = 4 flits

  • ring :

– short packets (req) = 1 flit – long packets (reply) = 1 flit  Wide channels results in high bandwidth for ring  However, for short packets, ring only utilizes ¼ of the channel bandwidth  Ring topology inefficient for short packets

Ring Router Microarchitecture

slide-21
SLIDE 21

21 NoCArc’09

0.5 1 1.5 2 2.5 16 36 64 16 36 64 16 36 64 16 36 64 2 4 8 16

Normalized runtime

0.5 1 1.5 2 2.5 16 36 64 16 36 64 16 36 64 16 36 64 2 4 8 16

Normalized runtime

ring mesh

Bandwidth Fragmentation

Ring Router Microarchitecture

bimodal pkts single flits pkts

slide-22
SLIDE 22

22 NoCArc’09

Limitations of this study

  • “Packaging” of on-chip network topology = 2D layout of the

topology

  • Layout of topology can impact the performance

– 2D mesh : only require communicating with neighbors – Ring : long links can be needed as network scale

  • Hierarchical rings not investigated.
  • Router complexity (for mesh) not properly modeled.

Ring Router Microarchitecture

slide-23
SLIDE 23

23 NoCArc’09 Ring Router Microarchitecture

Summary

  • On-chip networks presents different constraints compared to off-

chip networks – can exploit different router microarchitecture.

  • Ring topology presents a simple topology and bufferless router

microarchitecture can be implemented.

  • Lightweight router microarchitecture proposed to increase

performance with minimal additional complexity.

  • Ring topology can scale but because of bandwidth

fragmentation, can be limited in scalability – especially high traffic.

  • Can we scale this router microarchitecture to 2D mesh

topology?

slide-24
SLIDE 24

24 NoCArc’09

Low-Cost Router Microarchitecture (Micro’09)

Ring Router Microarchitecture

slide-25
SLIDE 25

25 NoCArc’09 Ring Router Microarchitecture

Thank you

Questions?