A Wormhole Router Design progress report Wei Song 30/07/2009 - - PowerPoint PPT Presentation

a wormhole router design
SMART_READER_LITE
LIVE PREVIEW

A Wormhole Router Design progress report Wei Song 30/07/2009 - - PowerPoint PPT Presentation

A Wormhole Router Design progress report Wei Song 30/07/2009 Advanced Processor Technology Group 2014/5/13 The School of Computer Science Content Motive and Plans Wormhole router Channel Slicing, motivation Lookahead, critical


slide-1
SLIDE 1

2014/5/13 Advanced Processor Technology Group The School of Computer Science

A Wormhole Router Design progress report

Wei Song 30/07/2009

slide-2
SLIDE 2

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Content

  • Motive and Plans
  • Wormhole router

– Channel Slicing, motivation – Lookahead, critical cycle – Implementation

  • XY/Stochastic routing scheme
slide-3
SLIDE 3

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Why a wormhole router now

  • Easy
  • Smallest cycle period
  • Early performance

estimation

  • Proof of channel

slicing

wormhole SDM

Larger crossbar / switch network

QoS Fault-tolerance

Larger route scheduler Larger Input/Output buffer controller

slide-4
SLIDE 4

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Plan for Router Design (1)

  • Wormhole router

– Speed estimation, basic design flow – Channel slicing, lookahead pipeline

  • Spatial Design Multiplex (SDM) router

– Utilizing channel slicing (provide virtual circuit) – M sub-channels on a port, crossbar*M – Benes, Clos switch network (ATM) – Route scheduling in the multi-stage switch

slide-5
SLIDE 5

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Plan for Router Design (2)

  • Dynamic Link Allocation

– Allocate idle sub-channels to active virtual circuits to reduce frame latency – Arbitration planning, crossbar reconfiguration and buffer planning

  • Fault-tolerance

– Error detection, deadlock recovery, route scheduling algorithm

slide-6
SLIDE 6

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Plan for Router Design (3)

  • QoS

– Virtual circuit is latency and bandwidth guaranteed (weak if dynamic link allocation is used) – Best Effort is a problem – Priorities for virtual circuit setup (reduce circuit setup time for high priority services)

slide-7
SLIDE 7

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Content

  • Motive and Plans
  • Wormhole router

– Channel Slicing, motivation – Lookahead, critical cycle – Implementation

  • XY/Stochastic routing scheme
slide-8
SLIDE 8

2014/5/13 Advanced Processor Technology Group The School of Computer Science

ChSlice: motive

C C C C

2-bit 2-bit

CD CD

16

d_i d_o ack_o ack_i

8 4 ack 16-bit ack of sub-channels

Advantages: data on all sub-channels are synchronized, ease the time division multiple access (TDMA) techniques, such as virtual channel and TDMA Drawbacks: low speed (66% on CD)

slide-9
SLIDE 9

2014/5/13 Advanced Processor Technology Group The School of Computer Science

ChSlice: implementation

H H H H H H D D D D D D D D D D D T D D D D D T D D D D D T D D D D D T D D D D D T D D D D D T

sub-channels time head routing data

C C

2-bit 16

d_i0 ack_i0 C C

2-bit

d_i15 ack_i15 d_o0 ack_o0 d_o15 ack_o15

slide-10
SLIDE 10

2014/5/13 Advanced Processor Technology Group The School of Computer Science

ChSlice: conclusion

  • Advantage

– fast

  • Overhead

– extra controllers – larger wire-count

  • No TDMA techniques but SDM is easy.
slide-11
SLIDE 11

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Content

  • Motive and Plans
  • Wormhole router

– Channel Slicing, motivation – Lookahead, critical cycle – Implementation

  • XY/Stochastic routing scheme
slide-12
SLIDE 12

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Lookahead: pipeline style

N CD N+1 CD N+2 CD

Di DN AN DN+1 AN+1 Do Ao Ai

data data data data data data data data N CD N+1 CD N+2 CD Di Ai DN DN+1 Do AN AN+1 Ao

Di DN AN DN+1 AN+1 Do Ao Ai

data data data data data data data data

slide-13
SLIDE 13

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Lookahead: conclusion

  • Advantage

– fast

  • Disadvantage

– not QDI – a small area overhead

N CD N+1 CD N+2 CD Di DN AN DN+1 AN+1 Do Ao Ai

data data data data data data data data

slide-14
SLIDE 14

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Lookahead: critical cycle

pipeline pipeline pipeline pipeline pipeline

arbiter ack from

  • ther outputs

data from

  • ther inputs

crossbar input buffer

  • utput buffer

pipeline pipeline pipeline pipeline pipeline

arbiter ack from

  • ther outputs

data from

  • ther inputs

crossbar input buffer

  • utput buffer

long line between routers d_i ack_i d_o ack_o

slide-15
SLIDE 15

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Lookahead: implementation

N CD N+1 CD N+2 CD N+1 CD crossbar

slide-16
SLIDE 16

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Content

  • Motive and Plans
  • Wormhole router

– Channel Slicing, motivation – Lookahead, critical cycle – Implementation

  • XY/Stochastic routing scheme
slide-17
SLIDE 17

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Router: structure

arbiter arbiter 5 input ports 5 output ports ctl ctl

80 16 80 16 80 16 80 16

d_i_0 ack_i_0 d_i_4 ack_i_4 d_o_0 ack_o_0 d_o_4 ack_o_4

slide-18
SLIDE 18

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Router: data path

input buffer crossbar

  • utput buffer

ip_d

ib_d ic_d

ib_pa ib_a

ip_a

rt_err acken gnt

  • c_a

ic_a

  • p_a
  • p_d
  • b_d
  • c_d
  • b_pa
  • b_a

eof 3 2 1 eof 3 2 1 eof 3 2 1

ic_da eof acki

ip_d+ ib_d+ ic_d+ ip_a+ ip_d- ib_a+ ib_d- ip_a- ic_d-

  • c_d+
  • b_d+
  • b_pa+
  • p_d+
  • b_a+
  • c_a+

ic_a+ ib_pa+ ib_a-

  • c_d-
  • b_d-
  • b_pa-
  • c_a-

ic_a- ib_pa-

  • p_d-
  • p_a+
  • p_a-
  • b_a-

ic_da+ ic_da-

slide-19
SLIDE 19

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Router: layout

  • Faraday 130nm Technology
  • 32-bit, 5 ports, XY routing algorithm
  • 0.3x0.3mm (14.3K gates, 0.057mm2)
  • Typical corner (25 oC 1.2V)
  • Cycle period 1.7 ns (2.35GByte/s per port)
slide-20
SLIDE 20

2014/5/13 Advanced Processor Technology Group The School of Computer Science

ChSlice and Lookahead

Speed: ChSlice 24.1% LH 17.2% Area: ChSlice 23.0% LH 5.3%

slide-21
SLIDE 21

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Compare to other routers

  • MANGO: 1.26ns; 0.12um; bundled data
  • ANoC: 4ns; 0.13um; 1-of-4
  • QNoC: 4.8ns; 0.18um; bundled data
  • ASPIN: 0.88ns; 90nm; dual-rail & bundled data
  • Our: 1.7ns; 0.13um; 1-of-4 & lookahead
slide-22
SLIDE 22

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Speed vs. data width

QNoC Wormhole

slide-23
SLIDE 23

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Content

  • Motive and Plans
  • Wormhole router

– Channel Slicing, motivation – Lookahead, critical cycle – Implementation

  • XY/Stochastic routing scheme
slide-24
SLIDE 24

2014/5/13 Advanced Processor Technology Group The School of Computer Science

XY/Stochastic

  • Motive

– Two routing algorithm is complicated – The deadlock problem – The involvement of network interfaces – Keep router simple

  • Solution

– Router: XY – Network interface: generate, consume, or forward (random)

slide-25
SLIDE 25

2014/5/13 Advanced Processor Technology Group The School of Computer Science

XY/Stochastic (request path)

M M

XY/Stochastic Router Only

slide-26
SLIDE 26

2014/5/13 Advanced Processor Technology Group The School of Computer Science

XY/Stochastic (ack path)

M M

Same Path Single Jump

slide-27
SLIDE 27

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Compare

  • Rely on routers

– A larger router (Special router design) – Longer routing overhead – Deadlocks – Shorter search time

  • XY/Stochastic routing

– Smaller router (normal router design) – Shorter routing time – Only deadlocks caused by errors – Longer search time (higher priority by QoS)

slide-28
SLIDE 28

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Conclusion

  • Router design plan

– Wormhole router is the first step

  • Wormhole router

– Channel slicing – SDM is better than TDMA for asynchronous routers

  • XY/Stochastic routing
slide-29
SLIDE 29

2014/5/13 Advanced Processor Technology Group The School of Computer Science

Question?