2014/5/13 Advanced Processor Technology Group The School of Computer Science
Channel Slicing: a Way to Build Fast Routers for Asynchronous NoCs - - PowerPoint PPT Presentation
Channel Slicing: a Way to Build Fast Routers for Asynchronous NoCs - - PowerPoint PPT Presentation
Channel Slicing: a Way to Build Fast Routers for Asynchronous NoCs Wei Song and Doug Edwards The University of Manchester 15/09/2009 Advanced Processor Technology Group 2014/5/13 The School of Computer Science Content Asynchronous NoCs
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Content
- Asynchronous NoCs
- Channel Slicing
– Motivation – Sliced sub-channels – Flow control
- An asynchronous wormhole router
– Implementation details – Performances
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Network-on-Chip (NoC)
(0,0) (0,1) (1,0) (1,1) (0,2) (0,3) (1,2) (1,3) (2,0) (2,1) (3,0) (3,1) (2,2) (2,3) (3,2) (3,3)
RT NI PE
PE: Processor Element NI: Network Interface RT: router
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Synchronous/Asynchronous
- Synchronous
– Fast
- Intel 80-tile 4GHz 65nm
- DSPIN 408MHz 130nm
– Small
- DSPIN 0.161mm2
– Power Consuming
- 10.39mW (250MHz)
– Sensitive to variation – Complex clock tree
- Asynchronous
– Slow !!
- ASPIN 714MHz 90nm
- ANoC 220MHz 130nm
– Large
- ANoC 0.211mm2
– Power Efficient
- 3.69mW (160MHz)
– Tolerance to variation – No clock tree
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Content
- Asynchronous NoCs
- Channel Slicing
– Motivation – Sliced sub-channels – Flow control
- An asynchronous wormhole router
– Implementation details – Performances
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Asynchronous Pipelines
- CHAIN (Bainbridge’02)
– 4 phase 1-of-4 pipelines
- QoS NoC (Felicijan’04)
– 8-bit, Four 4 phase 1-of-4 pipelines
- ANoC (Beigne’05)
– 32-bit 16 4 phase 1-of-4 pipelines
- SpiNNaker (Plana’07)
– Several 1-of-4/2-of-7 pipelines
- ASPIN (Sheibanyrad’08)
– 32-bit 16 dual-rail pipelines / bundled-data
- MANGO (Bjerregaard’05) & QNoC (Dobkin’09)
– Bundled-data
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Completion Detection
C C C C
2-bit 2-bit
CD CD
16
d_i d_o ack_o ack_i
8 4 ack 16-bit ack of sub-channels
Advantages: data on all sub-channels are synchronized, ease the time division multiple access (TDMA) techniques, such as virtual channel and TDMA Drawbacks: low speed (66% on CD)
2014/5/13 Advanced Processor Technology Group The School of Computer Science
ChSlice: implementation
C C
2-bit 16
d_i0 ack_i0 C C
2-bit
d_i15 ack_i15 d_o0 ack_o0 d_o15 ack_o15 C C C C
2-bit 2-bit
CD CD
16
d_i d_o ack_o ack_i
2014/5/13 Advanced Processor Technology Group The School of Computer Science
How to do it in a router?
Arbiter
- ther ports
crossbar Arbiter
- ther ports
crossbar data-path ack
Arbiter
- ther ports
crossbar Arbiter
- ther ports
crossbar data-path ack
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Flow control
H H H H H H D D D D D D D D D D D T D D D D D T D D D D D T D D D D D T D D D D D T D D D D D T
sub-channels time head routing data
H D D D D D D T
time head routing data
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Content
- Asynchronous NoCs
- Channel Slicing
– Motivation – Sliced sub-channels – Flow control
- An asynchronous wormhole router
– Implementation details – Performances
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Router: structure
arbiter arbiter 5 input ports 5 output ports ctl ctl
80 16 80 16 80 16 80 16
d_i_0 ack_i_0 d_i_4 ack_i_4 d_o_0 ack_o_0 d_o_4 ack_o_4
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Router: data path
input buffer crossbar
- utput buffer
ip_d
ib_d ic_d
ib_pa ib_a
ip_a
rt_err acken gnt
- c_a
ic_a
- p_a
- p_d
- b_d
- c_d
- b_a
eof 3 2 1 eof 3 2 1 eof 3 2 1
eof
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Re-Synchronization
input buffer crossbar
- utput buffer
ip_d
ib_d ic_d
ib_pa ib_a
ip_a
rt_err acken gnt
- c_a
ic_a
- p_a
- p_d
- b_d
- c_d
- b_a
eof 3 2 1 eof 3 2 1 eof 3 2 1
eof
eof acken ch_fin ic_a rt_err rt_dec
rt_dec+ eof+/1 acken+/1 eof-/1 ch_fin-/1 ic_a+ ic_a- acken-/1 ch_fin+/1 rt_dec+ rt_err+ acken-/2 eof+/2 acken+/2 eof-/2 ch_fin+/2 rt_err- ch_fin-/2
normal frame faulty frame
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Routing Decision
rt_dec+ rt_en-/1 ch_fin_a+/1 ch_fin_a- rt_en+ rt_err+ rt_en-/2 ch_fin_a+/2 rt_dec- rt_err-
normal frame faulty frame
rt_dec ch_fin0 ch_fin15 ch_fin_a rt_en rt_err
4 4 4 4
ib_a0 ib_a1 ib_a2 ib_a3 ib_d0[0..3] ib_d1[0..3] ib_d2[0..3] ib_d3[0..3]
8 8 4
- b
i t ( 1
- f
- 4
) c
- m
p a r a t
- r
4
- b
i t ( 1
- f
- 4
) c
- m
p a r a t
- r
target_x target_y local_x local_y > < = > < = ch_fin_a rt_dec rt_err
M E M E M E M E M E M E
ir_n ir_e ir_w ir_l
- r_s
- r_w
- r_n
- r_l
gnt_s gnt_l gnt_n gnt_w rt_en east arbiter
gnts from
- ther ports
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Router: layout
- Faraday 130nm Technology
- 32-bit, 5 ports, XY routing algorithm
- 0.3x0.3mm (12.6K gates, 0.050mm2)
- Typical corner (25 oC 1.2V)
- Cycle period 2.2 ns (1.82GByte/s per port)
- Equivalent to 450MHz
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Compare with other routers
Sliced Wormhole Synchronized Wormhole ANoC ASPIN QNoC MANGO DSPIN Tech (nm) 130 130 130 90 180 120 130 Period (ns) 2.2 2.8 4.0 0.88 4.8 1.26 2.45 Period (Hz) 450M 360M 250M 1.13G 208M 790M 408M Pipeline Style 4-phase 1-of-4 4-phase 1-of-4 4-phase 1-of-4
Dual-Rail / Bundled-Data
Bundled-data Bundled-data Synchronous circuit Other Standard cell Standard cell
Customized Cell Lib Customized FIFO
Delay line Delay line
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Speed vs. Data Width
QNoC Sliced Wormhole
2014/5/13 Advanced Processor Technology Group The School of Computer Science
Speed and Area
2014/5/13 Advanced Processor Technology Group The School of Computer Science