a wormhole router design
play

A Wormhole Router Design progress report Wei Song 30/07/2009 - PowerPoint PPT Presentation

A Wormhole Router Design progress report Wei Song 30/07/2009 Advanced Processor Technology Group 2014/5/13 The School of Computer Science Content Motive and Plans Wormhole router Channel Slicing, motivation Lookahead, critical


  1. A Wormhole Router Design progress report Wei Song 30/07/2009 Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  2. Content • Motive and Plans • Wormhole router – Channel Slicing, motivation – Lookahead, critical cycle – Implementation • XY/Stochastic routing scheme Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  3. Why a wormhole router now • Easy wormhole • Smallest cycle period Larger • Early performance crossbar / switch network estimation • Proof of channel SDM Larger Larger slicing route Input/Output scheduler buffer controller QoS Fault-tolerance Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  4. Plan for Router Design (1) • Wormhole router – Speed estimation, basic design flow – Channel slicing, lookahead pipeline • Spatial Design Multiplex (SDM) router – Utilizing channel slicing (provide virtual circuit) – M sub-channels on a port, crossbar* M – Benes, Clos switch network (ATM) – Route scheduling in the multi-stage switch Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  5. Plan for Router Design (2) • Dynamic Link Allocation – Allocate idle sub-channels to active virtual circuits to reduce frame latency – Arbitration planning, crossbar reconfiguration and buffer planning • Fault-tolerance – Error detection, deadlock recovery, route scheduling algorithm Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  6. Plan for Router Design (3) • QoS – Virtual circuit is latency and bandwidth guaranteed (weak if dynamic link allocation is used) – Best Effort is a problem – Priorities for virtual circuit setup (reduce circuit setup time for high priority services) Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  7. Content • Motive and Plans • Wormhole router – Channel Slicing, motivation – Lookahead, critical cycle – Implementation • XY/Stochastic routing scheme Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  8. ChSlice: motive 16-bit ack of sub-channels 2-bit C C 8 d_i 16 d_o 4 2-bit C C CD CD ack_i ack_o ack Advantages: data on all sub-channels are synchronized, ease the time division multiple access (TDMA) techniques, such as virtual channel and TDMA Drawbacks: low speed (66% on CD) Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  9. ChSlice: implementation sub-channels H D D D D D D T 2-bit d_o 0 d_i 0 C C H D D D D D D T H D D D D D D T ack_o 0 ack_i 0 16 2-bit H D D D D D D T d_o 15 d_i 15 C C H D D D D D D T ack_o 15 ack_i 15 H D D D D D D T time head routing data Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  10. ChSlice: conclusion • Advantage – fast • Overhead – extra controllers – larger wire-count • No TDMA techniques but SDM is easy. Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  11. Content • Motive and Plans • Wormhole router – Channel Slicing, motivation – Lookahead, critical cycle – Implementation • XY/Stochastic routing scheme Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  12. Lookahead: pipeline style D N+1 D N Di Do N+1 N+2 N+1 N+2 CD CD CD N N CD CD CD A N+1 Ai A N Ao Di data data Di data data Ai Ai D N D N data data data data A N A N D N+1 D N+1 data data data data A N+1 A N+1 Do Do data data data data Ao Ao Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  13. Lookahead: conclusion • Advantage – fast N+1 N+2 CD CD CD N • Disadvantage – not QDI – a small area overhead Di data data Ai D N data data A N D N+1 data data A N+1 Do data data Ao Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  14. Lookahead: critical cycle crossbar crossbar data from data from input buffer output buffer input buffer output buffer other inputs other inputs d_i d_o between long line routers pipeline pipeline pipeline pipeline pipeline pipeline pipeline pipeline pipeline pipeline ack from ack from other outputs other outputs ack_i ack_o arbiter arbiter Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  15. Lookahead: implementation N+1 N+2 N+1 N CD CD CD CD crossbar Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  16. Content • Motive and Plans • Wormhole router – Channel Slicing, motivation – Lookahead, critical cycle – Implementation • XY/Stochastic routing scheme Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  17. Router: structure 80 80 d_i_0 d_o_0 16 16 ack_i_0 ack_o_0 arbiter ctl 5 input 5 output ports ports 80 80 d_i_4 d_o_4 16 16 ack_i_4 ack_o_4 arbiter ctl Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  18. Router: data path input buffer crossbar output buffer ib_d ic_d ob_d 0 0 oc_d 0 1 1 1 2 2 2 op_d ip_d 3 3 3 eof eof eof gnt ib_pa eof ip_a ob_a op_a oc_a ib_a ob_pa ic_a acki ic_da rt_err acken ip_d+ ib_d+ ic_d+ oc_d+ ob_d+ op_d+ ip_a+ ib_a+ ob_pa+ ob_a+ op_a+ i c _ d a+ oc_a+ ic_a+ ib_pa+ ip_d- ib_d- ic_d- oc_d- ob_d- op_d- ip_a- ib_a- ob_pa- ob_a- op_a- i c _ d a - oc_a- ic_a- Advanced Processor Technology Group 2014/5/13 ib_pa- The School of Computer Science

  19. Router: layout • Faraday 130nm Technology • 32-bit, 5 ports, XY routing algorithm • 0.3x0.3mm (14.3K gates, 0.057mm 2 ) • Typical corner (25 o C 1.2V) • Cycle period 1.7 ns (2.35GByte/s per port) Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  20. ChSlice and Lookahead Speed: ChSlice 24.1% LH 17.2% Area: ChSlice 23.0% LH 5.3% Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  21. Compare to other routers • MANGO: 1.26ns; 0.12um; bundled data • ANoC: 4ns; 0.13um; 1-of-4 • QNoC: 4.8ns; 0.18um; bundled data • ASPIN: 0.88ns; 90nm; dual-rail & bundled data • Our: 1.7ns; 0.13um; 1-of-4 & lookahead Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  22. Speed vs. data width Wormhole QNoC Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  23. Content • Motive and Plans • Wormhole router – Channel Slicing, motivation – Lookahead, critical cycle – Implementation • XY/Stochastic routing scheme Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  24. XY/Stochastic • Motive – Two routing algorithm is complicated – The deadlock problem – The involvement of network interfaces – Keep router simple • Solution – Router: XY – Network interface: generate, consume, or forward (random) Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  25. XY/Stochastic (request path) Router Only XY/Stochastic M M Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  26. XY/Stochastic (ack path) Same Path Single Jump M M Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  27. Compare • Rely on routers – A larger router (Special router design) – Longer routing overhead – Deadlocks – Shorter search time • XY/Stochastic routing – Smaller router (normal router design) – Shorter routing time – Only deadlocks caused by errors – Longer search time (higher priority by QoS) Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  28. Conclusion • Router design plan – Wormhole router is the first step • Wormhole router – Channel slicing – SDM is better than TDMA for asynchronous routers • XY/Stochastic routing Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  29. Question? Advanced Processor Technology Group 2014/5/13 The School of Computer Science

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend