from channel slicing to from channel slicing to spatial
play

From Channel Slicing to From Channel Slicing to Spatial Division - PowerPoint PPT Presentation

From Channel Slicing to From Channel Slicing to Spatial Division Multiplexing Spatial Division Multiplexing -- the asynchronous router design the asynchronous router design -- Wei Song 03/12/2009 Advanced Processor Technology Group


  1. From Channel Slicing to From Channel Slicing to Spatial Division Multiplexing Spatial Division Multiplexing -- the asynchronous router design the asynchronous router design -- Wei Song 03/12/2009 Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  2. Index Index • Channel Slicing Channel Slicing • – Asynchronous NoCs and routers – Channel Slicing – A wormhole router design • Spatial Division Multiplexing (SDM) – Motives – Switching networks – 2-stage Clos network – The distributed scheduler – Implementation results Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  3. Asynchronous NoCs NoCs Asynchronous • GALS • Full async comm fabric • QDI pipelines • Low dynamic power • Tolerance to variation • Fast prototype Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  4. Synchronised QDI Pipelines QDI Pipelines Synchronised 8 4 Nangate Cell Lib 65nm 1-of-4 Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  5. Channel Slicing (1) Channel Slicing (1) • Remove the C-element tree • Sub-channels run independently Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  6. Channel Slicing (2) Channel Slicing (2) Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  7. Channel Slicing (3) Channel Slicing (3) sub-channels Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  8. The Wormhole Router The Wormhole Router 80 80 d_i_0 d_o_0 16 16 ack_i_0 ack_o_0 • Faraday 130 nm arbiter ctl • 5 32-bit ports 5 input 5 output ports ports • 3 routers: 80 80 d_i_4 d_o_4 16 16 – Synchronised ack_i_4 ack_o_4 arbiter ctl – Channel Sliced – Plus lookahead N+1 N+2 N Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  9. Area Results Channel Slicing: 23% extra controllers in input buffer increased wire count in crossbar Lookahead: 5.3% extra AND gates and C2P elements on critical path Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  10. Speed Results Synchronised: 345MHz Channel Slicing: 450MHz ChSlice+LH: 590MHz Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  11. Compare with Other Routers Asynchronous cell library: constrains the adaptation to other projects ANoC, ASPIN Bundled-data: less tolerant to variation MANGO, QNoC, ASPIN Customized design: design complexity ASPIN Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  12. Data Width Effect 4.0 ChSlice + LH Channel Slicing 3.5 Synchronised 3.0 Cycle Period (ns) 2.5 2.0 1.5 1.0 0.5 0.0 0 20 40 60 80 100 120 140 Data Wdith of Ports (bit) Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  13. Index Index • Channel Slicing – Asynchronous NoCs and routers – Channel Slicing – A wormhole router design • Spatial Division Multiplexing (SDM) Spatial Division Multiplexing (SDM) • – Motives – Switching networks – 2-stage Clos network – The distributed scheduler – Implementation results Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  14. SDM: Motivation (1) SDM: Motivation (1) • The problems that the wormhole router cannot handle: – QoS, delay and throughput guaranteed services – Fault-tolerance – Network efficiency Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  15. Motivation (2) Motivation (2) Wormhole Switch Allocator Input Output Port 0 Port 0 W Crossbar Input Output Port P-1 PxP Port P-1 W Virtual Channel SDM Input Port 0 Switch Scheduler M Output Port 0 W/M Switching Network Output Input Port P-1 Port P-1 MPxMP Input Buffer W/M Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  16. Motivation (3) – – Problems of VC Problems of VC Motivation (3) • Pipelines are synchronised Input VC Allocator • Area overhead Port 0 Switch Allocator M Output W • QoS (complicated Port 0 arbiters) Crossbar • TDMA (time slot Input Output Port P-1 PxP Port P-1 Input Buffer definition) W • Fault-tolerance (partial faulty link) Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  17. Motivation (4) – – Benefits of SDM Benefits of SDM Motivation (4) • Delay and throughput Guarantee Input • Fault-tolerance Switch Port 0 Scheduler M Output • Speed (Channel slicing) Port 0 W/M • Area Switching Network Output Input Port P-1 • Link efficiency Port P-1 MPxMP Input Buffer – interrupts W/M Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  18. Motivation (4) – – Problems of SDM Problems of SDM Motivation (4) • Area overhead = × 2 C P W CB = × × 2 C M P W SDM • Scheduling Algorithm – Wormhole ( P to 1) – SDM ( MP to M ) Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  19. Index Index • Channel Slicing – Asynchronous NoCs and routers – Channel Slicing – A wormhole router design • Spatial Division Multiplexing (SDM) – Motives – Switching networks Switching networks – – 2-stage Clos network – The distributed scheduler – Implementation results Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  20. SDM: Switching Networks SDM: Switching Networks • Strict Non-Blocking (SNB) – An input port and an output port is always connectable • Rearrangeable Non-Blocking (RNB) – An input port and an output port is connectable with possible changes on existing connections • Blocking – Not all input ports and output ports are connectable under certain cases Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  21. Crossbar Crossbar • SNB = × 2 C N W CB Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  22. Clos Network Network Clos SNB/RNB C( m , n , k ) N = nk SNB: m >= 2n-1 RNB: m = n Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  23. Benes Network Benes Network Multi-stage Clos C(2,2,4) + 2C(2,2,2) SNB Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  24. Area of Switching Networks Area of Switching Networks Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  25. Problems of all Switching Networks Problems of all Switching Networks • Crossbar – Area ~ N 2 – Easy to schedule • Clos – Area ~ N 1.5 – Difficult but possible to schedule by hardware – Optimal area is reached when • Benes – Area ~ N log N – Impossible to schedule by hardware (microprocessor) – Optimal area is reached when N =2 n Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  26. Index Index • Channel Slicing, the wormhole router • Spatial Division Multiplexing (SDM) – Motives – Switching networks – 2 2- -stage stage Clos Clos network network – – The distributed scheduler – Implementation results Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  27. SDM: 2- -stage stage Clos Clos Network Network SDM: 2 SIM SOM CM( 0 ) M M 5 5 WIM WOM M M CM( r ) NOM NIM M M 5 5 EIM EOM M M CM( M-1 ) LOM LIM 5 5 M M Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  28. Area Comparison Area Comparison Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  29. Benefits of the 2- -stage stage Clos Clos Network Network Benefits of the 2 • Minimal area when M <= 16 • Only have 2-stages, latency is reduced • Latency bounded • Scheduling algorithm is also simplified • The CMs could be further reduced • It is a RNB network. An SNB network requires 3 stages Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  30. Index Index • Channel Slicing, the wormhole router • Spatial Division Multiplexing (SDM) – Motives – Switching networks – 2-stage Clos network – The distributed scheduler The distributed scheduler – – Implementation results Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  31. SDM: Scheduling Algorithms SDM: Scheduling Algorithms • Optimized algorithms Optimized algorithms • – Always reach the optimal configuration that every possible connection is configured – Time complexity O( N 2 ) – Normally software based ( [Leroy 2008] microprocessor, 64 ports, 50us) • Heuristic algorithms Heuristic algorithms • – Capable of configuring part of the possible connections with less time and area – Time complexity O( N ) ~ O(log N ) – Normally hardware implementable, distributed, and scalable Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  32. Synchronous Dispatch Algs Algs. . Synchronous Dispatch Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  33. Synchronous Dispatch Algs Algs. . Synchronous Dispatch Advanced Processor Technology Group 2009-12-2 The School of Computer Science

  34. Synchronous Dispatch Algs Algs. . Synchronous Dispatch Advanced Processor Technology Group 2009-12-2 The School of Computer Science

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend