multi core architectures
play

Multi-core Architectures Interconnect Technology Virendra Singh - PowerPoint PPT Presentation

Multi-core Architectures Interconnect Technology Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/


  1. Multi-core Architectures Interconnect Technology Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail: viren@ee.iitb.ac.in CS-683: Advanced Computer Architecture Lecture 29 (30 Oct 2013) CADSL

  2. Topology Summary • First network design decision • Critical impact on network latency and throughput – Hop count provides first order approximation of message latency – Bottleneck channels determine saturation throughput CADSL 30 Oct 2013 CS-683@IITB 2

  3. Routing Summary • Latency paramount concern – Minimal routing most common for NoC – Non-minimal can avoid congestion and deliver low latency • To date: NoC research favors DOR for simplicity and deadlock freedom – On-chip networks often lightly loaded • Only covered unicast routing – Recent work on extending on-chip routing to support multicast CADSL 30 Oct 2013 CS-683@IITB 3

  4. Switching/Flow Control Overview • Topology: determines connectivity of network • Routing: determines paths through network • Flow Control: determine allocation of resources to messages as they traverse network – Buffers and links – Significant impact on throughput and latency of network CADSL 30 Oct 2013 CS-683@IITB 4

  5. Packets • Messages: composed of one or more packets – If message size is <= maximum packet size only one packet created • Packets: composed of one or more flits • Flit: flow control digit • Phit: physical digit – Subdivides flit into chunks = to link width – In on-chip networks, flit size == phit size. ● Due to very wide on-chip channels CADSL 30 Oct 2013 CS-683@IITB 5

  6. Switching • Different flow control techniques based on granularity • Circuit-switching: operates at the granularity of messages • Packet-based: allocation made to whole packets • Flit-based: allocation made on a flit-by-flit basis CADSL 30 Oct 2013 CS-683@IITB 6

  7. Virtual Cut Through • Packet-based: similar to Store and Forward • Links and Buffers allocated to entire packets • Flits can proceed to next hop before tail flit has been received by current router – But only if next router has enough buffer space for entire packet • Reduces the latency significantly compared to SAF CADSL • But still requires large buffers 30 Oct 2013 CS-683@IITB 7

  8. Virtual Cut Through Example 0 5 • Lower per-hop latency • Larger buffering required CADSL 30 Oct 2013 CS-683@IITB 8

  9. Flit Level Flow Control • Wormhole flow control • Flit can proceed to next router when there is buffer space available for that flit – Improved over SAF and VCT by allocating buffers on a flit-basis • Pros – More efficient buffer utilization (good for on- chip) – Low latency • Cons CADSL 30 Oct 2013 CS-683@IITB 9 – Poor link utilization: if head flit becomes

  10. Wormhole Example Red holds this Channel idle but channel: channel red packet blocked remains idle until read behind blue proceeds Buffer full: blue cannot proceed Blocked by other packets • 6 flit buffers/input port CADSL 30 Oct 2013 CS-683@IITB 10

  11. Virtual Channel Flow Control • Virtual channels used to combat HOL block in wormhole • Virtual channels: multiple flit queues per input port – Share same physical link (channel) • Link utilization improved – Flits on different VC can pass blocked packet CADSL 30 Oct 2013 CS-683@IITB 11

  12. Virtual Channel Example Buffer full: blue cannot proceed Blocked by other packets • 6 flit buffers/input port • 3 flit buffers/VC CADSL 30 Oct 2013 CS-683@IITB 12

  13. Deadlock • Using flow control to guarantee deadlock freedom give more flexible routing • Escape Virtual Channels – If routing algorithm is not deadlock free – VCs can break resource cycle – Place restriction on VC allocation or require one VC to be DOR • Assign different message classes to different VCs to prevent protocol level deadlock CADSL – Prevent req-ack message cycles 30 Oct 2013 CS-683@IITB 13

  14. Buffer Backpressure • Need mechanism to prevent buffer overflow – Avoid dropping packets – Upstream nodes need to know buffer availability at downstream routers • Significant impact on throughput achieved by flow control • Credits • On-off CADSL 30 Oct 2013 CS-683@IITB 14

  15. Credit-Based Flow Control • Upstream router stores credit counts for each downstream VC • Upstream router – When flit forwarded ● Decrement credit count – Count == 0, buffer full, stop sending • Downstream router – When flit forwarded and buffer freed ● Send credit to upstream router ● Upstream increments credit count CADSL 30 Oct 2013 CS-683@IITB 15

  16. Credit Timeline Node 1 Node 2 t1 Flit departs Credit router t2 Process Credit round t3 trip delay Credit F l i t t4 Process t5 • Round-trip credit delay: – Time between when buffer empties and when next flit can be processed from that buffer entry – If only single entry buffer, would result in significant throughput degradation CADSL – Important to size buffers to tolerate credit turn- 30 Oct 2013 CS-683@IITB 16 around

  17. On-Off Flow Control • Credit: requires upstream signaling for every flit • On-off: decreases upstream signaling • Off signal – Sent when number of free buffers falls below threshold Foff • On signal – Send when number of free buffers rises above threshold Fon CADSL 30 Oct 2013 CS-683@IITB 17

  18. On-Off Timeline Foffthreshold Node 1 Node 2 reached t1 Flit Flit Foffset to prevent t2 Flit Off flits arriving t3 Flit before t4 from Proces t4 Flit overflowing s Flit Flit Flit Fonthreshold t5 Flit reached Fonset so that On Flit t6 Node 2 does Flit Proces not run out of t7 Flit s Flit flits between t5 Flit and t8 t8 Flit • Less signaling but more buffering – On-chip buffers more expensive than wires CADSL 30 Oct 2013 CS-683@IITB 18

  19. Flow Control Summary • On-chip networks require techniques with lower buffering requirements – Wormhole or Virtual Channel flow control • Dropping packets unacceptable in on-chip environment – Requires buffer backpressure mechanism • Complexity of flow control impacts router microarchitecture (next) CADSL 30 Oct 2013 CS-683@IITB 19

  20. Router Microarchitecture Overview • Consist of buffers, switches, functional units, and control logic to implement routing algorithm and flow control • Focus on microarchitecture of Virtual Channel router • Router is pipelined to reduce cycle time CADSL 30 Oct 2013 CS-683@IITB 20

  21. Virtual Channel Router Virtual Channel Routing Computation Allocator Switch Allocator VC 0 VC 0 VC 0 MVC 0 VC x VC 0 Input Ports VC 0 MVC 0 VC x CADSL 30 Oct 2013 CS-683@IITB 21

  22. Baseline Router Pipeline BW RC VA SA ST LT • Canonical 5-stage (+link) pipeline – BW: Buffer Write – RC: Routing computation – VA: Virtual Channel Allocation – SA: Switch Allocation – ST: Switch Traversal – LT: Link Traversal CADSL 30 Oct 2013 CS-683@IITB 22

  23. Baseline Router Pipeline 1 2 3 4 5 6 7 8 9 Head BW RC VA SA ST LT Body 1 BW SA ST LT BW SA ST LT Body 2 BW SA ST LT Tail • Routing computation performed once per packet • Virtual channel allocated once per packet • body and tail flits inherit this info from head flit CADSL 30 Oct 2013 CS-683@IITB 23

  24. Router Pipeline Optimizations • Baseline (no load) delay ( ) = 5 + × + cycles link delay hops t serializat ion • Ideally, only pay link delay • Techniques to reduce pipeline stages – Lookahead routing: At current router perform routing computation for next router ● Overlap with BW BW VA SA ST LT NRC CADSL 30 Oct 2013 CS-683@IITB 24

  25. Router Pipeline Optimizations • Speculation – Assume that Virtual Channel Allocation stage will be successful ● Valid under low to moderate loads – Entire VA and SA in parallel BW VA ST LT NRC SA – If VA unsuccessful (no virtual channel returned) CADSL ● Must repeat VA/SA in next cycle 30 Oct 2013 CS-683@IITB 25

  26. Router Pipeline Optimizations • Bypassing: when no flits in input buffer – Speculatively enter ST – On port conflict, speculation aborted VA NRC ST LT Setup – In the first stage, a free VC is allocated, next routing is performed and the crossbar is setup CADSL 30 Oct 2013 CS-683@IITB 26

  27. Buffer Organization Physical Virtual channel channel s s • Single buffer per input • Multiple fixed length queues per physical CADSL channel 30 Oct 2013 CS-683@IITB 27

  28. Arbiters and Allocators • Allocator matches N requests to M resources • Arbiter matches N requests to 1 resource • Resources are VCs (for virtual channel routers) and crossbar switch ports. • Virtual-channel allocator (VA) – Resolves contention for output virtual channels – Grants them to input virtual channels • Switch allocator (SA) that grants crossbar CADSL 30 Oct 2013 CS-683@IITB 28 switch ports to input virtual channels

  29. Round Robin Arbiter • Last request serviced given lowest priority • Generate the next priority vector from current grant vector • Exhibits fairness CADSL 30 Oct 2013 CS-683@IITB 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend