synchronized progress in interconnection networks
play

Synchronized Progress in Interconnection Networks (SPIN) : A new - PowerPoint PPT Presentation

ISCA 2018 Session 8B: Interconnection Networks Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Tushar Krishna Georgia Tech Georgia Tech (aniruddh@gatech.edu)


  1. ISCA 2018 Session 8B: Interconnection Networks Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Tushar Krishna Georgia Tech Georgia Tech (aniruddh@gatech.edu) (tushar@ece.gatech.edu) Paul V. Gratz Texas A&M University (pgratz@tamu.edu)

  2. 2 Network Routing F J A B E G C H I K D Deadlock

  3. 3 Routing Deadlocks � A Routing Deadlock is a cyclic buffer dependency chain that renders forward progress impossible. A F Deadlock B E C D

  4. 4 Routing Deadlocks � A Routing Deadlock is a cyclic buffer dependency chain that renders forward progress impossible. � Deadlocks are a fundamental problem in both off-chip and on- chip interconnection networks. � Cause system breakdown and kill chips. � Deadlocks are hard to detect during functional verification. � Manifest after a long use time. � Depend on : traffic pattern, injection rate, congestion. � Show up due to system wear out faults and power-gating of network elements which are hard to simulate. � Need a solution for functional correctness !!

  5. 5 Solution I: Dally’s Theory � Defines a strict order in acquisition of links and/or buffers which ensures a cyclic dependency is never created. Higher to Lower not A F allowed 1 6 2 B E 3 5 4 C D

  6. 6 Solution I: Dally’s Theory � Defines a strict order in acquisition of links and/or buffers which ensures a cyclic dependency is never created. � Implementations: Turn model [5], XY routing, Up- Down routing [20]. � Limitations: � Routing Restrictions: Increased Latency, Throughput loss, Energy overhead � Require large no. of VCs for fully adaptive routing.

  7. 7 Solution II: Duato’s Theory � Adds buffers to create a deadlock free escape path that can be used to avoid/recover from deadlocks. � Implementation: turn restrictions in escape-VC. A F F Escape-VC VC0 B E E C D VC0 Escape-VC

  8. 8 Solution II: Duato’s Theory � Adds buffers to create a deadlock free escape path that can be used to avoid/recover from deadlocks. � Implementation: turn restrictions in escape-VC. � Limitations: � Energy and Area overhead of escape VCs. � Additional routing tables/logic for routing within escape-VC.

  9. 9 Other Solutions � Solution III: Flow Control � Restrict injection when no. of empty buffers fall below a threshold � Implementation: Bubble Flow Control [9] � Limitation: Implementation Complexity, Throughput Loss. � Solution IV: Deflection Routing � Assign every flit to some output port even if they get misrouted . � Implementation: BLESS [10], CHIPPER [35] � Limitation: Livelocks, non-minimal routing

  10. 10 Comparison of Deadlock Freedom Theories Metric Acyclic No Packet Topology VC cost for Mesh Theory Livelock Routing CDG not Injection Indepen- Free Minimal Adaptive Required Restrictions dent 1 6 Dally 1 2 Duato Can we do better ?? Flow 2 2 Control Deflection 1 Routing 1 1 SPIN

  11. 11 Outline � Routing Deadlocks � State of the Art � Dally’s Theory � Duato’s Theory � Flow Control Routing � Deflection Routing � SPIN : S ynchronized P rogress in I nterconnection N etworks � Evaluations � Conclusion

  12. 12 SPIN : Key Idea � Simultaneous Synchronized Movement of all deadlocked packets in the loop is called a spin . What if: We coordinate the movement A F of every packet to the next hop at a given time ?? spin complete B E Simultaneous Deadlock Synchronized C D Movement

  13. 13 SPIN : Key Idea � Simultaneous Synchronized Movement of all deadlocked packets in the loop is called a spin . � Each spin leads to one hop forward movement of all deadlocked packets. � One spin may not resolve the deadlock. If so, spin can be repeated � Deadlock is guaranteed to be resolved in a finite number of spins [proof in paper, Sec. III]

  14. 14 SPIN : Key Idea A F First spin complete B E Second spin complete C D

  15. 15 SPIN : Key Idea E D Deadlock Resolved F C Packets E &B exit the loop A B

  16. 16 Outline � Routing Deadlocks � State of the Art � SPIN : S ynchronized P rogress in I nterconnection N etworks � Key Idea � Implementation Example � Micro-architecture � FAvORS � Evaluations � Conclusion

  17. 17 SPIN: Implementation Example � SPIN is a generic deadlock freedom theory that can have multiple implementations . � We choose a recovery approach as deadlocks are rare scenarios (See Sec. II-F). � Our Implementation : � Detect the Deadlock. � Coordinate a time for spin. � Execute the spin.

  18. 18 Implementation Example : Detect Deadlocks � Use counters . � Placed at every node at design time. � Optimize by exploiting topology symmetry (See Static Bubble [6]). � If packet does not leave in threshold time (configurable), it indicates a 
 potential deadlock . � Counter expired ? Send probe to verify deadlock.

  19. 19 Implementation Example : Probe Msg. Probe Returns: A F Deadlock Confirmed 1. Deadlock 
 Detection Counter Expires at Node 2. Coordinating 
 the spin. 5 Probe B 3. Executing 
 E the spin. Send Probe C D

  20. 20 Implementation Example : Probe Msg. � Probe is a special message that tracks the buffer dependency . � Probe returns to sender: � Cyclic buffer dependence, hence deadlock . � Next, send a move msg. to convey the spin time � Upon receiving move msg., router sets its counter to count to spin cyle .

  21. 21 Implementation Example : Move Msg. A F 1. Deadlock 
 Detection Set counter to count to spin cycle 2. Coordinating 
 the spin. Move B 3. Executing 
 E the spin. Move returns Send Move C D

  22. 22 Implementation Example : spin A F 1. Deadlock 
 Detection 2. Coordinating 
 the spin. B Counters expire 3. Executing 
 E together in the spin the spin. cycle C D

  23. 23 Implementation Example : spin F E 1. Deadlock 
 Detection 2. Coordinating 
 the spin. A 3. Executing 
 D the spin. B C

  24. 24 Multiple SPIN Optimization � Resolving a deadlock may require multiple spins � After spin, router can resume normal operation. � Counter expires again, process repeated. � Optimization: send probe_move after spin is complete. � probe_move checks if deadlock still exists and if so, sets the time for the next spin. � Details in paper (Sec. IV-B).

  25. 25 Outline � Routing Deadlocks � State of the Art � SPIN : S ynchronized P rogress in I nterconnection N etworks � Key Idea � Implementation Example � Micro-architecture � FAvORS � Evaluations � Conclusion

  26. 26 Implementation Micro-architecture � No additional links: Spl. Msgs. use the same links as regular flits. � Spl. Msgs. have higher priority in link usage over regular flits. � Links are anyways idle during deadlocks. � Bufferless Forwarding: Spl. Msgs. are not buffered anywhere (either forwarded or dropped). � Distributed Design: any router can initiate the recovery. � 4% area overhead compared to traditional mesh router in 15nm Nangate [42].

  27. 27 Outline � Routing Deadlocks � State of the Art � SPIN : S ynchronized P rogress in I nterconnection N etworks � Key Idea � Walkthrough Example � Micro-architecture � FAvORS � Evaluations � Conclusion

  28. 28 FAvORS Routing Algorithm � SPIN is the first scheme that enables true one-VC fully adaptive deadlock-free routing for any topology . � FAvORS : F ully A dapti v e O ne-vc R outing with S PIN. � Algorithm has two flavors: � Minimal Adaptive � Non-minimal Adaptive. � Route Selection Metrics: � Credit turn-around time � Hop Count � More details in paper (Sec. V).

  29. 29 Outline � Routing Deadlocks � State of the Art � SPIN : Synchronized Progress in Interconnection Networks � Evaluations � Conclusion

  30. 30 Evaluations � Network Configuration: Simulator gem5 simulator + Garnet 2.0 Network model Topologies 1024 node Off-chip 8x8 Mesh Dragon-fly Link 1-cycle Inter-group: 3-cycle Latency Intra-group: 1-cycle Traffic Synthetic + Synthetic Multi-threaded (PARSEC)

  31. 31 Evaluations : Baselines � 8x8 Mesh: Design Routing Minimal Theory Deadlock Adaptivity Freedom Type West-first Routing Partial Yes Dally Avoidance Escape-VC Full Yes Duato Avoidance Static-Bubble [6] Full Yes Flow-Control Recovery � 1024 Node Off-chip Dragon-fly: Design Routing Minimal Theory Deadlock Adaptivity Freedom Type UGAL [37] Full No Dally Avoidance

  32. 32 Saturation Throughput � 1024-node Off-chip Dragon-fly: Neighbor Bit-complement 100 100 Latency (cycles) Latency (cycles) 75 75 50 50 25 25 0 0 0.01 0.08 0.15 0.22 0.29 0.36 0.01 0.03 0.05 0.07 0.09 0.11 Inj. Rate (flits/node/cycle) Inj. Rate (flits/node/cycle) FAvORS_NMin_1VC UGAL_3VC Minimal_1VC UGAL_3VC SPIN SPIN Dally SPIN 62% higher 50% higher throughput 25% higher throughput throughput compared to compared to UGAL_Dally compared to UGAL_Dally Minimal Routing 1-VC

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend