deadlock recovery support for fault tolerant routing
play

Deadlock-Recovery Support for Fault-tolerant Routing Algorithms in - PowerPoint PPT Presentation

MCSoC-13 National Institute of Informatics, Tokyo, Japan, September 26-28, 2013 Deadlock-Recovery Support for Fault-tolerant Routing Algorithms in 3D-NoC Architectures Akram Ben Ahmed, Achraf Ben Ahmed, Abderazek Ben Abdallah The University of


  1. MCSoC-13 National Institute of Informatics, Tokyo, Japan, September 26-28, 2013 Deadlock-Recovery Support for Fault-tolerant Routing Algorithms in 3D-NoC Architectures Akram Ben Ahmed, Achraf Ben Ahmed, Abderazek Ben Abdallah The University of Aizu Graduate School of Computer Science and Engineering, Adaptive Systems Laboratory, Aizu-Wakamatsu, Japan. Email:d8141104@u-aizu.ac.jp The University of Aizu Adaptive systems lab 1

  2. Outline • Background • Motivation and goal • Look-Ahead-Fault-Tolerant routing • RAB mechanism for deadlock-recovery • Evaluation • Conclusion and future work The University of Aizu Adaptive systems lab 2

  3. Outline • Background • Motivation and goal • Look-Ahead-Fault-Tolerant routing • RAB mechanism for deadlock-recovery • Evaluation • Conclusion and future work The University of Aizu Adaptive systems lab 3

  4. Background: 3D-NoC systems 4mm** • 2D-NoC limitations: – Large diameter – High power 200um** • 3D-NoC merits – High scalability Z Y – Low interconnect power X – Heterogonous integration Typical 3D-NoC structure (3D-OASIS-NoC) * * A. Ben Ahmed and A. Ben Abdallah. Architecture and Design of High-throughput, Low-latency, and Fault-Tolerant Routing Algorithm for 3D-Network-on-Chip (3D-NoC). The Journal of Supercomputing, Apr.2013, DOI: 10.1007/s11227-013-0940-9. ** S. Lakhani, Y. Wang, A. Milenkovic, and V. Milutinovic , “2 -D matrix multiplication on a 3- D systolic array,” Microelectron. Journal , vol. 27, no.1, pp. 11 – 22, Feb. 1996. The University of Aizu Adaptive systems lab 4

  5. Background: Fault Tolerance • 3D-NoC systems are complex and they are susceptible to a variety kinds of faults that can be caused by different factors: – Physical damage – Crosstalk – Thermal power etc.. • Types : permanent, transient, and intermittent. • Faults can cause the information corruption or the entire system failure The University of Aizu Adaptive systems lab 5

  6. Background: Fault Tolerant Routing Algorithm 233 230 231 232 223 220 221 222 213 210 211 212 133 30 31 32 203 200 201 202 123 20 21 22 113 110 111 112 033 30 31 32 103 100 101 102 023 20 21 22 Z 013 010 011 012 Since fault tolerant routing Fault tolerant Routing 003 Y 000 001 002 algorithms are adaptive, algorithms can be used the deadlock problem is to redirect the flits to X one of the main concern non-faulty links The University of Aizu Adaptive systems lab 6

  7. Background: Deadlock (Example) 200 202 201 201 Permanent Fault link Valid link 201 201 210 211 212 202 202 211 211 200 201 202 212 203 212 212 222 222 The University of Aizu Adaptive systems lab 7

  8. Background: Deadlock (Example) 200 201 Permanent Fault link Valid link 201 202 201 201 210 211 212 202 202 211 200 201 202 211 203 212 212 222 212 222 The University of Aizu Adaptive systems lab 8

  9. Background: Deadlock (Example) Permanent Fault link 200 Valid link 202 201 201 BLOCK 201 201 210 211 212 202 211 BLOCK DEADLOCK BLOCK 202 BLOCK 211 200 201 202 212 203 212 212 222 222 The University of Aizu Adaptive systems lab 9

  10. Background: Virtual Channels Dest 200 Permanent Fault link Valid link Dest 202 Dest 201 Dest 201 211 212 Dest 212 Dest 203 201 202 Dest 211 Dest 222 The University of Aizu Adaptive systems lab 10

  11. Background: Virtual Channels Dest 200 Permanent Fault link Valid link Dest 201 211 212 Dest 201 Dest 11 Dest 202 Dest 203 201 202 Dest 12 Dest 222 The University of Aizu Adaptive systems lab 11

  12. Background: Virtual Channels Permanent Fault link Valid link Dest 201 211 212 Dest 201 Dest 222 Dest 211 Dest 200 Dest 202 Dest 203 201 202 Dest 212 The University of Aizu Adaptive systems lab 12

  13. Background: Virtual Channels Permanent Fault link Valid link Dest 201 211 212 Dest 201 Dest 202 Dest 211 201 202 Dest 212 The University of Aizu Adaptive systems lab 13

  14. Background: Virtual Channels Permanent Fault link Valid link Dest 211 211 212 Dest 201 Dest 212 Dest 201 Dest 202 201 202 The University of Aizu Adaptive systems lab 14

  15. Outline • Background • Motivation and goal • Look-Ahead-Fault-Tolerant routing • RAB mechanism for deadlock-recovery • Evaluation • Conclusion and future work The University of Aizu Adaptive systems lab 15

  16. Motivation and Goal • Previously, we presented a high throughput fault tolerant routing algorithm named Look- Ahead-Fault-Tolerant (LAFT). • LAFT is an adaptive routing that takes advantage of look-ahead routing to enhance the system performance while guaranteeing fault tolerance. LAFT is susceptible to deadlock The University of Aizu Adaptive systems lab 16

  17. Motivation and Goal • Virtual Channels (VCs) are used in most systems to solve the deadlock – Expensive to implement – Require additional clock cycles for arbitration • We present Random-Access-Buffer mechanism to solve the deadlock problem at very low cost The University of Aizu Adaptive systems lab 17

  18. Outline • Background • Motivation and goal • Look-Ahead-Fault-Tolerant routing • RAB mechanism for deadlock-recovery • Evaluation • Conclusion and future work The University of Aizu Adaptive systems lab 18

  19. Look-Ahead-Fault-Tolerant: Example D Fault link Valid link Source node S Destination node D C N Current node C Next node N Current out-port Next out-port S 1- The current out-port is read from the flit and the next-node address is computed The University of Aizu Adaptive systems lab 20

  20. Look-Ahead-Fault-Tolerant: Example D Fault link Valid link Source node S Destination node D C N Current node C Next node N Current out-port Next out-port S 2- The three possible direction are calculated: North, East, and Up The University of Aizu Adaptive systems lab 21

  21. Look-Ahead-Fault-Tolerant: Example D Fault link Valid link Source node S Destination node D C N Current node C Next node N Current out-port Next out-port S 3- When verifying the link status of the three directions, two possible directions are computed: North and UP (East is faulty) The University of Aizu Adaptive systems lab 22

  22. Look-Ahead-Fault-Tolerant: Example D Fault link Valid link Source node S Destination node D C N Current node C Next node N Current out-port Next out-port S 4- When calculating the diversity value of each direction, North has the highest one: North=3 (North, east, and up); Up=2 (North and east) The University of Aizu Adaptive systems lab 23

  23. Look-Ahead-Fault-Tolerant: Example D Fault link Valid link Source node S Destination node D C N Current node C Next node N Current out-port Next out-port S 5- North is selected as the Next out-port and it is embedded in the flit to be used in the next downstream node The University of Aizu Adaptive systems lab 24

  24. Outline • Background • Motivation and goal • Look-Ahead-Fault-Tolerant routing • RAB mechanism for deadlock-recovery • Evaluation • Conclusion and future work The University of Aizu Adaptive systems lab 25

  25. Random-Access-Buffer mechanism: Architecture South North North data_out P2 P1 P1 data_in Next_port Wr_adr Rd_adr RAB_cntrl RAB_Wr_adr RAB_Rd_adr Select_Wr head tail Select_Rd sw_grnt FIFO manager FIFO Timer Timer Manages the input RAB manager manager If the flit’s request buffer when no When receiving the flag, Wr_adr Rd_adr deadlock_flag is not served after deadlock is detected it drops the request of a period of time a the blocking flit and flag is issued search for another one RAB manager The University of Aizu Adaptive systems lab 26

  26. Random-Access-Buffer mechanism: Example data-out North North East Up P3 P2 P1 P1 data-out Next-port Wrt_adr Rd_adr North Sw-gr 0 Status-register 0 0 0 0 Used to keep the status of the RAB cntrl blocking flits Timer The University of Aizu Adaptive systems lab 27

  27. Random-Access-Buffer mechanism: Example data-out North North East Up P3 P2 P1 P1 data-out Next-port Wrt_adr Rd_adr North sw-gr 0 0 0 0 0 RAB cntrl Timer -Timer informs that the flit being processed did not get the grant and it is blocked The University of Aizu Adaptive systems lab 28

  28. Random-Access-Buffer mechanism: Example data-out North North East Up P3 P2 P1 P1 data-out Next-port Wrt_adr Rd_adr North sw-gr 0 0 0 0 0 0 0 1 1 RAB cntrl Timer -The request is dropped and the status-register is updated for the entire packet -RAB cntrl reads the next packet Next-port The University of Aizu Adaptive systems lab 29

  29. Random-Access-Buffer mechanism: Example data-out North North East Up P3 P2 P1 P1 data-out Next-port Wrt_adr Rd_adr East sw-gr 1 0 0 1 1 0 0 0 0 RAB cntrl Timer -The next flit is checked and served The University of Aizu Adaptive systems lab 30

  30. Random-Access-Buffer mechanism: Example data-out North North Up P3 P1 P1 data-out Next-port Wrt_adr Rd_adr Up sw-gr 1 0 0 1 1 RAB cntrl Timer - When assigning the Wrt-adr , the RAB cntrl check the status register and assign an unoccupied slot to ovoid flit overwriting The University of Aizu Adaptive systems lab 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend