priority based wormhole networks on chip challenges and
play

Priority-based Wormhole Networks-on-Chip: challenges and - PowerPoint PPT Presentation

Leandro Soares Indrusiak Priority-based Wormhole Networks-on-Chip: challenges and opportunities Leandro Soares Indrusiak Real-Time Systems Group Department of Computer Science University of York United Kingdom RTN 2017 1 Real-Time Systems


  1. Leandro Soares Indrusiak Wormhole Networks-on-Chip PE PE PE Router R R R Core PE PE PE R R R PE PE PE Link R R R 44 Real-Time Systems Group

  2. Leandro Soares Indrusiak Wormhole Networks-on-Chip PE PE PE R R R PE PE PE R R R PE PE PE Link R R R 45 Real-Time Systems Group

  3. Leandro Soares Indrusiak Wormhole Networks-on-Chip arbitration PE PE PE R R R data out data in PE PE PE routing routing data out data in & & transmission transmission data out data in control control R R R data out data in PE PE PE data in data out R R R 46 Real-Time Systems Group

  4. Leandro Soares Indrusiak Wormhole Networks-on-Chip 47 Real-Time Systems Group

  5. Leandro Soares Indrusiak NoC parallelism and scalability CPU I/O CPU CPU Multiple connections simultaneously RAM CPU CPU CPU 48 Real-Time Systems Group

  6. Leandro Soares Indrusiak NoC performance CPU I/O CPU CPU link contention task contention leads to latency leads to latency variability variability RAM CPU CPU CPU 49 Real-Time Systems Group

  7. Leandro Soares Indrusiak Time predictability in embedded NoCs  Ability to guarantee an upper bound frequency on the system’s temporal behaviour upper bound  worst-case response time of each task  worst-case latency of each NoC packet  worst-case end-to-end latencies of communicating task chains time frequency  Ability to constrain the variability of the system’s temporal behaviour  limited best/worst case difference time 50 Real-Time Systems Group

  8. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 51 Real-Time Systems Group

  9. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 52 Real-Time Systems Group

  10. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 53 Real-Time Systems Group

  11. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 54 Real-Time Systems Group

  12. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 55 Real-Time Systems Group

  13. Leandro Soares Indrusiak Wormhole Networks-on-Chip packet is blocked R R R R R R Packet Header Packet Data PE PE 56 Real-Time Systems Group

  14. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 57 Real-Time Systems Group

  15. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 58 Real-Time Systems Group

  16. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header new packet Packet Data PE PE released 59 Real-Time Systems Group

  17. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 60 Real-Time Systems Group

  18. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 61 Real-Time Systems Group

  19. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 62 Real-Time Systems Group

  20. Leandro Soares Indrusiak Wormhole Networks-on-Chip R R R R R R Packet Header Packet Data PE PE 63 Real-Time Systems Group

  21. Leandro Soares Indrusiak Performance guarantees in embedded NoCs  As the core counts increase, NoC link contention tends to me the dominant source of latency variability  Current solutions  Full traffic separation (i.e. no link contention) • deterministic routing, fully disjoint routes (e.g. Hermes) • multiple overlay networks (e.g. Tilera) - contention over NIs and memory still possible • circuit switching (e.g. PNoC) - unpredictable circuit setup time • very low utilisation • state of the art: mixed criticality, virtual traffic separation 64 Real-Time Systems Group

  22. Leandro Soares Indrusiak Performance guarantees in embedded NoCs  As the core counts increase, NoC link contention tends to me the dominant source of latency variability  Current solutions  Virtual traffic separation • time-division multiplexing (TDM) - fixed traffic slotting (e.g. Aethereal, AElite) • round-robin (RR) - rate controlling (e.g. Kalray, Nostrum, IDAMC) • fixed-priority (FP) - priority-arbitrated virtual channels (e.g. QNoC) 65 Real-Time Systems Group

  23. Leandro Soares Indrusiak Priority preemptive virtual channels  Wormhole NoCs using virtual channels with priority preemptive arbitration can discriminate packets of different levels of urgency  Matches previous work on schedulability analysis in priority-preemptive wormhole networks 66 Real-Time Systems Group

  24. Leandro Soares Indrusiak Priority preemptive virtual channels PE PE PE highest priority highest priority priority ID with remaining credit with remaining credit R R R PE PE PE data_out data_in … … routing routing R R R credit_out credit_in & & transmission transmission PE PE PE control control R R R … … 67 Real-Time Systems Group

  25. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 68 Real-Time Systems Group

  26. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 69 Real-Time Systems Group

  27. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header high priority Packet Data PE PE packet released 70 Real-Time Systems Group

  28. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 71 Real-Time Systems Group

  29. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 72 Real-Time Systems Group

  30. Leandro Soares Indrusiak first packet is Priority preemptive virtual channels preempted R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 73 Real-Time Systems Group

  31. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 74 Real-Time Systems Group

  32. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 75 Real-Time Systems Group

  33. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 76 Real-Time Systems Group

  34. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 77 Real-Time Systems Group

  35. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 78 Real-Time Systems Group

  36. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 79 Real-Time Systems Group

  37. Leandro Soares Indrusiak Priority preemptive virtual channels R R R wormhole NoC with priority preemptive virtual channels R R R Packet Header Packet Data PE PE 80 Real-Time Systems Group

  38. Leandro Soares Indrusiak Priority-preemptive wormhole NoCs Pros vs Cons 81 Real-Time Systems Group

  39. Leandro Soares Indrusiak Priority-preemptive wormhole NoCs  Cons  not available as COTS 82 Real-Time Systems Group

  40. Leandro Soares Indrusiak Priority-preemptive wormhole NoCs Xilinx Artix FPGA  Cons  hardware overhead related to virtual channel buffering and arbitration B. Sudev, L. S. Indrusiak: Low overhead predictability enhancement in non-preemptive network-on-chip routers using Priority Forwarded Packet Splitting. ReCoSoC 2014. 83 Real-Time Systems Group

  41. Leandro Soares Indrusiak Priority-preemptive wormhole NoCs Xilinx Artix FPGA  Cons  hardware overhead related to virtual channel buffering and arbitration simple round-robin, no traffic shaping B. Sudev, L. S. Indrusiak: Low overhead predictability enhancement in non-preemptive network-on-chip routers using Priority Forwarded Packet Splitting. ReCoSoC 2014. 84 Real-Time Systems Group

  42. Leandro Soares Indrusiak Priority-preemptive wormhole NoCs Xilinx Artix FPGA  Cons  hardware overhead related to virtual channel buffering and arbitration priority non- OPEN preemptive PROBLEM arbitration [Sudev ALERT & Indrusiak, ReCoSoC 2014] B. Sudev, L. S. Indrusiak: Low overhead predictability enhancement in non-preemptive network-on-chip routers using Priority Forwarded Packet Splitting. ReCoSoC 2014. 85 Real-Time Systems Group

  43. Leandro Soares Indrusiak Priority-preemptive wormhole NoCs Xilinx Artix FPGA  Cons  hardware overhead related to virtual channel buffering and arbitration priority preemptive arbitration, 4 VCs with 2 position buffers each B. Sudev, L. S. Indrusiak: Low overhead predictability enhancement in non-preemptive network-on-chip routers using Priority Forwarded Packet Splitting. ReCoSoC 2014. 86 Real-Time Systems Group

  44. Leandro Soares Indrusiak Priority-preemptive wormhole NoCs  Pros  notion of priorities is very intuitive and natural  no waste of bandwidth through reservation mechanisms  amenable to tight analysis methods (more on this later)  virtual separation of traffic  accommodates change in traffic properties (periods, packet sizes, jitter) 87 Real-Time Systems Group

  45. Leandro Soares Indrusiak Priority-preemptive wormhole NoCs  Pros  simple protocols to handle mixed-criticality traffic R R R R R R C C C C C C R R R R R R C C C C C C after a mode change, routers R R R R R R arbitrate links in criticality order, C C C C C C and in priority order within the mode change same criticality notification L. S. Indrusiak, J. Harbin, A. Burns: Average and Worst-Case Latency Improvements in Mixed-Criticality Wormhole Networks-on-Chip. ECRTS 2015. 88 Real-Time Systems Group

  46. Leandro Soares Indrusiak Priority-preemptive wormhole NoCs vs 89 Real-Time Systems Group

  47. Leandro Soares Indrusiak Outline  Wormhole Networks  Networks-on-Chip  Real-Time Analysis  Resource Management 90 Real-Time Systems Group

  48. Leandro Soares Indrusiak Performance evaluation  How to estimate performance figures for a particular application mapped to a Network-on-Chip?  full system prototyping • cores + NoC in FPGA, running OS + application • extremely costly setup time, can only explore few design alternatives  accurate system simulation • cycle-accurate model of cores + NoC, running OS + application • extremely long simulation time, can only explore few design alternatives  approximately-timed system simulation • approximately-timed model of cores + NoC, executing an abstract model of the OS + application  analytical system performance models • average or worst-case latency estimation for restricted application styles (periodic independent tasks, synchronous dataflow, etc.) 91 Real-Time Systems Group

  49. Leandro Soares Indrusiak Performance evaluation  How to estimate performance figures for a particular application mapped to a Network-on-Chip?  full system prototyping • cores + NoC in FPGA, running OS + application • extremely costly setup time, can only explore few design alternatives  accurate system simulation • cycle-accurate model of cores + NoC, running OS + application • extremely long simulation time, can only explore few design alternatives  approximately-timed system simulation • approximately-timed model of cores + NoC, executing an abstract model of the OS + application  analytical system performance models • average or worst-case latency estimation for restricted application styles (periodic independent tasks, synchronous dataflow, etc.) 92 Real-Time Systems Group

  50. Leandro Soares Indrusiak Real-Time Analysis  First approaches to analyse priority-preemptive wormhole networks came during the 90s  Mutka (1994)  Hary and Ozguner (1997)  Key idea is to consider the entire path of a packet as a single shared resource  worst-case latency bound of a packet flow can be found by analysing the higher priority packet flows that share at least one link of its route 93 Real-Time Systems Group

  51. Leandro Soares Indrusiak Real-Time Analysis t4 t2 t1 interference t1 graph PE PE PE R R R t3 PE PE PE t2 R R R t3 PE PE PE R R R t4 pri(t1)>pri(t2)>pri(t3)>pri(t4) 94 Real-Time Systems Group

  52. Leandro Soares Indrusiak Real-Time Analysis  Kim et al (1998) recognised that direct interferences are not enough to produce t2 correct upper bounds t1 PE PE PE  Indirect interference must R R R t3 PE PE PE be considered, in order to take into account back-to- R R R PE back hits caused by PE PE upstream indirect R R R interference pri(t1)>pri(t2)>pri(t3) 95 Real-Time Systems Group

  53. Leandro Soares Indrusiak Real-Time Analysis  With the introduction of Networks-on-Chip in the 2000s, the approach of Kim et al was revisited by Lu et al (ASP DAC 2005)  aiming to provide upper bounds to sporadic packets over NoCs with priority preemptive virtual channels  flawed assumption of a critical instant where all packets start flowing simultaneously 96 Real-Time Systems Group

  54. Leandro Soares Indrusiak Real-Time Analysis  Shi and Burns (NOCS 2008) corrected the flaw on Lu et al and produced a response time formulation that uses a conservative approach to upstream indirect interference interference jitter OPEN I = R j -L j J j PROBLEM ALERT 97 Real-Time Systems Group

  55. Leandro Soares Indrusiak Real-Time Analysis  Several lines of work were derived from Shi and Burns 2008  highly cited: 145 (Google Scholar)  many works on priority assignment and task mapping  a few on analysis improvement, aiming to make it tighter • Nikolic et al (arxiv 2016) considered that the interference should not be calculated based on the full path, but the contention domain • Kashif et al (Trans Comp 2015) attempted to analyse packet paths on a link-by-link manner, but assumed infinite buffering (i.e. did not consider backpressure) • Kashif and Patel (RTAS 2016) attempted to consider buffering and backpressure effects • all of them upper-bounded by Shi and Burns 2008 98 Real-Time Systems Group

  56. Leandro Soares Indrusiak Real-Time Analysis  Xiong et al (GLSVLSI 2016) has made two key contributions  new formulation to the upstream indirect interference problem, aiming to be tighter than Shi and Burns 2008  new formulation to the downstream indirect interference problem, aiming to capture a previously unseen issue, and showing that Shi and Burns 2008 is optimistic and unsafe (and so are all the analyses upper-bounded by it) 99 Real-Time Systems Group

  57. Leandro Soares Indrusiak Real-Time Analysis  Indrusiak et al (arxiv 2016) has shown that  Xiong et al’s formulation to the upstream indirect interference problem was flawed  Xiong et al’s formulation to the downstream indirect interference problem was correct, but unnecessarily pessimistic (i.e. it assumed all indirect interference as if it is direct interference)  a tighter upper bound that considers the downstream indirect interference problem is possible  Xiong et al published a corrected analysis on IEEE Trans Comp in 2017 OPEN PROBLEM ALERT 100 Real-Time Systems Group

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend