finding a needle in the haystack of hardened interconnect
play

Finding a Needle in the Haystack of Hardened Interconnect Patterns - PowerPoint PPT Presentation

Finding a Needle in the Haystack of Hardened Interconnect Patterns S. Nikoli, G. Zgheib*, and P. Ienne FPL19, Barcelona, 09.09.2019 cole Polytechnique Fdrale de Lausanne *Intel Corporation Why harden connections? 2 crossbar LUT LUT


  1. Finding a Needle in the Haystack of Hardened Interconnect Patterns S. Nikolić, G. Zgheib*, and P. Ienne FPL19, Barcelona, 09.09.2019 École Polytechnique Fédérale de Lausanne *Intel Corporation

  2. Why harden connections? 2 crossbar LUT LUT LUT

  3. Why harden connections? 2 crossbar LUT LUT LUT

  4. Why harden connections? 2 crossbar LUT LUT LUT

  5. Why harden connections? 2 crossbar LUT LUT LUT

  6. Why harden connections? 2 crossbar LUT LUT LUT

  7. Why harden connections? 2 crossbar LUT LUT LUT

  8. Why harden connections? 2 crossbar LUT LUT LUT

  9. What is the price? 3 crossbar LUT LUT LUT Circuit to be mapped Cluster architecture

  10. XC4000 [1] UTFPGA1 [2] Triptych [3] [1] H.-C. Hsieh, W. S. Carter, J. Ja, E. Cheung, S. Schreifels, C. Erickson, P. Freidin, L. Tinkey, and R. Kanazawa. Third-generation architecture boosts speed and density of fi eld-programmable gate arrays, 1990 [2] P. Chow, S. O. Seo, D. Au, B. Fallah, C. Li, and J. Rose. A 1.2um CMOS FPGA using cascaded logic blocks and segmented routing, 1991 [3] C. Ebeling, G. Borriello, S. A. Hauck, D. Song, E. A. Walkup. TRIPTYCH: A New FPGA Architecture, 1991

  11. 5 � 5-LUT � 10 8 Challenges How to design the patterns? • Intuition? • Enumeration How to map on patterns? (CAD tool scalability) 5 12 LUT LUT LUT LUT LUT

  12. 5 � 5-LUT � 10 8 Challenges How to design the patterns? • Intuition? • Enumeration How to map on patterns? (CAD tool scalability) 5 12 LUT LUT LUT LUT LUT

  13. 5 � 5-LUT � 10 8 Challenges How to design the patterns? • Intuition? • Enumeration How to map on patterns? (CAD tool scalability) 5 12 LUT LUT LUT LUT LUT

  14. 5 � 5-LUT � 10 8 Challenges How to design the patterns? • Intuition? • Enumeration How to map on patterns? (CAD tool scalability) 5 12 LUT LUT LUT LUT LUT

  15. Challenges How to design the patterns? • Intuition? • Enumeration How to map on patterns? (CAD tool scalability) 5 12 5 � 5-LUT � 10 8 LUT LUT LUT LUT LUT

  16. Enumeration

  17. Representation • represent each LUT by a node (circles) • only represent shared inputs (triangles) • each edge is a hardened connection 6 I I LUT LUT LUT LUT LUT LUT

  18. Representation • represent each LUT by a node (circles) • only represent shared inputs (triangles) • each edge is a hardened connection 6 I I LUT LUT LUT LUT LUT LUT

  19. Representation • represent each LUT by a node (circles) • only represent shared inputs (triangles) • each edge is a hardened connection 6 I I LUT LUT LUT LUT LUT LUT

  20. Representation • represent each LUT by a node (circles) • only represent shared inputs (triangles) • each edge is a hardened connection 6 I I LUT LUT LUT LUT LUT LUT

  21. Representation • represent each LUT by a node (circles) • only represent shared inputs (triangles) • each edge is a hardened connection 6 I I LUT LUT LUT LUT LUT LUT

  22. Enumeration (no input sharing for now) 7 a c b //V - vertex set G = (V, {}) expandable = (G) while expandable { G = pop(expandable) for e in V x V { if keep(G + e) { push(G + e, expandable) } } }

  23. Enumeration (no input sharing for now) 7 a c b //V - vertex set G = (V, {}) expandable = (G) a c b a c b while expandable { G = pop(expandable) for e in V x V { if keep(G + e) { push(G + e, expandable) } } }

  24. Enumeration (no input sharing for now) 7 a c b //V - vertex set G = (V, {}) expandable = (G) a c b a c b while expandable { G = pop(expandable) for e in V x V { a c a c b b a c b a c if keep(G + e) { b push(G + e, expandable) } } }

  25. Enumeration (no input sharing for now) 7 a c b //V - vertex set G = (V, {}) expandable = (G) a c b a c b while expandable { G = pop(expandable) for e in V x V { a c a c b b a c b a c if keep(G + e) { keep b push(G + e, expandable) } } }

  26. When to stop? When area or delay stop decreasing? When area or delay start increasing? 8

  27. When to stop? When area or delay stop decreasing? When area or delay start increasing? 8

  28. When to stop? 9 Circuit to be mapped No hardened connections 7 LUT LUT LUT

  29. 9 When to stop? Circuit to be mapped With hardened connections 7 No hardened connections LUT LUT LUT 7 LUT LUT LUT

  30. 9 When to stop? Circuit to be mapped With hardened connections 7 No hardened connections LUT LUT LUT 7 LUT LUT LUT

  31. When to stop? 9 Circuit to be mapped With hardened connections 7 No hardened connections LUT LUT LUT 7 7 LUT LUT LUT LUT LUT LUT

  32. When to stop? 9 Circuit to be mapped With hardened connections 7 No hardened connections LUT LUT LUT 7 7 LUT LUT LUT LUT LUT LUT When area or delay stop decreasing? When area or delay start increasing?

  33. When to stop?

  34. Other issues: avoiding listing duplicates 11 A A B C B C

  35. Other issues: maintaining subgraph relations 12 x y z xx xy xz yy yz zz xxx xxy xxz xyy xyz xzz yyy yyz yzz zzz G H 1 H 2

  36. Challenges How to design the patterns? • Intuition? • Enumeration How to map on patterns? (CAD tool scalability) 13 12 5 � 5-LUT � 10 8 LUT LUT LUT LUT LUT

  37. Experiments

  38. ( � 10 8 patterns) Setup • Search space: acyclic five 5-LUT patterns • Architecture = 4x the pattern with a shared crossbar (20 5-LUT clusters) 14

  39. Setup • Search space: acyclic five 5-LUT patterns • Architecture = 4x the pattern with a shared crossbar (20 5-LUT clusters) 14 ( � 10 8 patterns)

  40. Setup • Search space: acyclic five 5-LUT patterns • Architecture = 4x the pattern with a shared crossbar (20 5-LUT clusters) 14 ( � 10 8 patterns)

  41. Results Found 261 patterns with only 12 external inputs achieving 15 Some examples a e d b e b a c c d 12 b e a d e a c b d c � 80 % packing density

  42. Results 16 95 90 85 utilization [%] 80 75 70 65 blob_merge boundtop ch_intrinsics mkDelayWorker32B diffeq1 diffeq2 mkPktMerge mkSMAdapter4B or1200 raygentop stereovision0 sha stereovision1 stereovision3

  43. Conclusions Numerical results not satisfactory (18-29% critical path delay increase) But... We have an efficient way of searching for good patterns • search techniques completely independent of the mapping algorithms In the future, this should help us understand what makes a good pattern and profit from connection hardening to the fullest 17 • searched the space � 10 8 in < 12h

  44. Thank you for attention For questions, please see the poster

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend