fpga architecture support for heterogeneous relocatable
play

FPGA Architecture Support for Heterogeneous, Relocatable Partial - PowerPoint PPT Presentation

24th International Conferenceon Field Programmable Logic and Applications September 3 rd , 2014 FPGA Architecture Support for Heterogeneous, Relocatable Partial Bitstreams Christophe H URIAUX v , Olivier S ENTIEYS v , Russell T ESSIER


  1. 24th International Conferenceon Field Programmable Logic and Applications September 3 rd , 2014 FPGA Architecture Support for Heterogeneous, Relocatable Partial Bitstreams Christophe H URIAUX v , Olivier S ENTIEYS v ★ , Russell T ESSIER ✜ University of Rennes 1, France v Inria, France ★ University of Massachusetts, USA ✜ C. Huriaux, O. Sentieys and R. Tessier - 1 September 3 rd , 2014 1

  2. Outline § Introduction § Overview of the FlexTiles project § Architecture Overview § Advantages of 3-D Stacking § Principles § Task Migration in an FPGA § Task Migration in FlexTiles § Heterogeneous case § Approach § Coping with Heterogeneity § Design Constraints § Results § Implementation in VPR § Conclusion C. Huriaux, O. Sentieys and R. Tessier - 2 September 3 rd , 2014 2

  3. FP7 FlexTiles Project § FlexTiles: Self adaptive heterogeneous manycore based on Flexible Tiles § Provide a heterogeneous many-core architecture offering § Large flexibility § High-performance, energy efficiency § Raised programming efficiency § Self-adaptation through virtualization C. Huriaux, O. Sentieys and R. Tessier - 3 September 3 rd , 2014 3

  4. Architecture Overview § 3D-Stacked Heterogeneous manycore § General Purpose Processors (GPP) § for flexibility and programming homogeneity § Network On Chip § Dedicated hardware accelerators mapped at run-time on a reconfigurable layer § Reconfigurable layer with seamless task migration capabilities § Virtualization layer to provide an abstraction of the manycore and self adaptive services § Tool-chain for parallelization and compilation C. Huriaux, O. Sentieys and R. Tessier - 4 September 3 rd , 2014 4

  5. Architecture Overview 3D interface to the NoC DSP blocks Memory blocks C. Huriaux, O. Sentieys and R. Tessier - 5 - 5 September 3 rd , 2014 5

  6. Task migration § Classical problem in dynamic reconfiguration[1] § Enhance resource usage ? 4x4 [1] K. Compton, Z. Li, J. Cooley, S. Knol, and S. Hauck, “Configuration relocation and defragmentation for run-time reconfigurable computing,” IEEE Transactions on VLSI Systems, vol. 10, no. 3, pp. 209 –220, 2002. C. Huriaux, O. Sentieys and R. Tessier - 6 September 3 rd , 2014 6

  7. 3D Stacking § 3D-Stacked Reconfigurable Accelerators § Improved resource usage § Improved bandwidth/latency § Improved performance and energy efficiency reconfigurable layer multicore layer Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core C. Huriaux, O. Sentieys and R. Tessier - 7 - 7 September 3 rd , 2014 7

  8. Task Migration in an FPGA I/O I/O I/O I/O I/O I/O I/O I/O § Predefined I/O I/O reconfigurable I/O regions HW Acce ccelerator #1 I/O BS #1 BS #1 I/O I/O § Bit-stream I/O I/O depends on task location I/O I/O I/O I/O HW Acce ccelerator #1 I/O BS # BS #2 I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O I/O C. Huriaux, O. Sentieys and R. Tessier - 8 September 3 rd , 2014 8

  9. Task Migration in FlexTiles § A task is synthesized, placed & routed into a Virtual Bit-Stream (VBS) § Independent from task physical location in the fabric § No predefined configuration domains § Resource sharing/distribution easiness, simplified task migration ��� ��� § Reconfiguration �� � ��� controller generates �� �� � �����2� final BS at run-time ���12�� ��� � � �1�321�� ��3�2���� ���1�����2�3�1�� ���12�� �1�321���2� ��3�� �� C. Huriaux, O. Sentieys and R. Tessier - 9 September 3 rd , 2014 9

  10. Task Migration in FlexTiles 3D NI 3D NI 3D NI 3D NI 3D NI HW Acce ccelerator #1 RAM DSP RAM DSP VBS #1 VBS #1 3D NI 3D NI 3D NI 3D NI 3D NI HW Acce ccelerator #2 #2 RAM DSP RAM DSP VBS VBS #2 3D NI 3D NI 3D NI 3D NI 3D NI C. Huriaux, O. Sentieys and R. Tessier - 10 September 3 rd , 2014 10

  11. Heterogeneity § Homogeneous case § No constraint on task placement § Regular routing architecture § Cope with heterogeneity § RAM, DSP, 3D I/Os § Migration is limited § vertically to the same column § to the next column containing same complex blocks Logic Element (LE) Configured LE Task C. Huriaux, O. Sentieys and R. Tessier - 11 September 3 rd , 2014 11

  12. Proposed architecture § Heterogeneous blocks routing is abstracted from logic routing § Long lines allow a trade-off between placement flexibility and routing complexity § A two-level routing is performed at runtime: § Logic routing (as in the homogeneous case) § Heterogeneous block routing through long lines C. Huriaux, O. Sentieys and R. Tessier - 12 September 3 rd , 2014 12

  13. Design Constraints § I/Os are made through 3D Network Interfaces, spread over the reconfigurable fabric M M D D 3D NI E E S S M M D P D P 3D NI 3D NI M M S S AI E E D D P P M M S S AI M M AI D D P Reconfiguration RAM P E E S S M M D D P P M M S S E E D D P P M M M S M S E E D P D P M S M S 3D NI 3D NI 3D NI M M D D P P E E S S M M D P D P AI AI AI M M S S E E D D P P M M S S M M D D P P E E S S M M M M D P D P E E S S M M D P D P Reconfiguration M M S S CTRL E E 3D NI 3D NI D D P P M M 3D NI S S M M D D P P E E AI AI S S M M AI D D P P M M S S E E P P M M C. Huriaux, O. Sentieys and R. Tessier - 13 September 3 rd , 2014 13

  14. Implementation in VPR § Versatile Place and Route (VPR), open source CAD tool for placement and routing § Part of the Verilog To Routing (VTR) framework § Source code modified to implement our techniques and deal with our constraints § Horizontal long-lines spread over partitions § Separate homogeneous and heterogeneous routing VPR and VTR: https://code.google.com/p/vtr-verilog-to-routing/ C. Huriaux, O. Sentieys and R. Tessier - 14 September 3 rd , 2014 14

  15. Implementation in VPR § Logic grid § Block placement F c =0.5 F c =1 § X: simple block Y X X § Y: 2 blocks tall § Mesh routing lines § Switch boxes X X § Interconnect VPR Original Routing Model C. Huriaux, O. Sentieys and R. Tessier - 15 September 3 rd , 2014 15

  16. Implementation in VPR § Logic grid § Block placement § Block typing Y X X § X: homogeneous § Y: heterogeneous § Mesh routing lines X X § Long lines § Switch boxes § Interconnect § Homogeneous § Heterogeneous Enhanced Routing Model C. Huriaux, O. Sentieys and R. Tessier - 16 September 3 rd , 2014 16

  17. Results § Architecture based on a simplified Stratix IV with: § Dual-port 144k memories § Fracturable 36x36 multipliers § Evaluation on two criteria § Delay of the critical path § Minimum channel width § Number of tracks in the homogeneous routing channels § Minimum channel width determined by VPR § Not directly related to silicon area C. Huriaux, O. Sentieys and R. Tessier - 17 September 3 rd , 2014 17

  18. Results § Benchmark set: VTR framework circuits [1] Circuit # Mem # Mult # LB bgm 0 11 2,174 boundtop 1 0 2,977 ch_intrinsics 1 0 272 diffeq1 0 5 41 diffeq2 0 5 43 LU8PEEng 45 8 30 mkDelayWorker32B 41 0 497 mkPktMerge 15 0 17 mkSMAdapter4B 5 0 181 or1200 2 1 273 raygentop 1 7 192 stereovision1 0 38 990 [1] Rose, Jonathan, Luu, Jason, Yu, Chi Wai, et al . The VTR project: architecture and CAD for FPGAs from verilog to routing. In Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays . ACM, 2012. p. 77-86. C. Huriaux, O. Sentieys and R. Tessier - 18 September 3 rd , 2014 18

  19. Results: Delay § Estimation of the worst case delay § Impossible to predict where connections to long lines will be done § Some channels crossing fixed-function blocks are longer C. Huriaux, O. Sentieys and R. Tessier - 19 September 3 rd , 2014 19

  20. Results: Delay ns ns propose sed/cl classi ssic 160,00 1,2 140,00 1 120,00 0,8 100,00 80,00 0,6 Crit. Path (classic) 60,00 Crit. Path. (enhanced) 0,4 40,00 Crit. Path. (ratio) 0,2 20,00 0,00 0 § Only 2% delay increase (in average) C. Huriaux, O. Sentieys and R. Tessier - 20 September 3 rd , 2014 20

  21. Results: Min. Channel Width # tracks cks propose sed/cl classi ssic 160,00 4,5 4 140,00 3,5 120,00 3 100,00 2,5 min W (classic) 80,00 2 min W (enhanced) 60,00 1,5 min W (ratio) 40,00 1 20,00 0,5 0,00 0 § 1.8X channel width increase on average § Need for specific routing algorithms to deal with the heterogeneous interconnection network C. Huriaux, O. Sentieys and R. Tessier - 21 September 3 rd , 2014 21

  22. Conclusion § FPGA embedded in a 3D architecture § More flexibility for task placement and/or relocation § Low impact on delay but cost on routing resources § Need to find a trade-off between flexibility and area increase of additional connections C. Huriaux, O. Sentieys and R. Tessier - 22 September 3 rd , 2014 22

  23. Thank you for your attention More info on FlexTiles: http://www.flextiles.eu C. Huriaux, O. Sentieys and R. Tessier September 3 rd , 2014 - 23 23

  24. Thank you for your attention C. Huriaux, O. Sentieys and R. Tessier September 3 rd , 2014 - 24 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend