fpga architecture design
play

FPGA Architecture Design Architectures are usually evaluated using - PDF document

Modeling Post-Techmapping and Post-Clustering FPGA Circuit Depth Joydip Das 1 , Steven J.E. Wilton 1 , Philip Leong 2 , Wayne Luk 3 1 The University of British Columbia, 2 The Chinese University of Hong Kong, 3 Imperial College London Funded by


  1. Modeling Post-Techmapping and Post-Clustering FPGA Circuit Depth Joydip Das 1 , Steven J.E. Wilton 1 , Philip Leong 2 , Wayne Luk 3 1 The University of British Columbia, 2 The Chinese University of Hong Kong, 3 Imperial College London � Funded by Altera and NSERC FPGA Architecture Design Architectures are usually evaluated using experimental methods - Using tools like VPR Problems: - Multi-dimensional optimization space – too much time - Require CAD tools for each architecture - Or “tuning” of a generic tool like VPR - No insight into what makes a good architecture 2

  2. Can we supplement the experimental approach with analytical techniques? This talk: A model that makes it possible 3 This Talk 1. Motivation: Speeding up architecture design 2. The model: - What makes a good model - What makes it hard - Overview 3. Details on Depth/Delay Model 4. Example of our Model’s Application 4

  3. Accelerating FPGA Architecture Investigation Early Architecture Evaluation Insight to Guide Experimentation 5 Analytical Model The key is an analytical model that relates architecture parameters to efficiency of the FPGA: Lookup-table size, Routing parameters, etc Area = f A ( K , N , F c ....) Delay = f D ( K , N , F c ,....) Power = f P ( K , N , F c ....) Delay of FPGA Implementation Depth of Critical Path in 2-LUTs 6

  4. Challenge: Capturing the “essence” of programmable logic in a set of simple equations 7 What makes a good model? 1. Analytical Model – No curve-fitting or expensive experimental techniques 2. Balancing Complexity and Accuracy – Simpler equations provide significantly more insight into architectural tradeoffs 3. Architectural Relationships – Should be as independent of user circuit as possible 8

  5. Estimation vs. Modeling Estimation: What is the performance or density of a given user circuit on an FPGA? - Useful in CAD tools to predict long paths, congested regions, etc. Modeling: On average , how does an architecture parameter affect the expected speed or density of an FPGA? - Can we answer this independent of the user circuit? 9 What makes it hard: - Many parameters that interact in complex ways How we make it feasible: - Break the model into stages, analogous to CAD flow 10

  6. Breaking it up: Delay K, p Average Post- Depth of c.p. Placement In k-LUTs Depth of c.p. Wirelength in 2-LUTs # Intra-cluster # Inter-cluster connections connections on c.p. on c.p. Delay Post-Routed Post-Placement Wirelength Wirelength along along c.p. c.p. 11 Breaking it up: Delay K, p N, I, p Average Post- Depth of c.p. Placement In k-LUTs Depth of c.p. Wirelength in 2-LUTs # Intra-cluster # Inter-cluster connections connections on c.p. on c.p. Delay Post-Routed Post-Placement Wirelength Wirelength along along c.p. c.p. 12

  7. Breaking it up: Delay K, p N, I, p Average Post- Depth of c.p. Placement In k-LUTs Depth of c.p. Wirelength in 2-LUTs # Intra-cluster # Inter-cluster connections connections on c.p. on c.p. Delay Post-Routed Post-Placement Wirelength Wirelength along along c.p. c.p. 13 Breaking it up: Delay K, p N, I, p Average Post- Depth of c.p. Placement In k-LUTs Depth of c.p. Wirelength in 2-LUTs # Intra-cluster # Inter-cluster connections connections on c.p. on c.p. Delay Post-Routed Post-Placement Wirelength Wirelength along along c.p. c.p. Fc, Fs, etc 14

  8. Breaking it up: Delay K, p N, I, p Average Post- Depth of c.p. Placement In k-LUTs Depth of c.p. Wirelength in 2-LUTs # Intra-cluster # Inter-cluster connections connections on c.p. on c.p. Delay Post-Routed Post-Placement Wirelength Wirelength along along c.p. c.p. All arch. params Fc, Fs, etc 15 Breaking it up: Delay K, p N, I, p Average Post- Depth of c.p. Placement In k-LUTs Depth of c.p. Wirelength in 2-LUTs # Intra-cluster # Inter-cluster connections connections on c.p. on c.p. Delay Post-Routed Post-Placement Wirelength Wirelength along along c.p. c.p. All arch. params Fc, Fs, etc Each part is simple, but together, they relate delay-efficiency to architectural parameters 16

  9. This Paper: Clustering Tech Mapping Average Post- Depth of c.p. Placement In k-LUTs Depth of c.p. Wirelength in 2-LUTs # Intra-cluster # Inter-cluster connections connections on c.p. on c.p. Delay Post-Routed Post-Placement Wirelength Wirelength along along c.p. c.p. Routing Physical Models 17 Technology Mapping Model K, p Average Post- Depth of c.p. Placement In k-LUTs Depth of c.p. Wirelength in 2-LUTs # Intra-cluster # Inter-cluster connections connections on c.p. on c.p. Delay Post-Routed Post-Placement Wirelength Wirelength along along c.p. c.p. 18

  10. Review: Technology Mapping Map logic gates to lookup-tables: Most algorithms give implementations of minimum depth 19 Modeling Technology Mapping: Intuitively, Bigger LUT Size Means Smaller Depth Circuit 4-LUT / Depth=2 2-LUT / Depth=4 So, Larger LUT Size � Fewer Nets � Lower Depth 20

  11. Mapping with K =4 : Two Extremes Depth = (K – 1) = 3 Depth = log 2 (K) = 2 [Maximum Possible] [Minimum Possible] Simple Approach: Take the average 21 Technology Mapping Model Depth after Techmapping d k 2 = − − γ + − γ d K 1 log ( K ) 2 2 Average Unused Inputs Depth before LUT Size in Each LUT Techmapping 22

  12. Techmapping Model : Validation Over-estimation 23 Clustering Model N, I, p Average Post- Depth of c.p. Placement In k-LUTs Depth of c.p. Wirelength in 2-LUTs # Intra-cluster # Inter-cluster connections connections on c.p. on c.p. Delay Post-Routed Post-Placement Wirelength Wirelength along along c.p. c.p. 24

  13. Review: Clustering / Packing FPGA logic blocks usually contain several LUTs: Altera: LABs Xilinx: CLBs Goal of Clustering Algorithms: Group LUTs into LAB-sized clusters - Connections between LUTs within a cluster are fast 25 Clustering Model Clustering does not eliminate nets: - It just makes some nets local (intra-cluster) and some global (inter-cluster) Intuitively: the larger the cluster size, the more nets are made local. Goal: derive an equation for the proportion of nets along the critical path that are made local 26

  14. Clustering Model Sketch of derivation: 1. Some nets are made local “on purpose” 2. Some nets are made local “by chance” - These are not nets that are specifically targeted by the cluster algorithm Work out proportion of each and combine them 27 Connections on Critical Path – Primary Goal LUT-1 LUT-1 LUT-2 LUT-2 LUT-3 LUT-3 LUT-4 LUT-4 LUT-5 LUT-6 LUT-5 LUT-7 Cluster Size, c = 5 28

  15. Connections on Critical Path – Primary Goal LUT-1 LUT-1 LUT-2 = Cluster Size c LUT-2 = − LUT-3 Absorbed / Local ( c 1 ) LUT-3 LUT-4 LUT-4 LUT-5 LUT-6 LUT-5 LUT-7 Cluster Size, c = 5 29 Connections Absorbed: Not on Critical Path – by chance LUT-1 LUT-1 LUT-2 LUT-2 LUT-3 LUT-3 LUT-4 LUT-4 LUT-5 LUT-6 LUT-5 LUT-7 Cluster Size, c = 5 30

  16. Connections Absorbed: Not on Critical Path – by chance LUT-1 LUT-1 Absorbed by chance : LUT-2 LUT-2 [ ] c LUT-3 − γ − + c ( K ) c 1 LUT-3 n LUT-4 k LUT-4 LUT-5 LUT-6 LUT-5 Details of derivation can be found in our paper LUT-7 Cluster Size, c = 5 31 Clustering Model: Which leads to ⎡ ⎤ [ ] c − + − γ − + ( c 1 ) c ( K ) c 1 ⎢ ⎥ d n n ⎢ ⎥ = = c k k , where c − γ ⎢ ⎥ d c ( K ) n k c ⎢ ⎥ ⎣ ⎦ Details of derivation can be found in our paper 32

  17. Clustering Model : Validation LUT Size, K=4 LUT Size, K=6 33 Is our Model Actually Useful? 34

  18. Example of Model’s Application We considered two flows: The “shape” of the results is more important than the actual values 35 Critical Path Delay from Analytical Model Analytical Flow 36

  19. Intra-Cluster & Inter-Cluster Delay: Intra-Cluster Delay (t_intra) Inter-Cluster Delay (t_inter) 37 Critical Path Delay from Analytical Model Analytical Flow 38

  20. Important caveat: We do not yet have a model for delay routing Average Post- Depth of c.p. Placement In k-LUTs Depth of c.p. Wirelength in 2-LUTs # Intra-cluster # Inter-cluster connections connections on c.p. on c.p. Delay Post-Routed Post-Placement Wirelength Wirelength along along c.p. c.p. For now, we use experimental results for this part 39 Overall Results: Delay 40

  21. Overall Results: Delay Same Conclusion in both cases: K=4. 41 Summary Key Result: It is possible to describe an FPGA architectures using a set of simple equations This Talk: � Analytical model for techmapped & clustered depth � Example of model’s application for early stage architecture evaluation Ongoing Works: � Analytical model for post-routing delay � Investigation of "Discrete Effects" � Analytical model for the whole design flow 42

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend