On the Sensitivity of FPGA Architectural Conclusions to Experimental - - PowerPoint PPT Presentation
On the Sensitivity of FPGA Architectural Conclusions to Experimental - - PowerPoint PPT Presentation
On the Sensitivity of FPGA Architectural Conclusions to Experimental Assumptions, Tools, and Techniques Andy Yan, Rebecca Cheng, Steve Wilton University of British Columbia Vancouver, B.C. stevew@ece.ubc.ca FPGA Experiments: Impressive
FPGA Experiments:
Impressive improvement in FPGA Technology: 1994: 25,000 Gates was good 2001: 6,000,000 System Gates How did this happen?
- Improvements in process technology
- Improvements in CAD Tools
- Improvements in Architectures
The key behind this: Experimentation
The Danger of Experimentation:
No matter how careful you are:
- You will have to make some assumptions
- You will have to settle on an experimental technique
- You will have to settle on a CAD tool
But what if these assumptions, techniques, & tools impact the conclusions… Can we believe any of these results?
This Talk:
Take a step back and look at some basic experiments:
- What is the best LUT size?
- What is the best switch block topology?
- What is the best cluster size?
- What is the best memory size?
The answers have all been published… But, how sensitive are they to the Assumptions, Tools, and Techniques
Question 1: What is the best LUT Size?
What is the best LUT size?
Intuitively, in terms of area:
- A smaller LUT takes up less chip area
- But more of them area required for a circuit
Intuitively, in terms of delay:
- A smaller LUT is faster
- But the critical path passes through more of them
(and also through the routing!) Published results: 4-6 inputs in each LUT is a good choice
Baseline Experiment:
Optimize (eg. SIS) Technology Map (eg. Flowmap) Place and Route (eg. VPR)
Benchmark Circuits Architectures Area Delay
Technology- Mapping Tool
Optimize (eg. SIS) Technology Map (eg. Flowmap) Place and Route (eg. VPR)
Benchmark Circuits Architectures Area Delay
How Sensitive is this on the Tools?
Sensitivity to Technology Mapper
Flowmap (Baseline) Critical Path Delay (s) * Area (MTE's) LUT Size 3 2 4 5 6 7 0.30 0.25 0.20 0.15 0.10 0.05
Sensitivity to Technology Mapper
Flowmap (Baseline) Cutmap Critical Path Delay (s) * Area (MTE's) LUT Size 3 2 4 5 6 7 0.30 0.25 0.20 0.15 0.10 0.05
Sensitivity to Technology Mapper
Chortle Flowmap (Baseline) Cutmap Critical Path Delay (s) * Area (MTE's) LUT Size 3 2 4 5 6 7 0.30 0.25 0.20 0.15 0.10 0.05
Conclusion depends on technology-mapper
Place and Route Tool
Optimize (eg. SIS) Technology Map (eg. Flowmap) Place and Route (eg. VPR)
Benchmark Circuits Architectures Area Delay
How Sensitive are these results?
Sensitivity to Place and Route Tool:
Normal VPR (Baseline)
Critical Path Delay (s) * Area (MTE's)
0.30 0.20 0.10 0.60 0.50 0.40 0.70 2 3 4 5 6 7 LUT Size
Sensitivity to Place and Route Tool:
Normal VPR (Baseline) Fast
Critical Path Delay (s) * Area (MTE's)
0.30 0.20 0.10 0.60 0.50 0.40 0.70 2 3 4 5 6 7 LUT Size
Sensitivity to Place and Route Tool:
UFP Normal VPR (Baseline) Fast
Critical Path Delay (s) * Area (MTE's)
0.30 0.20 0.10 0.60 0.50 0.40 0.70 2 3 4 5 6 7 LUT Size
Sensitivity to Place and Route Tool:
Routability- Driven UFP Normal VPR (Baseline) Fast
Critical Path Delay (s) * Area (MTE's)
0.30 0.20 0.10 0.60 0.50 0.40 0.70 2 3 4 5 6 7 LUT Size
Optimization
Optimize (eg. SIS) Technology Map (eg. Flowmap) Place and Route (eg. VPR)
Benchmark Circuits Architectures Area Delay
How Sensitive is this on the Tools?
Optimization Scripts:
2 3 4 5 6 7 LUT Size SIS + Flowmap 3.0x106 3.5x106 4.0x106 4.5x106 5.0x106 5.5x106
Area (MTE's)
Optimization Scripts:
2 3 4 5 6 7 LUT Size SIS + Flowmap (SIS + Flowmap)*2 3.0x106 3.5x106 4.0x106 4.5x106 5.0x106 5.5x106
Area (MTE's)
Optimization of circuits is important!
Circuits
Optimize (eg. SIS) Technology Map (eg. Flowmap) Place and Route (eg. VPR)
Benchmark Circuits Architectures Area Delay
How Sensitive is this on the Circuits?
Benchmark Circuits:
MCNC Critical Path Delay (s) * Area (MTE's) LUT Size 3 2 4 5 6 7 0.30 0.20 0.10 0.60 0.50 0.40 Synthesized
MCNC Circuits behave differently than “real” circuits
Quantifying our Results
Want a number that indicates how strongly
- ur conclusions are affected by an experimental
variation Consider an experiment to find best value of an architectural parameter Run 1: Baseline Run 2: Same experiment with one experimental parameter varied Margin = The difference in conclusion between Run 1 and Run 2
Area * Delay Sweep of an Architectural Parameter RUN 1 Best Architecture
Margin : Case 1
Margin : Case 1
Area * Delay Sweep of an Architectural Parameter RUN 1 RUN 2 Best Architecture
X% Y% Margin = | X – Y |
Area * Delay Sweep of an Architectural Parameter Best Architecture Best Architecture RUN 1 RUN 2
Margin: Case 2
X% Y% Margin = MAX( X , Y )
Quantifying the Sensitivity:
Categorize Experimental Variations by their Margin: 0%-2%: Not Sensitive 2%-5%: Slightly Sensitive 5%-10%: Sensitive 10%-100%: Very Sensitive > 100%: Extremely Sensitive We can have area margins, delay margins, and area * delay margins.
Margin Results: Summary
I’ll leave a paper with tabulated results, but here are the variations that had a margin > 5%: Using Chortle instead of Flowmap: 76% Optimize and Tech Map circuits twice: 8.5% Use Routability-Driven Place and Route: 301% Synthesized Circuits rather than MCNC ccts: 11% Multiply Minimum Channel width by 1.1 5.4% Use Fc=0.3 rather than Fc=0.6 5.7% Use Fc=0.4 rather than Fc=0.6 11% Use Fc=0.7 rather than Fc=0.6 5.5% Use Fc=0.8 rather than Fc=0.6 11% Use Segments of Length 1 instead of 4 8.5%
Question 2: What is the best Switch Block Topology?
What is the best Switch Block?
Published Switch Blocks
- Disjoint switch block (Xilinx)
- Universal switch block
- Wilton switch block
- Imran switch block
(combination of Wilton and Disjoint block) Our FPL paper showed the Imran block was good:
- Unlike disjoint, it does not divide routing fabric
into segments
- Unlike Wilton, it does not suffer from extra
transistors in segmented architectures
Sensitivity to Place and Route Tool:
0.1 0.2 0.3 0.4 0.5 VPR (Baseline) Critical Path Delay (s) * Area (MTE's) Disjoint Wilton Universal Imran
Sensitivity to Place and Route Tool:
0.1 0.2 0.3 0.4 0.5 VPR (Baseline) UFP Critical Path Delay (s) * Area (MTE's) Disjoint Wilton Universal Imran
Sensitivity to Place and Route Tool:
0.1 0.2 0.3 0.4 0.5 VPR (Baseline) UFP Routeability- Driven Critical Path Delay (s) * Area (MTE's) Disjoint Wilton Universal Imran
Sensitivity to Place and Route Tool:
0.1 0.2 0.3 0.4 0.5 VPR (Baseline) UFP Routeability- Driven Fast Critical Path Delay (s) * Area (MTE's) Disjoint Wilton Universal Imran
Margin Results: Summary
We did many experiments, but here are the variations that had a margin > 5%: Use Fast Option of VPR: 6.8% Use Routability-Driven Place and Route: 320% Synthesized Circuits rather than MCNC ccts: 7.5% Implement on Double-Sized FPGA: 7.5% Use Segments of Length 1 instead of 4 33% All switches buffered (instead of 50/50): 6.8%
Question 3: How Big should each cluster be?
What is the best Cluster (LAB) size?
Intuitively:
- A larger cluster (LAB) means more local connections
- But a larger cluster is slower and has area overhead
Previous Published Results:
- Between 4 and 10 LUT’s / cluster seem to work well
Sensitivity to Place and Route Tool:
Fast UFP VPR (Baseline) Routability 1 2 3 4 5 6 7 8 9 10 Cluster Size 0.1 0.2 0.3 0.4 0.5 0.6 Critical Path Delay (s) * Area (MTE's)
The Main Message is This:
Experimental results can be significantly influenced by the assumptions, tools, and techniques used in experimentation There are many architecture papers out there:
- Very few really address how sensitive their results
are to the experimental assumptions (at UBC, we are guilty of this too)
- The results in this talk show that they should
Orthogonal Architecture Assumptions
Optimize (eg. SIS) Technology Map (eg. Flowmap) Place and Route (eg. VPR)
Benchmark Circuits Architectures Area Delay
How Sensitive is this on the Architecture?
Orthogonal Architecture Assumptions:
LUT Size 3 2 4 5 6 7
Area (MTE's)
4.5x106 5.0x106 5.5x106 6.0x106 Fc=0.6 (baseline)
Orthogonal Architecture Assumptions:
LUT Size 3 2 4 5 6 7
Area (MTE's)
4.5x106 5.0x106 5.5x106 6.0x106 Fc=1.0 Fc=0.6 (baseline) Fc=0.3
Conclusion does depend on Fc
Sensitivity to Fc:
Cluster Size Critical Path Delay (s) * Area (MTE) 0.06 2 3 4 5 6 7 8 9 10 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 Fc=0.5
Sensitivity to Fc:
Cluster Size Critical Path Delay (s) * Area (MTE) 0.06 2 3 4 5 6 7 8 9 10 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 Fc=0.3 Fc=0.5
Sensitivity to Fc:
Cluster Size Critical Path Delay (s) * Area (MTE) 0.06 2 3 4 5 6 7 8 9 10 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 Fc=0.3 Fc=0.5 Fc=0.7
Sensitivity to Fc:
Cluster Size Critical Path Delay (s) * Area (MTE) 0.06 2 3 4 5 6 7 8 9 10 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 Fc=0.3 Fc=0.5 Fc=0.7 Fc=0.9
Conclusion does depend on Fc
Question 4: What is the best Memory Array Size?
What is the best Memory Array Size?
Focus on one previous study which investigated the best memory size when memories are used to implement logic. Intuitively:
- A larger memory can implement more logic
- A larger memory is slower and larger
Previous Published Results:
- A 2Kbit memory seem to work well
Sensitivity to Packing Tool:
Margin (EMBPACK) = 53%
256 512 1024 2048 4096 Packing Ratio Bits Per Array SMAP (Baseline) SMAP-d 2.0 2.5 3.0 3.5 4.0 1.5 1.0 EMBPACK