On the Sensitivity of FPGA Architectural Conclusions to Experimental - - PowerPoint PPT Presentation

on the sensitivity of fpga architectural conclusions to
SMART_READER_LITE
LIVE PREVIEW

On the Sensitivity of FPGA Architectural Conclusions to Experimental - - PowerPoint PPT Presentation

On the Sensitivity of FPGA Architectural Conclusions to Experimental Assumptions, Tools, and Techniques Andy Yan, Rebecca Cheng, Steve Wilton University of British Columbia Vancouver, B.C. stevew@ece.ubc.ca FPGA Experiments: Impressive


slide-1
SLIDE 1

On the Sensitivity of FPGA Architectural Conclusions to Experimental Assumptions, Tools, and Techniques

Andy Yan, Rebecca Cheng, Steve Wilton University of British Columbia Vancouver, B.C. stevew@ece.ubc.ca

slide-2
SLIDE 2

FPGA Experiments:

Impressive improvement in FPGA Technology: 1994: 25,000 Gates was good 2001: 6,000,000 System Gates How did this happen?

  • Improvements in process technology
  • Improvements in CAD Tools
  • Improvements in Architectures

The key behind this: Experimentation

slide-3
SLIDE 3

The Danger of Experimentation:

No matter how careful you are:

  • You will have to make some assumptions
  • You will have to settle on an experimental technique
  • You will have to settle on a CAD tool

But what if these assumptions, techniques, & tools impact the conclusions… Can we believe any of these results?

slide-4
SLIDE 4

This Talk:

Take a step back and look at some basic experiments:

  • What is the best LUT size?
  • What is the best switch block topology?
  • What is the best cluster size?
  • What is the best memory size?

The answers have all been published… But, how sensitive are they to the Assumptions, Tools, and Techniques

slide-5
SLIDE 5

Question 1: What is the best LUT Size?

slide-6
SLIDE 6

What is the best LUT size?

Intuitively, in terms of area:

  • A smaller LUT takes up less chip area
  • But more of them area required for a circuit

Intuitively, in terms of delay:

  • A smaller LUT is faster
  • But the critical path passes through more of them

(and also through the routing!) Published results: 4-6 inputs in each LUT is a good choice

slide-7
SLIDE 7

Baseline Experiment:

Optimize (eg. SIS) Technology Map (eg. Flowmap) Place and Route (eg. VPR)

Benchmark Circuits Architectures Area Delay

slide-8
SLIDE 8

Technology- Mapping Tool

Optimize (eg. SIS) Technology Map (eg. Flowmap) Place and Route (eg. VPR)

Benchmark Circuits Architectures Area Delay

How Sensitive is this on the Tools?

slide-9
SLIDE 9

Sensitivity to Technology Mapper

Flowmap (Baseline) Critical Path Delay (s) * Area (MTE's) LUT Size 3 2 4 5 6 7 0.30 0.25 0.20 0.15 0.10 0.05

slide-10
SLIDE 10

Sensitivity to Technology Mapper

Flowmap (Baseline) Cutmap Critical Path Delay (s) * Area (MTE's) LUT Size 3 2 4 5 6 7 0.30 0.25 0.20 0.15 0.10 0.05

slide-11
SLIDE 11

Sensitivity to Technology Mapper

Chortle Flowmap (Baseline) Cutmap Critical Path Delay (s) * Area (MTE's) LUT Size 3 2 4 5 6 7 0.30 0.25 0.20 0.15 0.10 0.05

Conclusion depends on technology-mapper

slide-12
SLIDE 12

Place and Route Tool

Optimize (eg. SIS) Technology Map (eg. Flowmap) Place and Route (eg. VPR)

Benchmark Circuits Architectures Area Delay

How Sensitive are these results?

slide-13
SLIDE 13

Sensitivity to Place and Route Tool:

Normal VPR (Baseline)

Critical Path Delay (s) * Area (MTE's)

0.30 0.20 0.10 0.60 0.50 0.40 0.70 2 3 4 5 6 7 LUT Size

slide-14
SLIDE 14

Sensitivity to Place and Route Tool:

Normal VPR (Baseline) Fast

Critical Path Delay (s) * Area (MTE's)

0.30 0.20 0.10 0.60 0.50 0.40 0.70 2 3 4 5 6 7 LUT Size

slide-15
SLIDE 15

Sensitivity to Place and Route Tool:

UFP Normal VPR (Baseline) Fast

Critical Path Delay (s) * Area (MTE's)

0.30 0.20 0.10 0.60 0.50 0.40 0.70 2 3 4 5 6 7 LUT Size

slide-16
SLIDE 16

Sensitivity to Place and Route Tool:

Routability- Driven UFP Normal VPR (Baseline) Fast

Critical Path Delay (s) * Area (MTE's)

0.30 0.20 0.10 0.60 0.50 0.40 0.70 2 3 4 5 6 7 LUT Size

slide-17
SLIDE 17

Optimization

Optimize (eg. SIS) Technology Map (eg. Flowmap) Place and Route (eg. VPR)

Benchmark Circuits Architectures Area Delay

How Sensitive is this on the Tools?

slide-18
SLIDE 18

Optimization Scripts:

2 3 4 5 6 7 LUT Size SIS + Flowmap 3.0x106 3.5x106 4.0x106 4.5x106 5.0x106 5.5x106

Area (MTE's)

slide-19
SLIDE 19

Optimization Scripts:

2 3 4 5 6 7 LUT Size SIS + Flowmap (SIS + Flowmap)*2 3.0x106 3.5x106 4.0x106 4.5x106 5.0x106 5.5x106

Area (MTE's)

Optimization of circuits is important!

slide-20
SLIDE 20

Circuits

Optimize (eg. SIS) Technology Map (eg. Flowmap) Place and Route (eg. VPR)

Benchmark Circuits Architectures Area Delay

How Sensitive is this on the Circuits?

slide-21
SLIDE 21

Benchmark Circuits:

MCNC Critical Path Delay (s) * Area (MTE's) LUT Size 3 2 4 5 6 7 0.30 0.20 0.10 0.60 0.50 0.40 Synthesized

MCNC Circuits behave differently than “real” circuits

slide-22
SLIDE 22

Quantifying our Results

Want a number that indicates how strongly

  • ur conclusions are affected by an experimental

variation Consider an experiment to find best value of an architectural parameter Run 1: Baseline Run 2: Same experiment with one experimental parameter varied Margin = The difference in conclusion between Run 1 and Run 2

slide-23
SLIDE 23

Area * Delay Sweep of an Architectural Parameter RUN 1 Best Architecture

Margin : Case 1

slide-24
SLIDE 24

Margin : Case 1

Area * Delay Sweep of an Architectural Parameter RUN 1 RUN 2 Best Architecture

X% Y% Margin = | X – Y |

slide-25
SLIDE 25

Area * Delay Sweep of an Architectural Parameter Best Architecture Best Architecture RUN 1 RUN 2

Margin: Case 2

X% Y% Margin = MAX( X , Y )

slide-26
SLIDE 26

Quantifying the Sensitivity:

Categorize Experimental Variations by their Margin: 0%-2%: Not Sensitive 2%-5%: Slightly Sensitive 5%-10%: Sensitive 10%-100%: Very Sensitive > 100%: Extremely Sensitive We can have area margins, delay margins, and area * delay margins.

slide-27
SLIDE 27

Margin Results: Summary

I’ll leave a paper with tabulated results, but here are the variations that had a margin > 5%: Using Chortle instead of Flowmap: 76% Optimize and Tech Map circuits twice: 8.5% Use Routability-Driven Place and Route: 301% Synthesized Circuits rather than MCNC ccts: 11% Multiply Minimum Channel width by 1.1 5.4% Use Fc=0.3 rather than Fc=0.6 5.7% Use Fc=0.4 rather than Fc=0.6 11% Use Fc=0.7 rather than Fc=0.6 5.5% Use Fc=0.8 rather than Fc=0.6 11% Use Segments of Length 1 instead of 4 8.5%

slide-28
SLIDE 28

Question 2: What is the best Switch Block Topology?

slide-29
SLIDE 29

What is the best Switch Block?

Published Switch Blocks

  • Disjoint switch block (Xilinx)
  • Universal switch block
  • Wilton switch block
  • Imran switch block

(combination of Wilton and Disjoint block) Our FPL paper showed the Imran block was good:

  • Unlike disjoint, it does not divide routing fabric

into segments

  • Unlike Wilton, it does not suffer from extra

transistors in segmented architectures

slide-30
SLIDE 30

Sensitivity to Place and Route Tool:

0.1 0.2 0.3 0.4 0.5 VPR (Baseline) Critical Path Delay (s) * Area (MTE's) Disjoint Wilton Universal Imran

slide-31
SLIDE 31

Sensitivity to Place and Route Tool:

0.1 0.2 0.3 0.4 0.5 VPR (Baseline) UFP Critical Path Delay (s) * Area (MTE's) Disjoint Wilton Universal Imran

slide-32
SLIDE 32

Sensitivity to Place and Route Tool:

0.1 0.2 0.3 0.4 0.5 VPR (Baseline) UFP Routeability- Driven Critical Path Delay (s) * Area (MTE's) Disjoint Wilton Universal Imran

slide-33
SLIDE 33

Sensitivity to Place and Route Tool:

0.1 0.2 0.3 0.4 0.5 VPR (Baseline) UFP Routeability- Driven Fast Critical Path Delay (s) * Area (MTE's) Disjoint Wilton Universal Imran

slide-34
SLIDE 34

Margin Results: Summary

We did many experiments, but here are the variations that had a margin > 5%: Use Fast Option of VPR: 6.8% Use Routability-Driven Place and Route: 320% Synthesized Circuits rather than MCNC ccts: 7.5% Implement on Double-Sized FPGA: 7.5% Use Segments of Length 1 instead of 4 33% All switches buffered (instead of 50/50): 6.8%

slide-35
SLIDE 35

Question 3: How Big should each cluster be?

slide-36
SLIDE 36

What is the best Cluster (LAB) size?

Intuitively:

  • A larger cluster (LAB) means more local connections
  • But a larger cluster is slower and has area overhead

Previous Published Results:

  • Between 4 and 10 LUT’s / cluster seem to work well
slide-37
SLIDE 37

Sensitivity to Place and Route Tool:

Fast UFP VPR (Baseline) Routability 1 2 3 4 5 6 7 8 9 10 Cluster Size 0.1 0.2 0.3 0.4 0.5 0.6 Critical Path Delay (s) * Area (MTE's)

slide-38
SLIDE 38

The Main Message is This:

Experimental results can be significantly influenced by the assumptions, tools, and techniques used in experimentation There are many architecture papers out there:

  • Very few really address how sensitive their results

are to the experimental assumptions (at UBC, we are guilty of this too)

  • The results in this talk show that they should
slide-39
SLIDE 39

Orthogonal Architecture Assumptions

Optimize (eg. SIS) Technology Map (eg. Flowmap) Place and Route (eg. VPR)

Benchmark Circuits Architectures Area Delay

How Sensitive is this on the Architecture?

slide-40
SLIDE 40

Orthogonal Architecture Assumptions:

LUT Size 3 2 4 5 6 7

Area (MTE's)

4.5x106 5.0x106 5.5x106 6.0x106 Fc=0.6 (baseline)

slide-41
SLIDE 41

Orthogonal Architecture Assumptions:

LUT Size 3 2 4 5 6 7

Area (MTE's)

4.5x106 5.0x106 5.5x106 6.0x106 Fc=1.0 Fc=0.6 (baseline) Fc=0.3

Conclusion does depend on Fc

slide-42
SLIDE 42

Sensitivity to Fc:

Cluster Size Critical Path Delay (s) * Area (MTE) 0.06 2 3 4 5 6 7 8 9 10 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 Fc=0.5

slide-43
SLIDE 43

Sensitivity to Fc:

Cluster Size Critical Path Delay (s) * Area (MTE) 0.06 2 3 4 5 6 7 8 9 10 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 Fc=0.3 Fc=0.5

slide-44
SLIDE 44

Sensitivity to Fc:

Cluster Size Critical Path Delay (s) * Area (MTE) 0.06 2 3 4 5 6 7 8 9 10 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 Fc=0.3 Fc=0.5 Fc=0.7

slide-45
SLIDE 45

Sensitivity to Fc:

Cluster Size Critical Path Delay (s) * Area (MTE) 0.06 2 3 4 5 6 7 8 9 10 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 Fc=0.3 Fc=0.5 Fc=0.7 Fc=0.9

Conclusion does depend on Fc

slide-46
SLIDE 46

Question 4: What is the best Memory Array Size?

slide-47
SLIDE 47

What is the best Memory Array Size?

Focus on one previous study which investigated the best memory size when memories are used to implement logic. Intuitively:

  • A larger memory can implement more logic
  • A larger memory is slower and larger

Previous Published Results:

  • A 2Kbit memory seem to work well
slide-48
SLIDE 48

Sensitivity to Packing Tool:

Margin (EMBPACK) = 53%

256 512 1024 2048 4096 Packing Ratio Bits Per Array SMAP (Baseline) SMAP-d 2.0 2.5 3.0 3.5 4.0 1.5 1.0 EMBPACK

slide-49
SLIDE 49

Sensitivity to Technology-Mapping Tool:

Sensitivity = 17%

256 512 1024 2048 4096 Packing Ratio Bits Per Array Flowmap (Baseline) 2 4 6 8 10 12 14 Chortle