FPGA Architecture Design Architectures are usually evaluated using - - PDF document

fpga architecture design
SMART_READER_LITE
LIVE PREVIEW

FPGA Architecture Design Architectures are usually evaluated using - - PDF document

Modeling Post-Techmapping and Post-Clustering FPGA Circuit Depth Joydip Das 1 , Steven J.E. Wilton 1 , Philip Leong 2 , Wayne Luk 3 1 The University of British Columbia, 2 The Chinese University of Hong Kong, 3 Imperial College London Funded by


slide-1
SLIDE 1

Modeling Post-Techmapping and Post-Clustering FPGA Circuit Depth

Funded by Altera and NSERC

Joydip Das1, Steven J.E. Wilton1, Philip Leong2, Wayne Luk3

1The University of British Columbia, 2The Chinese University of Hong Kong, 3Imperial College London

2

FPGA Architecture Design

Architectures are usually evaluated using experimental methods

  • Using tools like VPR

Problems:

  • Multi-dimensional optimization space – too much time
  • Require CAD tools for each architecture
  • Or “tuning” of a generic tool like VPR
  • No insight into what makes a good architecture
slide-2
SLIDE 2

3

Can we supplement the experimental approach with analytical techniques? This talk: A model that makes it possible

4

This Talk

  • 1. Motivation: Speeding up architecture design
  • 2. The model:
  • What makes a good model
  • What makes it hard
  • Overview
  • 3. Details on Depth/Delay Model
  • 4. Example of our Model’s Application
slide-3
SLIDE 3

5

Accelerating FPGA Architecture Investigation

Early Architecture Evaluation Insight to Guide Experimentation

6

Analytical Model

The key is an analytical model that relates architecture parameters to efficiency of the FPGA:

Delay of FPGA Implementation Depth of Critical Path in 2-LUTs

Lookup-table size, Routing parameters, etc

Area = fA(K,N,Fc....) Delay = fD(K,N,Fc,....) Power = fP(K,N,Fc....)

slide-4
SLIDE 4

7

Challenge: Capturing the “essence” of programmable logic in a set of simple equations

8

What makes a good model?

  • 1. Analytical Model

– No curve-fitting or expensive experimental techniques

  • 2. Balancing Complexity and Accuracy

– Simpler equations provide significantly more insight into architectural tradeoffs

  • 3. Architectural Relationships

– Should be as independent of user circuit as possible

slide-5
SLIDE 5

9

Estimation vs. Modeling

Estimation: What is the performance or density of a given user circuit on an FPGA?

  • Useful in CAD tools to predict long paths, congested

regions, etc. Modeling: On average, how does an architecture parameter affect the expected speed or density of an FPGA?

  • Can we answer this independent of the user circuit?

10

What makes it hard:

  • Many parameters that interact in complex ways

How we make it feasible:

  • Break the model into stages, analogous to CAD flow
slide-6
SLIDE 6

11

Breaking it up: Delay

Depth of c.p. in 2-LUTs Depth of c.p. In k-LUTs # Inter-cluster connections

  • n c.p.

# Intra-cluster connections

  • n c.p.

Post-Routed Wirelength along c.p. Average Post- Placement Wirelength Delay Post-Placement Wirelength along c.p.

K, p

12

Depth of c.p. in 2-LUTs Depth of c.p. In k-LUTs # Inter-cluster connections

  • n c.p.

# Intra-cluster connections

  • n c.p.

Post-Routed Wirelength along c.p. Average Post- Placement Wirelength Delay Post-Placement Wirelength along c.p.

K, p N, I, p

Breaking it up: Delay

slide-7
SLIDE 7

13

Depth of c.p. in 2-LUTs Depth of c.p. In k-LUTs # Inter-cluster connections

  • n c.p.

# Intra-cluster connections

  • n c.p.

Post-Routed Wirelength along c.p. Average Post- Placement Wirelength Delay Post-Placement Wirelength along c.p.

K, p N, I, p

Breaking it up: Delay

14

Depth of c.p. in 2-LUTs Depth of c.p. In k-LUTs # Inter-cluster connections

  • n c.p.

# Intra-cluster connections

  • n c.p.

Post-Routed Wirelength along c.p. Average Post- Placement Wirelength Delay Post-Placement Wirelength along c.p.

K, p N, I, p Fc, Fs, etc

Breaking it up: Delay

slide-8
SLIDE 8

15

Depth of c.p. in 2-LUTs Depth of c.p. In k-LUTs # Inter-cluster connections

  • n c.p.

# Intra-cluster connections

  • n c.p.

Post-Routed Wirelength along c.p. Average Post- Placement Wirelength Delay Post-Placement Wirelength along c.p.

K, p N, I, p Fc, Fs, etc All arch. params

Breaking it up: Delay

16

Depth of c.p. in 2-LUTs Depth of c.p. In k-LUTs # Inter-cluster connections

  • n c.p.

# Intra-cluster connections

  • n c.p.

Post-Routed Wirelength along c.p. Average Post- Placement Wirelength Delay Post-Placement Wirelength along c.p.

Each part is simple, but together, they relate delay-efficiency to architectural parameters

K, p N, I, p Fc, Fs, etc All arch. params

Breaking it up: Delay

slide-9
SLIDE 9

17

Depth of c.p. in 2-LUTs Depth of c.p. In k-LUTs # Inter-cluster connections

  • n c.p.

# Intra-cluster connections

  • n c.p.

Post-Routed Wirelength along c.p. Average Post- Placement Wirelength Delay Post-Placement Wirelength along c.p.

Tech Mapping Clustering Routing Physical Models

This Paper:

18

Depth of c.p. in 2-LUTs Depth of c.p. In k-LUTs # Inter-cluster connections

  • n c.p.

# Intra-cluster connections

  • n c.p.

Post-Routed Wirelength along c.p. Average Post- Placement Wirelength Delay Post-Placement Wirelength along c.p.

K, p

Technology Mapping Model

slide-10
SLIDE 10

19

Review: Technology Mapping

Most algorithms give implementations of minimum depth Map logic gates to lookup-tables:

20

Modeling Technology Mapping:

Intuitively, Bigger LUT Size Means Smaller Depth

2-LUT / Depth=4 4-LUT / Depth=2 Circuit

So, Larger LUT Size Fewer Nets Lower Depth

slide-11
SLIDE 11

21

Mapping with K=4 : Two Extremes

Depth = (K – 1) = 3 [Maximum Possible] Depth = log2(K) = 2 [Minimum Possible]

Simple Approach: Take the average

22

Technology Mapping Model ) ( log 1 2

2 2

γ γ − + − − = K K d d k

Depth after Techmapping Depth before Techmapping LUT Size Average Unused Inputs in Each LUT

slide-12
SLIDE 12

23

Techmapping Model : Validation

Over-estimation

24

Depth of c.p. in 2-LUTs Depth of c.p. In k-LUTs # Inter-cluster connections

  • n c.p.

# Intra-cluster connections

  • n c.p.

Post-Routed Wirelength along c.p. Average Post- Placement Wirelength Delay Post-Placement Wirelength along c.p.

N, I, p

Clustering Model

slide-13
SLIDE 13

Review: Clustering / Packing

25

FPGA logic blocks usually contain several LUTs: Altera: LABs Xilinx: CLBs Goal of Clustering Algorithms: Group LUTs into LAB-sized clusters

  • Connections between LUTs within a cluster are fast

26

Clustering does not eliminate nets:

  • It just makes some nets local (intra-cluster)

and some global (inter-cluster) Intuitively: the larger the cluster size, the more nets are made local. Goal: derive an equation for the proportion of nets along the critical path that are made local

Clustering Model

slide-14
SLIDE 14

27

Sketch of derivation:

  • 1. Some nets are made local “on purpose”
  • 2. Some nets are made local “by chance”
  • These are not nets that are specifically targeted

by the cluster algorithm Work out proportion of each and combine them

Clustering Model

28

Connections on Critical Path – Primary Goal

LUT-1 LUT-1 LUT-3 LUT-2 LUT-4 LUT-5

Cluster Size, c = 5

LUT-2 LUT-3 LUT-4 LUT-5 LUT-6 LUT-7

slide-15
SLIDE 15

29

Connections on Critical Path – Primary Goal

LUT-1 LUT-1 LUT-3 LUT-2 LUT-4 LUT-5

Cluster Size, c = 5

LUT-2 LUT-3 LUT-4 LUT-5 LUT-6 LUT-7

) 1 ( / − = = c Local Absorbed c Size Cluster

30

Connections Absorbed: Not on Critical Path – by chance

LUT-1 LUT-1 LUT-3 LUT-2 LUT-4 LUT-5

Cluster Size, c = 5

LUT-2 LUT-3 LUT-4 LUT-5 LUT-6 LUT-7

slide-16
SLIDE 16

31

Connections Absorbed: Not on Critical Path – by chance

LUT-1 LUT-1 LUT-3 LUT-2 LUT-4 LUT-5

Cluster Size, c = 5

LUT-2 LUT-3 LUT-4 LUT-5 LUT-6 LUT-7

[ ]

1 ) ( : + − − c K c k n c chance by Absorbed γ

Details of derivation can be found in our paper

32

Which leads to

[ ]

c k k k c

n n c where K c c K c n c c d d = ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ − + − − + − = , ) ( 1 ) ( ) 1 ( γ γ

Clustering Model:

Details of derivation can be found in our paper

slide-17
SLIDE 17

33

Clustering Model : Validation

LUT Size, K=4 LUT Size, K=6

34

Is our Model Actually Useful?

slide-18
SLIDE 18

35

We considered two flows: The “shape” of the results is more important than the actual values

Example of Model’s Application

36

Critical Path Delay from Analytical Model

Analytical Flow

slide-19
SLIDE 19

37

Intra-Cluster & Inter-Cluster Delay:

Intra-Cluster Delay (t_intra) Inter-Cluster Delay (t_inter)

38

Critical Path Delay from Analytical Model

Analytical Flow

slide-20
SLIDE 20

39

Important caveat: We do not yet have a model for delay routing For now, we use experimental results for this part

Depth of c.p. in 2-LUTs Depth of c.p. In k-LUTs # Inter-cluster connections

  • n c.p.

# Intra-cluster connections

  • n c.p.

Post-Routed Wirelength along c.p. Average Post- Placement Wirelength Delay Post-Placement Wirelength along c.p.

40

Overall Results: Delay

slide-21
SLIDE 21

41

Same Conclusion in both cases: K=4.

Overall Results: Delay

42

Key Result: It is possible to describe an FPGA architectures using a set of simple equations This Talk:

  • Analytical model for techmapped & clustered depth
  • Example of model’s application for early stage

architecture evaluation Ongoing Works:

  • Analytical model for post-routing delay
  • Investigation of "Discrete Effects"
  • Analytical model for the whole design flow

Summary