Reconfigurable Computing Reconfigurable Computing Partitioning - - PowerPoint PPT Presentation

reconfigurable computing reconfigurable computing
SMART_READER_LITE
LIVE PREVIEW

Reconfigurable Computing Reconfigurable Computing Partitioning - - PowerPoint PPT Presentation

Reconfigurable Computing Reconfigurable Computing Partitioning Partitioning Chapter 5 Chapter 5 Prof. Dr.- -Ing. Jrgen Teich Ing. Jrgen Teich Prof. Dr. Lehrstuhl fr Hardware- -Software Software- -Co Co- -Design Design


slide-1
SLIDE 1

Reconfigurable Computing

Reconfigurable Computing Reconfigurable Computing Partitioning Partitioning Chapter 5 Chapter 5

  • Prof. Dr.
  • Prof. Dr.-
  • Ing. Jürgen Teich
  • Ing. Jürgen Teich

Lehrstuhl für Hardware Lehrstuhl für Hardware-

  • Software

Software-

  • Co

Co-

  • Design

Design

slide-2
SLIDE 2

Reconfigurable Computing

2

Partitioning Partitioning -

  • Motivation

Motivation

  • A design implementation is often too big to allow an

implementation on a single FPGA.

  • Possible solutions are:

Spatial partitioning: The design is partitioned into many

  • FPGAs. Each partition block is implemented in one

single FPGA. All the FPGAs are used simultaneously. Temporal partitioning: The design is partitioned into blocks, each of which will be executed on one FPGA at a given time.

  • We will give a short overview on spatial partitioning in the

first part of the chapter. Temporal partitioning algorithms will be considered in detail in the second part of the chapter.

slide-3
SLIDE 3

Reconfigurable Computing

3

  • Dataflow graph: A dataflow or sequencing graph or task

graph G =(V,E) is a directed acyclic graph in which: each node vi in V represents a task with execution time di An edge e =(u, v) represents a data dependency between the nodes u and v.

  • Scheduling and ordering relation: Given a DFG G =(V,E) with

a precedence relation among the nodes

A schedule is a function s: V → N. A schedule defines for each node, the time at which the node will be executed on the reconfigurable device. – A schedule is feasible iff ∀(u,v) ∈ V: s(u) ≤ s(v) We define an ordering relation ≤ induced by any schedule s as follows: u ≤ v ↔ s(u) ≤ s(v)

Partitioning Partitioning – – definitions definitions

s: V
slide-4
SLIDE 4

Reconfigurable Computing

4

The relation ≤ can be extended to sets as follows: (A ≤ B) ↔ ∀a ∈ A, b ∈ B: either a is not in relation with b or a ≤ b.

  • Partition: Given a DFG G=(V,E) and a set R={R1, R2, ...., Rk}
  • f reconfigurable devices. A partition P of a graph G toward

R is its division into some disjoint subsets P1, P2,,…,Pr : ∀Pi ∃Rj: S(Pi) ≤ S(Rj) ∧ T(Pi) ≤ T(Rj) where S(X) = size of X and T(X) = # terminals of X

  • A partition is called spatial iff (pij=1 iff Pi will be implemented

in Rj) |{ Pi ∈ P: pij = 1}| =1 ∀Rj ∈ R

  • A partition is temporal iff ∃Rj ∈ R: |{Pi ∈ P: pij = 1}| >1
  • If all the devices in R are of the same type, then the partition

is said to be uniform.

  • If |R|=1,we have a single device partition

Partitioning Partitioning – – definitions definitions

s: V
slide-5
SLIDE 5

Reconfigurable Computing

Spatial partitioning Spatial partitioning

5

slide-6
SLIDE 6

Reconfigurable Computing

6

Spatial Spatial partitioning partitioning -

  • Problem

Problem

  • Partitioning Constraints: Each FPGA

is characterized by:

The size, i.e., the number of LUTs, FFs available The terminals, i.e., the number of I/O pins available on the device A partition is valid iff: for a block B produced by the partition, we have:

S(B) <= S(device) where S(X) = size of X T(B) <= T(device) where T(X) = # terminals of X a f d b c e a c b f d e

slide-7
SLIDE 7

Reconfigurable Computing

7

Spatial Spatial partitioning partitioning -

  • Problem

Problem

  • Objectives: The following objectives

are possible:

Minimize the number of cut nets Minimize the number of produced blocks Minimize the delay

  • Difficult problem due to all the

constraints which are not always compatible.

  • Solution approaches:

Use of heuristics for automatic partitioning Manual intervention

a f d b c e a c b f d e

slide-8
SLIDE 8

Reconfigurable Computing

8

Spatial partitioning Spatial partitioning – – Approaches Approaches -

  • Hierarchical

Hierarchical

  • Motivation:

Reduce the problem complexity Keep the global view during partitioning Improve the final result in terms of number

  • f devices.

Respecting the design hierarchy facilitates debugging Performance improvement

  • Approach:

Apply an algorithm for clustering a flat netlist (creates red envelopes) Flatten the hierarchy except created (red) clusters Partition this flat netlist (reduced problem size)

Hierarchical spatial partitioning

B

slide-9
SLIDE 9

Reconfigurable Computing

9

Spatial partitioning Spatial partitioning – – Approaches Approaches -

  • Hierarchical

Hierarchical

  • Removing all non-valid blocks

may produce a big amount of glue logic in the final problem

  • Some non-valid blocks may be

partitioned separately by applying a divide-and-conquer strategy

  • ST quality is used to determine

how good a partition block is: ST = S/T (S=Size, T=Terminal) defines the ratio size/terminal

  • Poor ST-quality: Blocks having

many connections with other hierarchy blocks

Removing hierarchy is preferable

Flattening the hierarchy

Small size, big I/O pin number, poor ST-quality Partitioning

slide-10
SLIDE 10

Reconfigurable Computing

10

Spatial partitioning Spatial partitioning – – Approaches Approaches -

  • Hierarchical

Hierarchical

  • Good ST-quality: Blocks having few

connections with other hierarchy blocks

Splitting is preferable

  • Average ST-quality: calculated recursively

in a bottom-up fashion (for a global view)

  • Device ST-quality: ST(D).

Device filling is good when the ST-quality of the assigned block is larger or equal to the device quality.

Big size, small I/O pin number, good ST-quality Splitting

ST < ST(D) Leaf block Remove Split Split Remove Split Split Remove Split Remove ST >= ST(D) and ST >= average ST ST >= ST(D) and ST < average ST Non leaf block with big amount of glue logic Non leaf block with small amount of glue logic

ST-quality Blocks

slide-11
SLIDE 11

Reconfigurable Computing

11

Spatial partitioning Spatial partitioning – – User intervention User intervention

  • Fully automatic partitioning never

satisfies designers

  • User intervention may lead to more

efficient results

  • A mixture of manual and automatic

strategies istherefore common

  • User intervention:

Assignment of hierarchy blocks to devices Hierarchy modification Manual guidance of the automatic partitioning Invoking automatic partitioning on selected blocks (splitting)

Top B C D F E G A H FPGA 1 FPGA 2 FPGA 3

Top B C D F E G A H Pre-assignment of blocks to FPGAs Top B C F E G A

Flattening

slide-12
SLIDE 12

Reconfigurable Computing

12

Spatial partitioning Spatial partitioning – – User intervention User intervention

Top B C D F E G A H Top B F E G A

Ungrouping

Top B C D F E G A H C

Splitting

Top B C D F E G A H

slide-13
SLIDE 13

Reconfigurable Computing

13

Spatial partitioning Spatial partitioning – – Timing Timing – – Block replication Block replication

Critical path optimization

10 ns 20 ns 30 ns 10 ns 30 ns 70 ns 20 ns

Reducing the number of I/O pins

10 ns 20 ns 30 ns 10 ns 30 ns 70 ns

B1 B3 B2 B1 B3 B2 B2 B2

10 ns 20 ns 30 ns 10 ns 30 ns 50 ns

B1 B1 B2 B3 B2

10 ns 20 ns 30 ns 10 ns 30 ns 50 ns

B1 B3 B2 B2 B1

slide-14
SLIDE 14

Reconfigurable Computing

Temporal partitioning Temporal partitioning

14

slide-15
SLIDE 15

Reconfigurable Computing

15

Temporal partitioning Temporal partitioning – – Problem definition Problem definition

  • Temporal partitioning:

We consider a single device temporal partitioning of a DFG G=(V,E) for a device R A temporal partition can also be defined as an ordered partition of G with the constraints imposed by R. With the ordering relation imposed on the partition, we reduce the solution space to only those partitions which can be scheduled on the device for execution. Therefore, cycles are not allowed in the dataflow graph. Otherwise, the resulting partition may not be schedulable on the device

a f d b c e a c b f d e

cycle

slide-16
SLIDE 16

Reconfigurable Computing

16

Temporal partitioning Temporal partitioning -

  • Problem

Problem

  • Goal:

Computation and scheduling of a Configuration graph

  • In a configuration graph,

Nodes are partitions or bitstreams Edges reflect the precedence in a given DFG The partition blocks communicate by means

  • f inter-configuration registers usually

mapped into the processor address space The configuration sequence is controlled by a host processor On configuration, save register values. This requires a given amount of memory After reconfiguration, copy values back

A configuration graph

P1 P2 P3 P4 P5 Inter-configuration registers

IO Register IO Register IO Register IO Register

Processor

Bus Block

IO Register IO Register IO Register IO Register

FPGA

FPGA register mapping into the address spaces of the processor

slide-17
SLIDE 17

Reconfigurable Computing

17

Temporal partitioning Temporal partitioning -

  • Problem

Problem

  • Objectives:

Minimize the number of interconnections. This is one

  • f the most important objective since it will minimize

the amount of exchanged data the amount of memory for temporally storing the data

Minimize the number of produced blocks Minimize the overal computation delay

  • Quality of the result: Provides a mean to

measure how good an algorithm performs

Connectivity of a graph G=(V,E): con(G) = 2*|E|/(|V|2 - |V|) Quality of Partitioning P = {P1,…,Pn}: Average connectivity over P High (low) quality means algorithm performs well (poor).

4 5 1 2 8 7 9 1 3 6

Quality = 0.2

1 3 4 5 6 2 8 7 9 1

Quality = 0.45

1 2 3 4 5 6 8 7 9 1

Connectivity = 0.24

slide-18
SLIDE 18

Reconfigurable Computing

18

Temporal partitioning vs Scheduling Temporal partitioning vs Scheduling

  • Scheduling: Given is a DFG and an architecture which is a

set of resources

Compute the starting time of each node on a given resource

  • Temporal partitioning: Given is a DFG and a reconfigurable

device

The starting time of each node is the starting time of the partition to which it belongs Compute the starting time of each node on the device

  • Solution approaches:

List scheduling Integer Linear Programming Network Flow Spectral method

slide-19
SLIDE 19

Reconfigurable Computing

19

Unconstrained Scheduling Unconstrained Scheduling

  • ASAP (as soon as possible)

Defines the earliest starting time for each node in the DFG Computes the minimal latency (lower bound)

  • ALAP (as late as possible)

Defines the latest starting time for each node in the DFG according to a given latency

  • The mobility of a node is the difference between the

ALAP-starting time and ASAP-starting time

Mobility is 0 node is on a critical path

slide-20
SLIDE 20

Reconfigurable Computing

20

ASAP ASAP-

  • Example

Example

Unconstrained scheduling with optimal latency: L = 4

Zeit 4

* +

  • <

Zeit 0 Zeit 3 Zeit 4

* * * * * +

  • Time 1

Time 2 Time 3 Zeit 3 Time 4 Time 0

slide-21
SLIDE 21

Reconfigurable Computing

21

ASAP ASAP-

  • Algorithm

Algorithm

ASAP(G(V,E),d) { FOREACH (vi without predecessor)

  • s(vi) := 0;

REPEAT { choose a node vi whose predecessors are all planned; s(vi) := maxj:(vj,vi)∈E {s(vj)+ dj}; } UNTIL (all nodes vi are planned); RETURN s }

slide-22
SLIDE 22

Reconfigurable Computing

22

ALAP ALAP-

  • Example

Example

Unconstrained scheduling with optimal latency: L = 4

* +

  • <

Zeit 1 Zeit 3 Zeit 4

* * * * * +

  • Zeit 4

Time 1 Time 2 Time 3 Time 4 Time 0

slide-23
SLIDE 23

Reconfigurable Computing

23

ALAP ALAP-

  • Algorithm

Algorithm

ALAP(G(V,E),d, L) { FOREACH (vi without successor) s(vi) := L - di; REPEAT { Choose a node vi whose successors are all planned; s(vi) := minj:(vi,vj)∈E {s(vj)} - di; } UNTIL (all nodes vi are planned); RETURN s }

slide-24
SLIDE 24

Reconfigurable Computing

24

Mobility Mobility

* * 1 1

Zeit 0 Zeit 1 Zeit 2 Zeit 3 Zeit 4

* * + < * + *

  • *

*

  • 2

2 2 2 * + + <

Time 1 Time 2 Time 3 Time 4 Time 0

slide-25
SLIDE 25

Reconfigurable Computing

25

Constrained scheduling Constrained scheduling

  • Extended ASAP, ALAP

Compute ASAP or ALAP Assign the tasks earlier (ALAP) or later (ASAP) such that the resource constraints are always fulfilled by construction

  • Listscheduling

A list L of ready to run tasks is created Tasks are placed in L in decreasing priority order At a given step, the task with highest priority is assigned to the free resource. Criteria can be: number of successors, mobility, connectivity, etc.

slide-26
SLIDE 26

Reconfigurable Computing

26

Extended ASAP, ALAP Extended ASAP, ALAP

* +

  • <

* * * * * +

  • 2 Multiplier, 2 ALUs (+, −, <)

Time 0 Time 1 Time 2 Time 3 Time 4

slide-27
SLIDE 27

Reconfigurable Computing

27

Constrained scheduling Constrained scheduling

* +

  • <

* * * * * +

  • 3

3 2 2 1 1 1 1

Criterion: number of successors Resource: 1 multiplier, 1 ALU (+, −, <)

slide-28
SLIDE 28

Reconfigurable Computing

28

Constrained scheduling Constrained scheduling

Time 0 Time 1 Time 2 Time 3 Time 4 Time 5 Time 6 Time 7

* +

  • <

* * * * + *

slide-29
SLIDE 29

Reconfigurable Computing

29

Temporal partitioning vs constrained scheduling Temporal partitioning vs constrained scheduling

  • List Scheduling (LS) for partitioning
  • 1. Construct a list L of all nodes with priorities
  • 2. Create a new empty partition Pact

2.1 Remove a node from the list and place it in the partition 2.2 If size(Pact) <= size(R) and T(Pact) <= T(R) goto 2.1, else goto 2.3 2.3 If empty(list), stop else goto 2.

slide-30
SLIDE 30

Reconfigurable Computing

30

Temporal partitioning vs constrained scheduling Temporal partitioning vs constrained scheduling

* +

  • <

* * * * * +

  • 3

3 2 2 1 1 1 1

Criterion: number of successors size(FPGA) = 250, size (mult) = 100, size(add) = size(sub) = 20, size(comp) = 10

slide-31
SLIDE 31

Reconfigurable Computing

31

P2 P1

Temporal partitioning vs constrained scheduling Temporal partitioning vs constrained scheduling

+ < * * * * P3

  • *
  • *

+ * +

  • <

* * * * * +

  • 3

3 2 2 1 1 1 1

Connectivity: c(P1) = 1/6, c(P2) = 1/3, c(P3) = 2/6 Quality: 5/6

slide-32
SLIDE 32

Reconfigurable Computing

32

P2 P1

Temporal partitioning vs constrained scheduling Temporal partitioning vs constrained scheduling

+ < * * * P3 *

  • *
  • *

+

  • <

* * * * * +

  • 3

3 2 2 1 1 1 1

Connectivity: c(P1) = 2/10, c(P2) = 2/3, c(P3) = 2/3 Quality: 1.2 Connectivity is better

* +

slide-33
SLIDE 33

Reconfigurable Computing

33

Temporal partitioning Temporal partitioning – – List scheduling List scheduling – – list list construction construction

  • ASAP

Place the currently processed node in the list if all its predecessors are already in the list. This corresponds to: Assigning level-number to nodes Scheduling the nodes for execution according to the level number

  • Drawback

“Levelization”: Nodes are assigned to partitions

  • nly on the basis of their level-number

(increase data exchange)

  • Advantage

Fast (linear run-time) Local optimization possible

+ / * * +

  • *
  • /

Level 0 Level 1 Level 2 Level 3

slide-34
SLIDE 34

Reconfigurable Computing

34

Temporal partitioning Temporal partitioning – – List scheduling List scheduling -

  • Improvement

Improvement

  • Local optimization by configuration switching (Bobda)
  • If two consecutive partitions P1 and P2 share a common set
  • f operators, then:

We implement the minimal set of operators needed for the two partitions. We use signals multiplexing to switch from one partition to the next

  • ne.
  • Drawbacks: More resources are needed to implement the

signal switching

  • Advantages:

Reconfiguration time is reduced Device operation is not interrupted

slide-35
SLIDE 35

Reconfigurable Computing

35

Temporal partitioning Temporal partitioning – – List scheduling List scheduling – – config config switching switching

a b c d e j i h

A d d S u b A d d M ul t A d d S u b M ul t

f b c j h

A d d A d d M u lt

g

S u b

a i d f e Configuration 2 Configuration 1 Inter configuration register g

slide-36
SLIDE 36

Reconfigurable Computing

36

Temporal partitioning Temporal partitioning – – List scheduling List scheduling-

  • config

config switching switching

a b c d e j i h

A d d S u b A d d M ul t A d d S u b M ul t

f b c j h

A d d A d d M u lt

g

S u b

a i d f e Configuration 2 Configuration 1 Inter configuration register g

slide-37
SLIDE 37

Reconfigurable Computing

37

Temporal partitioning Temporal partitioning – – List scheduling List scheduling-

  • config

config switching switching

a b c d e j i h

A d d S u b A d d M ul t A d d S u b M ul t

f b c j h

A d d A d d M u lt

g

S u b

a i d f e Configuration 2 Configuration 1 Inter configuration register g

slide-38
SLIDE 38

Reconfigurable Computing

38

Temporal partitioning Temporal partitioning – – List scheduling List scheduling -

  • Improvement

Improvement

  • Improved List Scheduling algorithm
  • 1. Generate the list of nodes node_list
  • 2. Build a first partition P1
  • 3. While (!node_list.empty( ))

4. build a new partition P2 5. If union(P1, P2) fits on the device, then implement configuration switching with P1 and P2 6. else set P1 = P2 and goto 3

  • 7. Exit
slide-39
SLIDE 39

Reconfigurable Computing

39

Temporal partitioning Temporal partitioning – – ILP ILP

  • With the ILP (Integer Linear Programming),

the temporal partitioning constraints are formulated as inequalities. The system of inequalities is then solved using an ILP-solver.

  • The constraints usually considered are:

Uniqueness constraint Precedence (temporal order) constraint Memory constraint Resource constraint Latency constraint

  • Notations: yvi = 1 ↔ v ∈ Pi

wuv = 1 ↔ u ∈ Pi ∧ v ∈ Pj ∧ Pi ≠ Pj

slide-40
SLIDE 40

Reconfigurable Computing

40

Temporal partitioning Temporal partitioning – – ILP ILP

  • Unique assignment constraint: Each task should be placed

in exactly one partition: ∀v ∈ V: ∑i=1,…,m yvi = 1

  • Precedence constraint: for each edge (u,v) in the graph,

node u must be placed either in the same partition as v or in an earlier partition than that in which v is placed:

  • Resource constraint: The sum of the resources needed to

implement the modules in one partition should not exceed the total amount of available resources:

Device area constraint Device terminal constraints

∑ ∑

= =

≤ ∈ ∀

m 1 i vi m 1 i ui

y i y i : E ) v , u (

) device ( S ) u ( s y : P P

V u ui i

≤ ∈ ∀

∑ ∈

) device ( T w ) device ( T w : P P

i i i i

P v , P u uv P v , P u uv i

≤ ∧ ≤ ∈ ∀

∑ ∑

∈ ∉ ∉ ∈

slide-41
SLIDE 41

Reconfigurable Computing

41

Temporal partitioning Temporal partitioning – – Network Network-

  • flow

flow-

  • approach

approach

  • Recursive bi-partitioning:

The goal at each step is the generation of a uni-directional bi-partition The goal at each step is to compute a bi- partition wich minimizes the edge-cut size between the two partition blocks. Network flow methods are used to compute the bi-partition with minimal edge-cut size. Directly applying the min-cut max-flow theorem may lead to non-unidirectional cuts. Therefore, the original G is first transformed into a new graph G' in which each cut will be unidirectional in an optimal solution.

Unidirectional recursive bipartitioning A bidirectional cut

slide-42
SLIDE 42

Reconfigurable Computing

42

Temporal partitioning Temporal partitioning – – Network Network-

  • flow

flow – – graph graph transformations transformations

  • Two-terminal net transformation

Replace an edge (v1, v2) by two edges (v1, v2) with capacity 1 and (v2, v1) with infinite capacity

  • Multi-terminal net transformation

For a multi-terminal net {v1, v2,…,vn}, introduce a dummy node v with no weight and a bridging edge (v1, v) with capacity 1. Introduce the edges (v, v2), .... (v, vn), each of which is assigned a capacity of 1. Introduce the edges (v2, v1), ..., (vn, v1), each

  • f which is assigned an infinite capacity

Having computed a min-cut in the trans- formed graph G, a min-cut can be derived in G: for each node of G' assigned to a partition, its counterpart in G is assigned to the corresponding partition in G.

1 1

slide-43
SLIDE 43

Reconfigurable Computing

43

Temporal partitioning Temporal partitioning – – Spectral method Spectral method -

  • approach

approach

  • Goal:

Increase the quality of a partitioning

  • 2-Step Method:

Place the connected components of the graphs in the same area before partitioning Use a partitioning strategy to compute the configuration graph

  • To solve Step 1:

Use the wire length model Place the component in an n-D vector space in such a way that the sum of the distances among the components is minimized (k-D spectral embedding)

slide-44
SLIDE 44

Reconfigurable Computing

44

Temporal partitioning Temporal partitioning – – Spectral approach Spectral approach -

  • 3

3-

  • D spectral embedding

D spectral embedding

Given a graph G = (V, E), find the k placement vectors X1 = (X1i),…,Xk = (Xki), (i=1,…,|V|) which minimize:

j i 2 j k i k 2 j 2 i 2 n 1 i n 1 j 2 j 1 i 1

c ) ) x x ....( ) x x ( ) x x (( 2 1 Z − + − + − =

∑ ∑

= =

To avoid the trivial case where all components have the same position, the following constraints are additionally imposed:

1 X X .... X X X X

k T k 2 T 2 1 T 1

= = = =

Given a graph G = (V, E), a placement vector X1 = (x11,x12,..., X1|V|) is a vector in which the i-th entry describes the coordinate of the i-th node in

  • ne dimension.
slide-45
SLIDE 45

Reconfigurable Computing

45

Temporal partitioning Temporal partitioning – – Spectral approach Spectral approach -

  • 3

3-

  • D spectral embedding

D spectral embedding

  • To solve this optimization problem, introduce the Lagrange multipliers

and the Lagrangian:

  • Taking the first partial derivative of L with respect to the vectors X1, X2,

…, Xk leads to the following equation

  • The solutions are the eigenvectors X1, X2, …, Xk related to the

eigenvalues

k 2 1

,......, , α α α

Hall(1970) proved that with B being the laplacian (D - C) matrix of G

  • Connection matrix C: C(i,j) = 1 if nodes i and j are connected and 0
  • therwise (symmetric matrix).
  • Degree matrix D: D(i,i) = sum of column i entries in C. D(i,j) = 0 else.

k T k 2 T 2 1 T 1

BX X ... BX X BX X z + + + =

) 1 X X ( ... ) 1 X X ( BX X ... BX X L

k T k 1 1 T 1 1 k T k 1 T 1

− − − − − + + = α α ) X BX ( 2 )... X BX ( 2 ) X BX ( 2

k k k 2 2 2 1 1 1

= − = − = − α α α

k 2 1

,......, , α α α

slide-46
SLIDE 46

Reconfigurable Computing

46

Temporal partitioning Temporal partitioning – – Spectral approach Spectral approach -

  • 3

3-

  • D spectral embedding

D spectral embedding

                                            1 1

  • 1
  • 3

1

  • 1
  • 3

1

  • 1
  • 1
  • 3

1

  • 1
  • 1
  • 1
  • 3

1

  • 1
  • 1
  • 1
  • 3

1

  • 1
  • 1
  • 3

1

  • 1
  • 1

1

  • 1

1

  • 1

1

  • 1

1

  • 1

1

  • 1

1

  • 1

                    0.355147 0.241072

  • 0.241072
  • 0.225048

0.152761

  • 0.0410593

                    0.0 0.325058

  • 0.325058

0.0 0.0 0.0                     0.191984

  • 0.0749483
  • 0.0749481
  • 0.191984

0.0749482 0.34188

0.438447 0.267949 0.112586

1 3 6

= = = α α α

1 3 2

slide-47
SLIDE 47

Reconfigurable Computing

47

Temporal partitioning Temporal partitioning – – Spectral approach Spectral approach -

  • Generation of Partitions

Generation of Partitions

Incremental generation of P0, P1, …,Pr Select components along the z-axis and put them in the actual partition block until the size of the block exceeds the size of the FPGA: Initially, At each step i, is partitioned into and This process is repeated until We have successive bipartitions

V P ~

0 =

∅ =

i

P ~

i

P

1 i

P ~

+

i

P ~ ) P P ~ P ~ , P (

i i 1 i i

− =

+

slide-48
SLIDE 48

Reconfigurable Computing

48

Temporal partitioning Temporal partitioning – – Spectral approach Spectral approach -

  • Generation of Partitions

Generation of Partitions -

  • elimination of cycles

elimination of cycles

Because the spectral method works only on undirected graphs, the computed temporal partitioning may be not satisfying order constraints. Post-processing is required to eliminate cycles in the configuration graph P0 P1

Post-processing is done via a modified version of the Kernighan-Lin Algorithms

slide-49
SLIDE 49

Reconfigurable Computing

49

Temporal partitioning Temporal partitioning – – Spectral approach Spectral approach -

  • Generation of Partitions

Generation of Partitions -

  • elimination of cycles

elimination of cycles

) v ( E ) v ( I ) v ( D

1 i ~ i

P P

+

− =

Cut = 3

i

P

1 i

P ~

+

Cut = 4

i

P

1 i

P ~

+

Iterative improvement using the Kernighan-Lin (KL) algorithm Improve the cut by moving a component v from one partition block to another block. Gain: Problem: How to define the cut and the gain for directed graphs?

Cut ?

1 i

P ~

+

i

P

slide-50
SLIDE 50

Reconfigurable Computing

50

Temporal partitioning Temporal partitioning – – Spectral approach Spectral approach -

  • Generation of Partitions

Generation of Partitions -

  • elimination of cycles

elimination of cycles

Definitions:

=

P v | ) v , v ( ij i P

j j i

w ) v ( E

{ }

Q v P v | E ) v , v ( E

j i j i PQ

∈ ∧ ∈ ∈ =

i

P

1 i

P ~

+

) v ( E

i

P

) v ( E

1 i ~

P +

) v ( I

1 i ~

P +

) v ( I

i

P

i

P

1 i

P ~

+

) v ( E

i

P

) v ( I

i

P

) v ( I

1 i ~

P +

) v ( E

1 i ~

P +

=

P v | ) v , v ( ij i P

j i j

w ) v ( I

slide-51
SLIDE 51

Reconfigurable Computing

51

Temporal partitioning Temporal partitioning – – Spectral approach Spectral approach -

  • Generation of Partitions

Generation of Partitions -

  • elimination of cycles

elimination of cycles

Goal:

  • r

Use the KL algorithm on two instances of the same problem

In the first instance, we seek the following goal where the gain of moving a component v is In the second instance, we seek the following goal where the gain of moving a component v is

1 i ~ i P

P

E

+

=

i 1 i ~

P P

E

+

=

) v ( I ) v ( E ) v ( D

1 i ~ i

P P

+

− = ) v ( E ) v ( I ) v ( D

1 i ~ i

P P

+

− =