[PPT] - Reconfigurable Computing Computing Reconfigurable Design and PowerPoint Presentation

SLIDE 1

Reconfigurable Computing

Reconfigurable Reconfigurable Computing Computing Design and Design and implementation implementation Chapter Chapter 4.2 4.2

Prof. Dr.
Prof. Dr.-
Ing. Jürgen Teich
Ing. Jürgen Teich

Lehrstuhl für Hardware Lehrstuhl für Hardware-

Software

Software-

Co

Co-

Design

Design

SLIDE 2

Reconfigurable Computing

Brief tour through logic synthesis Brief tour through logic synthesis

2

SLIDE 3

Reconfigurable Computing

3

Logic synthesis Logic synthesis -

Goal

Goal

A digital system consists of

combinatorial parts separated by memory elements

The goal of the logic synthesis is

to provide an implementation of a digital system for a given platform or for a given target library

FPGA-Goal: Generation of

configuration data

The implementation must be
ptimized according to factors

such as area, delay, power consumption, testability, etc...

digital system

SLIDE 4

Reconfigurable Computing

4

Logic synthesis Logic synthesis -

Two

Two-

level

level-

logic

logic

Logic synthesis is usually divided into two parts: Two-level logic synthesis:

designs represented in two-level logic (sum of product-terms, the sums are implemented on the first level and the product on the second level) Advantages:

Natural representation of Boolean functions Well understood and easy manipulation Drawbacks: not representative for the logic complexity. Therefore, bad estimator

f complexity during logic optimization

Initially developed for PALs and PLAs

* * * * +

F

X2 X1 X3 X2 X4 X3

Two-level logic

SLIDE 5

Reconfigurable Computing

5

Logic synthesis Logic synthesis -

Multi

Multi-

level

level-

logic

logic

Multi-level logic synthesis: targets

multi-level (many Boolean function

n the path from the inputs to the
utputs)

Advantages: Smaller, faster, less power in most cases Drawbacks: Difficult to manipulate Few manipulation algorithms exist Appropriate for mask-programmable

r field programmable devices
Multi-level will be therefore

considered in this course

F1

X2 X1 X3 X2 X6 X5 X4 X5

F2 F3

Multi-level logic

SLIDE 6

Reconfigurable Computing

6

Logic synthesis Logic synthesis -

Boolean Networks

Boolean Networks

Multi-level logic is usually represen- ted using Boolean networks (BN). A BN is a directed acyclic graph (DAG) in which

a node represents an arbitrary Boolean function An edge represents the (data) dependency between nodes

In order to efficiently manipulate a Boolean network, viable represen- tation of nodes is necessary. The important factors considered are:

memory efficiency correlation with the final representation

SLIDE 7

Reconfigurable Computing

7

Logic synthesis Logic synthesis -

Node representation

Node representation

The choices usually made for node representation are:

Sum-Of-Products (SOP) Factored Form (FF) Binary Decision Diagram (BDD)

Sum-Of-Product: Sum of product terms
Factored form (FF): Defined recursively as follows:

(FF = product) or (FF = sum). (product = literal) or (product = FF1*FF2). (sum = literal) or (sum = FF1+FF2). Example: is a product of the factored forms and , which in turn is a sum of the factored forms and )) ( ( e d b a c + + c ) ( e d b a + + a ) ( e d b +

SLIDE 8

Reconfigurable Computing

8

Logic synthesis Logic synthesis -

Node representation

Node representation

Binary Decision Diagram (BDD): A BDD

is a rooted DAG with two kinds of nodes:

Variable nodes : A variable node v is a non-terminal node with the following attributes:

index(v) ∈ {1,…,n} (i defines a variable xi) Two children low(v) and high(v)

Constant nodes: A constant node v is a terminal node with value(v) ∈ {0, 1} The nodes are ordered from the root to the terminal nodes. For each non-terminal v, if low(v) is non terminal, then index(low(v)) < index(v) Similarly, if high(v) is non-terminal, then index(high(v)) < index(v)

SLIDE 9

Reconfigurable Computing

9

Logic synthesis Logic synthesis -

Node representation

Node representation

Correspondence between a BDD with

root v and a Boolean function

The root represents the Boolean function fv If v is terminal, then fv = value(v) If v is a non terminal node with index i, the Shannon expansion theorem is used: The value of fv for a given assignment is

btained by traversing the graph from the root

to the terminal according to the assignment values The figure aside shows the optimal-BDD representation of the function cd d b abc f + + =

) ( ) ( v high i v low i v

f x f x f + =

SLIDE 10

Reconfigurable Computing

10

Logic synthesis Logic synthesis -

Node manipulation

Node manipulation

Given a suitable node representation, operations are

performed on the Boolean network. The goal is the generation of an equivalent and cost effective simplified function.

The operations usually applied for the reduction of

Boolean networks are:

Decomposition: Replace a Boolean expression with a collection of new expressions. A Boolean function f(X) is decomposable if we can find a function g(X) such that f(X) = f’(g(X), X) Example: 12 literals Decomp.: 8 literals Extraction: Use to identify common intermediate sub-functions from a set of given functions. Example: can be rewritten as d c b d c a abd abc f + + + = ) ( ) ( ) ( ) ( d c ab d c ab d c b a d c ab f + + + = + + + = e bc a g e d bc a f ) ( , ) ( + = + + = ) ( with , bc a x e x g e xd f + = = + =

SLIDE 11

Reconfigurable Computing

11

Logic synthesis Logic synthesis -

Node manipulation

Node manipulation

Factoring: Transformation of SOP-expressions in factored form Example: Substitution: Replace an expression e within a function f with the value of an equivalent expression g(X) = e Example: Collapsing or Elimination: Reverse operation to substitution. It is use to eliminate levels in order to meet timing constraints Example: e d c b a f e bd bc ad ac f + + + = + + + + = ) )( ( as rewritten be can bc a g e d g f e d bc a f + = + = + + = with ) ( as rewritten be can ) )( ( d c b ad ac d c b g ga f + + = + = + = f by replaced be will g with

SLIDE 12

Reconfigurable Computing

12

Logic synthesis Logic synthesis -

LUT

LUT-

Technology mapping

Technology mapping

Technology mapping binds the optimized nodes of the

Boolean network to the target device library.

In the FPGA case, library elements are LUTs. Therefore,

this process is called LUT-based Technology mapping.

LUT-Based technology mapping is an optimization

process whose goal is usually:

Minimizing the number of LUT used (device area) Minimizing the signal delay (Speed) Optimizing routability, minimizing power (very few work)

In this chapter, we will study two LUT-technology mapping

algorithms.

The chortle-crf for area minimization The FlowMap for delay minimization

SLIDE 13

Reconfigurable Computing

13

Logic synthesis Logic synthesis -

LUT

LUT-

Technology mapping

Technology mapping – – definitions definitions

Given a Boolean network:

A primary input (PI) node is a node with no predecessor.

A primary output (PO) is a node which has no successor.

The level of a node is the length of the longest path from

the primary input to that node.

The depth of a graph is the largest level of a node in the

graph.

For a node v, input(v) is defined as the set of nodes

which are fan-in of.

A Boolean network is K-Bounded, if input(v) ≤ K for all

nodes in the graph.

v

SLIDE 14

Reconfigurable Computing

14

Logic synthesis Logic synthesis -

LUT

LUT-

Technology mapping

Technology mapping – – definitions definitions

A Cone Cv at a node v is the

tree with root which spans from v to the primary inputs.

A Cone Cv at a node v is

K-feasible if:

input(v) ≤ Cv

Any path connecting a node in Cv and v lies entirely in v

The LUT-technology mapping

problem can be defined as the problem of covering a Boolean network with a set of K- feasible cones.

v

A K-feasible Cone at v Graph covering with cones LUT Mapping

SLIDE 15

Reconfigurable Computing

15

Logic synthesis Logic synthesis -

LUT

LUT-

Techmap

Techmap-

The Chortle

The Chortle-

crf

crf algorithm algorithm

Developed by Francis et al., University of Toronto in 1991.
bin packing approach which traverses the nodes from the

primary inputs (PIs) to the primary outputs (POs)

At each node, the best circuit implementing the K-feasible

cone at that node is searched for.

The two main goals are:

Minimizing the number of LUTs and therefore the device area. Minimizing the number of used pins at the output LUTs.

Approach: At each node, construct a tree of LUTs

that implements the function of the fan-in LUT that implements the decomposition of the node

SLIDE 16

Reconfigurable Computing

16

Logic synthesis Logic synthesis -

LUT

LUT-

Techmap

Techmap-

The Chortle

The Chortle-

crf

crf algorithm algorithm

First step: Two-level decomposition

The two levels consist of a single first-level and several second-level nodes (the fan-in). Each second-level node implements the operation of the nodes being decomposed over a set of fan-in LUTs. The first-level nodes will be implemented in the second phase. The construction is done using a bin-packing approach. The goal of the bin-packing is to find a minimum number

f bins with a given capacity into which a set of boxes can

be packed. Here, the boxes are the second-level or fan-in LUTs and the bins are the resulting LUTs. The capacity of a bin is the number of LUT-inputs.

SLIDE 17

Reconfigurable Computing

17

Logic synthesis Logic synthesis -

LUT

LUT-

Techmap

Techmap-

The Chortle

The Chortle-

crf

crf algorithm algorithm

Packing two two-input-LUTs LUT1 and LUT2 into a new LUTr means Oring the output of LUT1 and LUT2 and implementing the resulting circuitry in LUTr

SLIDE 18

Reconfigurable Computing

18

Logic synthesis Logic synthesis -

LUT

LUT-

Techmap

Techmap-

The Chortle

The Chortle-

crf

crf algorithm algorithm

First step: Two-level

decomposition

Algorithm Two-Level-decomposition { start with an empty list of LUTs; while there are unpacked fanin LUTs { if the largest unpacked fanin LUT will not fit within any LUT in the list { create an empty LUT and add it to the end of the LUT list } pack the largest unpacked fanin LUT into the first LUT it will fit in } }

SLIDE 19

Reconfigurable Computing

19

Logic synthesis Logic synthesis -

LUT

LUT-

Techmap

Techmap-

The Chortle

The Chortle-

crf

crf algorithm algorithm

Second step: Multi-level decomposition

The first-level node is implemented using a tree of LUTs The reduction of the number of LUTs is done by using LUTs with unused pins to implement a portion of the first-level LUTs.

Algorithm MultiLevel { while there is more than one unconnected LUT { if there are no free inputs among the remaining unconnected LUTs { create an empty LUT and add it to the end of the LUT list } connect the most filled unconnected LUT to the next unconnected LUT with a free input } }

SLIDE 20

Reconfigurable Computing

20

Design Flow Design Flow – – LUT LUT-

Techmap

Techmap – – The FlowMap The FlowMap algorithm algorithm

The FlowMap algorithm is a network flow-based method aiming at minimizing signal delays of mapped designs. We first recall some basics of network flow. Given is a network N=(V, E) (which is a graph with the set

f nodes V and the set of edges E) with source s and a sink t

A cut is a partition

f N with

The cut-size of a cut is the number of nodes in adjacent to some nodes in

A cut is K-feasible iff

The edge cut-size

f is the weighted sum of

crossing edges. For each node t, we define the label l(t) of t as the depth of the

ptimal LUT which implements t in an optimal mapping of the

subgraph Nt of N (where Nt is the cone at t) The height

f is the maximum label in

t

) , ( X X ) , ( X X n ) , ( X X X X t X s ∈ ∈ and ) , ( X X K X X n ≤ ) , ( ) , ( X X e ) , ( X X ) , ( X X h ) , ( X X X

SLIDE 21

Reconfigurable Computing

21

Design Flow Design Flow – – LUT LUT-

Techmap

Techmap – – The FlowMap The FlowMap algorithm algorithm

The objective of the FlowMap algorithm is the minimization of the signal delays determined by:

The delay in the LUTs. The interconnection delay.

Because the LUT placement is not yet known, only LUT delay is considered. Interconnection delay is assumed to be the same for all signals. The delay of a signal is therefore the number of LUTs that the signal traverses on a path from input to output. The objective is the minimization of the depth of the resulting DAG. The FlowMap algorithm is a two-steps method:

Node labelling phase. Node mapping phase.

SLIDE 22

Reconfigurable Computing

22

Design Flow Design Flow – – LUT LUT-

Techmap

Techmap – – The FlowMap The FlowMap algorithm algorithm

The First phase of the algorithm computes the labels of the nodes in a topological order. This ensures that each nodes is processed after all its predecessors. The labeling is done as follows: Each primary input is assigned the label 0. For a given node t to be processed, the cone Nt is transformed into a network Nt by inserting a source node s whose output is connected to all inputs of Nt . With the assumption that LUT(t) implements t in an

ptimal mapping of Nt, the cut , where X(t)

is the set of nodes in LUT(t) and is K-feasible. The level l(t) of t is then given by: Lemma 1: If p is the maximum label in input(t), then

Network transformation

) ) ( ), ( ( t X t X ) ( ) ( t X N t X

t −

= ) ) ( ), ( ( min ) (

feasible

K

is ) ) ( ), ( (

t X t X h t l

t X t X

= 1 ) ( ) ( + = ∨ = p t l p t l

SLIDE 23

Reconfigurable Computing

23

Design Flow Design Flow – – LUT LUT-

Techmap

Techmap – – The FlowMap The FlowMap algorithm algorithm

Lemma 2: Let N1

t be the network obtained from Nt by

collapsing all the nodes in Nt into a single node t1 Nt has a K-feasible cut of height p-1 iff N1

t has a

K-feasible cut.

According to Lemmas 1 and 2, the FlowMap

algorithm first checks if there is a K-feasible cut

f height p-1 in Nt

If such a cut exists, then l(t) = p and node will be packed in the second phase in the same LUT with the nodes in X(t) Otherwise, l(t) = p+1 since the minimum height of a K-feasible cut in Nt is p and is such a cut. A new LUT will be created for t in the second phase.

The problem of testing if a K-feasible cut with height

p-1 exists can be done by first transforming Nt into N1

t

Network collapsing

t

) ) ( ), ( ( t X t X }) { }, { ) ( ( t t N V

t −

SLIDE 24

Reconfigurable Computing

24

Design Flow Design Flow – – LUT LUT-

Techmap

Techmap – – The FlowMap The FlowMap algorithm algorithm

A second transformation is done to transform N1

t into a

new network N2

t.

For each node v in N1

t other than s and t1, two new

nodes v1 and v2 are introduced and connected by a bridging edge (v1, v2 ) The source and sink are also inserted in N2

t. For each edge

(s, v )/ (v , t1) an edge (s, v1 )/ (v2 , t1) is inserted in N2

t .

For each edge (u, v) in N1

t, a new edge (u2 , v1) is

introduced in N2

t.

The capacity of each bridging edge is set to 1 and that of non-bridging edge is set to ∞.

The goal of this step is to

reduce the node cut-size in N1

t into an edge cut-size in N2 t

apply well known methods to solve the edge cut-size in N2

t

Finally, derive the equivalent solution in N1

t

This will be done using the following Lemma:

Second transformation

SLIDE 25

Reconfigurable Computing

25

Design Flow Design Flow – – LUT LUT-

Techmap

Techmap – – The FlowMap The FlowMap algorithm algorithm

Lemma 3: N1

t has a K-feasible cut iff N2 t has a cut whose

edge size is no more than K.

Testing is such a cut exists in N2

t is done using Min-cut

max-flow theorem (the minimum cut produce the maximal flow between source and sink). The augmenting path method is then used to increasingly detect if the value of a flow in N2

t is more than K.

Second transformation Derived solution

SLIDE 26

Reconfigurable Computing

26

Design Flow Design Flow – – LUT LUT-

Techmap

Techmap – – The FlowMap The FlowMap algorithm algorithm

In the second phase of the FlowMap algorithm, nodes are

mapped to K-LUTs.

The algorithm works on the set L of outputs of the Boolean

network.

Initially, L contains all primary outputs.

For each node , it is assumed that a minimum K-feasible cut has been computed in the first phase. A K-LUT LUTv is created to implement the function of as well as that of all nodes in L is then updated to Nodes belonging to two different cut-set and will be automatically duplicated.