Reconfigurable Computing Reconfigurable Computing Applications - - PowerPoint PPT Presentation

reconfigurable computing reconfigurable computing
SMART_READER_LITE
LIVE PREVIEW

Reconfigurable Computing Reconfigurable Computing Applications - - PowerPoint PPT Presentation

Reconfigurable Computing Reconfigurable Computing Applications Applications Chapter 9 Chapter 9 Prof. Dr.- -Ing. Jrgen Teich Ing. Jrgen Teich Prof. Dr. Lehrstuhl fr Hardware- -Software Software- -Co Co- -Design Design


slide-1
SLIDE 1

Reconfigurable Computing Reconfigurable Computing Applications Applications Chapter 9 Chapter 9

  • Prof. Dr.
  • Prof. Dr.-
  • Ing. Jürgen Teich
  • Ing. Jürgen Teich

Lehrstuhl für Hardware Lehrstuhl für Hardware-

  • Software

Software-

  • Co

Co-

  • Design

Design

Reconfigurable Computing

slide-2
SLIDE 2

Overview Overview

Reconfigurable Computing

2

  • FPGAs have been used in the past mostly in

Rapid prototyping Non-frequent reconfigurable systems

  • Hardware implementation, sometimes specific for the

FPGA architecture The most important application areas are: Searching (text, genetic database, etc.) Image processing Mechanical control Etc.

slide-3
SLIDE 3

Searching Searching – – pattern matching pattern matching

Reconfigurable Computing

3

  • Pattern matching is the basis of search engines
  • The purpose is to find and (count) the occurrence of a

given pattern in a given text

  • Useful in:

Dictionaries Document collection indexing Document filtering and classification Spam avoidance Content surveillance

slide-4
SLIDE 4

Searching Searching – – pattern matching pattern matching – – sliding windows sliding windows

Reconfigurable Computing

4

  • Sliding windows

(Cockscot & Foulk ) Keywords are kept in register. One character / Byte A set of comparators are used. One comparator / Byte Hit signal is set whenever the text- segment matches the corresponding word Advantage:

Easy to replace old patterns

Drawbacks:

Not flexible: Fixed length of registers Redundancy: more comparators than necessary for word with same prefix

slide-5
SLIDE 5

Searching Searching – – pattern matching pattern matching -

  • sliding windows

sliding windows

Reconfigurable Computing

5

  • Avoid redundancy

Use only one comparator for common characters in different words

  • Data folding (Foulk)

Fold the data in the circuit Consider the bit-representation of each character Generate a comparator circuit for each character in the words to be searched for

Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit

8-bit comparator

Bit Bit Bit Bit Bit Bit Bit Bit

01001110-Comparator

slide-6
SLIDE 6

Searching Searching – – pattern matching pattern matching -

  • FSM

FSM-

  • Based

Based

Reconfigurable Computing

6

  • FSM-Based pattern matcher

Each regular grammar can be recognized by an FSM In pattern matching, the target words define the regular grammar The target words are compiled in the automaton Each word defines a unique path from the start state to an end state When scanning a text, the automaton changes its state with the appearance of characters Reaching a final state corresponds to the appearance of a word Redundancy is avoided by implementing common prefix

FSM-Recognizer and corresponding state transition table for the word conte

slide-7
SLIDE 7

Searching Searching – – pattern matching pattern matching -

  • FSM

FSM-

  • Based

Based

Reconfigurable Computing

7

FSM-Based pattern matcher

RAM-implementation One RAM or ROM for storing the state transition table One state register One character register A hit detector The Input character and the state register are used to determine the next state The hit detector checks if the current state is equal to a hit state and sets a hit for the corresponding word Advantage:

  • Simple to implement

Drawback:

  • Expensive in terms of flip flops

Char reg

RAM ROM

State Reg Next state Hit detect Character stream RAM/ROM implementation

  • f the word recognizer
slide-8
SLIDE 8

Searching Searching – – pattern matching pattern matching -

  • FSM

FSM-

  • Based

Based

Reconfigurable Computing

8

FSM-Based pattern matcher

One-hot implementation Each state is coded in one flip flop The D-input of the flip flop is

  • btained by an AND of the output
  • f the previous flip flop with the

result of the comparator The comparator is character- specific Only n FF are used to implement a word of length n Advantage:

Low cost Reflects the structure of the grammar

Drawback:

Not easy to build Redundancy in the comparators

  • n

t e c

Character-specific comparators

slide-9
SLIDE 9

Searching Searching – – pattern matching pattern matching -

  • FSM

FSM-

  • Based

Based

Reconfigurable Computing

9

  • FSM-Based pattern matcher

Exploiting common prefix For words with common prefix,

  • nly one common starting path

corresponding to the length of the common prefix is used. Redundancy of comparators can be avoided by implementing only one comparator for each character. The result of the comparison will then be provided to all gates using them

Words with common prefix and the corresponding FSM

slide-10
SLIDE 10

Searching Searching – – pattern matching pattern matching -

  • FSM

FSM-

  • Based

Based

Reconfigurable Computing

10

  • FSM-Based pattern matcher

Optimized architecture Implement the common prefix Redundancy of comparators is removed: Each character in the set is implemented in a position vector: pos(i) = 1 iff character i is detected

Block diagram of the optimal pattern matcher Detailed structure of the optimal pattern matcher

slide-11
SLIDE 11

Reconfigurable Computing

11

Searching Searching – – pattern matching pattern matching – – use of use of reconfiguration reconfiguration

Bit Bit Bit Bit Bit Bit Bit Bit

  • FSM-Based pattern matcher

Use of reconfiguration Replace the character comparators Replace the FSM for a set of words

Reconfiguration

New character comparator New set of words

R e c

  • n

fig u r a tio n

slide-12
SLIDE 12

Reconfigurable Computing

12

Signal processing Signal processing – – distributed arithmetic distributed arithmetic -

  • Motivation

Motivation

  • Signal processing applications (FFT, Convolution, Filter

algorithms) are characterized by MAC-intensive computations

  • Signal processing functions are usually implemented on

special processors DSPs ASICs

  • FPGAs provide the advantage of reconfigurability, but

MAC-intensive applications are expensive

  • However, for MAC computations involving one constant

vector, FPGAs present one of the best alternatives to DSPs

slide-13
SLIDE 13

Signal processing Signal processing-

  • distributed arithmetic

distributed arithmetic -

  • Basics

Basics

Reconfigurable Computing

13

ij i X

A ∗

Because the Ai are constant, there exist 2n possible values for We can pre-compute the possible values and store them in a LUT (DALUT) and retrieve them on demand at run-time

( )

( )

ij i j j ij i

X A 2 = 2 X A = X A = Z ∗ ∗ ∗ ∗ ∗

∑ ∑ ∑ ∑

With the binary representation for Xi:

j ij i

2 X = X ∑

( )

∗ ∗

i i X

A = X A = Z

A constant row vector, X column vector Solution of the following equation:

( )

ij i j

X A 2 = Z ∗ ∗ ∑

is the classical form of distributed arithmetic FPGA Advantage: Computation is memory-based (use of LUTs)

slide-14
SLIDE 14

Signal processing Signal processing-

  • distributed arithmetic

distributed arithmetic -

  • Basics

Basics

Reconfigurable Computing

14

To better understand, we spread the DA equation Z=[ ] 0 2

( ) ( )

n n0 1 n 1 n

A X + A X ∗ ∗

− −

.........................

2 20 1 10

A X + A X ∗ ∗ + [ ] 1 2

( ) ( )

n n1 1 n 1 1 n

A X + A X ∗ ∗

− − 2 21 1 11

A X + A X ∗ ∗ .........................

2 W 2 1 W 1

A X + A X ∗ ∗ .........................

( ) ( )

n nW 1 n W 1 n

A X + A X ∗ ∗

− −

] W

2

. . . . . . . . . . . . . . . . . . . . . . . . . . . + [ The bits of the variables will be used to address the memory and retrieve the required values in a bit-serial way. The DA-datapath implementation is straightforward

slide-15
SLIDE 15

Signal processing Signal processing-

  • distributed arithmetic

distributed arithmetic -

  • Datapath

Datapath

Reconfigurable Computing

15 ( )

1 W 1 W 1 X

X

................ . . . . . .

10 11

X X

( )

1 W 2 W 2 X

X

− ................ 20 21

X X

( )

1 W n nWX

X

− ................ n0 n1

X X

. . .

1

A

2

A

2 1 A

+ A

3

A

1 3

A + A

2 3

A + A

1 2 3

A + A + A

4

A

Z +/- DA-LUT Address DA-LUT Parallel bit-serial input j-shift

slide-16
SLIDE 16

Signal processing Signal processing-

  • distributed arithmetic

distributed arithmetic -

  • Datapath

Datapath

Reconfigurable Computing

16

k-parallel

( )

1 W 1 W 1

X X

................ . . . . . .

10 11 X

X

( )

1 W 2 W 2

X X

................

20 21 X

X

( )

1 W n nW X

X

................

n0 n1 X

X

DA-LUT 1 DA-LUT 2 DA-LUT k

ACC 1 ACC 2 ACCk Z Adder tree

slide-17
SLIDE 17

Signal processing Signal processing-

  • distributed arithmetic

distributed arithmetic -

  • Example

Example

Reconfigurable Computing

17

Recursive convolution of time domain simulation of optical multimode intra/system interconnects

( ) ( )

3 53 2 24 1 5 4 1 n n

x f + x f + x f x f + t y f = t y ∗ ∗ ∗ − ∗ ∗

Recursive formula to be implemented on 3 intervals Comparison of different implementations Virtex 2000E implementation on the Celoxica RC1000-PP board

slide-18
SLIDE 18

Signal processing Signal processing – – Fast Fourier Transform (FFT) Fast Fourier Transform (FFT)

Reconfigurable Computing

18

  • Fourier Series developed by the French Mathematician

Joseph Fourier (1807)

Application of the initial idea in the field of heat diffusion

  • The advent of digital computation and “discovery” of Fast

Fourier Transform (FFT) in the 50's revolutionized the field

  • f signal processing

Practical processing and meaningful interpretation of signals with exceptional importance for human and industry Medical monitors and scanners Modern electronic communications Image processing etc.

slide-19
SLIDE 19

Signal processing Signal processing – – FFT FFT -

  • Basics

Basics

Reconfigurable Computing

19

  • Fourier transform F(u) of a continuous function f(x):

dx e ) x ( f ) u ( F

) ux 2 j ( π − ∞ ∞ −∫

=

  • Given F(u), we can obtain f(x) by means of inverse Fourier

transform

  • Extension in 2-D, F(u,v)
  • Corresponding inverse in 2-D

du e ) u ( F ) x ( f

) ux 2 j ( π − ∞ ∞ −∫

=

du e ) v , u ( F ) y , x ( f

)) vy ux ( 2 j ( + − ∞ ∞ −∫

=

π

dx e ) y , x ( f ) v , u ( F

)) vy ux ( 2 j ( + − ∞ ∞ − ∞ ∞ −

∫ ∫

=

π

slide-20
SLIDE 20

Signal processing Signal processing – – FFT FFT -

  • Basics

Basics

Reconfigurable Computing

20

  • Fourier transform of a discrete function of one variable F(u)
  • Given F(u), we can obtain, f(x) by means of inverse Fourier

transform 1 M ,..., u , 1 M ,..., x , e ) x ( f M 1 ) u ( F

) M / ux 2 j ( 1 M x

− = − = =

− − =

π

1 M−

1 M ,..., u , 1 M ,..., x , e ) u ( F ) x ( f

) M / ux 2 j ( x

− = − = =

− =

π

  • The Brute force computation of the Fourier transform

requires M2 multiplications and additions

  • The performance can be improved to (M log M) using the

successive doubling method

slide-21
SLIDE 21

Signal processing Signal processing – – FFT FFT -

  • Basics

Basics

Reconfigurable Computing

21

  • For notational convenience, we replace the previous

equation with:

) M / 2 j ( M ) ux ( M 1 M x

e W , W ) x ( f M 1 ) u ( F

π − − =

= =

  • We assume M to be of the form
  • With n and K being positive integers each, we have:

K 2 2 M

n =

=

] W W ) 1 x 2 ( f K 1 W ) x 2 ( f K 1 [ 2 1 ) u ( F

) u ( K 2 ) ux ( K 1 K x )) x 2 ( u ( K 2 1 K x

+ + =

∑ ∑

− = − = ) ux ( K 2 1 M x

W ) x ( f K 2 1 ) u ( F

− =

=

Fodd(u) Feven(u)

slide-22
SLIDE 22

Signal processing Signal processing – – FFT FFT -

  • Basics

Basics

Reconfigurable Computing

22

  • Since and , we have

] W ) u ( F ) u ( F [ 2 1 ) u ( F

u K 2

  • dd

even

+ =

  • The FFT is a fast implementation of the Discrete Fourier

Transform (DFT).

Based on a divide-and-conquer model, M log2M computations

] W ) u ( F ) u ( F [ 2 1 ) K u ( F

u K 2

  • dd

even

− = +

u M ) M u ( M

W W =

+

u M 2 ) M u ( M 2

W W − =

+

slide-23
SLIDE 23

Signal processing Signal processing – – FFT FFT -

  • Algorithm

Algorithm

Reconfigurable Computing

23

  • The N-point DFT computation

can be divided into two N/2-point DFT computation. These N/2- point DFT computations can be divided into two N/4-point DFT computations, and so on.

  • There are log2(N) stages
  • After the division and the DFT

computation, a merging process is performed, in which the transforms are reassembled.

Merging process

slide-24
SLIDE 24

Signal processing Signal processing – – FFT FFT – – Algorithm Algorithm – – butterfly unit butterfly unit

Reconfigurable Computing

24

The butterfly unit

The reassembling is done by using the complex elements say, g and h from the previous stage. Current element = g - h*WN

k and g + h*WN k

The twiddle factor

The twiddle factor terms (of the WN

k = exp(j 2*

π*k/N) must be available The real and complex parts of these factors are stored in a ROM. The factors correspond to a sine (imaginary part) and cosine (real part) functions, in a set of N/2 equally-spaced angles in an interval from 0 to (N-2)* π/N. Therefore, only a N/2-position memory is needed.

g h * Wk

N

  • +
slide-25
SLIDE 25

Signal processing Signal processing – – FFT FFT – – FPGA implementation FPGA implementation

Reconfigurable Computing

25

16 points FFT Pipelined READ, EXECUTE and OUTPUT stage

Read one complex input (32-bits) in every cycle 16 point input is read in 16 cycles Output one complex result (32-bits) in every cycle EXECUTE stage also takes 16 cycles Performance Latency: 17 cycles Throughput: 1 transform per 16 cycles

READ EXEC OUTPUT READ EXEC OUTPUT

slide-26
SLIDE 26

Signal processing Signal processing – – FFT FFT – – FPGA implementation FPGA implementation

Reconfigurable Computing

26

Completing the execution stage in 16 cycles results in large fan-outs for the internal registers: very poor timing characteristics

By trial and error, divide the execution stage into 4 pipelined sub-stages That necessitates 5x internal storage, but 2560 bits of storage is no big deal for XC2V2000. Final performance : Latency : 65 cycles Throughput : 1 transform per 16 cycles (same as before)

R E1 E2 E3 E4 O R E1 E2 E3 E4 O

slide-27
SLIDE 27

Signal processing Signal processing – – FFT FFT – – FPGA implementation FPGA implementation

Reconfigurable Computing

27

Resource requirements: 1432 out of 21504 flip-flops (6%) 1037 out of 21504 LUT (4%) 65 out of 624 I/O pins (10%) Timing

Minimum clock period: 5.899ns (Maximum frequency: 169.520MHz)

Power estimation 440 mW

slide-28
SLIDE 28

Image processing Image processing – – Basic operators Basic operators

Reconfigurable Computing

28

  • Image processing algorithms usually process an image (set
  • f points with a given characteristics such as color, gray

level, luminance, etc.) point by point.

  • The resulting pixels depend only on the pixels in the
  • riginal picture.
  • A sequential processor needs quadratic run-time to

process a complete image. By using parallelism, each pixel can be computed independently.

  • Many image processing system are based on the following
  • perators:

Median filtering Basic Morphological operations Convolution Edge detection

  • Algorithms are often based on the moving window operator
slide-29
SLIDE 29

Image processing Image processing – – Moving window Moving window

Reconfigurable Computing

29

  • The moving or sliding window

algorithm usually processes one pixel of the image at a time

  • The value of the pixel is changed

by a function of a local pixel region covered by the window

  • The operator moves over the

image to cover all pixels

  • For a pipelined implementation, all

the pixel of the windows must be accessed at the same time for each clock

  • FPGA implementation uses FIFO

buffers

FIFO1 FIFO2 RAM W11 W12 W13 W21 W22 W23 W31 W32 W33 Disposed

FIFO1 FIFO2 W11 W12 W13 W21 W22 W23 W31 W32 W33 Disposed W14 W15 W24 W25 W34 W35 W41 W42 W43 W51 W52 W53 W44 W45 W54 W55 RAM FIFO3 FIFO4

3x3 and 5x5 moving windows

slide-30
SLIDE 30

Image processing Image processing – – Moving window Moving window -

  • FPGA

FPGA

Reconfigurable Computing

30

  • FIFO Implementation

FIFOs are implemented using circular buffers constructed from Multi- ported RAMs (available in, e.g., Virtex FPGA) Indexes keep track of the front and tail items in the buffer BLOCK RAMs are readable and writable in one clock-cycle. This allows a throughput of one pixel per cycle.

  • 3x3 windows

2 buffers of size W-3 (W = image width) are used The two FIFO buffers must be full to access all the window pixels in

  • ne cycle

In each clock cycle, a pixel is read from the memory and placed into the bottom left corner The content of the window is shifted to the right with the right most member being added to the tail of the FIFO The top right pixel is disposed after computation, since it is not used in the future computation

slide-31
SLIDE 31

Image processing Image processing – – Median filtering Median filtering

Reconfigurable Computing

31

  • Basics

An impulse noise (or salt and pepper noise) in an image has a gray level with higher low different from the neighbor point. Linear filters have no ability to remove this type of noise Median filters share remarkable advantages on removing this type of noise Often used in digital signal and image/video applications

  • Implementation

Use a sliding window of odd size (e.g., 3x3) over an image At each window position, the median of the sample values is taken to replace the value at the center of the window High computational cost O(N log N) even using most efficient sorting algorithms General purpose processors are not a good solutions for real time

  • implementation. This justifies the use of FPGAs.
slide-32
SLIDE 32

Image processing Image processing – – Median filtering Median filtering

Reconfigurable Computing

32

  • Sequential implementation

(pseudo code)

For x=1 to # rows For y = 1 to # cols Build Windows array pixel(x,y) = Median(window array) End End

  • Complexity O(#rows X #cols x NlogN)

(N=3)

Hardware sorting implementation

10 5 20 14 3 11 15 25 2 11 2 3 5 10 11 14 20 25 Median

slide-33
SLIDE 33

Image processing Image processing – – Median filtering Median filtering -

  • result

result

Reconfigurable Computing

33

Original image Filtered image

slide-34
SLIDE 34

Image processing Image processing – – Basic Morphological Operators Basic Morphological Operators

Reconfigurable Computing

34

  • Morphology in image processing studies the appearance of
  • bjects.
  • Useful for example in:

Skeletonization Edge detection Restoration

  • Processing

The image is processed pixel-by-pixel using a structuring element (the sliding windows) The window may fit or not to the image

  • Most basic building blocks:

Erosion (shrinks or erodes an object in the image) Dilation (grows the image) Operations like opening and closing of an image can be derived by performing erosion and dilation in different order

slide-35
SLIDE 35

Image processing Image processing – – Basic Morphological Operators Basic Morphological Operators

Reconfigurable Computing

35

  • Erosion

Replaces the center pixel in the sliding window by the smallest pixel value in the window array The bright area of the image shrinks, or erodes

  • Dilation

Replaces the center pixel in the sliding window by the greatest pixel value in the window array The bright area of the image grows

  • Algorithm

Same as the median Instead of selecting the median element, the minimum is selected for erosion and the maximum is selected for dilation

slide-36
SLIDE 36

Image processing Image processing – – Median filtering Median filtering -

  • result

result

Reconfigurable Computing

36

Dilation Original image Erosion

slide-37
SLIDE 37

Image processing Image processing – – Convolution Convolution -

  • Basics

Basics

Reconfigurable Computing

37

  • Convolution multiplies two arrays of numbers with different

sizes and produces a third array of numbers

  • In image processing, convolution implements operators

whose output pixels are computed as linear combinations of certain input pixels values.

  • 1-D Convolution

Formally convolution takes two input functions f(x) and g(x) and

generates h(x) = f(x)*g(x) where g(x) is referred to as the filter:

  • 2-D Convolution

Most important in modern image processing A finite size window (convolution mask) is scanned over the image The output pixel value is the weighted sum of the input pixels within

the window

The weight is the value of the filter assigned to each pixel in the

window

( ) ( ) ( )d

τ τ x g τ f = x h −

∞ ∞ −

slide-38
SLIDE 38

Image processing Image processing – – Convolution Convolution -

  • Basics

Basics

Reconfigurable Computing

38

  • 2-D Convolution

Mathematically represented by the following equation: where x is the input image, h is the filter and y is the output image Supports a virtual infinite variety of masks, each with its own feature 3x3 convolutions are most commonly used and operate only on a pixel and its directly adjacent neighbours

( ) ( ) ( )

j n i, m x j i, h = n m, y

height img = i width img = j

− −

∑ ∑

− −

P1 P2 P3 P4 P5 P6 P7 P8 P9 W1 W2 W3 W4 W5 W6 W7 W8 W9 P

*

∑ ∑

9 1 = i i i 9 1 = i i

W P W = P

slide-39
SLIDE 39

Image processing Image processing – – Convolution Convolution – – Gaussian filters Gaussian filters

Reconfigurable Computing

39

  • Gaussian convolution filters

1-D 2-D The idea is to use the 2-D distribution as a point spread function. This is achieved by convolution A discrete approximation of the Gauss function is required to perform the convolution In theory, Gauss distribution is zero anywhere. Therefore an infinite large convolution kernel may be required But in practice, the convolution kernel is truncated as shown in the pictures

( )

2 2

σ 2 x e πσ 2 1 = x G − ( )

( )

2 2 2

σ 2 y + x e πσ 2 1 = y x, G −

21 31 21 31 48 31 21 31 21

1 2 5 6

2 4 5 4 9 12 5 12 15 4 2 9 4 12 5 4 9 12 2 4 5 9 4 4 2

1 1 1 5

3x3 Gaussian smooth filter 5x5 Gaussian smooth filter

1 .4

slide-40
SLIDE 40

Image processing Image processing – – Convolution Convolution – – Gaussian filters Gaussian filters

Reconfigurable Computing

40

Convolution Original image

slide-41
SLIDE 41

Image processing Image processing – – Edge detection Edge detection -

  • Basics

Basics

Reconfigurable Computing

41

  • Edges

Placed in image with strong intensity contrast Often occurs at image location representing boundaries

  • Edge detection

Extensively used in image segmentation, i.e., dividing an image into areas corresponding to different objects Representing an image by its edges significantly reduces the amount

  • f data

Since edges correspond to strong illumination gradients, the derivatives of the image are used to compute the edges Operators often used are Laplace operator Soebel operator Canny edge detection algorithm

slide-42
SLIDE 42

Image processing Image processing – – Edge detection Edge detection -

  • operators
  • perators

Reconfigurable Computing

42

  • Laplace

Gradient operator Intensity difference are enhanced. The edges are more pronounced For each pixel, the gray value of its four neighbours (top, left, bottom, right) pixel value are subtracted from its own value

  • Soebel operator

Combination of two 1-D operators One for detecting horizontal edges One for detecting vertical edges

  • 1
  • 1

4

  • 1
  • 1

Laplace operator

  • 1

1

  • 2

2

  • 1

1

Soebel-x

1 2 1

  • 1
  • 2
  • 1

Soebel-y

slide-43
SLIDE 43

Image processing Image processing – – Convolution Convolution – – Gaussian filters Gaussian filters

Reconfigurable Computing

43

Edge detection Original image

slide-44
SLIDE 44

Image processing Image processing – – Use of reconfiguration Use of reconfiguration

Reconfigurable Computing

44

  • Intelligent image processing system

According to input image and other conditions,

Some operations are done to improve the image

Filtering (the correct filter is chosen) Smoothing Segmentation (Edge detection) Skeletonization

Some adjustments are done on the image input hardware

Calibration Focussing Everything is done while the system keeps running

Fixed parts of the system will run continuously Reconfigurable must be replaced at run-time

slide-45
SLIDE 45

Mechanical Control Mechanical Control – – Basics Basics

Reconfigurable Computing

45

  • Controller task is to influence

the dynamic behavior of a plant

  • Inputs values for the plant

depends on plant's outputs (Feedbacks)

  • A plant is modeled as a linear

time invariant (LTI)-System

  • Controller is modeled as LTI-

System

  • Time discretization

Scaling to fix-point k, k+1, k+2 …sample points T… sample period tc… calculation time of controller

Plant Controller

D/A- C D/A- C reference value

T T k k k k+1 +1 k k+2 +2 t t

t tc

c

slide-46
SLIDE 46

Mechanical Control Mechanical Control – – Basics Basics

Reconfigurable Computing

46

k

u r

k

y r

k k k k k 1 k

u D x C y u B x A x r r r r r r + = + =

+

⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ = = ... v M v M v M z

row 2 row 1

v v r r

combine A,B,C,D to M, combine A,B,C,D to M, represent computation as represent computation as set of scalar products of set of scalar products of each row of M with each row of M with u: Controller input, x: State, y: Output A, B, C, D: Constant coefficient matrices

v r

slide-47
SLIDE 47

Mechanical Control Mechanical Control – – DA DA-

  • Implementation

Implementation

Reconfigurable Computing

47

=

= =

n 1 i i ia

x x a z (1) r r

− = −

=

1 w j j ij i

2 x x (2)

∑ ∑ ∑ ∑

− = = − = − = −

= =

1 w j n 1 i i ij j n 1 i 1 w j j ij i

a x 2 2 x a z (3)

Scalar product:

  • const. vector and var. vector

xi as w-bit fix-point (here x just unsigned in [0,2[ ) replace (2) in (1), swap the sums since xij is in {0,1} right sum can have just 2n values pre-compute it and store it in a 2n x w ROM as LUT

20 2-1 2-2 2-3 2-(w-1) …

a r x v

slide-48
SLIDE 48

Mechanical Control Mechanical Control – – DA DA-

  • Implementation (Parallel)

Implementation (Parallel)

Reconfigurable Computing

48

Coeffic ient Look -Up Table (DA LUT )

Input R egis ter

x 1 > > c x 2 > > c

DA L UT1

x [1..n],w-1

DA L UT2

x[1..n],w-2 Data P ath (DP ) (c + 1 Input A dder) Res ult (z ) > > c

DA L UTc

x n > > c x[1..n],w-c

. . . ... ... ...

w w w w n n

> > 0 > > c -1 > > c -2

− = −

=

1 w j j ] n .. 1 [ j

) x ( DALUT 2 z c-Bit at a time Architecture

slide-49
SLIDE 49

Mechanical Control Mechanical Control – – Multi controller system Multi controller system

Reconfigurable Computing

49

  • Many controller modules
  • ptimized for different
  • perating regimes
  • Controllers have different

structures not only different coefficients

  • Supervisor observes

plant and determines best controller module

  • Multiplexer switches

controller outputs

slide-50
SLIDE 50

Mechanical Control Mechanical Control – – Multi controller system Multi controller system

Reconfigurable Computing

50

  • Periodic execution of

task graph

  • Conditional branching to

controller modules (CM)

  • CMs implement various

area/time trade-offs possible

tex area

slide-51
SLIDE 51

Reconfigurable Computing

51

Mechanical Control Mechanical Control – – Multi controller system Multi controller system -

  • Use of reconfiguration

Use of reconfiguration

  • One slot

solution

slide-52
SLIDE 52

Reconfigurable Computing

52

Mechanical Control Mechanical Control – – Multi controller system Multi controller system -

  • Use of reconfiguration

Use of reconfiguration

  • Two slots

solution

slide-53
SLIDE 53

Reconfigurable Computing

53

Mechanical Control Mechanical Control – – Multi controller system Multi controller system -

  • Use of reconfiguration

Use of reconfiguration

) ( : time execution

CM CM ex

A f t = ) 2 / ( ) ( 2 2

) 2 ( ) 2 ( min 1 ) 2 (

A f t T t f A A

CM ex CM ex CM

= = = =

) ( ) ( ) (

) 1 ( ) 1 ( ) 1 ( min 1 ) 1 (

A g A f t t T t f A A

CM rec CM ex CM ex CM

+ = + = = =

CM CM CM reconfig

A r A g t × ≈ = ) ( : time reconfig.

slide-54
SLIDE 54

Reconfigurable Computing

54

Mechanical Control Mechanical Control – – Multi controller system Multi controller system -

  • FPGA Implementation

FPGA Implementation -

  • inverse pendulum

inverse pendulum

Synthesis results of controller:

  • FPGA: Virtex 800
  • Area: 1003 slices (ca. 10%)
  • Clock Rate: > 70 Mhz
  • Computation Time: tc < 1 µs

The Raptor 2000 Board

Host

Simulates Plant Host configurations

RAPTOR 2000

  • Comm. Resources

Configuration Manager

FPGA

Controller Module Supervisor Communication Controller characteristics: Dimensions: p=3, n=2, q=3 Word-width: Input: 16 Bit, Intern: 32 Bit

slide-55
SLIDE 55

Reconfigurable Computing

55

Mechanical Control Mechanical Control – – Multi controller system Multi controller system -

  • FPGA

FPGA-

  • Implementation

Implementation -

  • Architecture

Architecture

Two-slot implementation One slot implementation

C M 1 reconfiguring… C M 1 C M 1 C M 2 C M 2 reconfiguring… C M 2 C M 3

slide-56
SLIDE 56

Reconfigurable Computing

56

Mechanical Control Mechanical Control – – Multi controller system Multi controller system -

  • FPGA

FPGA-

  • Implementation

Implementation -

  • Architecture

Architecture

Reconfigurable Module Slot (Controller Module) Fix Module Slot (Supervisor) Bus Macros