Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Jrg Henkel - - - PDF document

reconfigurable and
SMART_READER_LITE
LIVE PREVIEW

Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Jrg Henkel - - - PDF document

Institut fr Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel Vorlesung im SS 2014 Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Jrg Henkel - 1 - Institut fr Technische Informatik Chair for Embedded Systems


slide-1
SLIDE 1

Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel

Lars Bauer, Jörg Henkel

Vorlesung im SS 2014

Reconfigurable and Adaptive Systems (RAS)

  • 1 -

Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel

Reconfigurable and Adaptive Systems (RAS)

  • 2 -
  • 5. Configuration Prefetching
slide-2
SLIDE 2
  • 3 -
  • L. Bauer, CES, KIT, 2014

RAS Topic Overview

  • 1. Introduction
  • 3. Special Instructions
  • 6. Coarse-Grained

Reconfigurable Processors

  • 8. Fault-tolerance

by Reconfiguration

  • 2. Overview
  • 4. Fine-Grained

Reconfigurable Processors

  • 7. Adaptive

Reconfigurable Processors

  • 5. Configuration Prefetching
  • Motivation and

Definition

  • Static Prefetching
  • Clock Frequency

Variation

  • Dynamic

Prefetching

  • Area Models
  • 4 -

Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel

5 5.1 Motivation and Definition

slide-3
SLIDE 3
  • 5 -
  • L. Bauer, CES, KIT, 2014

PRISC

  • Reconfiguration is triggered implicitly by SI execution
  • Reconfiguration time: 100-600 cycles
  • Fast reconfiguration time at the cost of very limited SI

complexity

XiRISC

  • pGA-load: load a configuration into the array
  • pGA-free: remove a configuration
  • 16 cycles to receive a complete configuration if it is

available in 2nd level configuration cache

  • Approx. ‘128+startup’ cycles (not explicitly stated by the

authors) to receive it from external memory

8 times slower because 2nd level cache bus is 8 times wider (256 bit) than memory bus (32 bit)

Recall: Performing Run-time Reconfigurations

  • 6 -
  • L. Bauer, CES, KIT, 2014

Garp

  • gaconf reg : Load (or switch to) configuration at address given by

reg

  • gasave reg : Save all array data state to memory at address given

by reg

  • garestore reg : Restore previously saved data state from memory

at address given by reg

  • Approx. 50 μs to reconfigure 32 rows (12 bus cycles per row plus

some startup time)

MOLEN:

  • p-set address : reconfigure those parts that that seldom change
  • c-set address : reconfigure those parts not addressed by p-set
  • set-prefetch address : prefetches the Microcode that is

responsible for a p-set or c-set operation

  • Reconfiguration time between 2 and 12 ms

Recall: Performing Run-time Reconfigurations (cont’d)

slide-4
SLIDE 4
  • 7 -
  • L. Bauer, CES, KIT, 2014

Reconfiguration can last from few cycles (if available in

cache) over microseconds (for limited SI complexity) to milliseconds (powerful FPGA fabrics)

If a configuration is not available when the SI shall execute

then the system performance is significantly affected

  • Either stall the execution until the reconfiguration completes
  • Or use the core ISA to implement the SI functionality (trap handler
  • r conditional branch)

Solution: if some region of the reconfigurable fabric is

currently free (i.e. not occupied by another configuration), then configuration prefetching can be used to perform the reconfiguration before the SI is needed

Recall: Performing Run-time Reconfigurations (cont’d)

  • 8 -
  • L. Bauer, CES, KIT, 2014

Definition: Start loading the configuration data of

a particular SI before that SI is actually used

Goal: Minimize performance loss due to pending

reconfigurations

  • Typically: try to finish the reconfiguration before the SI is

executed

  • Note: sometimes it is better to avoid reconfiguration for

an SI and to execute it with the core ISA instead (to avoid Thrashing)

  • Configuration prefetching can be used to transfer confi-

guration data from external memory to configuration cache (preparing a reconfiguration) or to perform the reconfiguration right ahead

Configuration Prefetching

slide-5
SLIDE 5
  • 9 -
  • L. Bauer, CES, KIT, 2014

Return from subroutine

Execute SI prefetch SI

Example Control-

flow graph

  • Each node is a Base-

Block (BB)

  • Color indicates the

execution frequency

  • Edges show the

control flow

  • Red edges are

function calls (dashed lines) or returns (solid lines)

Configuration Prefetching (cont’d)

Time for reconfigu- ration

  • 10 -
  • L. Bauer, CES, KIT, 2014

Example Control-

flow graph

  • Each node is a Base-

Block (BB)

  • Color indicates the

execution frequency

  • Edges show the

control flow

  • Red edges are

function calls (dashed lines) or returns (solid lines)

Configuration Prefetching (cont’d)

slide-6
SLIDE 6
  • 11 -
  • L. Bauer, CES, KIT, 2014

Temporal distance between starting the prefetching

  • peration and the SI execution
  • Depends on control flow
  • Starting too late SI is demanded before prefetching completes
  • Starting too early Potential conflicts between currently

demanded SIs and SIs that shall be prefetched

Probability that the SI executions are reached

  • Depends on control flow
  • Typically: when prefetching is started earlier, then the uncertainty

whether the SI execution is eventually reached is higher

Number of SI executions

  • Depends on control flow
  • If the SI is executed rather seldom then it might be better to

execute it using the core ISA rather then speculating a prefetch

  • peration

Relevant Parameters for Prefetching

  • 12 -
  • L. Bauer, CES, KIT, 2014

False Prefetching: Due to control-flow uncertainty, it can

happen that prefetching for an SI was triggered and even before it finishes it becomes clear that the SI is not going to execute at all

‘Still Pending’ False Prefetching:

  • The prefetching was triggered to a queue and did not start yet
  • Simply remove it from that queue

‘Already Running’ False Prefetching:

  • The prefetching operation is currently running
  • For line-based reconfigurable fabrics (e.g. Garp or XiRisc) finish

prefetching the current line and abort it afterwards (short delay)

  • For FPGA-based reconfigurable fabrics aborting may not be

possible (unless prefetching to a cache)

Aborting prefetching operations

slide-7
SLIDE 7
  • 13 -
  • L. Bauer, CES, KIT, 2014

In node I4 and I3

all 4 SIs may be prefetched

When the

control flow moves to I1 then it is clear that SIs 3 and 4 are not demanded

  • Their prefetching

may be stopped (if possible)

Aborting prefetching operations (cont’d)

SI-centric Control Flow Graph R Rectangle: usage of SI Circle: set of in- structions (poten- tially with embed- ded control flow, sub routine calls etc.)

I2 I1 I3 I4

src: [LH02]

  • 14 -
  • L. Bauer, CES, KIT, 2014

‘0’

Aborting prefetching operations (cont’d)

‘1’

slide-8
SLIDE 8
  • 15 -
  • L. Bauer, CES, KIT, 2014

‘0’

Aborting prefetching operations (cont’d)

  • 16 -
  • L. Bauer, CES, KIT, 2014

Aborting prefetching operations (cont’d)

‘1’

‘0’

slide-9
SLIDE 9
  • 17 -

Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel

5 5.2 Static Prefetching

  • 18 -
  • L. Bauer, CES, KIT, 2014

Static Prefetching

src: [LH02]

Idea: At compile time analyze the

control-flow graph and embed prefetching instructions into to code that statically decide which SIs shall be prefetched

Required: probability which

branch will be taken

  • From profiling; shown as

edge labels (in percent)

Next step: At each node,

establish a list of all reachable SIs, sorted by the probability to reach them

slide-10
SLIDE 10
  • 19 -
  • L. Bauer, CES, KIT, 2014

Probability of a node n

to reach SI s :

Example: 3 Paths to

reach SI 3 from I10

  • 0.3 * 0.4 * 0.4 +
  • 0.3 * 0.6 * 0.4 +
  • 0.2 * 0.8 * 0.4
  • Probability to reach SI 3

from node I10: 18.4%

Static Prefetching (cont’d)

  • Paths from

node to SI

, :

p Edges e on n s Path p

n s e

  • src: [LH02]
  • 20 -
  • L. Bauer, CES, KIT, 2014

Static Prefetching (cont’d)

P1 P2 P3 P4 P4,3 P1,2 P4,3,2 P1,3,2 P3,4,1,2

src: [LH02]

P1,4,3,2

All SI nodes that can be reached (in decreasing probability)

Algorithm moving

backwards through the graph

  • Initialize a queue with all ‘SI-

nodes’ (squares, 100% reachability)

  • Remove a node n from the

queue and update the pro- bability information of all its predecessors that they can also reach the SIs that can be reached from node n

  • When all successors of a

node are processed then add it to the queue

  • Iterate until queue is empty
slide-11
SLIDE 11
  • 21 -
  • L. Bauer, CES, KIT, 2014

P1 P2 P3 P4 P4,3 P1,2 P4,3,2 P1,3,2 P3,4,1,2 P1,4,3,2

Depending on the capacity of the

FPGA, limit the prefetches to e.g. the 2 most probable SIs (this affects I7, I8, I9, I10)

Some prefetch instructions are

redundant (e.g. due to previously executed prefetches, I1, I4)

  • Prefetch at I2 may not be removed

because the control flow may come from node I8 (no SI 2 at I8) or from I5 (SI 1 might have started first and needs to be aborted now)

  • Prefetch at I6 may be removed

even though I9 prefetches P3 first; here, it is not beneficial to abort P3 to start P4 from scratch

Static Prefetching (cont’d)

src: [LH02]

  • 22 -

Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel

5 5.3 Clock Frequency Variation

slide-12
SLIDE 12
  • 23 -
  • L. Bauer, CES, KIT, 2014

Static prefetching works well, when the control flow is

known at compile time

  • i.e. edge probabilities are rather constant at run time

In this case, both the reconfiguration time (RT) and

the application execution time (ET) until the next SI are known with a good probability

Clock Frequency Variation

time

src: [DH06]

RT RT RT RT ET ET ET ET RT ET

  • 24 -
  • L. Bauer, CES, KIT, 2014

Observation: Performance of the application is (rather) bad

until the reconfiguration is completed

  • Either stall or use core ISA

Idea: Slow down to execution time to match the

reconfiguration time

  • Advantage: reduced energy consumption at minor performance

degradation

Clock Frequency Variation (cont’d)

time

No potential for slowing down as it was too slow anyway

src: [DH06]

RT RT RT RT ET ET ET ET RT ET

slide-13
SLIDE 13
  • 25 -
  • L. Bauer, CES, KIT, 2014

Problem: Potentially each RT/ET block has its

  • wn optimal ET clock frequency
  • The number of different clocks that can be statically

provided is limited

Idea: sort all different frequencies and cluster

them

Clock Frequency Variation (cont’d)

src: [DH06]

  • 26 -
  • L. Bauer, CES, KIT, 2014

Resulting reconfiguration/execution

schedule:

Clock Frequency Variation (cont’d)

RT RT RT RT ET ET ET ET RT ET time ET

Potentially tolerating a small performance loss for further power reduction

slide-14
SLIDE 14
  • 27 -

Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel

5 5.4 Dynamic Prefetching

  • 28 -
  • L. Bauer, CES, KIT, 2014

Problems of Static Prefetching

Fixed compile time decisions for prefetching and

clock frequency rely on fixed control flow

  • Actually, the

control flow graph can be considered constant, but the edge probability may change

  • ver time

Recall: H.264

video encoder

slide-15
SLIDE 15
  • 29 -
  • L. Bauer, CES, KIT, 2014

Problems of Static Prefetching (cont’d)

If MB_Type = P_MB

MC

Loop Over MB

Encoding Engine

Loop Over MB

ME: SAD, SATD RD

MB-Type Decision (I or P) Mode Decision (for I or P)

Loop Over MB

IPRED DCT / Q DCT / HT / Q IDCT / IQ IDCT / IHT / IQ CAVLC

then else

MB Encoding Loop

In-Loop De- Blocking Filter

The decision whether a MacroBlock (MB) is encoded as I-

MB or as P-MB depends only on the input video frame

  • Edge probability (in the control-flow graph) of a frame is only

known after the frame was encoded

  • 30 -
  • L. Bauer, CES, KIT, 2014

Problems of Static Prefetching (cont’d)

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 50 100 150 200 250 300 350 400 450 500 550 600 650 700

Frame Number I-MBs per frame [%]

Distribution of I-MBs [%] in a CIF (352x288: 396 MBs) Video Scence

[BSTH07]

Is HW for I- MB needed at all? Is HW for P- MB needed at all?

slide-16
SLIDE 16
  • 31 -
  • L. Bauer, CES, KIT, 2014

Dynamic Prefetching

PME P: Prefetching Point ME: Motion Estimation EE: Encoding Engine LF: Loop Filter ME PEE EE PLF LF PME ME PEE EE time

Problem: Static prefetching cannot reflect dynamic

execution behavior

Observation: The execution behavior does not change

randomly (it could, but this rarely happens)

Idea: Maybe a kernel (e.g. EE) can learn from its previous

executions

Provide information for better prefetching in the next iteration

  • 32 -
  • L. Bauer, CES, KIT, 2014

Classification:

  • Static Prefetching: purely based on compiler-

determined decisions

  • Dynamic Prefetching: based on run-time adaptation

(except the initial predictions)

Requires online monitoring to obtain feedback of the actual execution

  • Hybrid Prefetching: combines both types, i.e. some

parts are kept adaptive and others are fixed

Sometimes the compiler/programmer simply knows it better

Dynamic Prefetching (cont’d)

slide-17
SLIDE 17
  • 33 -
  • L. Bauer, CES, KIT, 2014

Prefetching is partitioned into 2 parts:

  • 1. Forecasting the expected SI execution frequencies (i.e. how
  • ften are they expected to be executed)
  • 2. Selecting SI implementations (e.g. core ISA or HW) according

to the forecast

  • Here: Focus on Forecasting (Selection comes in later slide set)

Case Study: Hybrid Prefetching for RISPP

Time for reconfiguration Inner loop, execu- ting the SI FB1: Forecasting the future SI execution frequency FB2: Forecasting that the SI is no longer required in this loop, potentially forecasting other SIs Potentially other inner loops, etc.

  • uter loop

Base Block Control Flow FB Forecast Block

  • 34 -
  • L. Bauer, CES, KIT, 2014

Time FB1 FB2 SI executions

Time for reconfi- guration Other inner loops

Next iteration

  • f outer loop

FB1 FB2

Time for reconfi- guration Other inner loops

SI executions Next iteration

  • f outer loop

= FBt = FBt+1 = FBt+2 = FBt+3 Note that FBt+i denotes the execution sequence of the FBs and any FBt+i eventually corresponds to FB1 or FB2 (as in this example only 2 different FBs are used)

An error back propagation approach changes the

prediction for the future

Execution Sequence of Forecast Blocks

Error back-propa- gation, targeting the location of a previous FB Error back- propagation resulting fine-tuning, indirectly changing the Forecast Value for a future execution of FB1

slide-18
SLIDE 18
  • 35 -
  • L. Bauer, CES, KIT, 2014

Consider the forecast blocks of a particular SI

independent of the forecast blocks of other SIs

After the execution of a hot spot there are 3

parameters available:

1. The initial forecast for the SI execution frequency 2. The actual monitored SI execution frequency 3. The forecasted future SI execution frequency

In general there could be multiple subsequent forecasts for one hot spot/SI execution before it is actually executed In our example, there is a dedicated non-modifiable (‘the programmer knows it better’) Forecast Block after the hot spot that informs that the hot spot is finished (and that the next one starts)

Error Back-Propagation

PEE

start

EE PEE

end

time ... PEE

start

EE PEE

end

Fine tuning Fine tuning

  • 36 -
  • L. Bauer, CES, KIT, 2014

Error: New Forecast Value:

Error Back-Propagation (cont’d)

FBt FBt+2 FBt-1

M(FBt+1) M(FBt+2) M(FBt) M(FBt-1) FV(FBt-1) FV(FBt) FV(FBt+1) FV(FBt+2)

FBt-2

M(FBt-2) FV(FBt-2)

Sliding Window time

FBt+1

Forecast Block

FBt+i

Forecast Value FV Monitoring Value M

Current Point in time

γ α

Calculate Forecast Error E(FBt+1) Calculate Back Propa- gation Value

Error E

  • 1

1 1

: –

t t t t

E FB M FB FV FB FV FB

  • 1

:

t t t

FV FB FV FB E FB

slide-19
SLIDE 19
  • 37 -
  • L. Bauer, CES, KIT, 2014

Parameter α determines the strength of the

back-propagation

  • α=0: no back-propagation static prefetching
  • α=1: completely overwrite the previous value

can lead to a very bad prediction in case of thrashing

Error Back-Propagation (cont’d)

  • 38 -
  • L. Bauer, CES, KIT, 2014

Examples

ith fine t tuning Old Forecast V Value FV(FBt) Monitored Value M M(FBt+1) Error E(FBt+1) New Forecast Value FV(FBt) 1 100 0-100 = -100 100+1*(-100) 2 100 100-0 = 100 0+1*(100) 3 100 0-100 = -100 100+1*(-100) 4 100 100-0 = 100 0+1*(100) ith fine t tuning Old Forecast V Value FV(FBt) Monitored Value M M(FBt+1) Error E(FBt+1) New Forecast Value FV(FBt) 1 100 0-100 = -100 100+0.5*(-100) 2 50 100 100-50 = 50 50+0.5*(50) 3 75 0-75 = -75 75+0.5*(-75) 4 37.5 100 100-37.5 = 62.5 37.5+0.5*(62.5) = 67.75

α=1 α=0.5

slide-20
SLIDE 20
  • 39 -
  • L. Bauer, CES, KIT, 2014

Parameter γ determines how strong the

future prediction should be considered for back-propagation

  • This shifts the forecast to an earlier point in time
  • Could lead to too early forecasts
  • See example below: after multiple iterations of that

chain, the forecast for the SI (on the right end) in FB4 will eventually shift back to FB3 and finally FB1

Error Back-Propagation (cont’d)

FB1 FB2 FB3 FB4 SI time

  • 40 -
  • L. Bauer, CES, KIT, 2014

One may update more than the directly preceding FBt

  • Use the parameter λ to weight it
  • For a large λ (note: all parameters are between 0 and 1) a rapid

change in the execution frequency is back-propagated rather fast

Potentially all (!) preceding FBt need to be updated

  • Furthermore, FBt-i and FBt-j (i≠j) may denote the same prefetching

block (from different loop iterations)

  • Practically infeasible for hardware implementation

Error Back-Propagation (cont’d)

FBt FBt+2 FBt-1

M(FBt+1) M(FBt+2) M(FBt) M(FBt-1) FV(FBt-1) FV(FBt) FV(FBt+1) FV(FBt+2)

1-λ

FBt-2

M(FBt-2) FV(FBt-2)

(1-λ) λ (1-λ) λ² (1-λ) λ³ time

FBt+1

slide-21
SLIDE 21
  • 41 -
  • L. Bauer, CES, KIT, 2014

50 100 150 200 250 300 350 400 450 500 150 200 250 300 350 400 450 500

Frame Number SI executions

MC Hz 4 (P-MB) IPred VDC & HDC 16x16 (I-MB)

Evaluating Dynamic Forecasting

[BSTH07]

  • 42 -
  • L. Bauer, CES, KIT, 2014

20 40 60 80 100 120 140 160 180 200 150 200 250 300 350 400 450 500

Forecast Value (expected amount of I-MBs) Frame Number

Actually Executed I-MBs Predicted I-MBs for α = 0.6 Predicted I-MBs for α = 0.3 Predicted I-MBs for α = 0.1

α determines the adaptation rate low rate provides good average in hectically changing scenarios, but fails to adapt at drastic (but then steady) changes

Evaluating Dynamic Forecasting (cont’d)

slide-22
SLIDE 22
  • 43 -
  • L. Bauer, CES, KIT, 2014

1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 50 100 150 200 250 300 350 400 450 500 550 600 650 700

Accumulated Absolute Error Frame Number

accumulated absolute error, α = 0.6 accumulated absolute error, α = 0.3 accumulated absolute error, α = 0.1

Depending on the frequency of so-called phase changes (e.g. moving from smooth to high motion and then staying at high motion for a while), a medium to high α values is beneficial.

Evaluating Dynamic Forecasting (cont’d)

  • 44 -
  • L. Bauer, CES, KIT, 2014

Typically based on compile-time profiling information

  • Can be used without profiling, but then typically performs bad at

the start of application execution

Dynamically updates the profiling information (online

monitoring) and performs prefetching based on these monitoring results

Some light-weight update mechanisms exist that do not

imply noticeable hardware or performance overhead

For some applications, static prefetching is sufficient, but

  • ther applications (e.g. H.264 video encoder) demand

dynamic/hybrid prefetching

Summary Dynamic Prefetching

slide-23
SLIDE 23
  • 45 -

Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel

5 5.5 Area Models

  • 46 -
  • L. Bauer, CES, KIT, 2014

Prefetching considers which SIs are predicted to

be used in the near future

Additionally, it needs to be considered, whether

the hardware implementation of these SIs have a conflict

  • Definition “Placement Conflict” of n SIs: hardware areas
  • f the SIs are pairwise disjoint, i.e. the intersection of

any two areas is empty

A conflict between SIs affects the prefetching

decision (independent of whether it is static, dynamic, or hybrid)

Configuration Area Model

slide-24
SLIDE 24
  • 47 -
  • L. Bauer, CES, KIT, 2014

[PBV07]

Conflict Example

  • 47 -

Conflict between OP1

and OP2 as well as between OP2 and OP3

  • But no conflict

between OP1 and OP3

Goal of prefetching:

start the reconfigu- ration at an early point in time

  • But: avoid crea-

ting a conflict by starting it too early

  • 48 -
  • L. Bauer, CES, KIT, 2014

The amount of conflicts also depend on the

FPGA area model

Some can reduce the potential for conflicts

  • Typically at the cost of reduced efficiency/flexibility

The conflict potential can also be reduced by

introducing

  • relocatable implementations (i.e. position within the

reconfigurable fabric is not fixed)

  • configuration caches

FPGA Area models

slide-25
SLIDE 25
  • 49 -
  • L. Bauer, CES, KIT, 2014

2D Area Model

Highest Flexibility Complex synthesis of communication at run time

  • Sometimes based on reconfigurable NoCs (Networks on Chip)

Complex place-

ment problem

External Defrag-

mentation problem (i.e. unusable gaps between the modules)

High conflict

potential

[KPR04]

  • 50 -
  • L. Bauer, CES, KIT, 2014
  • Reduced external fragmentation problem and placement complexity
  • Has internal fragmentation, i.e. unusable space within a module/region for module
  • Typically, dedicated routing channels for inter-module communication
  • Easy Relocation
  • For instance

used in Garp and XiRisc

  • Potentially

complicated to establish rou- ting within the reconfigurable modules (due to the rather slim shape)

1D Area Model

[KPR04]

slide-26
SLIDE 26
  • 51 -
  • L. Bauer, CES, KIT, 2014

1-dimensional area model

needs 2-dimensional placement

  • ‘Time’ is the extra dimension

Simple approach: only

memorize the front-line, i.e. for each area block remember the maximum time until it is

  • ccupied

Example: The SIs T1, T2, …, T7

are requested one after each

  • ther and need to be placed
  • Problem: due to the ‘front-line’

limitation, T7 is placed suboptimal

1D Horizon Scanning

[SWP04]

  • 51 -
  • 52 -
  • L. Bauer, CES, KIT, 2014

1D Stuffing

[SWP04]

Better solution: consider

‘gaps’ in the placement / schedule

Requires a representation

  • f occupied regions
  • Typically represented as

(overlapping) rectangles of

  • ccupied regions
  • Alternative: (overlapping)

rectangles of free regions, so- called ‘free-space management’

  • 52 -
slide-27
SLIDE 27
  • 53 -
  • L. Bauer, CES, KIT, 2014

1D Fixed-Size Area Model

[KPR04] No external defragmentation, but high internal

defragmentation

  • Reduced area efficiency when modules differ in size significantly

Conflicts

  • nly occur

when too many modules are required at the same time

Easy com-

munication connection

  • 54 -
  • L. Bauer, CES, KIT, 2014
  • For Xilinx FPGAs only the 1D Fixed-Size Area Model is supported by the

Xilinx Tools (different containers can have different sizes, but over time the size of a container is fixed)

  • It is possible to realize the other models as well, using self-developed tool chains
  • Case studies exist from different university projects
  • The 1D Fixed-Size model is the simplest model considering placement

and communication

  • However, it has the largest internal fragmentation
  • The 2D model is the most complicated one
  • Often Network on Chips are used to establish the communication (high

communication latency) or busses are established at run time (high overhead)

  • The 2D model and the 1D (flexible) model demand defragmentation
  • Which model is actually used is to a large degree independent of the

underlying Special Instruction composition (format, parameters etc.)

  • Clear separation between ISA and Microarchitecture

FPGA Area models

slide-28
SLIDE 28
  • 55 -
  • L. Bauer, CES, KIT, 2014

Long reconfiguration time demand for

configuration prefetching

Static Prefetching (compiler decides) Clock Frequency variation: slow down until

reconfigurations completed

Adaptivity with the application demand for

Dynamic/Hybrid Prefetching

Placement conflicts Area models

Prefetching Summary

  • 56 -
  • L. Bauer, CES, KIT, 2014

[LH02] Z. Li and S. Hauck: “Configuration Prefetching Techniques for Partial Reconfigurable Coprocessors with Relocation and Defragmentation”, International Symposium on Field- Programmable Gate Arrays, pp. 187-195, 2002. [DH06] F. Dittmann, T. Heimfarth: “Clock Frequency Variation of Partially Reconfigurable System”, 19th International Conference on Architecture of Computing Systems (ARCS), pp. 195-204, 2006. [BSTH07] L. Bauer, M. Shafique, D. Teufel, J. Henkel: “A Self-Adaptive Extensible Embedded Processor”, First International Conference on Self-Adaptive and Self-Organizing Systems (SASO), pp. 344-347, 2007. [PBV07] E. M. Panainte, K. Bertels, S. Vassiliadis: “The Molen Compiler for Reconfigurable Processors”, ACM Transactions on Embedded Computing Systems, vol. 6, no. 1, article 6, 2007. [KPR04] H. Kalte, M. Porrmann, U. Rückert: “System-on-Programmable-Chip Approach Enabling Online Fine-Grained 1D-Placement”, International Parallel and Distributed Processing Symposium, pp. 141a, 2004. [SWP04] C. Steiger, H. Walder, M. Platzner: “Operating Systems for Reconfigurable Platforms: Online Scheduling of Real-Time Tasks”, IEEE Transactions on Computers, vol. 53, no. 11, pp. 1392- 1407, 2004.

References and Sources