Reconfigurable Computing Computing Reconfigurable Partial - - PowerPoint PPT Presentation

reconfigurable computing computing reconfigurable partial
SMART_READER_LITE
LIVE PREVIEW

Reconfigurable Computing Computing Reconfigurable Partial - - PowerPoint PPT Presentation

Reconfigurable Computing Computing Reconfigurable Partial reconfiguration reconfiguration design design Partial Chapter 8 8 Chapter Prof. Dr.- -Ing. Jrgen Teich Ing. Jrgen Teich Prof. Dr. Lehrstuhl fr Hardware- -Software


slide-1
SLIDE 1

Reconfigurable Reconfigurable Computing Computing Partial Partial reconfiguration reconfiguration design design Chapter Chapter 8 8

  • Prof. Dr.
  • Prof. Dr.-
  • Ing. Jürgen Teich
  • Ing. Jürgen Teich

Lehrstuhl für Hardware Lehrstuhl für Hardware-

  • Software

Software-

  • Co

Co-

  • Design

Design

Reconfigurable Computing

slide-2
SLIDE 2

Partial Reconfiguration Design Partial Reconfiguration Design -

  • Introduction

Introduction

Reconfigurable Computing

2

Reconfiguration advantages

  • Fast computation compared to GPP
  • Flexible computation compared to ASIC

Partial device re-use allows

  • Space saving
  • Power saving

VHDL video Out mp3

control

Video in PS2 MPEG2

MP3 Video out Time MP3 Control MPEG MPEG

slide-3
SLIDE 3

Partial Reconfiguration Design Partial Reconfiguration Design -

  • Introduction

Introduction

Reconfigurable Computing

3

A partially reconfigurable design consists of:

A set of full reconfigurable designs A set of partial designs which can be separately downloaded

The full designs as well as the partial modules are available as full (partial) bitstream used to configure the device The partially reconfigurable modules are use to move the system from one configuration to the next one

slide-4
SLIDE 4

Partial reconfiguration design Partial reconfiguration design -

  • Introduction

Introduction

Reconfigurable Computing

4

  • The purpose of this section is to learn how to

design a partially reconfigurable system using the current CAD tools and devices

  • The Xilinx FPGAs (Virtex-II and Spartan-II) are

some of the few devices on the market allowing partial reconfiguration

  • This section will focus on two Xilinx-based

methodologies for designing a partially reconfigurable system: The Xilinx partial design flow The Xilinx small bit modifcation using JBits

slide-5
SLIDE 5

Partial reconfiguration design Partial reconfiguration design -

  • Approach

Approach

Reconfigurable Computing

5

  • Traditional design flow

Full circuit Constraints define:

Placement Constraints Relative Location Constraints Timing Constraints

  • Only one full circuit is

generated

VHDL Basic Constraints (pins, timing, …)

+

Netlist

All constraints provided earlier

Technology Mapping Place and route Full Bitstream

slide-6
SLIDE 6

Partial reconfiguration design Partial reconfiguration design -

  • Approach

Approach

Reconfigurable Computing

6

  • Partial reconfiguration flow:

Placement constraints must be provided Modules are compiled separately The result is a set of full and partial implementations (EDIF and bitstream). The partial reconfigurable bitstreams are used to move the device from one configuration to another.

Netlist 3 VHDL Basic Constraints (pins, timing, …)

+

Placement Constraints

(block positions and area)

Technology Mapping Place and route Netlist 2 Netlist 1 Netlist 3 Netlist 2 Netlist 1 Full Partial

Full Bitstream 3 Full Bitstream 2 Full Bitstream 1 partial Bitstream 3 partial Bitstream 2 partial Bitstream 1

slide-7
SLIDE 7

Partial reconfiguration design Partial reconfiguration design -

  • Approach

Approach

Reconfigurable Computing

7

  • Delaying placement constraints increases degree of freedom

Basic constraints Basic constraints Basic constraints

VHDL SystemC HandelC Netlist 3 Technology Mapping Place and route Netlist 2 Netlist 1 Netlist 3 Netlist 2 Netlist 1 Full Partial

Full Bitstream 3 Full Bitstream 2 Full Bitstream 1 partial Bitstream 3 partial Bitstream 2 partial Bitstream 1

Placement Constraints

(block positions and area)

+

Placement constraints

Area constraints provided after a first evaluation Run-time Relocation

slide-8
SLIDE 8

Partial reconfiguration on Xilinx Virtex FPGAs Partial reconfiguration on Xilinx Virtex FPGAs

Reconfigurable Computing

8

Create a bitstream database for full and partial modules to be used at run-time for device reconfiguration

slide-9
SLIDE 9

The partial design flow The partial design flow

Reconfigurable Computing

9

  • Modular implementation of a large project
  • The team manager defines the structure of the
  • verall project (top-level)
  • Each designer or team of designers imple-

ment and test each module separately

  • The implemented modules are inte-

grated in the final design

  • A top-level consists of

A set of independent modules Interfaces between the modules Interfaces with the pins

  • Each module is assigned a given position and

area on the device by means of area constraints

Top1 mod 1 mod 2 mod 3 Interfaces

slide-10
SLIDE 10

The partial design flow The partial design flow

Reconfigurable Computing

10

  • For partial reconfiguration, the goal is to

generate A set of full designs A set of partial designs The partial designs are used to move from one full design to another

  • The input is structured as follows:

Top_level

Module_1 Module_2 … Module_N

  • The input language can be any HDL

Top3 mod 1 mod 2 mod 3 Top2 mod 1 mod 2 mod 3 Top1 mod 1 mod 2 mod 3

slide-11
SLIDE 11

The partial design flow The partial design flow – – Example Example

Reconfigurable Computing

11

  • Modular design

Static module is fixed for all times Only partial module can be reconfigured

  • Insertion of

communication macros at fixed positions

Partial Module Static Module Partial Module Static Module

t1 t2 t3 t4 x y

slide-12
SLIDE 12

The partial design flow The partial design flow – – Example VHDL Example VHDL

Reconfigurable Computing

12

entity TOP is port ( x: in std_logic; y: out std_logic); end TOP; architecture ARCH of Top is component MACRO port ( in1, in2 : in std_logic;

  • ut1, out2 : out_std_logic);

end component; component STATIC_MODULE port ( x_in, d_in: in std_logic; d_out: out std_logic); end component; component PARTIAL_MODULE port ( d_in: in std_logic; y_out, d_out: out std_logic); end component; signal t1, t2, t3, t4 : std_logic;

slide-13
SLIDE 13

The partial design flow The partial design flow – – Example VHDL Example VHDL

Reconfigurable Computing

13

begin static: STATIC_MODULE port map(x_in=>x, d_in=>t4, d_out=>t1); macro_right2left: MACRO port map (in1=>t1, in2=>open, out1=>t2, out=>open); macro_left2right: MACRO port map (in1=>t3, in2=>open, out1=>t4, out=>open); partial: PARTIAL_MODULE port map(y_out=>y, d_in=>t2, d_out=>t3); end ARCH;

  • Macro component ís provided by Xilinx and must be inserted in

the TOP level for the communication with partial modules

  • Static and partial modules do not require any additional

changes

slide-14
SLIDE 14

The partial design flow The partial design flow – – Four Steps Four Steps

Reconfigurable Computing

14

  • 1) Build the top level context

Slice Macros at fixed positions are used to communicate between static design logic and reconfigurable logic. Note that there is NO logic except IOs and clocks in the top level design. All logic is contained in one or more 'modules' (E.G. AREA_GROUP). Required files : top.ngc and top.ucf (for constraints) Output file: top.ngo.

  • 2) Build the static modules

These are the modules (E.G. logic and routing) that will NOT be dynamically reconfigured. Required files : .ngc for each static 'module'. Output files: top_routed.ncd (without reconfigurable logic)

slide-15
SLIDE 15

The partial design flow The partial design flow – – Four Steps Four Steps

Reconfigurable Computing

15

  • 3) Build the dynamic modules

Build each flavor of each dynanically reconfigurable module. Required files : .ngc for each dynamic 'module'. Output files: Routed .ncd file for each flavor of a dynamically reconfigurable module WITHOUT logic and routing from static modules.

slide-16
SLIDE 16

The partial design flow The partial design flow – – Four Steps Four Steps

Reconfigurable Computing

16

  • 4) Assemble full design with each flavor of each

dynamically reconfigurable module. Generate the required bitstreams.

Required files : .ncd for static modules and for each flavor of each reconfigurable Output files:

a bitstream for full design with each flavor of each reconfigurable module a partial bitstream for each flavor of each reconfigurable module report file

slide-17
SLIDE 17

The partial design flow The partial design flow – – directory structure directory structure

Reconfigurable Computing

17

  • Synth Containts the VHDL code and synthesised

Netlists

top static Mod1 … ModN

  • Top/Initial Built top level context
  • Static Built static modules
  • ReconfigModules Built reconfigurable modules
  • Merges Contains assembled bitstreams
slide-18
SLIDE 18

The partial design flow The partial design flow – – Top level context Top level context

Reconfigurable Computing

18

Instantiate static and reconfigurable modules as “black-boxes” Connect the modules at the top-level using slice macros between reconfigurable and fixed modules Estimate a rectangular bounding region for each module and constrain it to this area Constrain top-level I/O ports and slice macros to a fixed locations The following command must be run in each initial directory under the corresponding Top-level

  • cd Top/Initial/
  • ngdbuild -modular initial top.ngc
slide-19
SLIDE 19

The partial design flow The partial design flow – – Static modules Static modules

Reconfigurable Computing

19

Slice macros must be located on boundaries correctly. Par will automatically exclude logic from being placed in reconfiguration areas. Par will exclude all glitchfull logic (rams / shift registers) from being placed in reconfig zones. Par will generate a file called static.used. This is a list

  • f routing resources utilized in the reconfiguration

areas by the static design. Map, place and route the static modules:

  • cd Static
  • ngdbuild -modular initial ../Top/Initial/top.ngo
  • map top.ngd
  • par -w top.ncd top_routed.ncd
slide-20
SLIDE 20

The modular design flow The modular design flow – – Dynamic modules Dynamic modules

Reconfigurable Computing

20

Copy the static.used file from the static area to "arcs.exclude" in the module directory.

  • This will disallow par of the active module design from using

routing utilized by the static design in the reconfig area.

Par generates a file called "dynamic.used" which is a list

  • f routing resources it utilizes in the Reconfig Area.

Map, place and route the dynamic modules:

  • cp Static/static.used ReconfigModules/ModN/arcs.exclude
  • cd ReconfigModules/ModN
  • ngdbuild -modular module -active rmodule ../../Top/Initial/top.ngo
  • map top.ngd
  • par -w top.ncd top_routed.ncd
slide-21
SLIDE 21

The modular design flow The modular design flow – – Create all Create all bitstreams bitstreams

Reconfigurable Computing

21

Create all bitstreams:

  • cp Static/top_routed.ncd Merges/static.ncd
  • cp ReconfigModules/ModN/top_routed.ncd Merges/modN.ncd
  • cp ReconfigModules/ModM/top_routed.ncd Merges/modM.ncd
  • PR_verifydesign.bat Merges/static.ncd Merges/modN.ncd

Merges/modM.ncd

slide-22
SLIDE 22

The partial design flow The partial design flow – – Area constraints Area constraints

Reconfigurable Computing

22

The leftmost boundary of each module and of each Bus Macro must be a multiple of 4

INST "Instance0" AREA_GROUP = "AG_Instance0"; AREA_GROUP "AG_Instance0" RANGE = SLICE_X0Y0:SLICE_X3Y29; AREA_GROUP "AG_Instance0" MODE=RECONFIG;

Instance name of the module in the top-level Bounding Box of the module

  • n the chip

State that the module can be reconfigured

slide-23
SLIDE 23

The partial design flow The partial design flow – – Use of Slice Macros Use of Slice Macros

Reconfigurable Computing

23

The routing of two designs creates unpredictables paths Signals connecting two reconfigurable modules in two different designs can be routed in different ways This can produce malfunction of the design after reconfiguration This can be avoided by providing fixed communication channels (slice macros) among reconfigurable modules Bus macros are tri-state lines running over 4 CLBs in FPGA Must be placed only at the top level!

Bus macros

slide-24
SLIDE 24

The partial design flow The partial design flow – – Bus Macro constraints Bus Macro constraints

Reconfigurable Computing

24

Slice Macros use the LUTs for the connection Each slice macro provides 8 unidirectional signals

It is possible to disable the connection temporarily

INST "macro_1" LOC = "SLICE_X34Y40"; INST "macro_2" LOC = "SLICE_X34Y24"; INST "macro_3" LOC = "SLICE_X34Y8";

slide-25
SLIDE 25

The partial design flow The partial design flow – – Example Example

Reconfigurable Computing

25

Two modules One VGA controller One colour generator The two modules can be partially reconfigured

Overall design VGA controller Color generator

slide-26
SLIDE 26

Small bits manipulation Small bits manipulation – – JBits JBits

Reconfigurable Computing

26

Java API for the Xilinx configuration Bitstream Provides function for an off-line modification of the Xilinx Virtex bitstreams with Modification of CLBs, IOBs, Block RAM or PIP (Programmable interconnect points) Access to LUT, MUXes and Flip Flops within a CLB Run-Time manipulation by readback/modify/writeback possible

slide-27
SLIDE 27

Small bits manipulation Small bits manipulation – – JBits JBits

Reconfigurable Computing

27

JBits flow

Source: Confiuration Bitstream Bitstreammanipulaton using JBITS Generating new (partial) Bitstream

Reading source bitfile:

jbits.read(infileName)

Some modifications:

AND_F[] = Expr.F_LUT("F1 & F2 & F3 & F4") LUTContents = Util.InvertIntArray(AND_F) jbits.setCLBBits( clbRow, clbCol, BX0.BX0, BX0.BY0 )

. . .

Writing the modified (partial) bitstream (only changes will be saved):

jbits.writePartial(outFileName) (jbits.write(outfileName, JBits.FULL) full bitstream will be written)

slide-28
SLIDE 28

Small bits manipulation Small bits manipulation – – JBits JBits

Reconfigurable Computing

28

Accessing the LUTs:

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 LUT Output G1 G2 G3 G4

Reading the LUT contents:

lut = jbits.getCLBBits(row, col, LUT.CONTENTS[0][LUT.G]) (Array of integer values)

Writing the LUT contents:

int [ ] LutArray = new int[ ] { 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0 ,1, 1, 1} jbits.setCLBBits( clbRow, clbCol, LUT.CONTENTS[0][LUT.G], LutArray )

slide-29
SLIDE 29

Small bits manipulation Small bits manipulation – – JBits JBits

Reconfigurable Computing

29

System.out.println("Touching frames ... "); for (int x=startColumn; x<endColumn; x++) { column = x; System.out.println("Column :"+Integer.toString(x)); /* touch the frames inside the column */ for (int y=0; y<22; y++) { minorFrame = y; /* add minorFrame to the tmp resource */ //res[0][0] = 0; res[0][1] = minorFrame; System.out.println("Touching frame :"+Integer.toString(y)); /* record value of top most bit in minorFrame */ value = jbits.getCLBBits(0,column,res); /* write bit back to touch frame */ jbits.setCLBBits(0,column,res,value); } } jbits.enableCrc(true); /* Generate Partial */ System.out.println("Writing partial bitstream to "+outFileName); jbits.writePartial(outFileName);

Column extraction:

slide-30
SLIDE 30

Things to know Things to know

Reconfigurable Computing

30

A frame is the smallest unit of configuration. Spans the height of the FPGA Configuration is glitcheless. If you reconfigure a frame with the same data, no glitches appear. Reconfiguration “reinitializes” SRL16s and LutRAM, not BlockRam

slide-31
SLIDE 31

Case study Case study

Reconfigurable Computing

31

Two Independent Systems on same FPGA : Embedded Linux (PPC+Logic) Audio filter application (Logic) Under OS control able to reconfigure the audio filters Modified partial flow to create the filters

slide-32
SLIDE 32

Reconfigurable DSP Demo Reconfigurable DSP Demo

Reconfigurable Computing

32

AC97 Core & Filter MP3 Player

Linux System XC2VP50

Speaker s

FS on Compact Flash

Telnet Login

Reconfiguration FPGA VFS

slide-33
SLIDE 33

Initial Linux System Floorplan Initial Linux System Floorplan

Reconfigurable Computing

33

slide-34
SLIDE 34

Physical Structure Physical Structure

Reconfigurable Computing

34

DSP-Core: AC97 Controller Audio-Filter Linux System: Ethernet LCD RAM-Controller ICAP SystemACE UART16550 JTAG GPIO PLB2OPB Linux System: Ethernet SystemACE Prohibit Area:

  • nly OPB Routing
slide-35
SLIDE 35

Initial Budgeting Initial Budgeting

Reconfigurable Computing

35

  • Design split into two parts, the Linux

system and the reconfigurable DSP region.

  • The Reconfigurable region is above and

between the two PowerPCs. Does not span whole height of device.

  • Linux system - lower half of FPGA.
  • Linux system partitioned into a left and right half to avoid SRL16s

under reconfigurable region

  • Only OPB routing between left and right half.
  • No bus macros were used.
slide-36
SLIDE 36

Active Module Phase Active Module Phase

Reconfigurable Computing

36

Create static and reconfigurable (partial) modules Each module was created separately Constrained to initial budget constraints

Static Module Lowpass filter Highpass filter

slide-37
SLIDE 37

Final Assembly Phase Final Assembly Phase

Reconfigurable Computing

37

Merge each reconfigurable module with the static system. From each merged design, generate a partial bitstream for the DSP filter. The partial bitstream contains the DSP module plus the part of the static system located in the same columns. As long as the static system is implemented exactly the same way in each partial bitstream, everything is

  • kay.

Use difference based partial flow to generate partial bitstreams

slide-38
SLIDE 38

Merging Designs Merging Designs

Reconfigurable Computing

38

The XDL Tool is used in this modified partial flow xdl -ncd2xdl {static.ncd, lowpass.ncd} -> {static.xdl, lowpass.xdl} cat static.xdl lowpass.xdl > merged_lowpass.xdl Resolve conflicts in merged_lowpass.xdl All nets and instances need to be unique or merged GLOBAL_LOGIC*, PWR_VCC*, PWR_GND* external io, clk net, and other shared nets needs to be merged xdl -xdl2ncd merged_lowpass.xdl -> merged_lowpass.ncd

slide-39
SLIDE 39

The Last Step: Generating Partial Bitstreams The Last Step: Generating Partial Bitstreams

Reconfigurable Computing

39

Difference-Based Partial Flow documented in XAPP 290 Bitgen -r option used to create partial bitstream of the difference between the static design and the merged design. Example: bitgen -g ActiveReconfig:Yes -r static.bit merged_lowpass.ncd lowpass.bit