Design and Architectures for Embedded Systems Maik Scheer Scheer ( - - PowerPoint PPT Presentation

design and architectures for embedded systems
SMART_READER_LITE
LIVE PREVIEW

Design and Architectures for Embedded Systems Maik Scheer Scheer ( - - PowerPoint PPT Presentation

Design and Architectures for Embedded Systems Maik Scheer Scheer ( (Lehrstuhl Lehrstuhl Prof. Dr. J. Prof. Dr. J. Henkel Henkel) ) Maik CES - - Chair for Embedded Systems Chair for Embedded Systems CES University of Karlsruhe, Germany


slide-1
SLIDE 1
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

Design and Architectures for Embedded Systems

Maik Maik Scheer Scheer ( (Lehrstuhl Lehrstuhl Prof. Dr. J.

  • Prof. Dr. J. Henkel

Henkel) ) CES CES -

  • Chair for Embedded Systems

Chair for Embedded Systems University of Karlsruhe, Germany University of Karlsruhe, Germany

Today: Reconfigurable Computing Today: Reconfigurable Computing

slide-2
SLIDE 2
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

Where are we ?

  • Emb. Software

Optimization for:

  • low power
  • Performance
  • Area, …

Embedded Processor Design

  • extens. Instruction
  • Parameterization

Integration Hardware Design

  • synthesis

Middleware, RTOS

  • Scheduling

System specification Design space exploration

  • low power
  • Performance
  • Area

System partitioning

  • models of computation
  • Spec languages

Estimation&Simulation

  • low power
  • performance
  • Area, …

Tape out Prototyping

embedded IP:

  • PEs
  • Memories
  • Communication
  • Peripherals

IC technology

Optimization

  • low power
  • performance
  • Area, …

refine

slide-3
SLIDE 3
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

Outline

  • Computation in Hardware vs. Computation in Software

Computation in Hardware vs. Computation in Software

  • Reconfigurable Computing

Reconfigurable Computing

  • Technologies for (Re)

Technologies for (Re)-

  • Configurable Hardware

Configurable Hardware

  • Granularity

Granularity

  • Fine

Fine-

  • grained (Re)

grained (Re)-

  • Configurable Hardware

Configurable Hardware

  • PLDs

PLDs

  • FPGAs

FPGAs

  • Coarse

Coarse-

  • grained Reconfigurable Hardware

grained Reconfigurable Hardware

  • XPP

XPP

  • Dynamically Reconfigurable

Dynamically Reconfigurable SoC SoC Based on LEON + XPP Based on LEON + XPP

  • Multi

Multi-

  • grained Reconfigurable Hardware

grained Reconfigurable Hardware

  • HoneyComb

HoneyComb

slide-4
SLIDE 4
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

Computation in Software

  • NOR

NOR-

  • function with 8 input signals in software

function with 8 input signals in software

  • 8 inputs at address 0x1000

8 inputs at address 0x1000

  • Output at bit 0 of address 0x2000

Output at bit 0 of address 0x2000

#define INPORT_ADR 0x1000 #define OUTPORT_ADR 0x2000 unsigned char *a = INPORT_ADR; unsigned char *b = OUTPORT_ADR; main() { while(1) { if(*a == 0) *b = 1; else *b = 0; } }

slide-5
SLIDE 5
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

Computation in Software

  • Assembly

Assembly-

  • code of the NOR

code of the NOR-

  • function (RISC)

function (RISC)

  • Minimum 6 cycles for computation (ELSE)

Minimum 6 cycles for computation (ELSE)

  • Maximum 7 cycles for computation (IF)

Maximum 7 cycles for computation (IF)

  • Input changes are provided to the output in minimum 5, maximum 1

Input changes are provided to the output in minimum 5, maximum 12 2 cycles cycles

L0: MOV R1, #a ; Address of a MOV R2, #b ; in R1, of b in R2 L1: LD R3, (R1) ; Inport-Bits in R3 CMP R3, #0 ; IF-condition BNE L3 L2: MOV R4, #1 ; IF-branch JMP L4 L3: MOV R4, #0 ; ELSE-branch L4: ST (R2), R4 JMP L1

slide-6
SLIDE 6
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

Computation in Hardware

  • NOR

NOR-

  • function with 8 input signals in hardware

function with 8 input signals in hardware

  • Computation time depends on propagation delay time

Computation time depends on propagation delay time

  • Computation time is constant and predictable

Computation time is constant and predictable

&

Inport_0 Inport_1 Inport_2 Inport_3 Inport_4 Inport_5 Inport_6 Inport_7 Outport_0

≥1

Inport_0 Inport_1 Inport_2 Inport_3 Inport_4 Inport_5 Inport_6 Inport_7 Outport_0

slide-7
SLIDE 7
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

Computation in Hardware vs. Computation in Software

  • Software

Software

  • Computation time depends on algorithm + available instruction se

Computation time depends on algorithm + available instruction set t

  • Computation time can vary depending on the actual program path

Computation time can vary depending on the actual program path

  • New algorithms can be implemented as software functions

New algorithms can be implemented as software functions

  • Available

Available instruction instruction set set is is fix fix

  • Used silicon area is fix

Used silicon area is fix

  • Hardware

Hardware

  • Each algorithm is implemented in hardware

Each algorithm is implemented in hardware

  • Modifications of algorithms are not possible

Modifications of algorithms are not possible

  • Computation time depends on critical path

Computation time depends on critical path

  • No fixed instruction set

No fixed instruction set

  • Used silicon area depends on the number of implemented

Used silicon area depends on the number of implemented instructions (algorithms) and on the instructions itself instructions (algorithms) and on the instructions itself

slide-8
SLIDE 8
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

Computation in Space vs. Computation in Time

  • Computation

Computation in time in time

  • Dimension of

Dimension of algrithms algrithms is is time time

  • Complex

Complex algorithms algorithms needs needs less less computation computation time time than than simple simple

  • nes
  • nes
  • Computation

Computation in in space space

  • Dimension of

Dimension of algorithms algorithms is is space space ( (silicon silicon area area) )

  • Complex

Complex algoritms algoritms requires requires more more silicon silicon area area than than simple simple

  • nes
  • nes

Reconfigurable computing

Design time (hardware) Computation time (software) Space ASIC Time General purpose processor Algorithm is implemented during: Computation is distributed in:

slide-9
SLIDE 9
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

Reconfigurable Computing

  • Definition

Definition

  • Reconfigurable

Reconfigurable computing computing involves involves chips chips or

  • r systems

systems capable capable of

  • f

modifying modifying themselves themselves on

  • n the

the fly fly, , while while running running, to , to meet meet different different application application needs needs. .

Flexibility Performance

ASIC CPU‘s & DSP‘s Reconfigurable computing

slide-10
SLIDE 10
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

Reconfigurable Computing (cont‘d)

  • Pros

Pros

  • High

High computation computation power power ( (near near ASIC) ASIC)

  • Better

Better ratio ratio between between power power consumption consumption and and computing computing power power compared compared to to general general purpose purpose processors processors

  • Flexible

Flexible like like general general purpose purpose processors processors

  • Suitible

Suitible for for data data-

  • flow

flow oriented

  • riented algorithms

algorithms

  • Cons

Cons

  • Reconfiguration

Reconfiguration overhead

  • verhead
  • Utilization

Utilization of

  • f hardware

hardware may may be be low low, , depending depending on

  • n actual

actual configuration configuration

  • Difficult

Difficult to to map map control control-

  • flow

flow dominant dominant structures structures

slide-11
SLIDE 11
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

Technologies for Configurable Hardware

  • Fuse

Fuse-

  • Technology

Technology

  • Silicon bridges can be destroyed through current

Silicon bridges can be destroyed through current

  • One time programmable

One time programmable

  • Anti

Anti-

  • Fuse

Fuse-

  • Technology

Technology

  • Two metal layers with dielectric in between (capacitor)

Two metal layers with dielectric in between (capacitor)

  • High resistance when not programmed

High resistance when not programmed

  • Dielectric layer can be destroyed through high voltage

Dielectric layer can be destroyed through high voltage -

  • > The metal

> The metal layers are connected layers are connected

  • One time programmable

One time programmable

  • Floating

Floating-

  • Gate

Gate-

  • Technology (EPROM)

Technology (EPROM)

  • CMOS

CMOS-

  • transistor with isolated gate

transistor with isolated gate

  • High voltage tunnels electrons to the gate

High voltage tunnels electrons to the gate

  • Deletion is done through UV

Deletion is done through UV-

  • light

light

  • About 100 times reprogrammable

About 100 times reprogrammable

slide-12
SLIDE 12
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

Technologies for Configurable Hardware

  • EEPROM

EEPROM-

  • Technology / Flash

Technology / Flash-

  • Technology

Technology

  • Same technique as EPROM

Same technique as EPROM

  • Deletion is done electrically

Deletion is done electrically

  • About 10.000 times reprogrammable

About 10.000 times reprogrammable

  • SRAM

SRAM-

  • Technology

Technology

  • Flip

Flip-

  • Flops are used for storage

Flops are used for storage

  • Stored configuration / data is lost when system is powered down

Stored configuration / data is lost when system is powered down

  • > External memory needed to store configuration

> External memory needed to store configuration

  • Unlimited times programmable

Unlimited times programmable

slide-13
SLIDE 13
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

Requirements to Devices for Reconfigurable Computing

  • Reconfigurable computing requires unlimited

Reconfigurable computing requires unlimited reprogrammability reprogrammability

  • Only

Only SRAM SRAM-

  • based

based devices devices are are suitable suitable for for reconfigurable reconfigurable computing computing

  • Configuration must be in system changeable on the fly

Configuration must be in system changeable on the fly

slide-14
SLIDE 14
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

Granularity

  • Definition:

Definition:

  • Granularity

Granularity of

  • f reconfigurable

reconfigurable logic logic is is the the size size of

  • f the

the smallest smallest functional functional unit unit that that is is addressed addressed by by the the mapping mapping tools tools. .

  • Mapping

Mapping tools tools are are used used to to generate generate configurations configurations for for the the reconfigurable reconfigurable device device

  • Often

Often granularity granularity is is defined defined through through the the data data-

  • path

path width width

8-bit adder Fine-grained Coarse-grained

slide-15
SLIDE 15
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

Granularity (cont‘d)

  • Fine

Fine-

  • grained

grained hardware hardware

  • High

High reconfiguration reconfiguration overhead

  • verhead
  • Provides

Provides more more flexibility flexibility in in adapting adapting it it to to the the algorithm algorithm structure structure. .

  • Coarse

Coarse-

  • grained

grained hardware hardware

  • Low

Low reconfiguration reconfiguration overhead

  • verhead
  • Multi

Multi-

  • grained

grained hardware hardware

  • Combination

Combination of

  • f coarse

coarse-

  • and

and fine fine-

  • grained

grained functional functional units units

  • Hybrid

Hybrid hardware hardware

  • Combination

Combination of

  • f microprocessor(s

microprocessor(s) and ) and reconfigurable reconfigurable logic logic

slide-16
SLIDE 16
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

Fine-grained CPLD Altera MAX 7000

  • EEPROM

EEPROM-

  • based

based

  • 600

600 – – 5000 5000 usable usable gates gates

  • Up to 175 MHz

Up to 175 MHz

  • Up to 164 I/

Up to 164 I/O O-

  • pins

pins

  • Up to 100

Up to 100 times times reprogrammable reprogrammable

  • Logic

Logic Array Blocks (LAB) Array Blocks (LAB)

  • Each

Each LAB LAB consists consists of 16

  • f 16

Macrocells Macrocells

  • Programmable

Programmable Interconnect Interconnect Array (PIA) Array (PIA)

  • I/O Blocks

I/O Blocks [MAX03] [MAX03]

slide-17
SLIDE 17
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

Fine-grained CPLD Altera MAX 7000 (cont‘d)

  • Programmable

Programmable AND AND-

  • array

array (5 (5 product product terms terms per per macrocell macrocell) )

  • Fixed

Fixed OR OR-

  • array

array

slide-18
SLIDE 18
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

Fine-grained FPGA Xilinx Spartan

  • SRAM

SRAM-

  • based

based

  • 5000

5000 – – 40.000 40.000 usable usable gates gates

  • Up to 224 I/

Up to 224 I/O O-

  • pins

pins

  • Up to 784

Up to 784 CLBs CLBs

  • Control

Control logic logic blocks blocks (CLB) (CLB)

  • Logic

Logic implemented implemented in in look look-

  • up

up tables tables (LUT) (LUT)

  • Programmable

Programmable switch switch matrix matrix (PSM) (PSM)

  • I/O Blocks

I/O Blocks [Spa02] [Spa02]

slide-19
SLIDE 19
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

  • Six

Six transistors transistors required required per per interconnect interconnect point point

  • Ten

Ten programmable programmable interconnect interconnect points points per per PSM PSM

  • Transistor

Transistor state state is is stored stored in in the the configuration configuration RAM RAM

  • Single

Single lines lines: : routing routing between between adjacent adjacent CLBs CLBs

  • Double

Double lines lines: : routing routing between between next next but but one

  • ne

CLBs CLBs

  • Long

Long lines lines: : routing routing across across the the entire entire length length or

  • r width

width

Fine-grained FPGA Xilinx Spartan (cont‘d)

slide-20
SLIDE 20
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

PACT‘s Coarse-grained Extreme Processing Platform (XPP)

  • XPP is built from a scalable number of standardized

XPP is built from a scalable number of standardized Processing Array Elements ( Processing Array Elements (PAEs PAEs) )

  • I/O

I/O-

  • Element

Element

  • RAM

RAM -

  • PAE

PAE

  • ALU

ALU -

  • PAE

PAE

  • Configuration

Configuration Manager Manager

[XPP] [XPP]

slide-21
SLIDE 21
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

PACT XPP: Processing Array Element (PAE)

  • Number

Number and and width width of

  • f data

data-

  • buses

buses + + number number of

  • f event

event-

  • buses

buses can can be be defined defined at at design design time time

  • Two

Two types types of

  • f PAEs

PAEs: : Here Here ALU ALU-

  • PAE

PAE

  • RAM

RAM-

  • PAE

PAE contains contains RAM RAM-

  • Object

Object instead instead of

  • f ALU

ALU-

  • Object

Object

slide-22
SLIDE 22
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

XPP Architecture Differentiators

  • Multiple parallel processing elements

Multiple parallel processing elements

  • Straightforward communication and data flow between ALL

Straightforward communication and data flow between ALL elements elements

  • IP model very modular and scalable

IP model very modular and scalable

  • Unique automatic data synchronization

Unique automatic data synchronization

  • Configuration flow replaces instruction flow

Configuration flow replaces instruction flow

  • Events allow intermediate results within algorithm to determine

Events allow intermediate results within algorithm to determine subsequent configurations subsequent configurations

slide-23
SLIDE 23
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

XPP Parallel Processor Paradigm

  • Multiple code sections are

Multiple code sections are computed sequentially computed sequentially

Time y x

Operation 2

Section 1 Section 2 Section 3

  • Scaling the array size allows the

Scaling the array size allows the computation of several code computation of several code sections simultaneously. sections simultaneously.

  • Performance increases linearly

Performance increases linearly with array size with array size

Time x

Section 1 Section 2 Section 3

y

slide-24
SLIDE 24
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

  • The code sections are mapped directly onto the processing

The code sections are mapped directly onto the processing array (static mapping) array (static mapping)

  • Code section nodes correspond to XPP

Code section nodes correspond to XPP ALUs ALUs

XPP Direct Programming Methodology

slide-25
SLIDE 25
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

XPP Development Suite (XDS)

  • Native Mapping Language (NML)

Native Mapping Language (NML) compiler compiler

  • Language optimized for parallel

Language optimized for parallel data flow data flow

  • NML uses constructs similar to C

NML uses constructs similar to C

  • XPP Software Simulator

XPP Software Simulator

  • Simulates functionality of XPP

Simulates functionality of XPP device device

  • Visualizer

Visualizer

  • Graphically shows status of

Graphically shows status of data, events, and all array data, events, and all array elements on a clock cycle by elements on a clock cycle by clock cycle basis clock cycle basis

  • NML and XPP tutorials

NML and XPP tutorials

  • Vectorizing

Vectorizing C compiler C compiler

Compiler, Router, Placer Compiler, Router, Placer Algorithm Design Algorithm Design XMAP Result OK? Result OK? Upload to XPP device Upload to XPP device Visualizer / Debugger Visualizer / Debugger Software Simulation Software Simulation XSIM XVIS NML Coding NML Coding

slide-26
SLIDE 26
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

XPP Visualizer / Debugger

  • Displays Routing of Data and

Displays Routing of Data and Events Events

  • Clock Accurate Data Flow

Clock Accurate Data Flow Visualization Visualization

  • Interactively Step through

Interactively Step through Simulated Simulated DataFlow DataFlow

slide-27
SLIDE 27
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

New Approach Dynamically Reconfigurable SoC

ASIC Program ROM

LEON µC RAM

Global SoC-RAM XPP-Array PACT Local XPP-RAM Amba-Bus

FIFO-Bridge

XPP processing array Configurable SoC LEON SPARC-Core

  • General

General purpose purpose processor processor (LEON) (LEON)

  • Management and

Management and upload upload of XPP

  • f XPP configurations

configurations

  • Computation

Computation of

  • f control

control-

  • flow

flow dominant dominant algorithm algorithm parts parts

  • XPP

XPP for for data data-

  • flow

flow dominant dominant algorithms algorithms

  • AMBA AHB

AMBA AHB on

  • n-
  • chip

chip communication communication

slide-28
SLIDE 28
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

New Approach Dynamically Reconfigurable SoC (cont‘d)

  • XPP

XPP is is integrated integrated as additional as additional computing computing unit unit

  • LEON

LEON is is not not halted halted during during calculation calculation of XPP (XPP

  • f XPP (XPP works

works decoupled decoupled from from LEON) LEON)

  • Decoupling

Decoupling through through dual dual-

  • clocked

clocked FIFOs FIFOs

  • Different

Different clock clock frequences frequences possible possible

  • XPP

XPP clock clock can can be be programmed programmed by by LEON LEON dynamically dynamically

  • System

System power power consumption consumption can can be be reduced reduced

  • XPP

XPP inputs inputs and and outputs

  • utputs are

are represented represented as as new new LEON LEON registers registers

  • Number

Number and and width width of

  • f inputs

inputs and and outputs

  • utputs are

are scalable scalable before before synthesis synthesis [BTVB03]; [BTS03]; [BTS04] [BTVB03]; [BTS03]; [BTS04]

slide-29
SLIDE 29
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

LEON Datapath Extension

alu/shift

mul/div

y

regfile

D-cache

address/dataout datain

32 32

imm, tbr, wim, psr

Decode Execute Memory Write

rs2 rs1 rd

tbr, wim, psr

30

jmpl address

32

ex pc

30 +1

jmpa

Add

call/branch address tbr '0'

Fetch

I-cache

address data

d_inst result e_inst m_inst w_inst d_pc e_pc m_pc w_pc wres Y rs1 f_pc rs2 ytmp

XPP config XPP-CLK, -IRQ,

  • Hold, -Trap, -Status

XPP-CLK, -IRQ,

  • Hold, -Trap, -Status

XPP data XPP event XPP data XPP event

XPP

4x4 Array

slide-30
SLIDE 30
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

LEON Datapath Extension (cont‘d)

  • LEON

LEON pipeline pipeline is is not not influenced influenced by by the the extension extension

  • New LEON

New LEON data data exchange exchange and and control control registers registers added added

  • Extension

Extension is is represented represented by by sub sub-

  • instruction

instruction set set

  • Additional

Additional load load/ /store store and and read read/ /write write instructions instructions are are added added to to the the pipeline pipeline

  • Instructions

Instructions are are implemented implemented in in assembler assembler and and can can be be used used via via C C-

  • macros

macros

  • Configuration

Configuration manager manager can can be be omitted

  • mitted since

since configurations configurations are are managed managed by by LEON LEON

  • Data

Data streams streams are are managed managed by by LEON LEON

slide-31
SLIDE 31
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

Application Development for LEON/XPP SoC

C-code Host System Host System XPP XPP

RAM, I/O Host Interface RAM, I/O Host Interface

Standard C- Compiler + XPP Interface Calls

XPP Vectorizing C-Compiler

Irregular Code Sections Streaming Code Sections

Selection by annotation Sequence of Configurations

  • Code

Code selection selection is is done done manually manually

slide-32
SLIDE 32
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

HoneyComb

  • New

New approach approach for for dynamically dynamically reconfigurable reconfigurable systems systems

  • Provides

Provides multi multi-

  • grained

grained datapaths datapaths

  • Support for adaptive runtime routing

Support for adaptive runtime routing

  • HoneyComb

HoneyComb is a project of the “ is a project of the “Institut für Technik der Institut für Technik der Informationsverarbeitung Informationsverarbeitung” (ITIV) of the ” (ITIV) of the Universität Universität Karlsruhe Karlsruhe

  • Project is currently ongoing

Project is currently ongoing

AMURHA

[TZB04]; [TOB04]; [TBE04] [TZB04]; [TOB04]; [TBE04]

slide-33
SLIDE 33
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

HoneyComb (cont‘d)

  • Hexagonal array

Hexagonal array-

  • based reconfigurable architecture

based reconfigurable architecture

  • Increasing of routable directions (vertical, 2x diagonal)

Increasing of routable directions (vertical, 2x diagonal)

  • Bandwidth increasing and

Bandwidth increasing and reachability reachability gain within the array gain within the array

  • Three types of multi

Three types of multi-

  • grained functional units:

grained functional units:

  • DPHC:

DPHC: D Data atap path ath H Honeycomb

  • neycomb c

cell ell

  • MEMHC:

MEMHC: Mem Memory

  • ry H

Honeycomb

  • neycomb c

cell ell

  • IOHC:

IOHC: I Input/ nput/O Output utput H Honeycomb

  • neycomb c

cell ell

MEMHC DPHC IOHC AMURHA

0,1 1,1 2,1 0,2 1,2 2,2 3,2 1,3 2,3 3,3 1,4 2,4 3,4 4,4 2,5 3,5 4,5 0,0 1,0 3,6 4,6

Logical View

0,1 1,1 2,1 1,2 2,2 0,2 3,2 0,0 1,0 1,3 2,3 3,3 2,4 3,4 1,4 4,4 2,5 3,5 4,5 3,6 4,6

Technological View

slide-34
SLIDE 34
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

HoneyComb (cont‘d)

  • Mesh of unique routing units / one per Honeycomb cell

Mesh of unique routing units / one per Honeycomb cell

  • Multiple point

Multiple point-

  • to

to-

  • point links between all neighbors

point links between all neighbors

  • Dedicated coarse

Dedicated coarse-

  • grained links

grained links

  • Extended multi

Extended multi-

  • grained links

grained links

  • Hand

Hand-

  • Shake

Shake-

  • Protocol for data consistency and

Protocol for data consistency and synchronous communication synchronous communication

Routing-Unit

slide-35
SLIDE 35
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

HoneyComb: Routing Unit

  • Modular cell structure

Modular cell structure

  • Cell type definition through

Cell type definition through functional module functional module

  • Routing task: connecting

Routing task: connecting selected output ports and the selected output ports and the input ports of the functional input ports of the functional modules modules

Routing Unit

… … … …

Functional Module

… … … …

Honeycomb cell structure

R

  • uting

Unit Input links Output links

… M U X … … M UX … M UX M UX …

Datapath- / Memory output

RRR R R R R R R R R R R R R R

Datapath / Memory input

Intermediate Register Control-FSM + Registers

Input ports Output ports

slide-36
SLIDE 36
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

HoneyComb: Adaptive Runtime Routing

  • Gathering of physical sub

Gathering of physical sub-

  • channels to logical groups is possible (1 up to 32

channels to logical groups is possible (1 up to 32 bits) bits)

  • Remaining sub

Remaining sub-

  • channels still usable

channels still usable

  • Partial usability of the same link (different sub channels) by d

Partial usability of the same link (different sub channels) by different ifferent configurations configurations

  • Supported by adaptive runtime routing

Supported by adaptive runtime routing

Coarse-grained link Fine-grained link

Used output strips Input strips

slide-37
SLIDE 37
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

HoneyComb: Adaptive Runtime Routing

  • Routing

Routing-

  • calculation based on array

calculation based on array-

  • coordinates

coordinates

  • The routing

The routing-

  • problem is similar to Internet

problem is similar to Internet-

  • routing

routing

  • Simplified problem because if the static array

Simplified problem because if the static array-

  • configuration

configuration

  • Intelligent

Intelligent FSMs FSMs are able to solve this problem are able to solve this problem

Source Sink

slide-38
SLIDE 38
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

Adabtability and Fault Tolerance

  • Adaptability and fault tolerance

Adaptability and fault tolerance

  • Dynamic routing:

Dynamic routing: Bypass of used/defective Bypass of used/defective HCs HCs

  • Advantages:

Advantages: No static configurations (at compile time!!!) No static configurations (at compile time!!!) Chip yield increasing Chip yield increasing

Faulty Cell Used Cell

Route without defect Route with defect Error-free functional unit Fault functional unit

Used functional unit

A B C D A B C D

Free functional unit New configuration

C D A B

slide-39
SLIDE 39
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

Summary

  • Algorithms

Algorithms can can be be implemented implemented in in hardware hardware or

  • r software

software

  • Reconfigurable

Reconfigurable hardware hardware combines combines the the flexibility flexibility of a

  • f a

general general purpose purpose processor processor with with the the parallelism parallelism of ASIC

  • f ASIC
  • Reconfigurable

Reconfigurable hardware hardware closes closes the the gap gap between between computation computation in time and in time and computation computation in in space space

  • Reconfigurable

Reconfigurable hardware hardware is is available available in different in different granularities granularities

slide-40
SLIDE 40
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

References and Sources

  • [MAX03] Altera; MAX 7000 Programmable Logic Device Family Data Sheet; Version 6.6;

http://www.altera.com/literature/ds/m7000.pdf

  • [Spa02] Xilinx; Spartan-XL (3.3V) and Spartan (5V) FPGAs Complete Data Sheet; Version 1.7;

http://direct.xilinx.com/bvdocs/publications/ds060.pdf

  • [XPP] PACT XPP Technologies AG; D - 80939 Munich; www.pactcorp.com
  • [BTVB03] Becker, J.; Thomas, A.; Vorbach, M.; Baumgarte, V.; An industrial/academic configurable

system-on-chip project (CSoC): coarse-grain XPP-/Leon-based architecture integration; Design, Automation and Test in Europe Conference and Exhibition; 2003

  • [BTS03] Becker, J.; Thomas, A.; Scheer, M.; Datapath and Compiler Integration of Coarse-grain

Reconfigurable XPP-Arrays into Pipelined RISC Processors. VLSI-SOC 2003

  • [BTS04] Becker, J.; Thomas, A.; Scheer, M.; Efficient processor instruction set extension by

asynchronous reconfigurable datapath integration; Integrated Circuits and Systems Design (SBCCI); 2003

  • [TZB04] Thomas A.; Zander T.; Becker J.; Adaptive DMA-based I/O Interfaces for Data Stream

Handling in Multi-grained Reconfigurable Hardware Architectures; Symposium on Integrated Circuits and Systems Design, 2004

slide-41
SLIDE 41
  • M. Scheer, Univ. of Karlsruhe, WS04/05, 2005

http://ces.univ-karlsruhe.de

References and Sources

  • [TOB04] Thomas A.; Becker J.; Dynamic Adaptive Routing Techniques In Multigrain Dynamic

Reconfigurable Hardware Architectures; Field-programmable Logic and its applications (FPL); 2004

  • [TBE04] Thomas A.; Becker J.: Aufbau- und Strukturkonzepte einer adaptiven multigranularen

rekonfigurierbaren Hardwarearchitektur; 17th International Conference on Architecture of Computing Systems (ARCS); 2004