Hardware-Software Codesign 7. Design Space Exploration Lothar - - PowerPoint PPT Presentation

hardware software codesign
SMART_READER_LITE
LIVE PREVIEW

Hardware-Software Codesign 7. Design Space Exploration Lothar - - PowerPoint PPT Presentation

Hardware-Software Codesign 7. Design Space Exploration Lothar Thiele Computer Engineering Swiss Federal 7 - 1 Institute of Technology and Networks Laboratory System Design specification system synthesis estimation SW-compilation


slide-1
SLIDE 1

7 - 1 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

Hardware-Software Codesign

  • 7. Design Space Exploration

Lothar Thiele

slide-2
SLIDE 2

7 - 2 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

SW-compilation HW-synthesis

System Design

specification system synthesis machine code net lists estimation instruction set intellectual

  • prop. block

intellectual

  • prop. code
slide-3
SLIDE 3

7 - 3 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

Optimization-Analysis Cycle

decision vector X

  • bjective

vector f(X)

  • ptimization algorithm

make decisions

  • nly by knowing (and comparing) f

CPU0 CPU1 CPU2 CPU3 bus p0 p1 p2 p3 CPU0 CPU1 CPU2 CPU3 bus p0 p1 p2 p3 CPU0 CPU1 CPU2 CPU3 bus p0 p1 p2 p3

evaluation model (e.g., simulation, analytic)

cost throughput delay allocation binding schedule memory

slide-4
SLIDE 4

7 - 4 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

Three Examples

slide-5
SLIDE 5

7 - 5 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

Example 1: Remember …

data flow

application data flow graph GP( , ,EP) architecture graph GA( , ,EA) (all possible) mapping relations EM

slide-6
SLIDE 6

7 - 6 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

Example 1: Remember …

slide-7
SLIDE 7

7 - 7 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

Example 1: Simple Mapping Model

EA

“chromosome” = encoded allocation + binding design point (implementation)

allocation binding solutions decode allocation decode binding scheduling  selection  recombination  mutation fitness evaluation

fitness

search algorithm analysis of individual solutions allocation α scheduling τ binding β user constraints

slide-8
SLIDE 8

7 - 8 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

Challenges of EAs in DSE

encoding allocation+binding

  • simple encoding

e.g., one bit per resource, one variable per binding

  • easy to implement
  • … however, it may lead to (many) infeasible partitioning solutions
  • encoding + repair

e.g. simple encoding and repair for allocation s.t. for each vp∈VP there exists at least one va∈α with (vp,va)∈Em

  • reduces number of infeasible partitioning solutions

(“smart”) generation of initial population (“smart”) neighborhood operations, e.g., mutation, crossover

slide-9
SLIDE 9

7 - 9 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

Example 2: Network Processors - Definition

implementation: high- performance, programmable devices optimized for (real- time) network packet processing features: complex packet processing capabilities at high line speeds (routing; forwarding; de-/encryption; de- /compression; …) and means to guarantee quality-of-service Typically, network processors serve as bridge between the network and the source/sink audio/video device (or set of devices)

core1 core2 internal shared bus I/Os Network Proc. mem1 mem2

Tile 0 Tile 6 Tile 7 Tile 5 Tile 1 Tile 2 Tile 3 Tile 4

  • n-/off-chip links
slide-10
SLIDE 10

7 - 10 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

Network Processor Architecture (*)

Network processor heterogeneous hardware/software architecture: available processing units

  • … are described in resource set R = {ARM9, PowerPC, DSP, MEngine,

Classifier, Cipher, LookUp, CheckSum}

  • … have a relative implementation cost cost(r)≥0, r∈R
  • ... and are selected for a specific architecture during the allocation step
  • with alloc(r)=1 if a resource is selected and 0 otherwise

(*) Note: example from Simon Künzli: Efficient Design Space Exploration for Embedded Systems, Shaker Verlag, ISBN 3-8322-5246-0, 2006.

slide-11
SLIDE 11

7 - 11 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

Network Processor Task Model

application structure: set of streams s∈S and set of tasks t∈T

  • each stream includes an ordered sequence of tasks V(s)=[t0,...,tn]

example: S={RTSend,NRTDecrypt,NRTEncrypt,RTRecv,NRTForward}

slide-12
SLIDE 12

7 - 12 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

Problem: Optimal Design of Network Processor

mappings M ⊆ T × R: all possible bindings of tasks

  • i.e., if (t,r)∈M, then task t

could be executed on resource r

request w(r,t) ≥ 0

  • i.e., execution of one packet in

t would use w computing units

  • f r

resource allocation cost c(r) ≥ 0

 binding Z of tasks to resources Z ⊆ M (leading to actual

implementation)

  • subset of mappings M s.t. every task t ∈ T is bound to exactly one

allocated resource r ∈ R and alloc(r) = 1 and r = bind(t)

c

slide-13
SLIDE 13

7 - 13 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

NP Design Constraints

the design of network processors typically faces conflicting goals:

delay constraints

  • e.g., maximal time a packet is processed within NP

throughput maximization

  • e.g., maximum throughput of NP (packets per second)

cost minimization

  • implementation with small amount of resources (e.g., processing units,

memory, and communication networks)

… and conflicting usage scenarios

  • usually, a packet processor is used in several different systems (e.g.,

router or consumer multimedia processing device) and might have different implementations with different throughput/delay requirements

slide-14
SLIDE 14

7 - 14 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

NP Design Space Exploration

issues to be considered during system-level design (and synthesis): allocation

  • determine hardware components of the network processor

binding

  • for each process of the software application choose an allocated

hardware unit which executes it

scheduling

  • for the set of tasks mapped onto a specific resource choose

scheduling policy/parameters – from available run-time environment, e.g., a fixed priority for each stream s: prio(s)>0

slide-15
SLIDE 15

7 - 15 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

Design Space Exploration Flow

alloc(r) = 0/1 r = bind(t) prio(s)>0

slide-16
SLIDE 16

7 - 16 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

Tools and a Small Demo

slide-17
SLIDE 17

7 - 17 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

performance of encryption/decryption performance of RT voice processing cost

… Some Results

slide-18
SLIDE 18

7 - 18 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

Example 3: Wave Field Synthesis

What is wave field synthesis (WFS)? high quality spatial sound reproduction system for huge listening areas 32 sound sources and 300 loudspeakers for medium sized reproduction rooms

WFS processor Mixing console Microphones Recording room

g

Reproduction room

slide-19
SLIDE 19

7 - 19 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

System Specification: WFS Application

Parallel application modeled as Kahn Process Network

source control convolution loudspeaker

structure: XML functionality: ANSI C & DOL(*) API

(*) DOL – distributed operation layer: http://www.tik.ee.ethz.ch/~shapes/dol.html

slide-20
SLIDE 20

7 - 20 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

System Specification: Architecture

Architecture is modeled at abstract level in XML format Modeled elements:

  • processors, buses, memories
  • communication paths between these elements
  • … parameters are included in the model

R IS C DS P R DM DDM R IS C BUS DS P BUS DMA DXM S R E G AHB 1 AHB AHB 2 S S C

slide-21
SLIDE 21

7 - 21 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

Application-to-Architecture Mapping

parallel application

microphones loudspeakers sum convolution

heterogeneous architecture

R IS C DS P R DM DDM R IS C BUS DS P BUS DMA DXM S R E G AHB 1 AHB AHB 2 S S C

design space exploration (performance analysis & mapping optimization) software synthesis

slide-22
SLIDE 22

7 - 22 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

number of activations

  • f process p

runtime of process p

  • n processor c

processor c with worst total runtime communication request from channel s bandwidth of communication link g communication link with worst load

Simple Analysis Model

max processor load max bus load

slide-23
SLIDE 23

7 - 23 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

slide-24
SLIDE 24

7 - 24 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

slide-25
SLIDE 25

7 - 25 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

Where Are Data Obtained From?

Static parameters: bandwidth of buses t(g) Functional simulation: number of activations for each process n(p), amount of data for each channel b(s) Instruction-set simulation: runtime of each process on different processors r(p,c) by using benchmark mappings

AR M mAgic AHB AR M mAgic AHB

slide-26
SLIDE 26

7 - 26 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

Design Space Exploration Cycle – An Example

functional simulation analysis model instruction-level simulation annotated application XML annotated architecture XML mapping generation & variation (mutation/crossover) EXPO mapping mapping XML performance numbers system description PISA interface evolutionary algorithm EXPO EXPO application EXPO architecture multi-objective

  • ptimizaton

evaluation designer’s data sheet

slide-27
SLIDE 27

7 - 27 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory

Example 3: Exploration

single processor mapping

  • max. processor load
  • max. bus load

search direction

microphones loudspeakers sum convolution

AR M mAgic AHB