Distributed Operation Layer: Efficient and Predictable KPN-Based - - PowerPoint PPT Presentation

distributed operation layer efficient and predictable kpn
SMART_READER_LITE
LIVE PREVIEW

Distributed Operation Layer: Efficient and Predictable KPN-Based - - PowerPoint PPT Presentation

Distributed Operation Layer: Efficient and Predictable KPN-Based Design Flow Iuliana Bacivarov, Wolfgang Haid, Kai Huang, and Lothar Thiele ETH Zrich, Switzerland Efficiency vs. Predictability? Efficiency is Predictability is


slide-1
SLIDE 1

Distributed Operation Layer: Efficient and Predictable KPN-Based Design Flow

Iuliana Bacivarov, Wolfgang Haid, Kai Huang, and Lothar Thiele ETH Zürich, Switzerland

slide-2
SLIDE 2

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov

Efficiency vs. Predictability?

Efficiency is…

  • … speed-up
  • … scalability
  • … small memory
  • … portability
  • … small effort

2

Distributed Operation Layer (DOL): efficient and predictable system-level MPSoC design flow Predictability is…

  • … analyzability
  • … guarantees
  • … fast estimates
  • … good estimates
  • … early in design
slide-3
SLIDE 3

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 3

Distributed Operation Layer

Reduce “accidental complexity” in design by raising the level of abstraction and automation

slide-4
SLIDE 4

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 4

Distributed Operation Layer

  • System specification

abstract MoC (KPN) vs. BSP

  • Performance

analysis

system-level (formal) analysis

  • vs. complete system

simulation

  • Design space

exploration

automated system-level exploration vs. trial-and-error

  • (Software) synthesis

automated synthesis on various MPSoCs (possible due to formal MoC)

Reduce “accidental complexity” in design by raising the level of abstraction and automation

slide-5
SLIDE 5

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 5

Outline

  • Introduction
  • Distributed operation layer design flow
  • Specification
  • Synthesis
  • Design space exploration
  • Performance analysis
  • Some experimental results
  • Conclusions
slide-6
SLIDE 6

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 6

DOL Software System-Level Design Flow

Goals

  • Efficiency
  • Predictability

Challenges

  • Scalable specification
  • Automated synthesis
  • System-level design

space exploration

  • Analytic performance

evaluation

Strengths

  • Abstraction
  • Automation

mapping specification (XML) application specification (XML & C) functional simulation generation simulation on workstation system synthesis (HdS generation) simulation on virtual platform evaluation on workstation architecture specification (XML) analysis model generation calibration data back-annotation performance data test & debug design space exploration

slide-7
SLIDE 7

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 7

System Specification

  • Roles
  • Express data and functional

parallelism in application

  • Specify mapping of application
  • n target architecture
  • Challenges
  • Scalability
  • Platform-independence

formal MoC – basis for efficient and predictable design

mapping specification (XML) application specification (XML & C) functional simulation generation simulation on workstation system synthesis (HdS generation) simulation on virtual platform evaluation on workstation architecture specification (XML) analysis model generation calibration data back-annotation performance data test & debug design space exploration

slide-8
SLIDE 8

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 8

Programming Model

  • Model of computation: Kahn process network
  • Coordination: XML with performance annotations
  • Functionality: C/C++ with specific programming DOL API
slide-9
SLIDE 9

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 9

Programming Model – Scalability

  • Scalability: “iterators” for large, multi-tile descriptions

01: <process name="src"> 02: <port type="output" name="out"/> 03: <source type="c" location="src.c"/> 04: </process> 01: <iterator variable="i" range="N"> 02: <process name="src"> 03: <append function="i"/> 04: <port type="output" name="out"/> 05: <source type="c" location="src.c"/> 06: </process> 07: </iterator>

slide-10
SLIDE 10

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 10

Abstract Platform Modeling

  • Elements
  • Structure: processors, peripherals, memories, buses, etc.
  • Interconnect: explicit read and write communication paths
  • Performance data: e.g. latency and bandwidth of HW communication
slide-11
SLIDE 11

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 11

Abstract Platform – Scalability

  • Specification: XML, including “iterators” capability
slide-12
SLIDE 12

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 12

Mapping Specification

  • Scheduling
  • Constraints

Mapping

  • Binding
  • Processes to processors
  • SW channels to HW paths
slide-13
SLIDE 13

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 13

System Synthesis

  • Role
  • Close the gap between

system-level specification and implementation

  • Challenges
  • Achieve desired performance
  • Handle deadlocks,

starvation, and data races

  • Preserve KPN semantics

automatic software synthesis – essential for efficient design

mapping specification (XML) application specification (XML & C) functional simulation generation simulation on workstation system synthesis (HdS generation) simulation on virtual platform evaluation on workstation architecture specification (XML) analysis model generation calibration data back-annotation performance data test & debug design space exploration

slide-14
SLIDE 14

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov

DOL Synthesis

  • Synthesis
  • Functional synthesis

SystemC untimed, native execution model generation

  • Software synthesis

HdS generation for MPARM, Atmel DIOPSIS, CELL

  • Strategy
  • Source-to-source code generators from DOL KPN to

implementation

  • Automatic generation of “glue code”: processes and

channels implementation, bootstrapping, and scheduling

14

mapping specification (XML) application specification (XML & C) functional simulation generation simulation on workstation system synthesis (HdS generation) simulation on virtual platform evaluation on workstation architecture specification (XML) analysis model generation calibration data back-annotation performance data test & debug design space exploration

slide-15
SLIDE 15

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 15

Functional Synthesis

  • Synthesis
  • DOL processes and FIFOs: SystemC threads and channels
  • SystemC main file: bootstrapping and scheduling
  • Features
  • Execution: native, un-timed
  • Debugging: standard tools, i.e., gdb
  • Performance data extraction: monitor READ/WRITE/FIRE

Automatic synthesis of DOL KPN in functional SystemC

sc thread sc channel sc channel

sc port sc port P2.fire()

sc thread

sc port P1.fire()

sc thread

sc port P3.fire()

scheduler

write() read()

slide-16
SLIDE 16

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 16

DOL Software Synthesis

  • MPARM: multi-ARM tiles connected

by NoC

  • Atmel Diopsis 940: tile:ARM9+DSP

connected by an AMBA bus; several tiles connected via NoC

  • Cell BE: PowerPC and 8 SPEs

connected via ring bus

Memory PPE MIC Main storage L2 Cache PPU L1 Cache SPU LS MFC SPU LS MFC SPU LS MFC SPU LS MFC SPU LS MFC SPU LS MFC SPU LS MFC SPU LS MFC SPE Element interconnect bus (EIB)

Legend: LS: Local Store MFC: Memory Flow Controller MIC: Memory Interface Controller PPE: Power Processor Element PPU: Power Processor Unit SPE: Synergistic Processor Elements SPU: Synergistic Processor Unit

tile tile

ARM core SP x-bar DRAM ctrl NI

switch switch switch

tile NoC

16 CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov

slide-17
SLIDE 17

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 17

Design Space Exploration

  • Role
  • Find Pareto-optimal mappings
  • f an application on target

architecture

  • Challenges
  • Multiple contradictory
  • bjectives
  • Exhaustive search not feasible
  • Instruction-accurate simulation

too slow for design space exploration

system-level automated design space exploration – the key element of an efficient design

mapping specification (XML) application specification (XML & C) functional simulation generation simulation on workstation system synthesis (HdS generation) simulation on virtual platform evaluation on workstation architecture specification (XML) analysis model generation calibration data back-annotation performance data test & debug design space exploration

slide-18
SLIDE 18

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 18

Mapping Optimization Framework

  • Control & GUI: EXPO - https://www.tik.ee.ethz.ch/expo
  • tool to explore the design space for network processor architectures
  • Interface: PISA - https://www.tik.ee.ethz.ch/pisa
  • Platform and language independent Interface for Search Algorithms

SPEA2 (Strength Pareto Evolutionary Algorithm) MPA (Modular Performance Analysis) http://www.mpa.ethz.ch

slide-19
SLIDE 19

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 19

EXPO-PISA Illustration

2 4 6 8 1 0 1 2 1 4 1 6 1 8 2 4 6 8 1 1 2 1 4 1 6 1 8 20

  • max. processor load
  • max. bus load
slide-20
SLIDE 20

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 20

Performance Analysis

  • Roles
  • Feedback for developer
  • Verification of single

designs

  • Decision basis for design

space exploration

  • Challenges
  • Accuracy
  • Speed

formal performance analysis – the key element of a predictable design

mapping specification (XML) application specification (XML & C) functional simulation generation simulation on workstation system synthesis (HdS generation) simulation on virtual platform evaluation on workstation architecture specification (XML) analysis model generation calibration data back-annotation performance data test & debug design space exploration

slide-21
SLIDE 21

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 21

DOL Performance Analysis

  • Goal: design real-time

systems (multi-media, signal processing)

  • Method:

Modular Performance Analysis (MPA)

http://www.mpa.ethz.ch

  • Challenge: integrate

MPA in DOL

  • Generate MPA model

from high-level spec

  • Calibrate MPA model

mapping specification (XML) application specification (XML & C) functional simulation generation simulation on workstation system synthesis (HdS generation) simulation on virtual platform evaluation on workstation architecture specification (XML) analysis model generation calibration data back-annotation performance data test & debug design space exploration mapping specification (XML) application specification (XML & C) functional simulation generation simulation on workstation system synthesis (HdS generation) simulation on virtual platform evaluation on workstation architecture specification (XML) MPA analysis model generation calibration data back-annotation performance data test & debug

#(events) Δ

design space exploration

#events

slide-22
SLIDE 22

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov

Modular Performance Analysis (MPA)*

  • Model
  • based on Network Calculus
  • modeling streams and

resources based on arrival and service curves

  • Output
  • worst-case bounds on

system properties

  • (Large) MPSoC

extensions

  • complex activation schemes,

timing correlations, blocking semantics, cyclic dependencies

22

Resources Streams

bRISC bBUS bDSP

P1 FIFO1 P2

b’RISC b’DSP

FIFO2

b’BUS

P3

a’ a

*http://www.mpa.ethz.ch

slide-23
SLIDE 23

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov

Modeling in MPA

23

intra-processor communication inter-processor communication

process

complex computation modeling

slide-24
SLIDE 24

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 24

MPA Model Generation

  • Automatic MPA model

generation in 2 steps

  • Framework-

independent model (XML format)

  • Framework-specific

model (Matlab script)

  • Challenges
  • Relation betw. DOL

spec and MPA model

  • Sequential evaluation
  • f parallel MPA model
  • Accurate parameters

24 CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov

slide-25
SLIDE 25

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 25

MPA Model Calibration

  • Goal: collect accurate performance data from simulation
  • Problem: too slow during design space exploration
  • Strategy: collect parameters beforehand, with “calibration

mappings”

25 CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov

slide-26
SLIDE 26

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 26

… A Few Results

bus ARM tile N ARM tile 1 ARM core scratchpad memory DMA controller M M S ARM core scratchpad memory M M S DMA controller instruction and data memory instruction and data memory

executing MJPEG decoder on MPARM*

*MPARM - virtual simulation platform of U. Bologna

(optimal) mapping

slide-27
SLIDE 27

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 27

Design Space Exploration

  • Set-up

PISA* and EXPO* (SPEA2)

  • Objectives
  • 1. end-to-end delay

(upper bound in MPA)

  • 2. cost (additive model)
  • Population

60 individuals x 50 generations

  • Pareto front

6 solutions

  • Search time

~2 hours

1 proc. 3 procs. 4 procs. end-to-end delay cost 2 procs. current population

*EXPO - https://www.tik.ee.ethz.ch/expo *PISA - https://www.tik.ee.ethz.ch/pisa

slide-28
SLIDE 28

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 28

Performance Analysis

mapping MJPEG decoder on 3-tile MPARM

slide-29
SLIDE 29

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 29

… Some Performance Figures: Speed

  • Model calibration: time-expensive (usual for all flows)

 cannot be included in the design space exploration loop

  • Model generation and performance analysis in MPA: sec.

 reasonable for design space exploration

slide-30
SLIDE 30

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 30

… Some Performance Figures: Accuracy

  • Differences: ~ 20%
  • some MPA operators do not produce tight bounds
  • simulation cannot provide actual worst/best-case behavior
  • …but system model and underlying architecture are well

suited for analyzing this application!

Observed (simulation) Estimated bounds (MPA)

slide-31
SLIDE 31

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 31

… Some More Performance Figures

  • The DOL framework is mainly implemented in Java

(available at http://www.tik.ee.ethz.ch/~shapes)

  • Code size of different parts of the design flow:
slide-32
SLIDE 32

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 32

Conclusions

  • “Accidental complexity” can be considerably reduced,

resulting in a both efficient and predictable design flow by

  • …using a fixed MoC (KPN) (vs. BSP approaches)
  • …formal performance analysis (vs. simulation)
  • …automated, system-level design space exploration (vs.

ad-hoc, manual techniques that include synthesis)

  • Complete SW design flow (specification, synthesis,

design space exploration, performance analysis) available: http://www.tik.ee.ethz.ch/~shapes

slide-33
SLIDE 33

CASA, ESWEEK – DOL: Efficient and Predictable Design Flow Iuliana Bacivarov 33

http://www.tik.ee.ethz.ch/~shapes

slide-34
SLIDE 34

iuliana.bacivarov@tik.ee.ethz.ch http://www.tik.ee.ethz.ch/~shapes

Thank You! Questions?