Model-based engineering of high-performance embedded applications on - - PowerPoint PPT Presentation

model based engineering of high performance embedded
SMART_READER_LITE
LIVE PREVIEW

Model-based engineering of high-performance embedded applications on - - PowerPoint PPT Presentation

15th Workshop on Virtualization in 15th Workshop on Virtualization in High-Performance Cloud Computing High-Performance Cloud Computing (VHPC'20, part of ISC 2020) (VHPC'20, part of ISC 2020) Model-based engineering of high-performance


slide-1
SLIDE 1

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871669

15th Workshop on Virtualization in High-Performance Cloud Computing (VHPC'20, part of ISC 2020) 15th Workshop on Virtualization in High-Performance Cloud Computing (VHPC'20, part of ISC 2020)

Model-based engineering of high-performance embedded applications on heterogeneous hardware with real-time constraints and energy efficiency

Tommaso Cucinotta – Scuola Superiore Sant’Anna, Pisa (Italy)

slide-2
SLIDE 2

2

Tommaso Cucinotta – Real-Time Systems Laboratory - Scuola Superiore Sant’Anna - VHPC 2020

Introduction & Motivations

  • CPSs have high

gher and high gher comput utatjo tjonal performance & reli liability requi uirements

  • Use of incr

creasingl gly heterogeneous us & interconnect cted, batu tuery-operated pla latg tgorms

– non-SMP multj-core – GP-GPU/TPU acceleratjon – FPGA

  • Heterogeneous pla

latg tgorms needed in sofu fu and hard real-tjm tjme use-ca cases

– automotjve, railways, aerospace,

robotjcs, gaming, multjmedia, ...

Heterogeneous Hardware Heterogeneous Hardware

Non-SMP multi-core GP-GPU TPU FPGA

slide-3
SLIDE 3

3

Tommaso Cucinotta – Real-Time Systems Laboratory - Scuola Superiore Sant’Anna - VHPC 2020

Problems & Challenges

  • Develo

elopme ment of so sofu fuware e for CP CPSs s is is cumbe mberso some me!

– Optjmum usage of

underlying hardware parallelism & acceleratjon

– Performance vs energy

consumptjon trade-ofgs

– Real-tjme constraints – Safety & certjfjcatjon

Heterogeneous Hardware Heterogeneous Hardware

Non-SMP multi-core GP-GPU TPU FPGA

Operating System Kernel / Hypervisor Operating System Kernel / Hypervisor Operating System Services & Middleware Operating System Services & Middleware CPU Scheduler CPU Scheduler I/O Scheduler I/O Scheduler Drivers... Drivers... Power Management Power Management App 1 App 1 App n App n

slide-4
SLIDE 4

4

Tommaso Cucinotta – Real-Time Systems Laboratory - Scuola Superiore Sant’Anna - VHPC 2020

MDE & Formalisms in Embedded System Design

  • Model-D
  • Driv

iven n Engi ngine neering ring (MD MDE)

– Fill the gap between

high-level specifjcatjons and actual system behavior

  • MD

MDE E emb mbraces

– Formal specifjcatjon language(s) – Model transformatjon engine(s) – Model refjnements & composability – Automatjc code generator(s) – Model verifjability

  • => Corr

rrect ctne ness-b s-by-c

  • const

nstructj ructjon

Sensors Actuators

MD E

(e.g. CAPELLA, AMALTHEA, AUTOSAR)

Logic Controller

Functional & Non-functional Requirements Functional & Non-functional Requirements System Specification System Specification Architecture Definition Architecture Definition Implemented Software Implemented Software

gap gap gap

slide-5
SLIDE 5

5

Tommaso Cucinotta – Real-Time Systems Laboratory - Scuola Superiore Sant’Anna - VHPC 2020

MDE & Formalisms in Embedded System Design

  • Model-D
  • Driv

iven n Engi ngine neering ring (MD MDE)

– Fill the gap between

high-level specifjcatjons and actual system behavior

  • MD

MDE E emb mbraces

– Formal specifjcatjon language(s) – Model transformatjon engine(s) – Model refjnements & composability – Automatjc code generator(s) – Model verifjability

  • => Corr

rrect ctne ness-b s-by-c

  • const

nstructj ructjon

Sensors Actuators

MD E

(e.g. CAPELLA, AMALTHEA, AUTOSAR)

Logic Controller

Functional & Non-functional Requirements Functional & Non-functional Requirements System Specification System Specification Architecture Definition Architecture Definition Implemented Software Implemented Software

gap gap gap

Trad aditjo itjonal M al MDE l limit mitatjo tjons

  • Single-processor systems or very limited

support for multj-core systems

  • Struggles at coping with nowadays

complex heterogeneous embedded boards Trad aditj itjonal M al MDE l limit mitatjo tjons

  • Single-processor systems or very limited

support for multj-core systems

  • Struggles at coping with nowadays

complex heterogeneous embedded boards

slide-6
SLIDE 6

6

Tommaso Cucinotta – Real-Time Systems Laboratory - Scuola Superiore Sant’Anna - VHPC 2020

AMPERE Project Goal

  • Fill

ll the gap p bet etween en

– MDE techniques with no/limited

parallelism support

– Parallel-programming models with effjcient

HW offmoading (OpenMP, CUDA, ...)

– Heterogeneity in hardware

  • In pr

pres esen ence e of non-functj tjonal l req equirements

– High-Performance – Real-Time Constraints – Energy Effjciency – Fault Tolerance

Sensors Actuators

MD E

(e.g. CAPELLA, AMALTHEA, AUTOSAR)

Logic Controller

Heterogeneous Hardware Heterogeneous Hardware

Non-SMP multi-core GP-GPU TPU FPGA

Run-time parallel frameworks Parallel Programming Models

(e.g. OpenMP, OpenCL, CUDA, COMPSs)

P a ra lle l E x e c u tion Mo d e l

Parallel Units Parallel Untits Parallel Units

slide-7
SLIDE 7

7

Tommaso Cucinotta – Real-Time Systems Laboratory - Scuola Superiore Sant’Anna - VHPC 2020

Bridge the gap

  • 1. Synthesis methods for an effjcient generatjon of

parallel source code, while keeping non- functjonal and composability guarantees

  • 2. Run-tjme parallel frameworks that guarantee

system correctness and exploit the performance capabilitjes of parallel architectures

  • 3. Integratjon of parallel frameworks into MDE

frameworks

Sensors Actuators

MD E

(e.g. CAPELLA, AMALTHEA, AUTOSAR)

Logic Controller Run-time parallel frameworks Parallel Programming Models

(e.g. OpenMP, OpenCL, CUDA, COMPSs)

P a r a lle l E x e c u tion Mo d e l

Parallel Units Parallel Untits Parallel Units

AMPERE Vision

slide-8
SLIDE 8

8

Tommaso Cucinotta – Real-Time Systems Laboratory - Scuola Superiore Sant’Anna - VHPC 2020

Bridge the gap

Sensors Actuators

MD E

(e.g. CAPELLA, AMALTHEA, AUTOSAR)

Logic Controller Run-time parallel frameworks Parallel Programming Models

(e.g. OpenMP, OpenCL, CUDA, COMPSs)

P a r a lle l E x e c u tion Mo d e l

Parallel Units Parallel Untits Parallel Units

AUTOSAR SW-C Runnables Client-server ASIL AUTOSAR SW-C Runnables Client-server ASIL AMALTHEA Performance Tasks Scheduling Platform AMALTHEA Performance Tasks Scheduling Platform CAPELLA Functional components Allocation of resources Data models View points validation rules CAPELLA Functional components Allocation of resources Data models View points validation rules Meta-model Driven Abstractions Components, Communications, Timing Characteristics, IntegrityAassurance, ... Meta-model Driven Abstractions Components, Communications, Timing Characteristics, IntegrityAassurance, ... Model Transformation Engine Model Transformation Engine Meta-parallel Programming Abstraction Parallelism, Synchronization, Data Dependencies, Data Attributes, ... Meta-parallel Programming Abstraction Parallelism, Synchronization, Data Dependencies, Data Attributes, ... OpenMP Task construct Dependencies Parallel construct OpenMP Task construct Dependencies Parallel construct OpenCL clgetDeviceId clCreateBuffer __kernel_exec OpenCL clgetDeviceId clCreateBuffer __kernel_exec COMPSs Compute resource Data movements Task annotations COMPSs Compute resource Data movements Task annotations Parallel Run-Time Frameworks Parallel Run-Time Frameworks

AMPERE MDE Framework

AMPERE Vision

slide-9
SLIDE 9

9

Tommaso Cucinotta – Real-Time Systems Laboratory - Scuola Superiore Sant’Anna - VHPC 2020

Sofuware Layer Tool Owner (License) DSMLs

AUTOSAR AUTOSAR (Proprietary) AMALTHEA BOSCH (Open-source) CAPELLA TRT (Open-source)

Parallel programming models

OpenMP OpenMP ARB (Proprietary) CUDA NVIDIA (Proprietary) OpenCL Khronos (Proprietary) COMPSs BSC (Open-source)

Artjfjcial Intelligence

TensorFlow Google (Open-source)

Code synthesis tools

Synthesis tools AMPERE (Open-source)

Analysis and testjng tools

NFP analysis AMPERE (Open-source)

Compilers and hardware synthesis tools

Mercurium BSC (Open-source) GCC/LLVM GNU/LLVM (Open-source) Vivado Xilinx (Proprietary)

Run-tjme libraries

GOMP GNU-GCC (Open-source) KMP LLVM (Open-source) Vivado Xilinx (Proprietary)

Operatjng systems

Linux Linux-Foundatjon (Open-source) ERIKA Enterp. EVI (Open-source/commercial)

Hypervisors

PikeOS SYSGO (Proprietary)

AMPERE Software Architecture

slide-10
SLIDE 10

10

Tommaso Cucinotta – Real-Time Systems Laboratory - Scuola Superiore Sant’Anna - VHPC 2020

AMPERE Software Development Workflow Overview

R u n

  • tim

e fr a m e w

  • rk

+ O S + H y p e rv is

  • r

Ex ecutionProfile

Model

P la tfo rm d e s c rip tio n

  • Accel. devices
  • Cores/clusters
  • Memory model
  • Etc.

S y s te md e s c rip tion

  • Components/communication
  • Functional/NFP
  • Etc.

Me ta MD E abs traction

C

  • d

eS y n th e s is+ Mu lti-c rite ria O p tim iz a tion

  • Performance
  • Time-predictability
  • Energy-efficiency
  • Resiliency

Meta PPM abstraction

C

  • m

p ile r

(Correctness + Refined Parallel Structure)

Parallel code(e.g. OpenMP , C UD A graphs)

R e s

  • u

rc eA lloc a tion

(i.e., mapping/scheduling)

  • Monitoring
  • Dynamic

resource allocation

slide-11
SLIDE 11

11

Tommaso Cucinotta – Real-Time Systems Laboratory - Scuola Superiore Sant’Anna - VHPC 2020

Obstacle Detectjon and Avoidance System (ODAS)

  • ADAS functjonalitjes based on data fusion coming from

tram vehicle sensors

Predictjve Cruise Control (PCC)

  • Extends Adaptjve Cruise Control (ACC) functjonality by

calculatjng the vehicle’s future velocity curve using the data from the electronic horizon

  • Improve fuel effjciency (in cooperatjon with the powertrain

control) by confjguring the driving strategy based on data analytjcs and AI

AMPERE Use-Cases

slide-12
SLIDE 12

12

Tommaso Cucinotta – Real-Time Systems Laboratory - Scuola Superiore Sant’Anna - VHPC 2020

FPGA-based system-on-chips are a very promising solution to enable predictable HW acceleration of complex computing workloads

  • Multiprocessors can host multi-OS software systems
  • FPGA fabric can be used to deploy HW accelerators

Asymmetric multjprocessor Large FPGA fabric Zynq Ultrascale+

FPGA System-on-Chip (SoC)

slide-13
SLIDE 13

13

Tommaso Cucinotta – Real-Time Systems Laboratory - Scuola Superiore Sant’Anna - VHPC 2020

  • Programmable logic exhibits very regular, clock-level behavior

(difgerently from other HW accelerators, e.g., GP-GPUs)

  • Internal control logic of several HW accelerators is typically based
  • n state machines

FIR and Sobel fjlters from Xilinx IP library (screenshot from Vivado 2017.4)

PL

  • PS

Interconnect HW accel #1 HW accel #2

port towards processing system

DRAM We can monitor & supervise bus transactjons to shield the systems from misbehaviors We can realize custom bus arbitratjon policies that help meet tjming constraints

FPGA fabric

  • Possibility to deploy custom bus logic
  • Bus/memory contention can be made predictable

HW Accelerators on FPGAs

slide-14
SLIDE 14

14

Tommaso Cucinotta – Real-Time Systems Laboratory - Scuola Superiore Sant’Anna - VHPC 2020

  • Modern FPGAs ofger dynamic partial reconfjguration (DPR) capabilities
  • DPR allows reconfjguring a portion of the FPGA at runtime, while

the rest of the device continues to operate

  • Is essence, reconfjguration requires programming a memory
  • Simplifying, an image of the FPGA confjguration (bitstream) is copied from one

memory to another

bitstreams Reco nfjgu rable regio n FPGA fabric

Dynamic Partial Reconfiguration

slide-15
SLIDE 15

15

Tommaso Cucinotta – Real-Time Systems Laboratory - Scuola Superiore Sant’Anna - VHPC 2020

  • Enable predictable HW acceleration on FPGA system-on-chips
  • Collection of technologies developed at the ReTiS Lab

http://fred.santannapisa.it/ http://fred.santannapisa.it/

Zynq-7000 series Zynq Ultrascale+

Supported platgorms

FRED Framework

slide-16
SLIDE 16

16

Tommaso Cucinotta – Real-Time Systems Laboratory - Scuola Superiore Sant’Anna - VHPC 2020

TASK(myTask) { <prepare input data> EXECUTE_HW_TASK(myHWtask); <retrieve output data> }

SW-Task

Suspend the execution until the completion of the HW-task Suspend the execution until the completion of the HW-task CPU FPGA Fabric SW-Task

Fixed-priority scheduling non-preemptjve executjon

HW-Task

periodic/sporadic real-tjme tasks HW accelerators implemented as programmable logic

System-on-Chip

work on shared-memory bufgers

FRED Programming Model

slide-17
SLIDE 17

17

Tommaso Cucinotta – Real-Time Systems Laboratory - Scuola Superiore Sant’Anna - VHPC 2020

  • CHaiDNN: HLS based DNN Accelerator Library for Xilinx Ultrascale+
  • Designed for maximum compute effjciency at 6-bit integer data types (it also

supports 8-bit integer data types)

  • The inference time in isolation exhibits very little fmuctuations
  • The real issue for time predictability is bus/memory contention

Setup: Xilinx ZCU102 (Ultrascale+), Vivado2018.2, GoogleNet, DMA from Xilinx IP lib

Time-predictable DNN Inference

slide-18
SLIDE 18

18

Tommaso Cucinotta – Real-Time Systems Laboratory - Scuola Superiore Sant’Anna - VHPC 2020

The FRED framework is a combination of several technologies:

  • Run-time FPGA manager & scheduler for Linux (both C and Python API)
  • Bus monitors and budget enforcers
  • Automated FPGA fmoor-planning
  • Automatic synthesis of bus interconnections

Inside the FRED Framework

slide-19
SLIDE 19

19

Tommaso Cucinotta – Real-Time Systems Laboratory - Scuola Superiore Sant’Anna - VHPC 2020

Conclusions

  • AMPERE aims to bridge the gap between MDE and PPM on HHW by

1. Providing a development framework for CPS targetjng parallel heterogeneous architectures for an increased productjvity compliant with current MDE practjses 2. Providing an executjon framework for an effjcient exploitatjon of parallel and heterogeneous architectures, fulfjlling functjonal and non-functjonal constraints 3. Integratjng AMPERE sofuware solutjons into relevant industrial use-cases (automotjve and railway) with HPC and real-tjme requirements

slide-20
SLIDE 20

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871669

Thanks for Listening Any Questions?

htups://www.linkedin.com/company/ampere-project htups://twituer.com/ampereproject