HW-SW Interfaces HW-SW Interfaces Abstraction and Design - - PowerPoint PPT Presentation

hw sw interfaces hw sw interfaces abstraction and design
SMART_READER_LITE
LIVE PREVIEW

HW-SW Interfaces HW-SW Interfaces Abstraction and Design - - PowerPoint PPT Presentation

HW-SW Interfaces HW-SW Interfaces Abstraction and Design Abstraction and Design for Multi-Processor SoC for Multi-Processor SoC Dr. Ahmed Amine JERRAYA TIMA Laboratory 46 Avenue Felix Viallet 38031 Grenoble Cedex France Tel: +33 476 57 47


slide-1
SLIDE 1

MPSoC’04

  • Dr. Ahmed Amine JERRAYA

TIMA Laboratory

46 Avenue Felix Viallet 38031 Grenoble Cedex France Tel: +33 476 57 47 59 Fax: +33 476 47 38 14 Email: Ahmed.Jerraya@imag.fr

HW-SW Interfaces Abstraction and Design for Multi-Processor SoC HW-SW Interfaces Abstraction and Design for Multi-Processor SoC

slide-2
SLIDE 2 MPSoC’04 - 2 Ahmed A. Jerraya

This is Team Work

Staff members:

  • P. Amblard, W. Cesário, X. Chen,
  • F. Rousseau, S. Yoo (left April ’04),
  • N. Zergainoh

Ph.D. students:

  • Y. Atat, I. Bacivarov, M. Bonaciu,
  • A. Bouchhima, A. Grasset, L. Kriaa,
  • Y. Paviot, A. Sarmento, A. Sasongko,
  • W. Youssef

Industrial Ph.D.:

  • A. Blampey, M. Fiandino, F. Hunsinger,

(STMicroelectronics)

  • L. Pieralisi

Collaborative Ph.D.:

  • Y. Cho, S. Han (Korea)
  • G. Majauskas (Lithuania)
  • I. Petkov (Bulgaria)

Master students and Undergraduates:

  • F. Dumitrascu, S. Hadhri,
  • R. Khrouf, K. Popovici, M. Yahia,
slide-3
SLIDE 3 MPSoC’04 - 3 Ahmed A. Jerraya

The SoC Era Challenges

SoC: put on a chip what we used to put on one or several boards

(ASIC, CPU, Memories, Analog/RF, MEMS, …)

Facts:

g 90% of new ASICs already include a CPU in 130nm. g Multimedia, network processors, mobile terminals and game

applications are already multiprocessors.

Fundamental changes:

g MPSoC is different from ASIC g MPSoC is different from SW g MPSoC requires abstract HW-SW interfaces to allow fast

integrations.

Challenges

g Generic MPSoC platform (programmable, reconfigurable, …) g Specific MPSoCs using standard IP with specific interconnect.
slide-4
SLIDE 4 MPSoC’04 - 4 Ahmed A. Jerraya

Generic SoC Platform vs. Application-Specific MPSoC

(MPSoC’03 after dinner discussion)

1986

Rack in a van

1990

PCB

1995

Chip set in a hand-set

2002

SoC

2006

SW component on a generic platform, e.g. Nomadic (ST)

Same roadmap for game computers, MP3, STB, NP, DVD

Example: The GSM History/Roadmap

slide-5
SLIDE 5 MPSoC’04 - 5 Ahmed A. Jerraya

Outline

  • 1. HW-SW Interfaces: From Wires to Abstract

Interconnect

  • 2. Abstracting HW-SW Interfaces
  • 3. HW-SW Interfaces Design & Debug: The ROSES

Environment

  • 4. MPSoC Design

4.1. MPEG4 Design Example 4.2. Results Analysis

slide-6
SLIDE 6 MPSoC’04 - 6 Ahmed A. Jerraya

Platform_API HdS CPU sub-system HW interfaces

SoC Platform vs. Embedded Software

Application SW design:

g Real time SW Models g Platform model, e.g. Sony PlayStation,

Nomadic

g Key Issue: Complexity (GB, ms)

Platform_API:

g Programming model to build software g Specific to application g Hides HW details

Hardware dependent SW (HdS)

g Provided by SoC designer in case of

specific HW

g Lower SW layers to access HW g Specific SoC function (e.g. DSP SW code) g Key issue: Performances (K&MB, ns)

Hardware sub-system

g CPU sub-systems g Specific hardware, Analog, memories,

Network-on-Chip HW interfaces: required for application

specific HW/SW interfaces Application software NoC HW components HW interfaces

MPSoC Design

slide-7
SLIDE 7 MPSoC’04 - 7 Ahmed A. Jerraya

HdS (78%)

Application SW HAL Parallel Prog. Model Memory Map Design Environment µ-Kernel

5 12 13 5 5 30 5 12 13

Data dependent computation C library bug Booting is not synchronized among processors. Lost some interrupts Wrong interrupt priority levels Context switch does not work correctly. Incorrect FIFO counter value causes deadlock. Result of compressed video is not correct. Abnormal execution of a portion of C code

Hardware dependent SW Design & Debug is The Bottelneck Example: SW Debug of an MPEG4 CoDec

Bugs %

slide-8
SLIDE 8 MPSoC’04 - 8 Ahmed A. Jerraya

Outline

  • 1. HW-SW Interfaces: From Wires to Abstract

Interconnect

  • 2. Abstracting HW-SW Interfaces
  • 3. HW-SW Interfaces Design & Debug: The ROSES

Environment

  • 4. MPSoC Design

4.1. MPEG4 Design Example 4.2. Results Analysis

slide-9
SLIDE 9 MPSoC’04 - 9 Ahmed A. Jerraya

Heterogeneous MPSoC Design Space

API
  • Com. Network
HdS HW IF API C P U 1 s u b s y s t e m T 1 HdS HW IF CPU 2 sub system CPU 1 sub system T 6 T 2 T 3 T 5 T 4

Software Programming model on an existing platform

Concurrency Decomposition Mapping Communication Synchronisation Interconnect

NoC Programming Model

HW Adaptation for application specific communication

Computation subsystem model

CPU sub-system for application specific computation SW Adaptation

slide-10
SLIDE 10 MPSoC’04 - 10 Ahmed A. Jerraya

Which Parallel Programming Model to Use ?

concurrency decomposition mapping communication synchronization Interconnection More IMPLICIT More Explicit Interface

  • All explicit
g ISA SW + RTL HW
  • Explicit concurrency, decomposition, mapping, communication,

synchronization, Interconnection; Implicit Interface

g TLM Transaction
  • Explicit concurrency, decomposition, mapping, communication,

synchronization; Implicit Interconnection and Interface

g MPI, TLM Message, thread package, concurrent C
  • Explicit concurrency, decomposition, mapping; Implicit

communication, synchronization, Interconnection and Interface

g SDL, compositional C++
slide-11
SLIDE 11 MPSoC’04 - 11 Ahmed A. Jerraya

Abstracting HW-SW Interfaces for A Software Sub-system

SW API-SW NoC API-HW HW-SW Interfaces SW adaptation (HdS) HAL Abstract CPU SS HW services HW adaptation

API-SW = SW programming model API-HW = NoC programming model Abstract CPU sub-system HAL = HW abstraction layer HW services: local architecture (e.g. bus) SW adaptation : implement programming model on CPU sub- system HW adaptation: adapt CPU sub- system to NoC

slide-12
SLIDE 12 MPSoC’04 - 12 Ahmed A. Jerraya

The Virtual Component Model

Virtual component

g Component Hardware IP Software IP Functional IP g Abstract Interfaces Required Services Provided Services Control Services Synchronization Parameters, ….

Execution Environment

g Abstract Platform (e.g. NoC, Cosimulation backplane, …)

Heterogeneous components thanks to adaptation between

different programming models.

Execution Environment Component 2

Abstract Interface 2

Component 1

Abstract Interface 1

slide-13
SLIDE 13 MPSoC’04 - 13 Ahmed A. Jerraya

The Virtual Component Model for MPSoC

Execution environment (e.g. AMBA bus) Virtual Processor SW component SW task 2 SW task 1 Virtual IP HW component HW block 2 HW block 1 Internal port (comp. prog. model) External port (NoC prog. Model)
  • Abs. level
TLM RT level Protocol AMBA rd/wr
  • Basic model: a set of hierarchically interconnected virtual modules

and an execution environment

  • Virtual Module:
g Content: Tasks/Instances + communication channels) g Abstract interface: set of virtual ports g Internal/external ports g Structure and services g

Colif: An XML object-oriented database for virtual architectures

g

Components programming models

g

NoC programming models

g

MPSoC programming model is the composition of NoC and components programming models.

slide-14
SLIDE 14 MPSoC’04 - 14 Ahmed A. Jerraya

Outline

  • 1. HW-SW Interfaces: From Wires to Abstract

Interconnect

  • 2. Abstracting HW-SW Interfaces
  • 3. HW-SW Interfaces Design & Debug: The ROSES

Environment

  • 4. MPSoC Design

4.1. MPEG4 Design Example 4.2. Results Analysis

slide-15
SLIDE 15 MPSoC’04 - 15 Ahmed A. Jerraya Virtual IP Virtual IP Virtual Processor Virtual Processor

System-level SoC Design Flow

  • System specification is a

virtual architecture: virtual modules using specific programming models connected through an execution environment.

System Specification

Execution environment (e.g. AMBA bus) Virtual Processor SW component SW task 2 SW task 1 Virtual IP HW component HW block 2 HW block 1
  • Automatic generation of

application-specific HW/SW interface sub-systems from basic interface components and CPU sub-system models.

… API SW comp. API CPU Basic SW interface component Basic HW interface component … API HW comp. API network Communication interconnect (e.g. NoC) HW interface sub-system (HW wrapper) HW component HW interface sub-system (HW wrapper) CPU sub-system SW interface sub-system (SW wrapper) SW components (Tasks)
  • Architecture implementation:

heterogeneous components and sophisticated communication interconnect to adapt different programming models.

slide-16
SLIDE 16 MPSoC’04 - 16 Ahmed A. Jerraya

Key Technology: Composing Interfaces

Interface sub-system composition

  • Component interface
g Required/Provided services g Control and Synchronization services g Parameters
  • Interface sub-system composition
g Services matching g User-extensible library g Code specialization

Execution environment Abstract interface Component

services services Interface component library MPI channel ARM7 boot Scheduler I/O driver Unix IPC Data conv.

Works for HW, SW, and Functional interface sub-systems

Execution environment Component

write send send Sched. IT I/O ISR
slide-17
SLIDE 17 MPSoC’04 - 17 Ahmed A. Jerraya

Outline

  • 1. HW-SW Interfaces: From Wires to Abstract

Interconnect

  • 2. Abstracting HW-SW Interfaces
  • 3. HW-SW Interfaces Design & Debug: The ROSES

Environment

  • 4. MPSoC Design

4.1. MPEG4 Design Example 4.2. Results Analysis

slide-18
SLIDE 18 MPSoC’04 - 18 Ahmed A. Jerraya

OpenDivX

g Open source Mpeg4

encoder/decoder

g Modified to work concurrently on

1/4th of each frame

Goals

g Refinement of HW/SW interfaces g Multi-level simulation and early

validation

g SW debug before HW platform is

ready.

MPSoC Design of a DivX Encoder

Movement detection & compensation fDCT Quant i fDCT DeQuant
slide-19
SLIDE 19 MPSoC’04 - 19 Ahmed A. Jerraya

MPEG video

Input

DMA

Combiner CPU_1 CPU_2 CPU_3

VLC

HW IP SW Node Video stream Data flow

INPUT : Split coming frame in 4 parts and send it to CPUs CPU_# : Treat coming data and prepare it for compression VLC

: Finalize compression and prepare the whole image

COMBINER : prepare for output and adjust compression parameters DMA : Direct access to local memories of processors.

DivX Encoder: Overview

CPU_0
slide-20
SLIDE 20 MPSoC’04 - 20 Ahmed A. Jerraya

Major architecture specificities

Specific Memory Controller : Switch bank service Specific Interface : Core IT + 2 Synchronization Signals Point to Point communication scheme

DivX Encoder: Overview

DMA INPUT Combiner Mem Ctrler bank0 bank1 ARM Core Mem Ctrler bank0 bank1 ARM Core Mem Ctrler bank0 bank1 ARM Core Mem Controler bank0 bank1 ARM Core Interface

RAM/ROM

Add Dec

slide-21
SLIDE 21 MPSoC’04 - 21 Ahmed A. Jerraya

CPU Sub-system Architecture With An ARM9 Core

SRAM1

Bus Matrix

Address Decoder ROM Memory Controller SRAM AHB AMBA MemCtrl SRAM0 MemCtrl

Link to DMA

slide-22
SLIDE 22 MPSoC’04 - 22 Ahmed A. Jerraya Virtual IP HW component (DMA) HW block 2 HW block 1 Virtual IP HW component (I/O) HW block 2 HW block 1 SW component (P4) SW task 2 SW task 1

Programming Model for DivX (DMA)

SW component (P3) SW task 2 SW task 1 SW component (P2) SW task 2 SW task 1 SW component (P1) encoder p1 p3 Stand by

RT-level channels

memory_bank_struct *memory_io; // initialize encoder library initialize(5, true, 0, 900); // loop forever while(1) { // waits for data p1.WaitEvent(); // gets the data address memory_io = (memory_bank_struct*) p2.switch_banks(); // signals computation starting p3.SendEvent(); // calls encoding function divx_compress(&(memory_io->ins), &memory_io->outs, 1); // signals computation ended p3.SendEvent(); wait(); } p2

SystemC transaction level channels

Shared-memory Programming Model

Shared memory: DMA control:

p1.switch_banks() p2.WaitEvent() p3.SendEvent()

slide-23
SLIDE 23 MPSoC’04 - 23 Ahmed A. Jerraya Virtual IP HW component (DMS) HW block 2 HW block 1 Virtual IP HW component (I/O) HW block 2 HW block 1 SW component (P4) SW task 2 SW task 1

Programming Model for DivX (DMS)

SW component (P3) SW task 2 SW task 1 SW component (P2) SW task 2 SW task 1 SW component (P1) encoder p1 Stand by

RT-level channels

memory_bank_struct *memory_io; // initialize message structure p1.sram_init(&mes); // loop forever while(1) { // input data p2.Recv(…); p2.PWait(…); // gets the data while (mes != end_data) { memory_io[mes.addr] = mes.data; } // calls encoding function divx_compress(&(memory_io->ins), &memory_io->outs, 1); // sends output data for (…) { p2.Send(…); p2.PWait(…); …} wait(); } p2

SystemC transaction level channels

Message Passing Programming Model

Message passing: DMS control:

p1.sram_init(base_address) p2.Conn_Setup (rmt_id,lch,rch) p2.Send (lch,laddress,size) p2.Recv (lch,laddress,size) p2.RWrite (lch,laddr,raddr,size) p2.RRead (lch,laddr,raddr,size) p2.IWait (lch) p2.PWait (lch)

slide-24
SLIDE 24 MPSoC’04 - 24 Ahmed A. Jerraya

The ROSES Environment

slide-25
SLIDE 25 MPSoC’04 - 25 Ahmed A. Jerraya

Outline

  • 1. HW-SW Interfaces: From Wires to Abstract

Interconnect

  • 2. Abstracting HW-SW Interfaces
  • 3. HW-SW Interfaces Design & Debug: The ROSES

Environment

  • 4. MPSoC Design

4.1. MPEG4 Design Example 4.2. Results Analysis

slide-26
SLIDE 26 MPSoC’04 - 26 Ahmed A. Jerraya

Key Results

Early and multi-level simulation allows for:

g Architecture exploration g Debug cost reduction Debug software before hardware is ready Mitigate hardware prototyping step

Automatic generation of HW and SW adaptation

layers: a drastic improvement of design productivity.

slide-27
SLIDE 27 MPSoC’04 - 27 Ahmed A. Jerraya

Multi-level Simulation Speed-up and Accuracy

RTL (or TLM) Application SW Abstract SW Interface Model HW HW Application SW OS HAL Model

SW simulation at programming model level Native SW simulation with abstract CPU sub-system model (HAL)

Speed-up ~500 (>>) ~100 1 Accuracy 75% (<<) 85% 100%

HW ISS RTL RTL

HW/SW co-simulation with ISS

slide-28
SLIDE 28 MPSoC’04 - 28 Ahmed A. Jerraya

QCIF RESOLUTION, 25 frames/s

176 144

Architecture Exploration for QCIF Resolution, 25 frames/s

slide-29
SLIDE 29 MPSoC’04 - 29 Ahmed A. Jerraya

1.000.000 2.000.000 3.000.000 4.000.000 5.000.000 6.000.000 7.000.000 8.000.000

5 10 15 20

clock cycles

1 Processor 2 Processors 4 Processors 8 Processors 12 Processors 16 Processors 32 Processors REAL TIME

Solution 1: QCIF using ARM7 (60MHz) Processors

Frame

slide-30
SLIDE 30 MPSoC’04 - 30 Ahmed A. Jerraya

500.000 1.000.000 1.500.000 2.000.000 2.500.000 3.000.000 3.500.000

5 10 15 20

clock cycles

1 Processor 2 Processors 4 Processors 8 Processors 16 Processors 32 Processors REAL TIME

Solution 2: QCIF Using ARM9SE46- 4kI$,4kD$ (60MHz) Processors

Frame

slide-31
SLIDE 31 MPSoC’04 - 31 Ahmed A. Jerraya

352 288

Architecture Exploration for CIF Resolution, 25 frames/s

slide-32
SLIDE 32 MPSoC’04 - 32 Ahmed A. Jerraya

5.000.000 10.000.000 15.000.000 20.000.000 25.000.000 30.000.000

5 10 15 20

clock cycles

1 Processor 2 Processors 4 Processors 8 Processors 12 Processors 16 Processors 20 Processors 32 Processors REAL TIME

Frame

Performance Results: CIF Using ARM7 (60MHz) Processors (+3 for VLCs)

slide-33
SLIDE 33 MPSoC’04 - 33 Ahmed A. Jerraya

2.000.000 4.000.000 6.000.000 8.000.000 10.000.000 12.000.000

5 10 15 20

clock cycles

1 Processor 2 Processors 4 Processors 8 Processors 16 Processors 32 Processors REAL TIME

Frame

Performance Results: CIF Using ARM9SE46- 4kI$,4kD$ (60MHz) Processors (+2 for VLC)

slide-34
SLIDE 34 MPSoC’04 - 34 Ahmed A. Jerraya

83 %

Early Simulation to Reduce HW/SW Interface Debug Cycle

Validate HdS at several levels of abstraction:

µ-Kernel Hardware

Abstraction Layer

Parallel

  • Prog. Model

CPU Core

API CPU interface Network adapt. HAL PPM API µ-Kernel

On HW prototype 13 % 5+5+30 = 40 % 12+13+5 = 30% Applied to case study % HdS bugs 0 % 100 %

T1
  • App. SW
T1 Ti

HAL PPM API µ-Kernel

Simulation Model CPU ISS

T1 App. SW T1 Ti T1 App. SW T1 Ti

PPM API µ-Kernel

Simulation Model (HAL + CPU) Instruction Set Simulator (ISS) T1 App. SW T1 Ti

PPM API

Simulation Model µ-Kernel + HAL + CPU

T1 App. SW T1 Ti

API

MPICH MPI SC

T1 App. SW T1 Ti

17 %

slide-35
SLIDE 35 MPSoC’04 - 35 Ahmed A. Jerraya

MPSoC Design Issues

Generic MPSoC platform vs. Application specific MPSoC

g HdS vs. Application specific HW-SW interfaces g SW programming model vs. a composition of

heterogeneous programming models

Application specific HW-SW interfaces

g Computation specific CPU sub-system g Interconnect g SW adaptation: HdS g HW adaptation

Early validation to reduce design and debug cost.

slide-36
SLIDE 36 MPSoC’04 - 36 Ahmed A. Jerraya

Thank You