Transactor-based debugging of massively parallel processor array - - PowerPoint PPT Presentation

transactor based debugging of
SMART_READER_LITE
LIVE PREVIEW

Transactor-based debugging of massively parallel processor array - - PowerPoint PPT Presentation

Transactor-based debugging of massively parallel processor array architectures Markus Blocherer, Srinivas Boppu, Vahid Lari, Frank Hannig, Jrgen Teich Hardware/Software Co-Design University of Erlangen-Nuremberg 1st International Workshop on


slide-1
SLIDE 1

Markus Blocherer, Srinivas Boppu, Vahid Lari, Frank Hannig, Jürgen Teich Hardware/Software Co-Design University of Erlangen-Nuremberg

Transactor-based debugging of massively parallel processor array architectures

1st International Workshop on Multicore Application Debugging (MAD 2013), November 14-15, 2013 Germany

slide-2
SLIDE 2

Agenda

Slide 2

Motivation Invasive Computing Hardware Debugging Transactor-based Prototyping Conclusions

slide-3
SLIDE 3

Slide 3

Motivation

Steady increase in the application complexity Customization and heterogeneity are the key success for future performance gains Steady increase in the number of cores on a chip

TCPA

CPU CPU CPU CPU Memory CPU i-Core CPU Memory CPU CPU CPU CPU CPU Memory

Memory I/O

CPU i-Core CPU Memory CPU

Memory

TCPA

CPU CPU CPU CPU NoC Router NoC Router NoC Router NoC Router NoC Router NoC Router NoC Router NoC Router NoC Router Memory

slide-4
SLIDE 4
  • A resource-aware computing paradigm

− Each application may use available computing resources in 3 phases:

  • Exploring and claiming them (invade)
  • Configuring them for parallel computing (infect)
  • Releasing them (retreat)
  • Support for resource-awareness at

various levels

− Application level − Compiler level − Run-time system level − Architecture level

  • Architecture consists of different

compute tiles

− RISC CPU tiles − RISC CPUs with reconfigurable fabrics − Programmable accelerators (TCPA)

tiled architecture

Invasive Computing

Slide 4

Challenge: Simultaneous development of different architecture and software parts as well as their integration and validation

slide-5
SLIDE 5

AHB bus

  • Conf. & Com.
  • Proc. (LEON3)
  • Int. Ctr.

APB bus

AHB/APB Bridge

IM GC AG IM GC AG IM GC AG IM GC AG Configuration Manager I/O Buffers I/O Buffers I/O Buffers I/O Buffers

/* code to be executed sequentially*/ ... val constraints = new AND(); constraints.add(new TypeConstraint(PEType.TCPA)); constraints.add(new PEquantity(4)); constraints.add(new Layout(LIN)); val claim = Claim.invade( constraints ); val ilet = (id:IncarnationID) => { /* code to be executed in parallel */ ... }; claim.infect( ilet ); … claim.retreat();

Run-time system

Invasion on TCPAs

  • Run-time system interaction with TCPAs
  • Resource requests and releases
  • Application configuration
  • Input/output data streams

How do we prototype TCPAs with tight software/hardware interactions?

Slide 5

slide-6
SLIDE 6

InvasIC Prototyping Platform

Slide 6

  • Synopsys FPGA-based

prototyping platform

− Up to 12 million ASIC gates

  • f capacity

− Tools for multi-FPGA prototypes (Certify) and RTL debug (Identify) − UMRBus interface kit for host workstation − Transactor library for AMBA to support bus-protocol communication − Portable hardware

DUT

FPGA-Based Hardware

Connector Camera Sensor I/F

Host

Connector

OS

Run-time Control Display Driver DVI Extension

slide-7
SLIDE 7

Typical HDL-based Development

Slide 7

HDL-Simulator (ModelSim) Testbench (VHDL)

I/O Buffers I/O Buffers I/O Buffers I/O Buffers

DUT

slide-8
SLIDE 8

HDL-Bridge-based Debugging

Slide 8

HDL-Simulator (ModelSim) Testbench (VHDL)

Software Wrapper Hardware Wrapper

I/O Buffers I/O Buffers I/O Buffers I/O Buffers

DUT

slide-9
SLIDE 9

Synopsys Transactor Library

Slide 9

  • Library offers

UMRBus-based transactors

− AMBA − UART − GPIO − …

  • C++ and Tcl API
  • Easy to integrate into

existing RTL designs

write () ahb_master Read () API CAPIM AHB bus

UMRBus read/write initiator

call back () ahb_slave call back () API CAPIM

read/write initiator UMRBus

slide-10
SLIDE 10

Evaluation

Slide 10

Performance Cycle accuracy Signal

  • bservability

Intended use HDL-Simulation slowest yes high hardware development HDL-Bridge slow yes medium hardware debugging AMBA- Transactor high no low integration and extended testing

  • Hardware developing and debugging requires cycle

accuracy and highly flexible possibilities to observe individual signals

  • For software developing and testing, the performance is a

key feature beside observability of registers

slide-11
SLIDE 11

Now, the main video based application (Edge detection) tries to capture the remaining PEs on the TCPA tile, while satisfying the following properties:  Guaranteed constant throughput for a 1024x768 frame resolution  Dynamic adaptation of quality of service (Laplace or Sobel)

Test Application

Slide 11

A secondary application pre-occupies a number of PEs

  • n the target TCPA-tile

AHB bus

  • Conf. & Com.
  • Proc. (LEON3)
  • Int. Ctr.

APB bus

AHB/APB Bridge

IM GC AG IM GC AG IM GC AG IM GC AG Configuration Manager I/O Buffers I/O Buffers I/O Buffers I/O Buffers

Rx Tx

DVI Extension Board

slide-12
SLIDE 12

Hardware/Software Interactions

Slide12

AMBA AHB Transactor

AHB bus

  • Conf. & Com.
  • Proc. (Leon3)

Int. Ctr.

APB bus

AHB/APB Bridge

IM IM IM IM

Configuration Manager I/O Buffers I/O Buffers I/O Buffers I/O Buffers

Rx Tx

DVI Extension Board

LEON3: An Invade Request for n PEs Request an arbitrary number of PEs for a secondary application (n) TCPA: Invasions on invasion controllers LEON3: Respond the invasion request (n PEs) LEON3: An Invade Request for 25 PEs TCPA: Invasions on invasion controllers Request 25 PEs for the edge detection application

If (2<m<9) Load Sobel 1x3 configuration

If (8<m<25) Load Laplace 3x3 configuration If (m==25) Load Laplace 5x5 configuration Receive the number of invaded PEs (m) LEON3: Respond the invasion request (m ) Send configuration stream and start computation

TCPA: Application execution

Application termination and resource release request

slide-13
SLIDE 13

Application Scenarios / Results

Slide 13

slide-14
SLIDE 14

Experimental Setup

Slide 14

AHB Bus LEON3 CORE: 1 LEON3 CORE: 2 LEON3 CORE: 0 LEON3 CORE: 3 static RAM Master Transactor

  • 1. Step

− Write data to the RAM − measure data rate

  • 2. Step

− Read data from RAM − measure data rate

slide-15
SLIDE 15

Master Transactor Data Rate

Slide 15

0,261 0,631 1,005 2,466 3,487 6,666 9,138 13,388 17,724 20,798 23,174 0,331 0,744 1,56 2,907 4,584 6,132 7,344 8,576 8,98 9,18 9,458 5 10 15 20 25 128 256 512 1K 2K 4k 8K 16K 32K 64K 128K MBytes/sec bytes write read

slide-16
SLIDE 16

Software Development

Slide 16

  • GRMON
  • General debug monitor for the LEON3 processor

 Read/write access to all system registers and memory  Built-in disassembler and trace buffer management  Downloading and execution of LEON applications  Breakpoint and watchpoint management  Support for USB, JTAG, RS232, PCI, and Ethernet debug links  Tcl interface (scripts, procedures, variables, loops etc.)

  • Challenges
  • Initial situation offered by GAISLER

 Bus-based MPSoC with up to 16 cores and only one GRMON instance

  • But, we need a GRMON instance to each tile

 Each instance needs a separate connection medium to CHIPit  Synchronization between the tiles

slide-17
SLIDE 17

GRMON Debugging

Slide 17

CPU CPU CPU CPU Memory i-Core CPU CPU CPU CPU Memory i-Core CPU CPU CPU CPU Memory CPU CPU CPU CPU Memory CPU CPU CPU CPU Memory

TCPA

Memory I/O Memory

TCPA

NoC Router NoC Router NoC Router NoC Router NoC Router NoC Router NoC Router NoC Router NoC Router

  • Data transfer

 I/O Tile  Direct to the tiles

  • Debug

− Debug unit − GAISLER (GRMON)

DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG DEBUG

slide-18
SLIDE 18

Multiple Transactor-based Debugging

Slide 18

CPU CPU CPU CPU Memory i-Core CPU CPU CPU CPU Memory i-Core CPU CPU CPU CPU Memory CPU CPU CPU CPU Memory CPU CPU CPU CPU Memory

TCPA

Memory I/O Memory

TCPA

NoC Router NoC Router NoC Router NoC Router NoC Router NoC Router NoC Router NoC Router NoC Router

  • Data transfer

 I/O Tile  Direct to the tiles

  • Debug

− AMBA Transactor − GAISLER (GRMON)

Transactor Transactor Transactor Transactor Transactor Transactor Transactor Transactor Transactor

slide-19
SLIDE 19

Conclusions

Silde 19

  • HDL-Bridge-based debugging enables efficient and

precise hardware development on multiple FPGAs

  • AHB transactor interface eased connectivity and control
  • ver FPGA-based prototype
  • Transactor-based debugging offers fast and scalable

hardware-software interaction of heterogeneous MPSoC

  • Our FPGA-based prototyping approach is feasible for

MPSoC validation and demonstration

slide-20
SLIDE 20

Thank you for your attention!

Slide 20

Transactor-based debugging of massively parallel processor array architectures Contact

Markus Blocherer Hardware/Software Co-Design Universität Erlangen-Nürnberg Cauerstraße 11, 91058 Erlangen, Germany Email: markus.blocherer@fau.de

www.invasive-computing.org