Architecture Hugo M IOMANDRE , Julien H ASCOT , Karol D ESNOS , Kevin - - PowerPoint PPT Presentation

architecture
SMART_READER_LITE
LIVE PREVIEW

Architecture Hugo M IOMANDRE , Julien H ASCOT , Karol D ESNOS , Kevin - - PowerPoint PPT Presentation

INSTITUT DLECTRONIQUE ET DE TLCOMMUNICATIONS DE RENNES Porting the Spider Dataflow Runtime on the Kalray MPPA Manycore Architecture Hugo M IOMANDRE , Julien H ASCOT , Karol D ESNOS , Kevin M ARTIN , Benoit D UPONT DE D INECHIN


slide-1
SLIDE 1
  • K. Desnos – SPIDER / MPPA

INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES

1

Hugo MIOMANDRE, Julien HASCOËT, Karol DESNOS, Kevin MARTIN, Benoit DUPONT DE DINECHIN Jean-François NEZAN

Porting the Spider Dataflow Runtime

  • n the Kalray MPPA Manycore

Architecture

Dataflow Workshop - 2017.12.12

slide-2
SLIDE 2
  • K. Desnos – SPIDER / MPPA

INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES

2

Reconfigurable Dataflow for Manycore

Context > Overview

GdR ISIS Project

Archi model

Modeling Framework Runtime Adaptation Layer Manycore Architecture

slide-3
SLIDE 3
  • K. Desnos – SPIDER / MPPA

INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES

3

The PiSDF Model of Computation

Context > Dataflow input

N

Read Header

Size

=4

Size Size

Read Image Filter

/N Size

  • ut

Size

in Size

SetNb Slices

=2

Size

Kernel

/N Size Size Size Size

Send

Desnos et al. "Pimm: Parameterized and interfaced dataflow meta-model for mpsocs runtime reconfiguration." SAMOS, 2013

slide-4
SLIDE 4
  • K. Desnos – SPIDER / MPPA

INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES

4

The SPIDER runtime manager

Context > Runtime

Heulot et al. "Spider: A synchronous parameterized and interfaced dataflow-based RTOS for multicore DSPs" EDERC, 2014 Jobs Jobs Jobs Params Timings Data Data

Pool of data FIFOs

Master tasks:

  • 1. Manage graphs
  • 2. Map & Schedule
  • 3. Send Jobs
  • 4. Run jobs
  • 5. Monitor & Trace

Slave task:

  • Run jobs

Slave Slave Master

slide-5
SLIDE 5
  • K. Desnos – SPIDER / MPPA

INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES

5

Manycore Architecture: Kalray’s MPPA256B

Context > The Beast

Challenges

  • Distributed scratchpad memory
  • Massive parallelism
  • NoC-Based communications
slide-6
SLIDE 6
  • K. Desnos – SPIDER / MPPA

INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES

6

  • Distributed synchronization
  • Lightweight manycore scheduling
  • Dataflow-based distributed memory allocation

Contributions

Miomandre et al. "Embedded Runtime for Reconfigurable Dataflow Graphs on Manycore Architectures" submitted to PARMA-DITAM 18

slide-7
SLIDE 7
  • K. Desnos – SPIDER / MPPA

INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES

7

Contributions > Overview

Spider distribution on MPPA

Master Slave X 256

slide-8
SLIDE 8
  • K. Desnos – SPIDER / MPPA

INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES

8

Contributions > Synchro

Previous Synchronization Mechanisms

Shared-Memory Based

(x86, Big.LITTLE)

Core1 Flag Core2

A B Poll

Pop job

Hardware supported

(TI Keystone 2)

Core1 FIFO Core2

A B

Pop job

slide-9
SLIDE 9
  • K. Desnos – SPIDER / MPPA

INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES

9

Contributions > Synchro

Synchronization Mechanisms

Before: Centralized Polling Master Slave Slave Slave NoC After: Distributed notifications NoC Master Slave Slave Slave

  • Unbounded

communications

  • Contention
  • Notifier/Observer pattern
  • Bounded

communications

slide-10
SLIDE 10
  • K. Desnos – SPIDER / MPPA

INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES

10

Contributions > Scheduling

Complexity issue for manycore

Good’ol Lis ist t sche heduli duling ng

  • 1. Create a sorted list of

all actors to map/schedule.

  • 2. Schedule first actor
  • f the list of first

available core.

  • 3. Go back to step 2

until the list is empty.

Complexity: O( A.log(A) + P.A))

A: # actors P: # processors Multicore architectures: P smaller than log(A) => A.log(A) dominates Manycore architecture: P is large A ∝ P (by designer) => Complexity quadraticaly

slide-11
SLIDE 11
  • K. Desnos – SPIDER / MPPA

INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES

11

Contributions > Scheduling

Specialized Round Robin

1.Create a list of actor in topological order. 2.Schedule first actor by systematically interleaving clusters and cores 3.Go back to step 2 until the list is empty.

Lightweight Scheduling

Complexity: O( A )

A: # actors Advantages:

  • Low complexity
  • Interleaving clusters avoids

communication contentions between starting actors.

  • The scheduler controls the number
  • f jobs in the queue of a core.
slide-12
SLIDE 12
  • K. Desnos – SPIDER / MPPA

INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES

12

Contributions > Memory Alloc.

Classic Memory Allocation

Malloc In & Out Success? Execute actor

OK KO

Crash!

slide-13
SLIDE 13
  • K. Desnos – SPIDER / MPPA

INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES

13

Contributions > Memory Alloc.

Cluster-Level Dataflow Aware Memory Allocation

Malloc In & Out Success? Execute actor

  • Notif. Master

OK KO

Other Actor?

OK OK

slide-14
SLIDE 14
  • K. Desnos – SPIDER / MPPA

INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES

14

Results > Application

+ Morphological operator

slide-15
SLIDE 15
  • K. Desnos – SPIDER / MPPA

INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES

15

Results > Performance

Max speedup: 22 on 256 cores (on 4k video)

slide-16
SLIDE 16
  • K. Desnos – SPIDER / MPPA

INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES

16

Results > Energy

2.5 more energy efficient (On 4k video)

Intel Xeon E5-1650 6-hyperthraded x86 Kalray MPPA256 Bostan 258 RISC Cores 11.40 fps 2.81 fps ~120W ~12W

slide-17
SLIDE 17
  • K. Desnos – SPIDER / MPPA

INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES

17

Summary

Successful porting of Spider on MPPA

  • Feasible: Reconfigurable Dataflow Runtime for manycore
  • Promising: Energy efficiency beats x86
  • Open source(-ish): On github!

Future Work

  • Better
  • Faster
  • Stronger

Julien, Florian, Alexandre, Hamza, Claudio, …

slide-18
SLIDE 18
  • K. Desnos – SPIDER / MPPA

INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES

18

http://preesm.sf.net @PreesmProject

Achievement unlocked Thanks for your attention Achievement unlocked Question time

?