architecture
play

Architecture Hugo M IOMANDRE , Julien H ASCOT , Karol D ESNOS , Kevin - PowerPoint PPT Presentation

INSTITUT DLECTRONIQUE ET DE TLCOMMUNICATIONS DE RENNES Porting the Spider Dataflow Runtime on the Kalray MPPA Manycore Architecture Hugo M IOMANDRE , Julien H ASCOT , Karol D ESNOS , Kevin M ARTIN , Benoit D UPONT DE D INECHIN


  1. INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES Porting the Spider Dataflow Runtime on the Kalray MPPA Manycore Architecture Hugo M IOMANDRE , Julien H ASCOËT , Karol D ESNOS , Kevin M ARTIN , Benoit D UPONT DE D INECHIN Jean-François N EZAN Dataflow Workshop - 2017.12.12 K. Desnos – S PIDER / MPPA 1

  2. Context > Overview INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES Reconfigurable Dataflow for Manycore GdR ISIS Project Modeling Runtime Manycore Framework Adaptation Architecture Layer Archi model K. Desnos – S PIDER / MPPA 2

  3. Context > Dataflow input INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES The PiSDF Model of Computation Read Size =4 Header Read Send Filter Size Size Size Size Image Size SetNb Size Slices N =2 Kernel Size /N Size /N Size out in Size Desnos et al. "Pimm: Parameterized and interfaced dataflow K. Desnos – S PIDER / MPPA 3 meta-model for mpsocs runtime reconfiguration." SAMOS, 2013

  4. Context > Runtime Master tasks: INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES 1. Manage graphs The S PIDER runtime manager 2. Map & Schedule 3. Send Jobs 4. Run jobs 5. Monitor & Trace Timings Master Jobs Params Data Slave Jobs Data Pool of data F IFO s Slave task: Slave Jobs - Run jobs Heulot et al. "Spider: A synchronous parameterized and interfaced K. Desnos – S PIDER / MPPA 4 dataflow-based RTOS for multicore DSPs" EDERC, 2014

  5. Context > The Beast INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES Manycore Architecture: Kalray’s MPPA256B Challenges • Distributed scratchpad memory • Massive parallelism • NoC-Based communications K. Desnos – S PIDER / MPPA 5

  6. Contributions INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES • Distributed synchronization • Lightweight manycore scheduling • Dataflow-based distributed memory allocation Miomandre et al. "Embedded Runtime for Reconfigurable Dataflow K. Desnos – S PIDER / MPPA 6 Graphs on Manycore Architectures" submitted to PARMA-DITAM 18

  7. Contributions > Overview INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES Spider distribution on MPPA Master Slave X 256 K. Desnos – S PIDER / MPPA 7

  8. Contributions > Synchro INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES Previous Synchronization Mechanisms Core 1 A Shared-Memory Based (x86, Big.LITTLE) Flag Pop Core 2 Poll B job Core 1 A Hardware supported FIFO (TI Keystone 2) Pop Core 2 B job K. Desnos – S PIDER / MPPA 8

  9. Contributions > Synchro INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES Synchronization Mechanisms Before: After: Centralized Polling Distributed notifications NoC NoC Master Slave Master Slave Slave Slave Slave Slave • Unbounded • Notifier/Observer pattern • Bounded communications • Contention communications K. Desnos – S PIDER / MPPA 9

  10. Contributions > Scheduling INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES Complexity issue for manycore Complexity: O( A.log(A) + P.A)) Good’ol Lis ist t sche heduli duling ng A: # actors 1. Create a sorted list of P: # processors all actors to map/schedule. Multicore architectures: 2. Schedule first actor P smaller than log(A) of the list of first => A.log(A) dominates available core. 3. Go back to step 2 Manycore architecture: until the list is empty. P is large A ∝ P (by designer) => Complexity quadraticaly K. Desnos – S PIDER / MPPA 10

  11. Contributions > Scheduling INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES Lightweight Scheduling Complexity: O( A ) Specialized Round Robin A: # actors 1.Create a list of actor in topological order. Advantages: 2.Schedule first - Low complexity actor by - Interleaving clusters avoids systematically interleaving communication contentions clusters and cores between starting actors. 3.Go back to step 2 - The scheduler controls the number until the list is of jobs in the queue of a core. empty. K. Desnos – S PIDER / MPPA 11

  12. Contributions > Memory Alloc. INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES Classic Memory Allocation Malloc In & Out Success? OK KO Crash! Execute actor K. Desnos – S PIDER / MPPA 12

  13. Contributions > Memory Alloc. INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES Cluster-Level Dataflow Aware Memory Allocation Malloc In & Out Success? OK KO Other OK Execute actor Actor? OK Notif. Master K. Desnos – S PIDER / MPPA 13

  14. Results > Application INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES + Morphological operator K. Desnos – S PIDER / MPPA 14

  15. Results > Performance INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES Max speedup: 22 on 256 cores (on 4k video) K. Desnos – S PIDER / MPPA 15

  16. Results > Energy INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES 2.5 more energy efficient (On 4k video) Intel Kalray Xeon E5-1650 MPPA256 Bostan 6-hyperthraded x86 258 RISC Cores 11.40 fps 2.81 fps ~120W ~12W K. Desnos – S PIDER / MPPA 16

  17. Summary INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES Successful porting of Spider on MPPA • Feasible: Reconfigurable Dataflow Runtime for manycore • Promising: Energy efficiency beats x86 • Open source(-ish): On github! Future Work • Better Julien, Florian, Alexandre, • Faster Hamza, Claudio, … • Stronger K. Desnos – S PIDER / MPPA 17

  18. INSTITUT D’ÉLECTRONIQUE ET DE TÉLÉCOMMUNICATIONS DE RENNES Achievement unlocked Thanks for your attention ? Achievement unlocked Question time http://preesm.sf.net @PreesmProject K. Desnos – S PIDER / MPPA 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend