Many-Core Scheduling of Data Parallel Applications using SMT Solvers - PowerPoint PPT Presentation

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Many-Core Scheduling of Data Parallel Applications using SMT Solvers Pranav Tendulkar Peter Poplavko Ioannis Galanommatis Oded Maler Verimag, FRANCE August 2014 Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 1 / 26

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Multi-core Processors Everywhere Tablets Laptops Space-shuttle Phones Cameras Cars Smart-TV Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 2 / 26

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Context Mapping and Scheduling solutions is exponential 1e14 7 1e14 Many-core platforms 2.00 6 1.75 involve extra complexity 5 Solutions 1.50 4 factors 1.25 3 1.00 2 Explicit modeling of 0.75 1 0.50 network communication 0 0.25 300 0.00 250 is necessary 200 Processors 1 150 2 Orchestration of 100 3 4 Tasks 50 5 processor and network 0 6 resources is non-trivial Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 3 / 26

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Design Problems How to : Maximize the performance of the application Optimally utilize memory resources Orchestrate shared resources such as Processors, DMA etc. Load balance the processors Minimize communication costs Schedule tasks in parallel sharing limited resources Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 4 / 26

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Outline Motivation 1 Application Model 2 Hardware Platform 3 Scheduling 4 Experiments 5 Conclusions 6 Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 5 / 26

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Overview Motivation 1 Application Model 2 Hardware Platform 3 Scheduling 4 Experiments 5 Conclusions 6 Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 6 / 26

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Model of Computation synchronous dataflow graphs (SDF) by E. Lee and D. Messerschmitt in 1987 task graph + symbolic representation of data parallelism signal-processing, video-coding applications a ‘standard’ in academic multicore compilers: StreamIt compiler of MIT Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 7 / 26

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Model of Computation synchronous dataflow graphs (SDF) by E. Lee and D. Messerschmitt in 1987 task graph + symbolic representation of data parallelism signal-processing, video-coding applications a ‘standard’ in academic multicore compilers: StreamIt compiler of MIT we use split-join graphs : restriction of SDF still covering perhaps 90% of use cases Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 7 / 26

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Model of Computation synchronous dataflow graphs (SDF) by E. Lee and D. Messerschmitt in 1987 task graph + symbolic representation of data parallelism signal-processing, video-coding applications a ‘standard’ in academic multicore compilers: StreamIt compiler of MIT we use split-join graphs : restriction of SDF still covering perhaps 90% of use cases Pranav Tendulkar, Peter Poplavko, and Oded Maler. “Symmetry Breaking for Multi-criteria Mapping and Scheduling on Multicores”. In: Formal Modeling and Analysis of Timed Systems . Lecture Notes in Computer Science. 2013 Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 7 / 26

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Split-Join Graphs a simple split-join graph example: α : spawn and split 1 /α : wait and join Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 8 / 26

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Kalray MPPA-256 Many-core platform = network of clusters 512 USMC PCIe inter laken DDR KB D-Noc C-Noc Router Router Quad GPIOs Core DMA C-NoC Eth Eth syst. DSU core Shared laken Inter laken Inter Memory C 0 C 1 C 4 C 5 Core Quad Core Quad C 2 C 3 C 5 C 6 C 8 C 9 C 12 C 13 512 KB KB 512 C 10 C 11 C 14 C 15 Quad DDR Core GPIOs PCIe interlaken 512 KB Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 10 / 26

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Kalray MPPA-256 Many-core platform = network of clusters 512 USMC PCIe inter laken DDR KB D-Noc C-Noc Router Router Quad GPIOs Core DMA C-NoC Eth Eth syst. DSU core Shared laken Inter laken Inter Memory C 0 C 1 C 4 C 5 Core Quad Core Quad C 2 C 3 C 5 C 6 C 8 C 9 C 12 C 13 512 KB KB 512 C 10 C 11 C 14 C 15 Quad DDR Core GPIOs PCIe interlaken 512 KB Efficient orchestration of network communication and cluster scheduling is non-trivial Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 10 / 26

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Platform characteristics 16 symmetric processors in a cluster Shared Memory within a cluster (2 MB) Data cache 8KB per core (disabled) Inter-cluster communication using DMA and NoC NoC with Toroidal 2D topology Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 11 / 26

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Design Flow Application Graph Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Design Flow Application Graph max workload per group Partitioning #groups estimated (3D Pareto solutions) comm. cost Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Design Flow Partitioning Output : Application Graph - Application graph partitioned into groups max workload per group Partitioning C #groups estimated A B E F (3D Pareto solutions) comm. cost D Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Design Flow Partitioning Output : Application Graph - Application graph partitioned into groups max workload per group Partitioning Goals : C #groups - Load balance the groups estimated A B E F (3D Pareto solutions) - Minimize communication comm. cost between groups D Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Design Flow Partitioning Output : Application Graph - Application graph partitioned into groups max workload per group Partitioning Goals : C #groups - Load balance the groups estimated A B E F (3D Pareto solutions) - Minimize communication comm. cost between groups D Problem Inputs : - Application Graph - Hardware Architecture Model Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Design Flow Application Graph max workload per group Partitioning #groups estimated (3D Pareto solutions) comm. cost Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Design Flow Application Graph max workload per group Partitioning #groups estimated (3D Pareto solutions) comm. cost Placement minimal solution communication cost Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Design Flow Placement C Output : Application Graph - Group to platform cluster A B E F assignment max workload per group D Partitioning #groups estimated (3D Pareto solutions) comm. cost Placement minimal solution communication cost Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

Many-Core Scheduling of Data Parallel Applications using SMT Solvers - PowerPoint PPT Presentation

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Many-Core Scheduling of Data Parallel Applications using SMT Solvers Pranav Tendulkar Peter Poplavko Ioannis Galanommatis Oded Maler Verimag, FRANCE

Welcome Welcome Core: Core A Regional Destination Core: Core UL Core: Core Downtown

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

Caching, Parallelism, Fault Tolerance Marco Serafini COMPSCI 532 Lectures 2-3 Memory Hierarchy

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Ponchatoula High School Scheduling for your Junior Year 2015-2016 Scheduling Procedures Online

CPU Scheduling Schedulers in the OS Structure of a CPU Scheduler Scheduling =

Scheduling and SAT Emmanuel Hebrard Toulouse Outline Introduction 1 Scheduling and SAT

CPU Scheduling Heechul Yun 1 Agenda Introduction to CPU scheduling Classical CPU

Parallelization and Parallelization and Proling Proling Programming for Statistical

Quickest Quickest Exam 1 Extra Credit: Exam 1 Extra Credit: either either show up and watch

An Introduction to Hilbert Space Embedding of Probability Measures Krikamol Muandet Max Planck

CNN and Musical Applications Juhan Nam Motivation Sensory data (image or audio) have

Transport Performance Metrics MIB draft-ietf-rmonmib-tpm-mib-06.txt Robert Cole, Russell Dietz

Datenanalyse mit Hadoop Quelle: Apache Software Foundation Datenanalyse mit Hadoop Gideon Zenz

CS 101: Computer Programming and Utilization Puru with CS101 TAs and Staff Course webpage:

CSCI 5832 Natural Language Processing Lecture 1 Jim Martin 1/23/07 CSCI 5832 Spring 2007