Many-Core Scheduling of Data Parallel Applications using SMT Solvers - - PowerPoint PPT Presentation

many core scheduling of data parallel applications using
SMART_READER_LITE
LIVE PREVIEW

Many-Core Scheduling of Data Parallel Applications using SMT Solvers - - PowerPoint PPT Presentation

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions Many-Core Scheduling of Data Parallel Applications using SMT Solvers Pranav Tendulkar Peter Poplavko Ioannis Galanommatis Oded Maler Verimag, FRANCE


slide-1
SLIDE 1

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Many-Core Scheduling of Data Parallel Applications using SMT Solvers

Pranav Tendulkar Peter Poplavko Ioannis Galanommatis Oded Maler

Verimag, FRANCE

August 2014

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 1 / 26

slide-2
SLIDE 2

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Multi-core Processors Everywhere

Cars Phones Space-shuttle Tablets Laptops Cameras Smart-TV

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 2 / 26

slide-3
SLIDE 3

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Context

Tasks 1 2 3 4 5 6 Processors 50 100 150 200 250 300 Solutions 1e14 1 2 3 4 5 6 7 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 1e14

Mapping and Scheduling solutions is exponential Many-core platforms involve extra complexity factors

Explicit modeling of network communication is necessary Orchestration of processor and network resources is non-trivial

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 3 / 26

slide-4
SLIDE 4

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Design Problems

How to: Maximize the performance of the application Optimally utilize memory resources Orchestrate shared resources such as Processors, DMA etc.

Load balance the processors Minimize communication costs Schedule tasks in parallel sharing limited resources

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 4 / 26

slide-5
SLIDE 5

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Outline

1

Motivation

2

Application Model

3

Hardware Platform

4

Scheduling

5

Experiments

6

Conclusions

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 5 / 26

slide-6
SLIDE 6

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Overview

1

Motivation

2

Application Model

3

Hardware Platform

4

Scheduling

5

Experiments

6

Conclusions

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 6 / 26

slide-7
SLIDE 7

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Model of Computation

synchronous dataflow graphs (SDF)

by E. Lee and D. Messerschmitt in 1987 task graph + symbolic representation of data parallelism signal-processing, video-coding applications

a ‘standard’ in academic multicore compilers:

StreamIt compiler of MIT

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 7 / 26

slide-8
SLIDE 8

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Model of Computation

synchronous dataflow graphs (SDF)

by E. Lee and D. Messerschmitt in 1987 task graph + symbolic representation of data parallelism signal-processing, video-coding applications

a ‘standard’ in academic multicore compilers:

StreamIt compiler of MIT

we use split-join graphs : restriction of SDF

still covering perhaps 90% of use cases

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 7 / 26

slide-9
SLIDE 9

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Model of Computation

synchronous dataflow graphs (SDF)

by E. Lee and D. Messerschmitt in 1987 task graph + symbolic representation of data parallelism signal-processing, video-coding applications

a ‘standard’ in academic multicore compilers:

StreamIt compiler of MIT

we use split-join graphs : restriction of SDF

still covering perhaps 90% of use cases Pranav Tendulkar, Peter Poplavko, and Oded Maler. “Symmetry Breaking for Multi-criteria Mapping and Scheduling on Multicores”. In: Formal Modeling and Analysis of Timed Systems. Lecture Notes in Computer Science. 2013

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 7 / 26

slide-10
SLIDE 10

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Split-Join Graphs

a simple split-join graph example: α : spawn and split 1/α: wait and join

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 8 / 26

slide-11
SLIDE 11

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Overview

1

Motivation

2

Application Model

3

Hardware Platform

4

Scheduling

5

Experiments

6

Conclusions

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 9 / 26

slide-12
SLIDE 12

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Kalray MPPA-256

Many-core platform = network of clusters

512 KB Quad Core USMC PCIe inter laken DDR GPIOs Eth Inter laken Quad Core 512 KB Eth Inter laken Quad Core 512 KB DDR GPIOs PCIe interlaken Quad Core 512 KB Shared Memory D-Noc Router DMA syst. core C-Noc Router C-NoC DSU C0 C1 C2 C3 C8 C9 C10 C11 C4 C5 C5 C6 C12 C13 C14 C15

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 10 / 26

slide-13
SLIDE 13

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Kalray MPPA-256

Many-core platform = network of clusters

512 KB Quad Core USMC PCIe inter laken DDR GPIOs Eth Inter laken Quad Core 512 KB Eth Inter laken Quad Core 512 KB DDR GPIOs PCIe interlaken Quad Core 512 KB Shared Memory D-Noc Router DMA syst. core C-Noc Router C-NoC DSU C0 C1 C2 C3 C8 C9 C10 C11 C4 C5 C5 C6 C12 C13 C14 C15

Efficient orchestration of network communication and cluster scheduling is non-trivial

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 10 / 26

slide-14
SLIDE 14

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Platform characteristics

16 symmetric processors in a cluster Shared Memory within a cluster (2 MB) Data cache 8KB per core (disabled) Inter-cluster communication using DMA and NoC NoC with Toroidal 2D topology

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 11 / 26

slide-15
SLIDE 15

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Overview

1

Motivation

2

Application Model

3

Hardware Platform

4

Scheduling

5

Experiments

6

Conclusions

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 12 / 26

slide-16
SLIDE 16

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Design Flow

Application Graph

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

slide-17
SLIDE 17

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Design Flow

Application Graph Partitioning estimated

  • comm. cost

max workload per group #groups

(3D Pareto solutions)

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

slide-18
SLIDE 18

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Design Flow

Application Graph Partitioning estimated

  • comm. cost

max workload per group #groups

(3D Pareto solutions)

Partitioning Output:

  • Application graph

partitioned into groups

A B C D E F

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

slide-19
SLIDE 19

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Design Flow

Application Graph Partitioning estimated

  • comm. cost

max workload per group #groups

(3D Pareto solutions)

Partitioning Output:

  • Application graph

partitioned into groups Goals:

  • Load balance the groups
  • Minimize communication

between groups

A B C D E F

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

slide-20
SLIDE 20

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Design Flow

Application Graph Partitioning estimated

  • comm. cost

max workload per group #groups

(3D Pareto solutions)

Partitioning Output:

  • Application graph

partitioned into groups Goals:

  • Load balance the groups
  • Minimize communication

between groups Problem Inputs:

  • Application Graph
  • Hardware Architecture

Model

A B C D E F

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

slide-21
SLIDE 21

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Design Flow

Application Graph Partitioning estimated

  • comm. cost

max workload per group #groups

(3D Pareto solutions)

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

slide-22
SLIDE 22

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Design Flow

Application Graph Partitioning estimated

  • comm. cost

max workload per group #groups

(3D Pareto solutions)

Placement communication cost

minimal solution

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

slide-23
SLIDE 23

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Design Flow

Application Graph Partitioning estimated

  • comm. cost

max workload per group #groups

(3D Pareto solutions)

Placement communication cost

minimal solution

Placement Output:

  • Group to platform cluster

assignment

A B C D E F

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

slide-24
SLIDE 24

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Design Flow

Application Graph Partitioning estimated

  • comm. cost

max workload per group #groups

(3D Pareto solutions)

Placement communication cost

minimal solution

Placement Output:

  • Group to platform cluster

assignment Goals:

  • Place communicating groups on

closely located hardware clusters

A B C D E F

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

slide-25
SLIDE 25

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Design Flow

Application Graph Partitioning estimated

  • comm. cost

max workload per group #groups

(3D Pareto solutions)

Placement communication cost

minimal solution

Placement Output:

  • Group to platform cluster

assignment Goals:

  • Place communicating groups on

closely located hardware clusters Problem Inputs:

  • Application Graph
  • Hardware Architecture Model
  • Partitioning scheme

A B C D E F

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

slide-26
SLIDE 26

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Design Flow

Application Graph Partitioning estimated

  • comm. cost

max workload per group #groups

(3D Pareto solutions)

Placement communication cost

minimal solution

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

slide-27
SLIDE 27

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Design Flow

Application Graph Partitioning estimated

  • comm. cost

max workload per group #groups

(3D Pareto solutions)

Placement communication cost

minimal solution

Multi-cluster Scheduling

(2D Pareto solutions)

comm. buffer size latency

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

slide-28
SLIDE 28

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Design Flow

Application Graph Partitioning estimated

  • comm. cost

max workload per group #groups

(3D Pareto solutions)

Placement communication cost

minimal solution

Multi-cluster Scheduling

(2D Pareto solutions)

comm. buffer size latency

Scheduling Output:

  • A mapping of every task to a

processor or DMA channel

  • Start time for every task
  • Comm. buffer size per channel

C0 D0 C1 D1 B0 B1 A0 E0 E1 F0 A0 B0 B1 C0 C1 D0 D1 E0 E1 F0 fifotx Time Cluster0 Cluster1 P1 P2 DMA0 P1 P2

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

slide-29
SLIDE 29

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Design Flow

Application Graph Partitioning estimated

  • comm. cost

max workload per group #groups

(3D Pareto solutions)

Placement communication cost

minimal solution

Multi-cluster Scheduling

(2D Pareto solutions)

comm. buffer size latency

Scheduling Output:

  • A mapping of every task to a

processor or DMA channel

  • Start time for every task
  • Comm. buffer size per channel

Goals:

  • Minimize application latency
  • Minimize comm. buffer space

C0 D0 C1 D1 B0 B1 A0 E0 E1 F0 A0 B0 B1 C0 C1 D0 D1 E0 E1 F0 fifotx Time Cluster0 Cluster1 P1 P2 DMA0 P1 P2

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

slide-30
SLIDE 30

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Design Flow

Application Graph Partitioning estimated

  • comm. cost

max workload per group #groups

(3D Pareto solutions)

Placement communication cost

minimal solution

Multi-cluster Scheduling

(2D Pareto solutions)

comm. buffer size latency

Scheduling Output:

  • A mapping of every task to a

processor or DMA channel

  • Start time for every task
  • Comm. buffer size per channel

Goals:

  • Minimize application latency
  • Minimize comm. buffer space

Problem Inputs:

  • partitioning solution
  • placement solution
  • hardware architecture model

C0 D0 C1 D1 B0 B1 A0 E0 E1 F0 A0 B0 B1 C0 C1 D0 D1 E0 E1 F0 fifotx Time Cluster0 Cluster1 P1 P2 DMA0 P1 P2

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 13 / 26

slide-31
SLIDE 31

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Design Space Exploration

SMT Constraints SMT Solver Design Space Exploration Algorithm solutions

scheduling placement partition problem constraints cost constraints

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 14 / 26

slide-32
SLIDE 32

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Design Space Exploration

SMT Constraints SMT Solver Design Space Exploration Algorithm solutions

scheduling placement partition problem constraints cost constraints

(x1, y1) SAT

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 14 / 26

slide-33
SLIDE 33

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Design Space Exploration

SMT Constraints SMT Solver Design Space Exploration Algorithm solutions

scheduling placement partition problem constraints cost constraints

(x2, y2) UNSAT

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 14 / 26

slide-34
SLIDE 34

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Design Space Exploration

SMT Constraints SMT Solver Design Space Exploration Algorithm solutions

scheduling placement partition problem constraints cost constraints

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 14 / 26

slide-35
SLIDE 35

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Exploration Algorithm

One SMT query for a given point (CL, CB) in the cost space: CL - latency CB - comm. buffer

sat points unsat points unexplored points

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 15 / 26

slide-36
SLIDE 36

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

DMA Model

Tasks communicating via DMA:

A I G B

Task Description Resources used Task duration I Initialization Processor and DMA Constant G Network Transfer Only DMA Transfer size dependent

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 16 / 26

slide-37
SLIDE 37

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Model Transformation

An example application graph:

A B ˆ e : [α(ˆ e), ω(ˆ e)]

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 17 / 26

slide-38
SLIDE 38

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Model Transformation

An example application graph:

A B ˆ e : [α(ˆ e), ω(ˆ e)]

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 17 / 26

slide-39
SLIDE 39

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Model Transformation

An example application graph:

A B ˆ e : [α(ˆ e), ω(ˆ e)]

Partition-Aware graph:

A Iwr Gwr B ewt(ˆ e) : [1, w↑(ˆ e)] ewn(ˆ e) : [1] ert(ˆ e) : [α(ˆ e), ω(ˆ e)]

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 17 / 26

slide-40
SLIDE 40

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Model Transformation

An example application graph:

A B ˆ e : [α(ˆ e), ω(ˆ e)]

Partition-Aware graph:

A Iwr Gwr B ewt(ˆ e) : [1, w↑(ˆ e)] ewn(ˆ e) : [1] ert(ˆ e) : [α(ˆ e), ω(ˆ e)]

Buffer-Aware graph:

A Iwr Gwr Fst B Ird Grd ewt : [1, w↑] ewn : [1] ert : [α, ω]

e

w s

: [ 1 ] ewb : [1, 0, b(ewt)] ers : [1] ern : [1]

erb : [α−1, 0, b(ert)]

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 17 / 26

slide-41
SLIDE 41

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Overview

1

Motivation

2

Application Model

3

Hardware Platform

4

Scheduling

5

Experiments

6

Conclusions

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 18 / 26

slide-42
SLIDE 42

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

JPEG Decoder

VLD IQ/ IDCT COLOR 12 1 12

VLD : Variable Length Decoder IQ / IDCT : Inverse Quantization / Inverse Discrete Cosine Transform Color : Color Conversion

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 19 / 26

slide-43
SLIDE 43

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

JPEG Decoder

Partitioning Solutions:

Allocated group Exploration Cost Solution vld iq color Cτ Cη Cz Ps0 1 2 424012 12384 3 Ps1 1 758116 2736 2 Ps2 934288 1 Ps3 1 1 510276 9648 2

Cτ : Maximum workload per group Cη : Total communication cost Cz : Number of Groups

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 20 / 26

slide-44
SLIDE 44

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

JPEG Decoder

Partitioning Solutions:

Allocated group Exploration Cost Solution vld iq color Cτ Cη Cz Ps0 1 2 424012 12384 3 Ps1 1 758116 2736 2 Ps2 934288 1 Ps3 1 1 510276 9648 2

Cτ : Maximum workload per group Cη : Total communication cost Cz : Number of Groups

Scheduling Solutions: 0.4 0.5 0.6 0.7 0.8 0.9 1 ·106 1 1.1 1.2 ·104 Latency (cycles) Buffer Size (bytes)

Ps0 Ps1 Ps2 Ps3

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 20 / 26

slide-45
SLIDE 45

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Measurements on the Kalray processor

0.4 0.5 0.6 0.7 0.8 0.9 1 ·106 1 1.1 1.2 ·104 Latency (cycles) Buffer Size (bytes)

Ps0

0.4 0.5 0.6 0.7 0.8 0.9 1 ·106 1 1.1 1.2 ·104 Latency (cycles) Buffer Size (bytes)

Ps1

0.4 0.5 0.6 0.7 0.8 0.9 1 ·106 1 1.1 1.2 ·104 Latency (cycles) Buffer Size(bytes)

Ps2

0.4 0.5 0.6 0.7 0.8 0.9 1 ·106 1 1.1 1.2 ·104 Latency (cycles) Buffer Size(bytes)

Ps3

model measured-min. measured-max.

JPEG decoder latency measured on Kalray platform

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 21 / 26

slide-46
SLIDE 46

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Other Applications

Benchmark #Actors #Channels #Tasks Total Exec. Time (cycles) Total Comm. Data (bytes) JPEG Decoder 3 2 25 934288 12384 Beam Former 8 7 53 342816 944 Insertion Sort 6 5 6 40033 320 Merge Sort 12 11 31 102347 704 Radix Sort 13 12 13 85464 768 Dct1 4 3 4 127496 768 Dct2 7 6 21 215525 1536 Dct3 5 4 12 129105 1024 Dct4 7 6 21 183890 1536 Dct5 7 6 21 216079 1536 Dct6 8 7 36 258304 1792 Dct7 8 7 29 218577 1792 Dct8 10 9 38 272514 2304 DctCoarse 3 2 3 74401 512 DctFine 6 5 20 163708 1280 Comparison Count 5 5 20 141397 1280 Matrix multiplication 11 11 79 1087840 10656 Fft 13 12 96 640109 6144

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 22 / 26

slide-47
SLIDE 47

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Other Applications

J P E G D e c . B e a m F

  • r

m e r I n s e r t i

  • n

S

  • r

t M e r g e S

  • r

t R a d i x S

  • r

t D c t 1 D c t 2 D c t 3 D c t 4 D c t 5 D c t 6 D c t 7 D c t 8 D c t C

  • a

r s e D c t F i n e C

  • m

p . c

  • u

n t M a t r i x M u l t . F f t 20 40 60 80 100

25 155 7 37 6 4 8 4 8 8 24 7 10 3 6 4 8 8

#Solutions %error

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 23 / 26

slide-48
SLIDE 48

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Other Applications

J P E G D e c . B e a m F

  • r

m e r I n s e r t i

  • n

S

  • r

t M e r g e S

  • r

t R a d i x S

  • r

t D c t 1 D c t 2 D c t 3 D c t 4 D c t 5 D c t 6 D c t 7 D c t 8 D c t C

  • a

r s e D c t F i n e C

  • m

p . c

  • u

n t M a t r i x M u l t . F f t 20 40 60 80 100

25 155 7 37 6 4 8 4 8 8 24 7 10 3 6 4 8 8

#Solutions %error

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 23 / 26

slide-49
SLIDE 49

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Overview

1

Motivation

2

Application Model

3

Hardware Platform

4

Scheduling

5

Experiments

6

Conclusions

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 24 / 26

slide-50
SLIDE 50

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Conclusions

Contributions: Automated design flow using SMT solvers Communication tasks for modeling explicit DMA communication Many-core scheduling of tasks on Processors and DMA

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 25 / 26

slide-51
SLIDE 51

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Conclusions

Contributions: Automated design flow using SMT solvers Communication tasks for modeling explicit DMA communication Many-core scheduling of tasks on Processors and DMA Future Work: Spreading task instances of an actor over multiple clusters Network route selection and communication scheduling Pipelined scheduling on the platform

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 25 / 26

slide-52
SLIDE 52

Motivation Application Model Hardware Platform Scheduling Experiments Conclusions

Questions

Questions?

Tendulkar, Poplavko, Galanommatis, Maler Mapping/scheduling for many-core 26 / 26