Scheduling tree-shaped task graphs to minimize memory and makespan - - PowerPoint PPT Presentation

scheduling tree shaped task graphs to minimize memory and
SMART_READER_LITE
LIVE PREVIEW

Scheduling tree-shaped task graphs to minimize memory and makespan - - PowerPoint PPT Presentation

Scheduling tree-shaped task graphs to minimize memory and makespan Lionel Eyraud-Dubois (INRIA, Bordeaux, France) , Loris Marchal (CNRS, Lyon, France) , Oliver Sinnen (Univ. Auckland, New Zealand) , Fr ed eric Vivien (INRIA, Lyon, France)


slide-1
SLIDE 1

1/ 28

Scheduling tree-shaped task graphs to minimize memory and makespan

Lionel Eyraud-Dubois (INRIA, Bordeaux, France), Loris Marchal (CNRS, Lyon, France), Oliver Sinnen (Univ. Auckland, New Zealand), Fr´ ed´ eric Vivien (INRIA, Lyon, France) New Challenges in Scheduling Theory Workshop, Aussois, March/April 2014

slide-2
SLIDE 2

2/ 28

Introduction

Task graph scheduling

◮ Application modeled as a graph ◮ Map tasks on processors and schedule them ◮ Usual performance metric: makespan (time)

Today: focus on memory

◮ Workflows with large temporary data ◮ Bad evolution of perf. for computation vs. communication:

1/Flops ≪ 1/bandwidth ≪ latency

◮ Gap between processing power and communication cost

increasing exponentially

annual improvements Flops rate 59%

  • mem. bandwidth

26%

  • mem. latency

5%

◮ Avoid communications ◮ Restrict to in-core memory (out-of-core is expensive)

slide-3
SLIDE 3

2/ 28

Introduction

Task graph scheduling

◮ Application modeled as a graph ◮ Map tasks on processors and schedule them ◮ Usual performance metric: makespan (time)

Today: focus on memory

◮ Workflows with large temporary data ◮ Bad evolution of perf. for computation vs. communication:

1/Flops ≪ 1/bandwidth ≪ latency

◮ Gap between processing power and communication cost

increasing exponentially

annual improvements Flops rate 59%

  • mem. bandwidth

26%

  • mem. latency

5%

◮ Avoid communications ◮ Restrict to in-core memory (out-of-core is expensive)

slide-4
SLIDE 4

2/ 28

Introduction

Task graph scheduling

◮ Application modeled as a graph ◮ Map tasks on processors and schedule them ◮ Usual performance metric: makespan (time)

Today: focus on memory

◮ Workflows with large temporary data ◮ Bad evolution of perf. for computation vs. communication:

1/Flops ≪ 1/bandwidth ≪ latency

◮ Gap between processing power and communication cost

increasing exponentially

annual improvements Flops rate 59%

  • mem. bandwidth

26%

  • mem. latency

5%

◮ Avoid communications ◮ Restrict to in-core memory (out-of-core is expensive)

slide-5
SLIDE 5

3/ 28

Focus on Task Trees

Motivation:

◮ Arise in multifrontal sparse matrix factorization ◮ Assembly/Elimination tree: application task graph is a tree ◮ Large temporary data ◮ Memory usage becomes a bottleneck

slide-6
SLIDE 6

4/ 28

Outline

Introduction and related work Complexity of parallel tree processing Heuristics for weighted task trees Simulations Summary and perspectives

slide-7
SLIDE 7

5/ 28

Outline

Introduction and related work Complexity of parallel tree processing Heuristics for weighted task trees Simulations Summary and perspectives

slide-8
SLIDE 8

6/ 28

Related Work: Register Allocation & Pebble Game

How to efficiently compute the following arithmetic expression with the minimum number of registers? 7 + (1 + x)(5 − z) − ((u − t)/(2 + z)) + v

+ u − − + 7 + v − 2 z 5 1 z x × / + t

Pebble-game rules:

◮ Inputs can be pebbled anytime ◮ If all ancestors are pebbled, a node can be pebbled ◮ A pebble may be removed anytime

Objective: pebble root node using minimum number of pebbles

slide-9
SLIDE 9

6/ 28

Related Work: Register Allocation & Pebble Game

How to efficiently compute the following arithmetic expression with the minimum number of registers? 7 + (1 + x)(5 − z) − ((u − t)/(2 + z)) + v

t u − − + 7 + v − 2 z 5 1 z x × / + +

Pebble-game rules:

◮ Inputs can be pebbled anytime ◮ If all ancestors are pebbled, a node can be pebbled ◮ A pebble may be removed anytime

Objective: pebble root node using minimum number of pebbles

slide-10
SLIDE 10

6/ 28

Related Work: Register Allocation & Pebble Game

How to efficiently compute the following arithmetic expression with the minimum number of registers? 7 + (1 + x)(5 − z) − ((u − t)/(2 + z)) + v

u − − + 7 + v − 2 z 5 1 z x × / + + t

Pebble-game rules:

◮ Inputs can be pebbled anytime ◮ If all ancestors are pebbled, a node can be pebbled ◮ A pebble may be removed anytime

Objective: pebble root node using minimum number of pebbles

slide-11
SLIDE 11

6/ 28

Related Work: Register Allocation & Pebble Game

How to efficiently compute the following arithmetic expression with the minimum number of registers? 7 + (1 + x)(5 − z) − ((u − t)/(2 + z)) + v

− − + 7 + v − 2 z 5 1 z x × / + + t u

Pebble-game rules:

◮ Inputs can be pebbled anytime ◮ If all ancestors are pebbled, a node can be pebbled ◮ A pebble may be removed anytime

Objective: pebble root node using minimum number of pebbles

slide-12
SLIDE 12

6/ 28

Related Work: Register Allocation & Pebble Game

How to efficiently compute the following arithmetic expression with the minimum number of registers? 7 + (1 + x)(5 − z) − ((u − t)/(2 + z)) + v

t u − − + 7 + v − 2 z 5 1 z x × / + +

Pebble-game rules:

◮ Inputs can be pebbled anytime ◮ If all ancestors are pebbled, a node can be pebbled ◮ A pebble may be removed anytime

Objective: pebble root node using minimum number of pebbles

slide-13
SLIDE 13

6/ 28

Related Work: Register Allocation & Pebble Game

How to efficiently compute the following arithmetic expression with the minimum number of registers? 7 + (1 + x)(5 − z) − ((u − t)/(2 + z)) + v

Complexity results

Problem on trees:

◮ Polynomial algorithm [Sethi & Ullman, 1970]

General problem on DAGs (common subexpressions):

◮ P-Space complete [Gilbert, Lengauer & Tarjan, 1980] ◮ Without re-computation: NP-complete [Sethi, 1973]

Pebble-game rules:

◮ Inputs can be pebbled anytime ◮ If all ancestors are pebbled, a node can be pebbled ◮ A pebble may be removed anytime

Objective: pebble root node using minimum number of pebbles

slide-14
SLIDE 14

7/ 28

Notations: Tree-Shaped Task Graphs

f3 f2 f5 f4 n3 n2 n5 n4 n1 5 4 2 1 3

◮ In-tree of n nodes ◮ Output data of size fi ◮ Execution data of size ni ◮ Input data of leaf nodes

have null size

◮ Memory for node i: MemReq(i) =

 

  • j∈Children(i)

fj   + ni + fi

slide-15
SLIDE 15

7/ 28

Notations: Tree-Shaped Task Graphs

f3

f2 f5 f4

n3

n2

n5 n4 n1 1 3 2 5 4

◮ In-tree of n nodes ◮ Output data of size fi ◮ Execution data of size ni ◮ Input data of leaf nodes

have null size

◮ Memory for node i: MemReq(i) =

 

  • j∈Children(i)

fj   + ni + fi

slide-16
SLIDE 16

8/ 28

Impact of Schedule on Memory Peak

3 3 2 2 6 1 2 2 4 3 1 5 4 2

Peak memory so far: Two existing optimal sequential schedules:

◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986]

slide-17
SLIDE 17

8/ 28

Impact of Schedule on Memory Peak

3 3 2

2

6 1 2

2

4 1 5 4 2 3

Peak memory so far: 4 Two existing optimal sequential schedules:

◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986]

slide-18
SLIDE 18

8/ 28

Impact of Schedule on Memory Peak

3 3 2

2

6 1 2 2 4 3 1 5 4 2

Peak memory so far: 4 Two existing optimal sequential schedules:

◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986]

slide-19
SLIDE 19

8/ 28

Impact of Schedule on Memory Peak

3 3

2 2

6 1

2

2 4 3 1 5 4 2

Peak memory so far: 6 Two existing optimal sequential schedules:

◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986]

slide-20
SLIDE 20

8/ 28

Impact of Schedule on Memory Peak

3 3

2 2

6 1 2 2 4 3 1 5 4 2

Peak memory so far: 6 Two existing optimal sequential schedules:

◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986]

slide-21
SLIDE 21

8/ 28

Impact of Schedule on Memory Peak

3

3 2 2

6

1

2 2 4 3 1 5 4 2

Peak memory so far: 8 Two existing optimal sequential schedules:

◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986]

slide-22
SLIDE 22

8/ 28

Impact of Schedule on Memory Peak

3

3

2 2 6 1 2 2 4 3 1 5 4 2

Peak memory so far: 8 Two existing optimal sequential schedules:

◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986]

slide-23
SLIDE 23

8/ 28

Impact of Schedule on Memory Peak

3 3

2 2

6

1 2 2 4 3 1 5 4 2

Peak memory so far: 12 Two existing optimal sequential schedules:

◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986]

slide-24
SLIDE 24

8/ 28

Impact of Schedule on Memory Peak

3 3

2 2 6 1 2 2 4 3 1 5 4 2

Peak memory so far: 12 Two existing optimal sequential schedules:

◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986]

slide-25
SLIDE 25

8/ 28

Impact of Schedule on Memory Peak

3 3

2 2 6 1 2 2

4

4 2 3 1 5

Peak memory so far: 12 Two existing optimal sequential schedules:

◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986]

slide-26
SLIDE 26

8/ 28

Impact of Schedule on Memory Peak

3 3 2 2 6 1 2 2 4 3 1 5 4 2

Peak memory so far: Two existing optimal sequential schedules:

◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986]

slide-27
SLIDE 27

8/ 28

Impact of Schedule on Memory Peak

3

3 2 2

6

1 2 2 4 3 1 5 4 2

Peak memory so far: 9 Two existing optimal sequential schedules:

◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986]

slide-28
SLIDE 28

8/ 28

Impact of Schedule on Memory Peak

3

3

2

2 6 1

2

2 4 3 1 5 4 2

Peak memory so far: 9 Two existing optimal sequential schedules:

◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986]

slide-29
SLIDE 29

8/ 28

Impact of Schedule on Memory Peak

3

3

2 2

6 1 2

2

4 1 5 4 2 3

Peak memory so far: 9 Two existing optimal sequential schedules:

◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986]

slide-30
SLIDE 30

8/ 28

Impact of Schedule on Memory Peak

3 3 2 2

6

1

2 2 4 3 1 5 4 2

Peak memory so far: 11 Two existing optimal sequential schedules:

◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986]

slide-31
SLIDE 31

8/ 28

Impact of Schedule on Memory Peak

3 3

2 2 6 1 2 2

4

4 2 3 1 5

Peak memory so far: 11 Two existing optimal sequential schedules:

◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986]

slide-32
SLIDE 32

8/ 28

Impact of Schedule on Memory Peak

3 3

2 2 6 1 2 2

4

4 2 3 1 5

Peak memory so far: 11 (which is better than 12) Two existing optimal sequential schedules:

◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986]

slide-33
SLIDE 33

9/ 28

Post-Order Traversal for Trees

Post-Order: entirely process one subtree after the other (DFS)

fn f2 r P1 P2 . . . Pn f1

Post-Order traversals are arbitrarily bad in the general case

There is no constant k such that the best post-order traversal is a k-approximation. In practice post-order have very good performance

slide-34
SLIDE 34

10/ 28

Outline

Introduction and related work Complexity of parallel tree processing Heuristics for weighted task trees Simulations Summary and perspectives

slide-35
SLIDE 35

11/ 28

Model for Parallel Tree Processing

◮ p identical processors ◮ Shared memory of size M ◮ Task i has execution time pi ◮ Parallel processing of nodes ⇒ larger memory ◮ Trade-off time vs. memory

f3

f2

f5 f4 n3

n2

n5 n4

n1 2 1 3 5 4

slide-36
SLIDE 36

12/ 28

NP-Completeness in the Pebble Game Model

Background:

◮ Makespan minimization NP-complete for trees (P|trees|Cmax) ◮ Polynomial when unit-weight tasks (P|pi = 1, trees|Cmax) ◮ Pebble game polynomial on trees

Pebble game model:

◮ Unit execution time: pi = 1 ◮ Unit memory costs: ni = 0, fi = 1

(pebble edges, equivalent to pebble game for trees)

Theorem

Deciding whether a tree can be scheduled using at most B pebbles in at most C steps is NP-complete.

slide-37
SLIDE 37

13/ 28

Space-Time Tradeoff

Not possible to get a guarantee on both memory and time simultaneously:

Theorem 1

There is no algorithm that is both an α-approximation for makespan minimization and a β-approximation for memory peak minimization when scheduling tree-shaped task graphs. For a fixed number of processors:

Theorem 2

For any α(p)-approximation for makespan and β(p)-approximation for memory peak with p ≥ 2 processors, α(p)β(p) ≥ 2p ⌈log(p)⌉ + 2·

slide-38
SLIDE 38

14/ 28

Outline

Introduction and related work Complexity of parallel tree processing Heuristics for weighted task trees Simulations Summary and perspectives

slide-39
SLIDE 39

15/ 28

InnerFirst: Post-Order in Parallel

Motivation:

◮ Post-Order behavior: process inner nodes ASAP ◮ Parallel version: give priority to inner nodes ◮ Naturally limits the number of concurrent subtrees ◮ Intuitively good to keep memory low

Implementation as a list-scheduling heuristic

◮ Put ready nodes in a queue (higher priority for inner nodes) ◮ Schedule them whenever a processor is ready ◮ Initially, sort leaf nodes using best sequential post-order

Performance:

◮ (2 − 1/p)-approximation for makespan ◮ Unbounded ratio for memory ◮ O(n log n) complexity

slide-40
SLIDE 40

16/ 28

DeepestFirst: Approach Optimal Makespan

DeepestFirst:

◮ Compute critical path values for all tasks ◮ List-scheduling based on critical path values

Performance:

◮ Known as a good heuristic for makespan minimization ◮ No guarantee (or intuition) on memory behavior ◮ O(n log n) complexity

slide-41
SLIDE 41

17/ 28

Subtrees: Coarse-Grain Parallelism

Motivation:

◮ Divide the tree in p large subtrees + small set of other nodes ◮ Each processor works on its own subtree ◮ Locally, use memory-optimal sequential algorithm ◮ Process all remaining nodes sequentially ◮ Optimization: if more than p subtrees when spliting,

load-balance subtrees on processors Performance:

◮ O(n log n) complexity ◮ p-approximation algorithm for memory

slide-42
SLIDE 42

18/ 28

How to Cope with Limited Memory?

Motivation:

◮ Work with a given quantity of memory ◮ Optimize makespan under this constraint

Stronger assumptions:

◮ Reduction tree:

  • j∈Children(i)

fj ≥ fi

◮ No extra memory cost for task execution

Assumptions not verified, but enforced by adding fictitious nodes

slide-43
SLIDE 43

18/ 28

How to Cope with Limited Memory?

Motivation:

◮ Work with a given quantity of memory ◮ Optimize makespan under this constraint

Stronger assumptions:

◮ Reduction tree:

  • j∈Children(i)

fj ≥ fi

◮ No extra memory cost for task execution

Assumptions not verified, but enforced by adding fictitious nodes

slide-44
SLIDE 44

18/ 28

How to Cope with Limited Memory?

Motivation:

◮ Work with a given quantity of memory ◮ Optimize makespan under this constraint

Stronger assumptions:

◮ Reduction tree:

  • j∈Children(i)

fj ≥ fi

◮ No extra memory cost for task execution

Assumptions not verified, but enforced by adding fictitious nodes

20 7 3 5

slide-45
SLIDE 45

18/ 28

How to Cope with Limited Memory?

Motivation:

◮ Work with a given quantity of memory ◮ Optimize makespan under this constraint

Stronger assumptions:

◮ Reduction tree:

  • j∈Children(i)

fj ≥ fi

◮ No extra memory cost for task execution

Assumptions not verified, but enforced by adding fictitious nodes

5 7 3 20 7 3 5 20

slide-46
SLIDE 46

18/ 28

How to Cope with Limited Memory?

Motivation:

◮ Work with a given quantity of memory ◮ Optimize makespan under this constraint

Stronger assumptions:

◮ Reduction tree:

  • j∈Children(i)

fj ≥ fi

◮ No extra memory cost for task execution

Assumptions not verified, but enforced by adding fictitious nodes

20 10 7 3 7 3 20 5 7 3 5 20

slide-47
SLIDE 47

19/ 28

Memory-Bounded Heuristics: Simple Way

First idea: restrain List-Scheduling heuristics (InnerFirst and DeepestFirst)

◮ Choose a feasible amount M of memory ◮ Check that memory ≤ M when starting a new leaf ◮ Guarantee: Memory used at most 2 × M

Proof ideas:

◮ Reduction tree: memory reduced by processing inner nodes ◮ During the processing: at most twice the input memory

slide-48
SLIDE 48

20/ 28

Memory-Bounded Heuristics: Complex Way

Second idea: complex memory booking scheme

◮ Book memory for parent nodes, ensure they can be processed

later

◮ Test for memory (booked+used) when starting a leaf ◮ Never exceeds a given memory M

22 30 16 20 18 12 14

slide-49
SLIDE 49

20/ 28

Memory-Bounded Heuristics: Complex Way

Second idea: complex memory booking scheme

◮ Book memory for parent nodes, ensure they can be processed

later

◮ Test for memory (booked+used) when starting a leaf ◮ Never exceeds a given memory M

22 30 16 20 18 12 14

slide-50
SLIDE 50

20/ 28

Memory-Bounded Heuristics: Complex Way

Second idea: complex memory booking scheme

◮ Book memory for parent nodes, ensure they can be processed

later

◮ Test for memory (booked+used) when starting a leaf ◮ Never exceeds a given memory M

22 30 16 20 18 12 14

slide-51
SLIDE 51

20/ 28

Memory-Bounded Heuristics: Complex Way

Second idea: complex memory booking scheme

◮ Book memory for parent nodes, ensure they can be processed

later

◮ Test for memory (booked+used) when starting a leaf ◮ Never exceeds a given memory M

22 30 16 20 18 12 14 2

slide-52
SLIDE 52

21/ 28

Outline

Introduction and related work Complexity of parallel tree processing Heuristics for weighted task trees Simulations Summary and perspectives

slide-53
SLIDE 53

22/ 28

Experimental Testbed

◮ 76 assembly trees of a set of sparse matrices from University

  • f Florida Sparse Collection

◮ Metis and AMD ordering ◮ 1, 2, 4, or 16 relaxed amalgamation per node ◮ 608 trees with:

number of nodes: 2,000 to 1,000,000 depth: 12 to 70,000 maximum degree: 2 to 175,000

◮ 2, 4, 8, 16 or 32 processors

slide-54
SLIDE 54

23/ 28

Results

Heuristic Best memory

  • Avg. normalized

Best makespan

  • Avg. normalized

memory needed makespan Subtrees 81.1 % 2.33 0.2 % 1.35 SubtreesOptim 49.9 % 2.45 1.1 % 1.29 InnerFirst 19.1 % 3.77 37.2 % 1.03 DeepestFirst 3.0 % 4.26 95.7 % 1.00 ◮ Memory normalized with optimal sequential memory ◮ Makespan normalized with best makespan

slide-55
SLIDE 55

24/ 28

Memory-Aware Heuristics: Makespan vs. Memory

1.5 2.0 1.4 1.3 1.2 1.1 1.0

Normalized memory limit (log scale) Normalized makespan (log scale)

1 2 4 6 8 10 15 20 4 processors

slide-56
SLIDE 56

24/ 28

Memory-Aware Heuristics: Makespan vs. Memory

1.3 1.5 1.4 2.0 1.2 1.1 1.0 Subtrees

Normalized makespan (log scale) Normalized memory limit (log scale)

4 processors 1 2 4 6 8 10 15 20

slide-57
SLIDE 57

24/ 28

Memory-Aware Heuristics: Makespan vs. Memory

1.1 1.0 2.0 1.5 1.4 1.3 1.2 Subtrees SubtreesOptim

Normalized memory limit (log scale) Normalized makespan (log scale)

4 processors 1 2 4 6 8 10 15 20

slide-58
SLIDE 58

24/ 28

Memory-Aware Heuristics: Makespan vs. Memory

1.2 1.0 2.0 1.5 1.4 1.3 1.1 InnerFirst Subtrees SubtreesOptim

Normalized makespan (log scale) Normalized memory limit (log scale)

4 processors 1 2 4 6 8 10 15 20

slide-59
SLIDE 59

24/ 28

Memory-Aware Heuristics: Makespan vs. Memory

1.1 1.5 1.4 1.3 1.0 1.2 2.0 InnerFirst MemLimitInnerFirst Subtrees SubtreesOptim

Normalized makespan (log scale) Normalized memory limit (log scale)

4 processors 1 2 4 6 8 10 15 20

slide-60
SLIDE 60

24/ 28

Memory-Aware Heuristics: Makespan vs. Memory

1.1 1.5 1.4 1.3 1.0 1.2 2.0 InnerFirst MemLimitInnerFirst MemLimitInnerFirstOptim Subtrees SubtreesOptim

Normalized memory limit (log scale) Normalized makespan (log scale)

20 4 processors 1 2 4 6 8 10 15

slide-61
SLIDE 61

24/ 28

Memory-Aware Heuristics: Makespan vs. Memory

1.5 1.3 1.4 1.2 2.0 1.0 1.1 DeepestFirst InnerFirst MemLimitDeepestFirst MemLimitDeepestFirstOptim MemLimitInnerFirst MemLimitInnerFirstOptim Subtrees SubtreesOptim

Normalized makespan (log scale) Normalized memory limit (log scale)

10 8 6 4 2 1 4 processors 15 20

slide-62
SLIDE 62

24/ 28

Memory-Aware Heuristics: Makespan vs. Memory

1.3 1.5 2.0 1.0 1.1 1.2 1.4 DeepestFirst InnerFirst MemLimitDeepestFirst MemLimitDeepestFirstOptim MemLimitInnerFirst MemLimitInnerFirstOptim MemoryBooking Subtrees SubtreesOptim

Normalized memory limit (log scale) Normalized makespan (log scale)

4 processors 15 20 1 2 6 4 8 10

slide-63
SLIDE 63

25/ 28

Memory-Aware Heuristics: Memory Usage

0.00 0.25 0.50 0.75 1.00 5 10 15 20

Normalized amount of available memory Normalized amount of used memory

heuristic MemoryBooking MemLimitInnerFirst MemLimitInnerFirstOptim MemLimitDeepestFirst

slide-64
SLIDE 64

26/ 28

Memory-Aware Heuristics: Makespan vs. memory

2 processors 4 processors 8 processors 16 processors 32 processors 1.0 1.1 1.2 1.3 1.4 1.5 1.0 1.1 1.2 1.3 1.4 1.5 2.0 1.0 1.1 1.2 1.3 1.4 1.5 2.0 1.0 1.1 1.2 1.3 1.4 1.5 2.0 1.0 1.1 1.2 1.3 1.4 1.5 2.0 3.0 1 2 4 6 8 10 15 20 1 2 4 6 8 10 15 20 1 2 4 6 8 10 15 20 1 2 4 6 8 10 15 20 1 2 4 6 8 10 15 20

Normalized amount of limited memory (log scale) Normalized makespan (log scale)

Heuristics: ParSubtrees ParSubtreesOptim ParInnerFirst ParDeepestFirst ParMemoryBooking ParMemLimitInnerFirst ParMemLimitInnerFirstOptim ParMemLimitDeepestFirst ParMemLimitDeepestFirstOptim

slide-65
SLIDE 65

27/ 28

Outline

Introduction and related work Complexity of parallel tree processing Heuristics for weighted task trees Simulations Summary and perspectives

slide-66
SLIDE 66

28/ 28

Summary and Perspectives

◮ Complexity study of parallel tree traversals ◮ Simple heuristics ◮ Memory-bounded heuristics ◮ Simulations on real elimination trees

Future work:

◮ Consider distributed memory ◮ Extend results to other class of regular graphs (2D grids, etc.) ◮ Minimize I/O volume for out-of-core execution ◮ Consider parallel (malleable) tasks