A Moldable Online Scheduling Algorithm and Its Application to - - PowerPoint PPT Presentation

a moldable online scheduling algorithm and its
SMART_READER_LITE
LIVE PREVIEW

A Moldable Online Scheduling Algorithm and Its Application to - - PowerPoint PPT Presentation

A Moldable Online Scheduling Algorithm and Its Application to Parallel Short Sequence Mapping Erik Saule , Doruk Bozda g, Umit V. Catalyurek Department of Biomedical Informatics, The Ohio State University { esaule,bozdagd,umit } @bmi.osu.edu


slide-1
SLIDE 1

A Moldable Online Scheduling Algorithm and Its Application to Parallel Short Sequence Mapping

Erik Saule, Doruk Bozda˘ g, Umit V. Catalyurek

Department of Biomedical Informatics, The Ohio State University {esaule,bozdagd,umit}@bmi.osu.edu

JSSPP 2010

Supported by the U.S. DOE SciDAC Institute, the U.S. National Science Foundation and the Ohio Supercomputing Center Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling :: 1 / 26

slide-2
SLIDE 2

Motivation

Sequencing

Next generation sequencing instruments (SOLiD, Solexa, 454) can sequence up to 1 billion bases a day

Hundreds of millions of 35-50 base reads

Mapping

Map reads to a reference genome efficiently (Human genome: 3Gb) Need large parallel computer Pooling resource will decrease cost We study the job scheduling problem

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Introduction:: 2 / 26

slide-3
SLIDE 3

Parallel Short Sequence Mapping[Bozdag et al., IPDPS 09]

Three partitioning dimensions: P(mg, mr, ms) = cgs G mg + cg G mgms + crs R mr + (cr + cc G mgms ) R mrms Partitioning on m processors is finding minimum P(mg, mr, ms) such that mgmrms ≤ m

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Introduction:: 3 / 26

slide-4
SLIDE 4

Outline of the Talk

1

Introduction

2

A Moldable Scheduling Problem

3

Deadline Based Online Scheduler (DBOS)

4

Experiments

5

Conclusion

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Introduction:: 4 / 26

slide-5
SLIDE 5

Parallel Short Sequence Mapping

The important facts: can adapt to different number

  • f processor

good runtime prediction function no super linear speed up non convex speedup function (steps) no preemption

2 4 6 8 10 12 14 16 18 20 22 5 10 15 20 25 30 speedup

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling A Moldable Scheduling Problem:: 5 / 26

slide-6
SLIDE 6

Moldable Scheduling

Instance

m processors n tasks Task i arrives at ri The execution of i on j processors takes pi,j time units

12 6 10 4 3 5 7

Solution

Task i is executed on πi processors Task i starts at σi Task i finishes at Ci = σi + pi,πi

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling A Moldable Scheduling Problem:: 6 / 26

slide-7
SLIDE 7

Objective Function

Flow time

The flow time is the time spent in the system per a task Fi = Ci − ri. Does not take task size into account. Optimizing the maximum flow time is unfair to small tasks. Optimizing the average flow time should starve large tasks.

Stretch [Bender et al. SoDA 98]

The stretch is the flow time normalized by the processing time of the task. In the moldable tasks context, we define it as si = Ci−ri

pi,1 .

It provides a better fairness between tasks. Optimizing maximum stretch avoids starvation.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling A Moldable Scheduling Problem:: 7 / 26

slide-8
SLIDE 8

Objective Function

Flow time

The flow time is the time spent in the system per a task Fi = Ci − ri. Does not take task size into account. Optimizing the maximum flow time is unfair to small tasks. Optimizing the average flow time should starve large tasks.

Stretch [Bender et al. SoDA 98]

The stretch is the flow time normalized by the processing time of the task. In the moldable tasks context, we define it as si = Ci−ri

pi,1 .

It provides a better fairness between tasks. Optimizing maximum stretch avoids starvation.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling A Moldable Scheduling Problem:: 7 / 26

slide-9
SLIDE 9

Online maximum stretch can not be approximated

Adversary technique on one processor

A large task enters in the system

On several processors

There are similar techniques on several processors but there are more complicated and thus less prone to appear in practice. The key point: if all processors are busy, a small task entering the system will have a large stretch.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling A Moldable Scheduling Problem:: 8 / 26

slide-10
SLIDE 10

Online maximum stretch can not be approximated

Adversary technique on one processor

If it is scheduled immediately, a small task is sent

On several processors

There are similar techniques on several processors but there are more complicated and thus less prone to appear in practice. The key point: if all processors are busy, a small task entering the system will have a large stretch.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling A Moldable Scheduling Problem:: 8 / 26

slide-11
SLIDE 11

Online maximum stretch can not be approximated

Adversary technique on one processor

It suffers a large delay (and an unbounded stretch)

On several processors

There are similar techniques on several processors but there are more complicated and thus less prone to appear in practice. The key point: if all processors are busy, a small task entering the system will have a large stretch.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling A Moldable Scheduling Problem:: 8 / 26

slide-12
SLIDE 12

Online maximum stretch can not be approximated

Adversary technique on one processor

If the large task is scheduled later, a small task is sent accordingly

On several processors

There are similar techniques on several processors but there are more complicated and thus less prone to appear in practice. The key point: if all processors are busy, a small task entering the system will have a large stretch.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling A Moldable Scheduling Problem:: 8 / 26

slide-13
SLIDE 13

Online maximum stretch can not be approximated

Adversary technique on one processor

It suffers a large delay (and an unbounded stretch)

On several processors

There are similar techniques on several processors but there are more complicated and thus less prone to appear in practice. The key point: if all processors are busy, a small task entering the system will have a large stretch.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling A Moldable Scheduling Problem:: 8 / 26

slide-14
SLIDE 14

Online maximum stretch can not be approximated

Adversary technique on one processor

It suffers a large delay (and an unbounded stretch)

On several processors

There are similar techniques on several processors but there are more complicated and thus less prone to appear in practice. The key point: if all processors are busy, a small task entering the system will have a large stretch.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling A Moldable Scheduling Problem:: 8 / 26

slide-15
SLIDE 15

Outline of the Talk

1

Introduction

2

A Moldable Scheduling Problem

3

Deadline Based Online Scheduler (DBOS)

4

Experiments

5

Conclusion

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Deadline Based Online Scheduler (DBOS):: 9 / 26

slide-16
SLIDE 16

Principle of the Deadline Based Online Scheduler (DBOS)

All tasks running concurrently should get the same stretch to maximize efficiency Using the optimal maximum stretch as an instant measure of the load Aim at a more efficient schedule than the optimal instant maximum stretch one to deal with still-to-arrive tasks

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Deadline Based Online Scheduler (DBOS):: 10 / 26

slide-17
SLIDE 17

The DBOS Algorithm

Targeting a maximum stretch S

Task i must complete before the deadline Di = ri + pi,1S.

Moldable Earliest Deadline First (MEDF)

Considers task in deadline order. Allocates the minimum number of processors to each task to completes before the deadline. Schedules the task as soon as possible without moving any other task.

DBOS(ρ)

Estimate the best maximum stretch S* using a binary search. The deadline problem is solved by MEDF. Build a schedule of good efficiency of stretch ρS*.

ρ is the online parameter

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Deadline Based Online Scheduler (DBOS):: 11 / 26

slide-18
SLIDE 18

The DBOS Algorithm

Targeting a maximum stretch S

Task i must complete before the deadline Di = ri + pi,1S.

Moldable Earliest Deadline First (MEDF)

Considers task in deadline order. Allocates the minimum number of processors to each task to completes before the deadline. Schedules the task as soon as possible without moving any other task.

DBOS(ρ)

Estimate the best maximum stretch S* using a binary search. The deadline problem is solved by MEDF. Build a schedule of good efficiency of stretch ρS*.

ρ is the online parameter

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Deadline Based Online Scheduler (DBOS):: 11 / 26

slide-19
SLIDE 19

The DBOS Algorithm

Targeting a maximum stretch S

Task i must complete before the deadline Di = ri + pi,1S.

Moldable Earliest Deadline First (MEDF)

Considers task in deadline order. Allocates the minimum number of processors to each task to completes before the deadline. Schedules the task as soon as possible without moving any other task.

DBOS(ρ)

Estimate the best maximum stretch S* using a binary search. The deadline problem is solved by MEDF. Build a schedule of good efficiency of stretch ρS*.

ρ is the online parameter

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Deadline Based Online Scheduler (DBOS):: 11 / 26

slide-20
SLIDE 20

An example

  • A system with two pending tasks

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Deadline Based Online Scheduler (DBOS):: 12 / 26

slide-21
SLIDE 21

An example

max stretch=2

  • Deadlines induced by a stretch of 2

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Deadline Based Online Scheduler (DBOS):: 12 / 26

slide-22
SLIDE 22

An example

max stretch=2

  • A maximum stretch of 2 is reachable

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Deadline Based Online Scheduler (DBOS):: 12 / 26

slide-23
SLIDE 23

An example

max stretch=1

  • But 1 is not

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Deadline Based Online Scheduler (DBOS):: 12 / 26

slide-24
SLIDE 24

An example

max stretch=1.5

  • Neither 1.5

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Deadline Based Online Scheduler (DBOS):: 12 / 26

slide-25
SLIDE 25

An example

max stretch=1.6

  • The best stretch is 1.6

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Deadline Based Online Scheduler (DBOS):: 12 / 26

slide-26
SLIDE 26

An example

max stretch=1.75

  • The online parameter ρ = 1.1 leaves much more space (thanks to MEDF).

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Deadline Based Online Scheduler (DBOS):: 12 / 26

slide-27
SLIDE 27

Outline of the Talk

1

Introduction

2

A Moldable Scheduling Problem

3

Deadline Based Online Scheduler (DBOS)

4

Experiments

5

Conclusion

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Experiments:: 13 / 26

slide-28
SLIDE 28

An Iterative Process [Sabin et al, JSSPP 06]

The algorithm

Processor allocation are evaluated using the flow-time of the FCFS schedule Starts with one processor per task. Try to add one processor to the task that will reduce its processing time the most If it is better, keep it Otherwise remove the processor and never try that task again

Properties

Optimizing flow time Claimed to outperform fair share Parameter-less

2 4 6 8 10 12 14 16 18 20 22 5 10 15 20 25 30 speedup

Improvement

If the speedup function is non convex or has

  • steps. The algorithm

gets stuck. Modification: step to the next point on the convex hull

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Experiments:: 14 / 26

slide-29
SLIDE 29

First Experimental Setting

Goal: assess performance on a well known setting

Downey model

Two parameters: Average parallelism Distance to linear speedup

10 20 30 40 50 60 70 50 100 150 200 250 300 0.5 1 2 50 2ΨΘΦΙςΣϑ4ςΣΓΙΩΩΣςΩ

7ΤΙΙΗΨΤ[ΜΞΛΕΡ%ΖΙςΕΚΙ4ΕςΕΠΠΙΠΜΩΘΣϑ (Μ2ΙςΙΡΞΓΨςΖΙΩΚΜΖΙΩΗΜ2ΙςΙΡΞΗΜΩΞΕΡΓΙΞΣΠΜΡΙΕςΩΤΙΙΗΨΤ

Generation

512 processors First 5000 tasks of SDSC Par 96 (From the Feitelson archive) Sequential time : total execution time Average parallelism : between number of used processor and 512 Distance to linear speedup : between 0 and 2

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Experiments:: 15 / 26

slide-30
SLIDE 30

Downey model results

1 2 3 4 5 x 10

4

0.00000001 0.000001 0.0001 0.01 1 100 10000

Sorted tasks Stretch

Iterative DBOS ρ=1 DBOS ρ=1.5

DBOS generates less tasks with high stretch.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Experiments:: 16 / 26

slide-31
SLIDE 31

Downey model results

1 2 3 4 5 x 10

4

100 10000 1000000 100000000 10000000000

Sorted tasks Flow

Iterative DBOS ρ=1 DBOS ρ=1.5

DBOS leads to better flow time. Iterative could be improved.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Experiments:: 17 / 26

slide-32
SLIDE 32

Second Experimental Setting

Goal: test case reflecting the cluster usage

Generation

512 processors Each task corresponds to one lab studying one genome Speedup according to the runtime prediction function 5000 tasks with exponential inter-arrival time Changing the parameter of the exponential to control the load

Real data

Sequencing machine Reads 454 GS FLX Genome Analyzer 1 million Solexa IG sequencer 200 million SOLiD system 400 million Genome Size

  • E. Coli

4.6 million Yeast 15 million

  • A. Thaliana

100 million Mosquito 280 million Rice 465 million Chicken 1.2 billion Human 3.4 billion

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Experiments:: 18 / 26

slide-33
SLIDE 33

Mapping : the online parameter (average stretch)

100−115 200−230 330−360 450−500 500−570 640−710 0.01 0.1 1

Average Stretch Load

DBOS ρ=1 DBOS ρ=1.1 DBOS ρ=1.3 DBOS ρ=1.5

Quickly drops with ρ. Step at ρ = 1.3.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Experiments:: 19 / 26

slide-34
SLIDE 34

Mapping : the online parameter (maximum stretch)

100−115 200−230 330−360 450−500 500−570 640−710 0.1 1 10 100

Maximum Stretch Load

DBOS ρ=1 DBOS ρ=1.1 DBOS ρ=1.3 DBOS ρ=1.5

Max stretch is kept at a reasonable level. The online parameter ρ is very helpful here.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Experiments:: 20 / 26

slide-35
SLIDE 35

Mapping : tuning the online parameter

1 2 3 4 5 6 7 0.01 0.1 1

Average Stretch ρ

640−710 500−570 450−500 330−360 200−230 100−115

On non-overloaded cases, the average stretch is bimonotonic. A reasonable ρ value is easy to find.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Experiments:: 21 / 26

slide-36
SLIDE 36

Mapping : DBOS vs Iterative (average flow)

100−115 200−230 330−360 450−500 500−570 640−710 10,000 100,000 1,000,000

Average Flow Load

Iterative Improved Iterative DBOS ρ=1.5

DBOS is competitive.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Experiments:: 22 / 26

slide-37
SLIDE 37

Mapping : DBOS vs Iterative (average stretch)

100−115 200−230 330−360 450−500 500−570 640−710 0.01 0.1 1

Average Stretch Load

Iterative Improved Iterative DBOS ρ=1.5

DBOS leads to much better stretch (even when iterative got stuck).

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Experiments:: 23 / 26

slide-38
SLIDE 38

Outline of the Talk

1

Introduction

2

A Moldable Scheduling Problem

3

Deadline Based Online Scheduler (DBOS)

4

Experiments

5

Conclusion

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Conclusion:: 24 / 26

slide-39
SLIDE 39

The end

Conclusion

Pooling the resources in short sequence mapping operation should lower the costs. To provide fairness stretch should be considered instead of flow time. An scheduling algorithm is proposed to optimize stretch and avoid worst case online scenario. Which performs well on Short Sequence Mapping application.

Perspective

Investigate other ways to avoid worst case scenarios. Study more simple algorithms/models to get reference points.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Conclusion:: 25 / 26

slide-40
SLIDE 40

The end

Conclusion

Pooling the resources in short sequence mapping operation should lower the costs. To provide fairness stretch should be considered instead of flow time. An scheduling algorithm is proposed to optimize stretch and avoid worst case online scenario. Which performs well on Short Sequence Mapping application.

Perspective

Investigate other ways to avoid worst case scenarios. Study more simple algorithms/models to get reference points.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Conclusion:: 25 / 26

slide-41
SLIDE 41

A Moldable Online Scheduling Algorithm and Its Application to Parallel Short Sequence Mapping

Erik Saule, Doruk Bozda˘ g, Umit V. Catalyurek

Department of Biomedical Informatics, The Ohio State University {esaule,bozdagd,umit}@bmi.osu.edu

JSSPP 2010

Supported by the U.S. DOE SciDAC Institute, the U.S. National Science Foundation and the Ohio Supercomputing Center Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc Moldable Task Scheduling Conclusion:: 26 / 26