A Moldable Online Scheduling Algorithm and Its Application to - PowerPoint PPT Presentation

A Moldable Online Scheduling Algorithm and Its Application to Parallel Short Sequence Mapping Erik Saule , Doruk Bozda˘ g, Umit V. Catalyurek Department of Biomedical Informatics, The Ohio State University { esaule,bozdagd,umit } @bmi.osu.edu Scheduling for Large Scale Systems, May 2009 Supported by the U.S. DOE SciDAC Institute, the U.S. National Science Foundation and the Ohio Supercomputing Center Erik Saule (BMI OSU) Moldable Task Scheduling 1 / 29

Motivation Sequencing Mapping Next generation Map reads to a reference genome sequencing instruments efficiently (Human genome: 3Gb) (SOLiD, Solexa, 454) can Sequential mapping takes about a sequence up to 1 billion day bases a day Need fast, parallel algorithms that Hundreds of millions of 35-50 base reads can handle mismatches Erik Saule (BMI OSU) Moldable Task Scheduling 2 / 29

Parallel Short Sequence Mapping[Bozdag et al. , IPDPS 09] Three partitioning dimensions: G G R G R P ( m g , m r , m s ) = c gs + c g + c rs + ( c r + c c ) m g m g m s m r m g m s m r m s Partitioning on m processors is finding minimum P ( m g , m r , m s ) such that m g m r m s ≤ m Erik Saule (BMI OSU) Moldable Task Scheduling 3 / 29

This talk A cost efficient approach To reduce cost, Ohio SuperComputing Center is building a bioscience dedicated cluster. It will host a Short Sequence Mapping service. Laboratories submits mapping request over the network. The service computes the mapping using the parallel algorithm. And sends the result back. This talk How to schedule the mapping request ? Erik Saule (BMI OSU) Moldable Task Scheduling 4 / 29

Outline of the Talk Introduction 1 A Moldable Scheduling Problem 2 Deadline Based Online Scheduler (DBOS) 3 Experiments 4 Conclusion 5 Erik Saule (BMI OSU) Moldable Task Scheduling 5 / 29

Parallel Short Sequence Mapping The important facts: 22 speedup can adapt to different number 20 of processor 18 16 good runtime prediction 14 12 function 10 8 no super linear speed up 6 4 non convex speedup function 2 (steps) 0 0 5 10 15 20 25 30 no preemption Erik Saule (BMI OSU) Moldable Task Scheduling 6 / 29

Moldable Scheduling Instance m processors n tasks Task i arrives at r i The execution of i on j processors takes p i , j time units Solution Task i is executed on π i processors Task i starts at σ i Task i finishes at C i = σ i + p i ,π i Erik Saule (BMI OSU) Moldable Task Scheduling 7 / 29

Objective Function Flow time The flow time is the time spent in the system per a task F i = C i − r i . Does not take task size into account. Optimizing the maximum flow time is unfair to small tasks. Optimizing the average flow time should starve large tasks. Stretch [Bender et al. SoDA 98] The stretch is the flow time normalized by the processing time of the task s i = C i − r i p i , 1 . It provides a better fairness between tasks. Optimizing maximum stretch avoids starvation. Erik Saule (BMI OSU) Moldable Task Scheduling 8 / 29

Online maximum stretch can not be approximated Adversary technique on one processor A large task enters in the system On several processors There are similar techniques on several processors but there are more complicated and thus less prone to appear in practice. The key point: if all processors are busy, a small task entering the system will have a large stretch. Erik Saule (BMI OSU) Moldable Task Scheduling 9 / 29

Online maximum stretch can not be approximated Adversary technique on one processor If it is scheduled immediately, a small task is sent On several processors There are similar techniques on several processors but there are more complicated and thus less prone to appear in practice. The key point: if all processors are busy, a small task entering the system will have a large stretch. Erik Saule (BMI OSU) Moldable Task Scheduling 9 / 29

Online maximum stretch can not be approximated Adversary technique on one processor It suffers a large delay (and an unbounded stretch) On several processors There are similar techniques on several processors but there are more complicated and thus less prone to appear in practice. The key point: if all processors are busy, a small task entering the system will have a large stretch. Erik Saule (BMI OSU) Moldable Task Scheduling 9 / 29

Online maximum stretch can not be approximated Adversary technique on one processor If the large task is scheduled later, a small task is sent accordingly On several processors There are similar techniques on several processors but there are more complicated and thus less prone to appear in practice. The key point: if all processors are busy, a small task entering the system will have a large stretch. Erik Saule (BMI OSU) Moldable Task Scheduling 9 / 29

Online maximum stretch can not be approximated Adversary technique on one processor It suffers a large delay (and an unbounded stretch) On several processors There are similar techniques on several processors but there are more complicated and thus less prone to appear in practice. The key point: if all processors are busy, a small task entering the system will have a large stretch. Erik Saule (BMI OSU) Moldable Task Scheduling 9 / 29

Outline of the Talk Introduction 1 A Moldable Scheduling Problem 2 Deadline Based Online Scheduler (DBOS) 3 Experiments 4 Conclusion 5 Erik Saule (BMI OSU) Moldable Task Scheduling 10 / 29

Principle of the Deadline Based Online Scheduler (DBOS) All tasks running concurrently should get the same stretch to maximize efficiency Using the optimal maximum stretch as an instant measure of the load Aim at a more efficient schedule than the optimal instant maximum stretch one to deal with still-to-arrive tasks Erik Saule (BMI OSU) Moldable Task Scheduling 11 / 29

The DBOS Algorithm Targeting a maximum stretch S Task i must complete before the deadline D i = r i + p i , 1 S . Moldable Earliest Deadline First (MEDF) Considers task in deadline order. Allocates the minimum number of processors to each task to completes before the deadline. Schedules the task as soon as possible without moving any other task. DBOS ( ρ ) Estimate the optimal maximum stretch S * using a binary search. The deadline problem is solved by MEDF. Build a schedule of good efficiency of stretch ρ S *. ρ is the online parameter Erik Saule (BMI OSU) Moldable Task Scheduling 12 / 29

A Moldable Online Scheduling Algorithm and Its Application to - PowerPoint PPT Presentation

A Moldable Online Scheduling Algorithm and Its Application to Parallel Short Sequence Mapping Erik Saule , Doruk Bozda g, Umit V. Catalyurek Department of Biomedical Informatics, The Ohio State University { esaule,bozdagd,umit } @bmi.osu.edu

A Moldable Online Scheduling Algorithm and Its Application to Parallel Short Sequence Mapping Erik

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Ponchatoula High School Scheduling for your Junior Year 2015-2016 Scheduling Procedures Online

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

Scheduling and SAT Emmanuel Hebrard Toulouse Outline Introduction 1 Scheduling and SAT

Planning and Scheduling Operations part 2 Scheduling and Control Functions Facility

Outline Workforce Scheduling DMP204 SCHEDULING, TIMETABLING AND ROUTING 1. Transportation

Transcriptome and isoform reconstruc1on with short reads

Data Mining in Bioinformatics Day 6: Classification in Bioinformatics Karsten Borgwardt February

Modlisation individu-centre de systmes biologiques complexes Application la simulation

Getting to the Core of Getting to the Core of Knowledge: Mining Knowledge: Mining

Group theoretic formalization of double-cut-and-join model of chromosomal rearrangement Sangeeta

Local Genetic Adaptation in Beef Cattle Jared Decker Assistant Professor Beef Genetics

Using International Information In National Single Step Genomic BLUP In Swiss Dairy Cattle

Agricultural Economics and Farm Surveys Department Teagasc Trevor Donnellan Ag Econ and Farm