Scheduling Parallel DAG Jobs Online Ben Moseley (CMU) Joint work - PowerPoint PPT Presentation

Scheduling Parallel DAG Jobs Online Ben Moseley (CMU) Joint work with: Kunal Agrawal (WahsU) Jing Li (NJIT) Kefu Lu (WashU/CMU)

Client-Server Scheduling l Clients send parallel jobs to the server l Jobs schedule on identical processors/machines l Server processes jobs and provides service guarantees l Jobs arrive over time – online l Jobs can be preempted l Worst case setting 2

Service Guarantees • Flow time – difference between arrival and completion of a job • Common objectives in online scheduling: • Average/Sum Flow Time • Maximum Flow Time • Throughput Time Job's Flow Time Arrival Completion 3

Parallelism Models • Speed-up Curves • Jobs associated with speed-up functions • Directed-Acyclic Graph (DAG) Model • Jobs have work which correspond to a DAG • Each job is modeled as a DAG • Job completed when last node of its DAG is completed • Processing rate depends on the number of nodes being worked on 4

Parallelism Models • Speed-Up Curves l Jobs have total work W divided into phases l Each phase has work l Phases are processed sequentially l Processing rate function Γ (m) l Function of number of processors given l Function is usually positive sub-linear l Function can be different depending on the phase the job is currently in. A Job's Phases 5

Directed Acyclic Graph Model of Parallelism l Nodes represent computation l Arrows represent dependencies 6

Online Study of Models • DAG model • Well-studied offline • Only studied recently online • Naturally captures programs generated by languages and libraries such as Cilk, Cilk Plus, Intel TBB, OpenMP. • Used by applied communities: Cyber-Physical-Systems (Real-Time) community excited (Outstanding paper award ECTRS 2013, Best- Student-Paper Award RTSS 2011) 7

Results First results for average flow in DAG model Average Flow Time [SODA 2016] • LAPS is (1+ ε ) speed O (1) competitive, for fixed ε >0 • Best theoretically possible Throughput [LATIN 2018] • A (1+ ε ) speed O (1) competitive algorithm for fixed ε >0 • Best theoretically possible Maximum Flow time [SPAA 2016] • A (1+ ε ) speed O (1) competitive algorithm, for fixed ε >0 • Open if speed is needed • Algorithm is practical

Algortihm Development • DAG model has been popular because of its connection to practice • Well studied for scheduling a single DAG job to minimize makespan • Work stealing algorithm: good practical and theoretical performance • Used in numerous systems for scheduling a parallel job • Non-clairvoyant • Distributed protocol • No preemption • Want to emulate this success and use theory for FIFO to guide a modification of Work- Stealing

Work-Stealing push Core 1 Steal Core 2 pop Core 3 double ended queues

Example: FIFO FIFO: Execute available nodes of job(s) with earliest arrival Could be more than one job depending on ready nodes

FIFO: Implementation Challenges Job 1 arrives at time 0 Job 2 arrives at time 1 Core 1 Core 2 Core 3

FIFO: Implementation Challenges A global queue Q Job 1 storing all available nodes arrives at time 0 Job 2 arrives at time 1 Core 1 Core 2 Q Core 3 Time 0 1 2 3 4 5 6

FIFO: Implementation Challenges A global queue Q Job 1 storing all available nodes arrives at time 0 Job 2 arrives at time 1 Core 1 Job’s arrival time 0 0 0 1 Core 2 Q Core 3 Time 0 1 2 3 4 5 6

FIFO: Implementation Challenges A global queue Q Job 1 storing all available nodes arrives at time 0 Each core at each time step executes one node in Q Job 2 from the job with the arrives at earliest arrival time time 1 Core 1 Job’s arrival time 0 0 0 1 Core 2 Q Core 3 Time 0 1 2 3 4 5 6

FIFO: Implementation Challenges A global queue Q Job 1 storing all available nodes arrives at time 0 Each core at each time step executes one node in Q Job 2 from the job with the arrives at earliest arrival time time 1 Core 1 Job’s arrival time 1 0 Core 2 Q Core 3 Time 0 1 2 3 4 5 6

Work Stealing for Multiple jobs Cores 1 execute FIFO order Parallel jobs C B 2 arrive at global queue admit 3 steal (1) Each core has a queue and executes work from it (2) Only when the local queue runs out of work, a core will admit a job from global queue (3) Algorithm can steal for other queues or from the global queue Has the same theoretical guarantees as FIFO and gave good practical performance

Conclusion New results for scheduling DAG jobs online • Results have lead to practically usable algorithms for minimizing • maximum flow time Recent results submitted for average flow time • • Much harder due to the need for preemptions Open Questions: • • Is resource augmentation needed for maximum flow time in the DAG and speed up curve model (knowing parallelism)? • Practical algorithm for throughput maximization?

Thank You! Questions? 0.14 0.09 0.1 Bing workload Finance workload Log-normal workload 0.09 0.08 0.12 OPT OPT 0.08 0.07 OPT steal-k-first steal-k-first 0.1 0.07 Max flow time (sec) Max flow time (sec) Max flow time (sec) steal-k-first 0.06 admit-first admit-first 0.06 admit-first 0.08 0.05 0.05 0.04 0.06 0.04 0.03 0.03 0.04 0.02 0.02 0.02 0.01 0.01 0 0 0 800 1000 1200 800 900 1000 800 1000 1200 QPS QPS QPS (a) Bing workload (b) Finance workload (c) Log-normal workload 19

Scheduling Parallel DAG Jobs Online Ben Moseley (CMU) Joint work - PowerPoint PPT Presentation

Scheduling Parallel DAG Jobs Online Ben Moseley (CMU) Joint work with: Kunal Agrawal (WahsU) Jing Li (NJIT) Kefu Lu (WashU/CMU) Client-Server Scheduling l Clients send parallel jobs to the server l Jobs schedule on identical processors/machines

JOBS, JOBS, JOBS! JOBS, JOBS, JOBS! Jobs, jobs, JO JOBS! JOBS, JOBS, JOBS! The other reality

CSE 421 Longest Path in a DAG, LIS, Shortest Path with Negative Weights Shayan Oveis Gharan 1

XD XDAG: PoW + DA DAG frozen@xdag.io XDAG: A new DAG-based cryptocurrency The first mineable

The PROIEL corpora Dag Trygve Truslew Haug Milan, 4 June 2019 Dag Haug PROIEL Milan, 4 June

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Jobs at sea TRINITY HOUSE // KEY STAGE 2 JOBS AT SEA Starter Activity 1 TRINITY HOUSE //

Ponchatoula High School Scheduling for your Junior Year 2015-2016 Scheduling Procedures Online

CPU Scheduling The scheduling problem: - Have K jobs ready to run - Have N 1 CPUs - Which

Periodic Task Scheduling Radek Pel anek Introduction Periodic Scheduling Aperiodic Jobs in

Planning and Scheduling Operations part 2 Scheduling and Control Functions Facility

Scheduling Scheduling Scheduling levels Decision to switch the running process can take place

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

AN INTRODUCTION TO WORKFLOWS WITH DAGMAN Presented by Lauren Michael 1 HTCondor Week 2019

3.5 Connectivity in Directed Graphs Directed Graphs Directed graph. G = (V, E) Edge (u, v)

DAGs with NO TEARS Continuous Optimization for Structure Learning Xun Zheng Bryon Aragam

Gradient-Based Neural DAG Learning for Causal Discovery Sbastien Lachapelle 1 Philippe

The Roy Model and Pearls Do Calculus: What Do Cannot Do James Heckman University of

DAGs and topological sort Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck

RPL- Routing over Low Power and Lossy Networks Michael Richardson Ines Robles IETF 94

In Search of a Fast and Efficient Serverless DAG Engine Benjamin Carver, Jingyuan Zhang, Ao