Decentralized Dynamic Scheduling across Heterogeneous Multi core - PowerPoint PPT Presentation

Decentralized Dynamic Scheduling across Heterogeneous Multi core across Heterogeneous Multi ‐ core Desktop Grids J Jaehwan Lee , Pete Keleher, Alan Sussman h L P t K l h Al S Department of Computer Science University of Maryland

Multi ‐ core is not enough Multi core is not enough • Multi ‐ core CPU is the current trend of desktop p computing • Not easy to exploit Multi ‐ core in a single machine for high throughput computing for high throughput computing � “ Multicore is Bad news for Supercomputers ”, S. Moore, IEEE Spectrum, 2008 • We have proposed decentralized solution for initial job placement for Multi ‐ core Grids, but.. Dynamic Re ‐ scheduling can surely improve performance even more performance even more ...

Motivation and Challenges Motivation and Challenges • Why is dynamic scheduling needed? � Stale load information � Unpredictable job completion times � Probabilistic initial job assignment • Challenges for decentralized dynamic scheduling for multi ‐ core grids � Multiple resource requirements � Decentralized algorithm needed � No Job starvation allowed N J b i ll d

Our Contribution Our Contribution • New Decentralized Dynamic Scheduling New Decentralized Dynamic Scheduling Schemes for Multi ‐ core Grids � Intra ‐ node scheduling � Inter ‐ node scheduling � Aggressive job migration via Queue Balancing • Experimental Results via extensive simulation p � Performance better than static scheduling � Competitive with an online centralized scheduler

Outline Outline • Background • Related work l d k • Our approach • Our approach • Experimental Results p • Conclusion & Future Work

Overall System Architecture Overall System Architecture • P2P grid P2P grid Job J Initiate Owner Matchmaking Heartbeat Node (Scheduling) Route Route Job J Send Job J Find Heartbeat Insert Job J Run Injection Node Node Peer to Peer Peer-to-Peer Clients Clients Node Node Network J (DHT - CAN) Assign GUID FIFO Job Queue to Job J

Matchmaking Mechanism in CAN Matchmaking Mechanism in CAN Memory Run A A D D G G Node J Pushing FIFO Queue Job J Job J Client B E H Heartbeat Job J Job J Insert J CPU >= C J C F I Owner && && M J Memory >= M J C J CPU

Outline Outline • Background k d • Related work l d k • Our approach O h • Experimental Results Experimental Results • Conclusion & Future Work

Backfilling Backfilling • Basic Concept CPU CPUs Job2 Job4 Job1 Job3 Time • Features Features � Job running time must be known � Conservative vs. EASY Backfilling � Inaccurate job running time estimates reduce overall performance

Approaches for K ‐ resource requirements Approaches for K resource requirements • Backfilling with multiple resource requirements (Leinberger:SC’99) � Backfilling in a single machine � Heuristic approaches Heuristic approaches � Assumption : Job Running times are known • Job migration to balance K ‐ resources between nodes J b i ti t b l K b t d (Leinberger:HCW’00) � Reduce local load imbalance by exchanging jobs, but does not consider overall system loads id ll l d � No backfilling scheme � Assumption : near ‐ homogeneous environment p g

Outline Outline • Background B k d • Related work Related work • Our approach Our approach • Experimental Results p • Conclusion & Future Work

Dynamic Scheduling Dynamic Scheduling • After Initial Job assignment, but before the job te t a Job ass g e t, but be o e t e job starts running, dynamic scheduling algorithm invoked Periodically • Costs for dynamic scheduling � Job Migration Cost • None : For intra ‐ node scheduling • Minimal : For inter ‐ node scheduling & Queue balancing Minimal : For inter node scheduling & Queue balancing � CPU cost : None • No preemptive scheduling : Once a job starts running, it won’t be stopped due to dynamic scheduling. ’t b t d d t d i h d li

Intra ‐ Node Scheduling Intra Node Scheduling • Extension of Backfilling with J 3 multiple resource requirements multiple resource requirements 3 Running J R Backfilling Job • Backfilling Counter ( BC ) BC BC � Initial value : 0 Head of 1 0 Queue J 1 � Counts number of other jobs 1 0 J 2 that have bypassed the job h h b d h b J 3 0 � Only a job whose BC is equal J 4 0 to or greater than maximum to or greater than maximum Queue BC of jobs in the queue can be backfilled Quad ‐ core CPU � No job starvation

Which job should be backfilled? Which job should be backfilled? • If multiple jobs can be backfilled, � Backfill Balanced ( BB ) (Leinberger:SC’99) algorithm � Backfill Balanced ( BB ) (Leinberger:SC 99) algorithm � Choose the job with minimum objective function (= BM x FM ) • Balance Measure ( BM ) Maximum Utilization � BM � BM = Average Utilization � Minimize uneven usage across multiple resources • Fullness Measure ( FM ) � FM = 1 – Average Utilization � Maximize average utilization

Inter node Scheduling Inter ‐ node Scheduling • Extension of Intra ‐ node scheduling across nodes • Node Backfilling Counter ( NBC ) • Node Backfilling Counter ( NBC ) � Maximum BC of jobs in the node’s waiting queue � Only jobs whose BC is equal to or greater than NBC of the target node can be migrated node can be migrated � No job starvation Running Running J J C R Running i J A R Running i J B J B’ J 4 Job Job Job NBC : 2 NBC : 0 BC BC BC 1 1 J 5 J 2 2 J J 1 1 1 J J 8 0 0 J 6 J 2 J 9 2 1 1 0 J 7 J 4 J 10 0 0 0 1 Node A Node B Node C

Inter ‐ node Scheduling – PUSH vs. PULL g • PUSH � A job sender initiates the process b d h � Sender tries to match every job in its queue with residual resources in its neighbors in the CAN � If a job can be sent to multiple nodes, pick the node with minimum objective function, and prefer a node with the fastest CPU PULL PULL • • � A job receiver initiates the process � Receiver sends a PULL ‐ Request message to the potential sender (the one with maximum current queue length) � Potential sender checks whether it has a job that can be backfilled, and the job P t ti l d h k h th it h j b th t b b kfill d d th j b satisfies BC condition � If multiple jobs can be sent, choose the job with minimum objective function (= BM x FM ) � If no job can be found, send a PULL ‐ Reject message to receiver If j b b f d d PULL R j i � The receiver looks for another potential sender among neighbors, if gets a PULL ‐ Reject message

Queue Balancing Queue Balancing • Intra ‐ node scheduling & Inter ‐ node scheduling look for job that can start running immediately, to use current residual resources • Add Proactive job migration for queue (load) balancing • Add Proactive job migration for queue (load) balancing � Migrated job does not have to start immediately • Use normalized Load measure for a node with multiple Use normalized Load measure for a node with multiple resources (Leinberger:HCW’00) � For each resource, sum all job’s requirements in the queue and normalize it with respect to node’s resource capability p p y � Load on a node defined as the maximum of those • PUSH & PULL schemes can be used � Minimize total local loads (= sum of loads of neighbors, TLL ) � Minimize maximum local load among neighbors ( MLL )

Outline Outline • Background B k d • Related work Related work • Our approach pp • Experimental Results p • Conclusion & Future Work

Experimental Setup Experimental Setup • Event ‐ driven Simulations � A set of nodes and events A t f d d t • 1000 initial nodes and 5000 job submissions • Jobs are submitted with average inter ‐ arrival time τ (with a Poisson distribution) Poisson distribution) • A node has 1,2,4 or 8 cores • Job run times uniformly distributed between 30 and 90 minutes minutes � Node Capabilities and Job Requirements • CPU, Memory, Disk and the number of cores • Job requirement for a resource can be omitted (Don’t care) � Job Constraint Ratio : The probability that each resource type for a job is specified � Steady state experiments

Comparison Models Comparison Models • Centralized Scheduler ( CENT ) � Online and global scheduling mechanism with a single wait queue � Not feasible in a complete implementation of P2P system Job J CPU ≥ 2.0GHz Centralized Mem ≥ 500MB Scheduler Scheduler Disk ≥ 1GB • Tested combinations of our schemes Tested combinations of our schemes � Vanilla : No dynamic scheduling (Static Scheduling only) � L : Intra ‐ node scheduling only � LI : L + Inter ‐ node scheduling � LIQ : LI + Queue balancing � LI(Q) ‐ PUSH/PULL : LI & LIQ with PUSH/PULL options

Performance varying system load Performance varying system load • LIQ ‐ PULL > LI ‐ PULL > LIQ ‐ PUSH > LI PUSH > L > Vanilla > LI ‐ PUSH > L >= Vanilla • Inter ‐ node scheduling provides big improvement • PULL i b tt PULL is better than PUSH th PUSH � In overloaded system, PULL is better to spread information due to aggressive trial for job migration (Demers:PODC’87) Intra ‐ node scheduling cannot • guarantee better performance than Vanilla than Vanilla � The Backfilling Counter does not ensure that other waiting jobs will not be delayed (different from conservative (different from conservative backfilling)

Decentralized Dynamic Scheduling across Heterogeneous Multi core - PowerPoint PPT Presentation

Decentralized Dynamic Scheduling across Heterogeneous Multi core across Heterogeneous Multi core Desktop Grids J Jaehwan Lee , Pete Keleher, Alan Sussman h L P t K l h Al S Department of Computer Science University of Maryland Multi

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

Coverage in Heterogeneous Coverage in Heterogeneous Networks Xiaoli Chu King s College

Dynamic Programming: Interval Scheduling and Knapsack 6.1 Weighted Interval Scheduling Weighted

Ponchatoula High School Scheduling for your Junior Year 2015-2016 Scheduling Procedures Online

CPU Scheduling Schedulers in the OS Structure of a CPU Scheduler Scheduling =

Scheduling and SAT Emmanuel Hebrard Toulouse Outline Introduction 1 Scheduling and SAT

CPU Scheduling Heechul Yun 1 Agenda Introduction to CPU scheduling Classical CPU

On the (im-) possibility of cold to warm distillation Henning Struchtrup University of Victoria,

Higher-Order Masking Schemes for S-boxes Matthieu Rivain Joint work with C. Carlet, L. Goubin,

T15: 402.8.3 Off Detector Electronics Colin Jessop, University of Notre Dame US-MTD Technical

Higher-Order Side Channel Security and Mask Refreshing J.-S. Coron,E. Prouff, M. Rivain and T.

Topological Hyperplane Arrangements David Forge, LRI, Universit e Paris-Sud and Thomas

Lecture 5 The Big Picture/Language Modeling Michael Picheny, Bhuvana Ramabhadran, Stanley F .

Constraining Queuing Delay in a Constraining Queuing Delay in a Router based on Superposition of

Introduction to Machine Learning Evaluation: Training Error compstat-lmu.github.io/lecture_i2ml