Decentralized Dynamic Scheduling across Heterogeneous Multi core - - PowerPoint PPT Presentation

decentralized dynamic scheduling across heterogeneous
SMART_READER_LITE
LIVE PREVIEW

Decentralized Dynamic Scheduling across Heterogeneous Multi core - - PowerPoint PPT Presentation

Decentralized Dynamic Scheduling across Heterogeneous Multi core across Heterogeneous Multi core Desktop Grids J Jaehwan Lee , Pete Keleher, Alan Sussman h L P t K l h Al S Department of Computer Science University of Maryland Multi


slide-1
SLIDE 1

Decentralized Dynamic Scheduling across Heterogeneous Multi core across Heterogeneous Multi‐core Desktop Grids

J h L P t K l h Al S Jaehwan Lee, Pete Keleher, Alan Sussman Department of Computer Science University of Maryland

slide-2
SLIDE 2

Multi‐core is not enough Multi core is not enough

  • Multi‐core CPU is the current trend of desktop

p computing

  • Not easy to exploit Multi‐core in a single machine

for high throughput computing for high throughput computing

“Multicore is Bad news for Supercomputers”, S. Moore, IEEE Spectrum, 2008

  • We have proposed decentralized solution for

initial job placement for Multi‐core Grids, but..

Dynamic Re‐scheduling can surely improve performance even more performance even more ...

slide-3
SLIDE 3

Motivation and Challenges Motivation and Challenges

  • Why is dynamic scheduling needed?

Stale load information Unpredictable job completion times Probabilistic initial job assignment

  • Challenges for decentralized dynamic

scheduling for multi‐core grids

Multiple resource requirements Decentralized algorithm needed N J b i ll d No Job starvation allowed

slide-4
SLIDE 4

Our Contribution Our Contribution

  • New Decentralized Dynamic Scheduling

New Decentralized Dynamic Scheduling Schemes for Multi‐core Grids

Intra‐node scheduling Inter‐node scheduling Aggressive job migration via Queue Balancing

  • Experimental Results via extensive simulation

p

Performance better than static scheduling Competitive with an online centralized scheduler

slide-5
SLIDE 5

Outline Outline

  • Background

l d k

  • Related work
  • Our approach
  • Our approach
  • Experimental Results

p

  • Conclusion & Future Work
slide-6
SLIDE 6

Overall System Architecture Overall System Architecture

  • P2P grid

P2P grid

Owner Node Job J

Route Initiate

Matchmaking (Scheduling)

Heartbeat Route Job J Find Send Job J Heartbeat

Clients Run Node Injection Node

Insert Job J

Peer-to-Peer Clients Node Node

Assign GUID to Job J

J

FIFO Job Queue

Peer to Peer Network (DHT - CAN)

slide-7
SLIDE 7

Matchmaking Mechanism in CAN Matchmaking Mechanism in CAN

Memory

A D G

Run

A D G

Pushing Job J Node

J

FIFO Queue

B E H

Client

Job J Heartbeat

C F I

Job J Owner

Job J

CPU >= CJ && Insert J

CPU

&& Memory >= MJ

CJ MJ

slide-8
SLIDE 8

Outline Outline

k d

  • Background

l d k

  • Related work

O h

  • Our approach
  • Experimental Results

Experimental Results

  • Conclusion & Future Work
slide-9
SLIDE 9

Backfilling Backfilling

  • Basic Concept

CPU CPUs Job1 Job2 Job3 Job4

  • Features

Time

Features

Job running time must be known Conservative vs. EASY Backfilling Inaccurate job running time estimates reduce overall performance

slide-10
SLIDE 10

Approaches for K‐resource requirements Approaches for K resource requirements

  • Backfilling with multiple resource requirements

(Leinberger:SC’99)

Backfilling in a single machine Heuristic approaches Heuristic approaches Assumption : Job Running times are known

J b i ti t b l K b t d

  • Job migration to balance K‐resources between nodes

(Leinberger:HCW’00)

Reduce local load imbalance by exchanging jobs, but does id ll l d not consider overall system loads No backfilling scheme Assumption : near‐homogeneous environment p g

slide-11
SLIDE 11

Outline Outline

B k d

  • Background
  • Related work

Related work

  • Our approach

Our approach

  • Experimental Results

p

  • Conclusion & Future Work
slide-12
SLIDE 12

Dynamic Scheduling Dynamic Scheduling

  • After Initial Job assignment, but before the job

te t a Job ass g e t, but be o e t e job starts running, dynamic scheduling algorithm invoked Periodically

  • Costs for dynamic scheduling

Job Migration Cost

  • None : For intra‐node scheduling
  • Minimal : For inter‐node scheduling & Queue balancing

Minimal : For inter node scheduling & Queue balancing

CPU cost : None

  • No preemptive scheduling : Once a job starts running, it

’t b t d d t d i h d li won’t be stopped due to dynamic scheduling.

slide-13
SLIDE 13

Intra‐Node Scheduling Intra Node Scheduling

  • Extension of Backfilling with

multiple resource requirements

J3

multiple resource requirements

  • Backfilling Counter (BC)

Backfilling Running Job BC JR

3

Initial value : 0 Counts number of other jobs h h b d h b

Head of Queue BC J1 J2 1 1

that have bypassed the job Only a job whose BC is equal to or greater than maximum

J3 J4

to or greater than maximum BC of jobs in the queue can be backfilled

Queue Quad‐core CPU

No job starvation

slide-14
SLIDE 14

Which job should be backfilled? Which job should be backfilled?

  • If multiple jobs can be backfilled,

Backfill Balanced (BB) (Leinberger:SC’99) algorithm Backfill Balanced (BB) (Leinberger:SC 99) algorithm Choose the job with minimum objective function(= BM x FM)

  • Balance Measure (BM)

BM

Maximum Utilization

BM = Minimize uneven usage across multiple resources

Average Utilization

  • Fullness Measure (FM)

FM = 1 – Average Utilization Maximize average utilization

slide-15
SLIDE 15

Inter‐node Scheduling Inter node Scheduling

  • Extension of Intra‐node scheduling across nodes
  • Node Backfilling Counter (NBC)
  • Node Backfilling Counter (NBC)

Maximum BC of jobs in the node’s waiting queue Only jobs whose BC is equal to or greater than NBC of the target node can be migrated node can be migrated No job starvation

Running J R i R i Running Job BC JC J Running Job 2 BC JA J Running Job 1 BC JB JB’ J NBC : 2 NBC : 0 J4 1 J8 J9 J10 2 2 J5 J6 J7 1 1 J1 J2 J4 1 1 1 Node C Node A Node B

slide-16
SLIDE 16

Inter‐node Scheduling – PUSH vs. PULL g

  • PUSH

b d h A job sender initiates the process Sender tries to match every job in its queue with residual resources in its neighbors in the CAN If a job can be sent to multiple nodes, pick the node with minimum objective function, and prefer a node with the fastest CPU

  • PULL
  • PULL

A job receiver initiates the process Receiver sends a PULL‐Request message to the potential sender (the one with maximum current queue length) P t ti l d h k h th it h j b th t b b kfill d d th j b Potential sender checks whether it has a job that can be backfilled, and the job satisfies BC condition If multiple jobs can be sent, choose the job with minimum objective function (= BM x FM ) If j b b f d d PULL R j i If no job can be found, send a PULL‐Reject message to receiver The receiver looks for another potential sender among neighbors, if gets a PULL‐Reject message

slide-17
SLIDE 17

Queue Balancing Queue Balancing

  • Intra‐node scheduling & Inter‐node scheduling look for job

that can start running immediately, to use current residual resources

  • Add Proactive job migration for queue (load) balancing
  • Add Proactive job migration for queue (load) balancing

Migrated job does not have to start immediately

  • Use normalized Load measure for a node with multiple

Use normalized Load measure for a node with multiple resources (Leinberger:HCW’00)

For each resource, sum all job’s requirements in the queue and normalize it with respect to node’s resource capability p p y Load on a node defined as the maximum of those

  • PUSH & PULL schemes can be used

Minimize total local loads (= sum of loads of neighbors, TLL) Minimize maximum local load among neighbors (MLL)

slide-18
SLIDE 18

Outline Outline

B k d

  • Background
  • Related work

Related work

  • Our approach

pp

  • Experimental Results

p

  • Conclusion & Future Work
slide-19
SLIDE 19

Experimental Setup Experimental Setup

  • Event‐driven Simulations

A t f d d t A set of nodes and events

  • 1000 initial nodes and 5000 job submissions
  • Jobs are submitted with average inter‐arrival time τ (with a

Poisson distribution) Poisson distribution)

  • A node has 1,2,4 or 8 cores
  • Job run times uniformly distributed between 30 and 90

minutes minutes Node Capabilities and Job Requirements

  • CPU, Memory, Disk and the number of cores
  • Job requirement for a resource can be omitted (Don’t care)

Job Constraint Ratio : The probability that each resource type for a job is specified

Steady state experiments

slide-20
SLIDE 20

Comparison Models Comparison Models

  • Centralized Scheduler (CENT)

Online and global scheduling mechanism with a single wait queue Not feasible in a complete implementation of P2P system

Job J

CPU ≥ 2.0GHz Mem ≥ 500MB

Centralized Scheduler

  • Tested combinations of our schemes

Disk ≥ 1GB

Scheduler

Tested combinations of our schemes

Vanilla : No dynamic scheduling (Static Scheduling only) L : Intra‐node scheduling only LI : L + Inter‐node scheduling LIQ : LI + Queue balancing LI(Q)‐PUSH/PULL : LI & LIQ with PUSH/PULL options

slide-21
SLIDE 21

Performance varying system load Performance varying system load

  • LIQ‐PULL > LI‐PULL > LIQ‐PUSH

> LI PUSH > L > Vanilla > LI‐PUSH > L >= Vanilla

  • Inter‐node scheduling

provides big improvement PULL i b tt th PUSH

  • PULL is better than PUSH

In overloaded system, PULL is better to spread information due to aggressive trial for job migration (Demers:PODC’87)

  • Intra‐node scheduling cannot

guarantee better performance than Vanilla than Vanilla

The Backfilling Counter does not ensure that other waiting jobs will not be delayed (different from conservative (different from conservative backfilling)

slide-22
SLIDE 22

Overheads Overheads

  • PULL has higher cost than PUSH

Active search (lots of trials and rejects)

  • Other schemes are similar to Vanilla

No significant additional overhead

slide-23
SLIDE 23

Performance varying Job Constraint Ratio Performance varying Job Constraint Ratio

  • LIQ‐PULL : best
  • LIQ == LI
  • LIQ‐PULL is competitive

to CENT to CENT

  • For 80% Job Constraint

Ratio, LIQ‐PULL f t performance gets relatively worse

difficult to find a capable neighbor for job migration, because jobs are more highly constrained constrained

slide-24
SLIDE 24

Evaluation Summary Evaluation Summary

  • Performance

LIQ‐PULL is competitive to CENT Inter‐node Scheduling has major impact on performance p PULL is better than PUSH (more aggressive search) Good performance can be achieved regardless of system load and job constraint ratio system load and job constraint ratio

it’s worthwhile to do dynamic load balancing

  • Overheads

PULL > PUSH (more aggressive search) Competitive to Vanilla Competitive to Vanilla

slide-25
SLIDE 25

Conclusion and Future Work Conclusion and Future Work

  • New decentralized Dynamic Scheduling for Multi‐core P2P

Grids

Extension of Backfilling (Intra‐node/Inter‐node) Backfilling Counter : No Job Starvation Proactive Queue Balancing

  • Performance Evaluation via simulation

B tt th St ti S h d li Better than Static Scheduling Competitive performance to CENT Low overhead

  • Future work

Real grid experiments (in cooperation with Astronomy Dept.) Decentralized Resource Management for Heterogeneous Decentralized Resource Management for Heterogeneous Asymmetric Multi‐processors