Mult lti-Resource Packin ing for Clu luster Schedule lers
Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, Aditya Akella
Mult lti-Resource Packin ing for Clu luster Schedule lers Robert - - PowerPoint PPT Presentation
Mult lti-Resource Packin ing for Clu luster Schedule lers Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, Aditya Akella Performance of cluster schedulers We find that: Resources are fragmented i.e. machines run
Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, Aditya Akella
We find that:
1Time to finish a set of jobs
Applications have (very) diverse resource needs Multiple resources become tight This matters, because no single bottleneck resource in the cluster:
3
4
Current Schedulers “Packer” Scheduler
Machine A 4 GB Memory Machine B 4 GB Memory
T1: 2 GB T3: 4 GB T2: 2 GB
Time ime
STOP
Machine A 4 GB Memory Machine B 4 GB Memory
T1: 2 GB T3: 4 GB T2: 2 GB
Time ime
5
Curren ent t Sc Schedu duler ers
RF increase with the number of resources being allocated !
Allocate resources per
slo lots, s, fair irne ness. ss.
Are not explicit about packing.
Current Schedulers “Packer” Scheduler
Machine A 4 GB Memory; 20 MB/s Nw.
Time ime
T1: 2 GB Memory T2: 2 GB Memory T3: 2 GB Memory
Machine A 4 GB Memory; 20 MB/s Nw.
Time ime
T1: 2 GB Memory 20 MB/s Nw. T2: 2 GB Memory 20 MB/s Nw. T3: 2 GB Memory
STOP
20 MB/s Nw. 20 MB/s Nw.
6
Not all of the resources are expli
licit citly ly all llocate cated E.g. g.,di ,disk sk and netw twor
can be over
er-al allo locate cated
Curren ent t Sc Schedu duler ers
Wo Work Conser serving ving ! != = no fragmentati mentation,
er-al allo locati cation
Hides the impact of resource fragmentation
Different tasks in the same job have different demands
How the job is scheduled impacts jobs’ current resource profiles Can schedule to create complementarity
Example in paper Packer vs. DRF: makespan and avg. completion time improve by over 30% Pareto eto1 effici icient ent != = perfor formant ant
1no job can increase its share without decreasing the share of another
7
Job completion time Fairness vs. Cluster efficiency vs.
8
9
Multi-Resource Packing of Tasks sim imil ilar to Multi-Dimensional Bin Packing
Balls could be tasks Bin could be machine, time
1APX-Hard is a strict subset of NP-hard
APX-Hard1
Existing heuristics do not directly apply: Assume balls of a fixed size Assume balls are known apriori
10
dependencies, cluster activity Avoiding fragmentation looks like: Tight bin packing Reduce # of bins reduce makespan
Packing heuristic
Over-Allocation
Alignment score (A)
11
A packing heuristic
Fit
A works eause:
Resource Fragmentation
12
13
CHALLENGE
Shortest Remaining Time First1 (SRTF)
1SRTF – M. Harchol-Balter et al. Connection Scheduling in Web Servers [USITS’99]
schedules jobs in ascending order of their remaining time
Job Completion Time Heuristic
Q: What is the shortest remaining time ?
remaining work remaining # tasks tasks’ duratios tasks’ resoure deads
& &
= A job completion time heuristic
14
CHALLENGE
Job Completion Time Heuristic
Combine A and P scores !
Packing Efficiency Completion Time
1: among J runnable jobs 2: score (j) = A(t, R)+ P(j) 3: max task t in j, demandt ≤ R (resources free) 4: pick j*, t* = argmax score(j)
A: delays job completion time P: loss in packing efficiency
15
16
Possible to satisfy all three In fact, happens often in practice
Fairness Heuristic
Performance and fairness do not mix well in general But …. We a get perfet fairess ad uh etter perforae
17
Pick the best-for-perf. task from among
Fairness Heuristic
Fairness is not a tight constraint
Heuristic
F = 0 F → 1 Most unfair Most efficient scheduling Close to perfect fairness
18
We saw: Other things in the paper:
Job Manager1 Node Manager1 Cluster-wide Resource Manager Multi-resource asks; barrier hint Track resource usage; enforce allocations New logic to match tasks to machines (+packing, +SRTF, +fairness) Allocations Asks Offers Resource availability reports
Yarn architecture
Changes to add Tetris(shown in orange)
19
20
Makespan Multi-resource Scheduler
50 100 150 200 5000 10000 15000
Utilizat ilization ion (%) Tim ime (s)
CP CPU Me Mem In In St
Tetris Gains from
50 100 150 200 4500 9000 13500 18000 22500
Utilizat ilization ion (%) Tim ime (s)
CPU Mem Mem In In St
Tetris vs. Single Resource Scheduler
Over-allocation Low value → high fragmentation
Single Resource Scheduler
21
Fairness Knob
No Fairness F = 0 Makespan
Job Compl. Time
[over impacted jobs]
Full Fairness F → 1 F = 0.25
Pack efficiently along multiple resources Prefer jobs with less reaiig work Incorporate Fairness
lower average job completion time
cluster performance
show encouraging initial results
http://research.microsoft.com/en-us/UM/redmond/projects/tetris/
22