Phoenix: A Constraint-aware Scheduler for Heterogeneous Datacenters - - PowerPoint PPT Presentation

phoenix a constraint aware scheduler for heterogeneous
SMART_READER_LITE
LIVE PREVIEW

Phoenix: A Constraint-aware Scheduler for Heterogeneous Datacenters - - PowerPoint PPT Presentation

Phoenix: A Constraint-aware Scheduler for Heterogeneous Datacenters Prashanth Thinakaran , Jashwant Gunasekaran, Bikash Sharma, Mahmut Kandemir, Chita Das June 6th, ICDCS 2017 Executive Summary Problem: Heterogeneity agnostic datacenter


slide-1
SLIDE 1

Phoenix: A Constraint-aware Scheduler for Heterogeneous Datacenters

Prashanth Thinakaran, Jashwant Gunasekaran, Bikash Sharma, Mahmut Kandemir, Chita Das

June 6th, ICDCS 2017

slide-2
SLIDE 2

Executive Summary

2

  • Problem: Heterogeneity agnostic datacenter schedulers leads to poor placement

choices of jobs

  • Schedulers Ignoring hardware and application level heterogeneity
  • Constraints are used as a medium
  • To express task level heterogeneity (Eg., Latency sensitive, Batch)
  • To expose hardware level heterogeneity (Eg., ISA, Clock speed, Accelerators)
  • To ensure task performance guarantees to ensure QoS
  • Phoenix is a constraint-aware scheduler that is:
  • Heterogeneity-aware and hybrid hence scalable
  • Uses real-time CRV metric for task reordering at peak congestions optimizing for tail latencies
  • Improves the 99th percentile (tail) latency by 1.9x across production cluster traces
slide-3
SLIDE 3

Outline

3

  • Scheduler Design Paradigm
  • Motivation
  • Modeling and synthesizing task constraints
  • Phoenix architecture
  • Results
slide-4
SLIDE 4

Scheduler Design Paradigm

4

Early Late Centralized Distributed

Hybrid Schedulers

Task binding to Queue

Constraint aware Constraint unaware

Mercury Sparrow

Hawk Eagle

Phoenix

Mesos Borg

Choosy

10M 100B 10B 1B 100M

Number of jobs executed per day

Yacc-D

Control Plane

slide-5
SLIDE 5

Outline

5

  • Scheduler Design Paradigm
  • Motivation
  • Modeling and synthesizing task constraints
  • Phoenix architecture
  • Results
slide-6
SLIDE 6

Constraint share in Google traces

6

Minimum Disks 1% Maximum Disks 8% Number of cores 17% ISA (x86,ARM) 74%

ISA (x86,ARM) Number of Nodes Ethernet Speed Number of cores Maximum Disks Kernel Version Platform Family CPU Clock speed Minimum Disks

slide-7
SLIDE 7

Task placement constraints

7

  • Constraint-based Job Requests in Cloud Schedulers
  • More than 50% of all tasks subscribe to task constraints
  • Eg., A job may request two server nodes belonging to x86

with at least 1 Gbps of network speed between them

  • Constraint subscription surges
  • Impact other unconstrained tasks
  • Root cause for tail-latencies
slide-8
SLIDE 8

Outline

8

  • Scheduler Design Paradigm
  • Motivation
  • Modeling and synthesizing task constraints
  • Phoenix architecture
  • Results
slide-9
SLIDE 9

Synthesizing Task Constraints

9

  • Publicly available Google’s cluster workload traces [1]
  • Hashed constraint values were correlated with constraint

frequency vector proposed in [2]

  • Yahoo, Cloudera synthetically generated constraints
  • Benchmarking model proposed in [1] to characterize and

generate constraints for tasks

  • Cross-validated & accuracy is close to 87%

[1] C. Reiss, J. Wilkes, and J. L. Hellerstein, “Google cluster-usage traces: format+ schema,” Google Inc., White Paper 2011. [2] B. Sharma, V. Chudnovsky, J. L. Hellerstein, R. Rifaat, and C. R. Das, “Modeling and synthesizing task placement constraints in google compute clusters,” in Proceedings of the 2nd ACM Symposium on Cloud Computing.

slide-10
SLIDE 10

Constraint distribution

10

  • 50% of tasks are

constrained

  • 33% of jobs demand two

constraints but only 12% of it could be satisfied

  • As incoming jobs demand

more constraints it become difficult to satisfy all of them.

slide-11
SLIDE 11

Job Queuing Delays

11

  • High tail latency for resource constrained tasks
  • Average 2 to 2.5x at tail incase of Eagle and Yacc-d
  • High volume of scheduling requests demands distributed

scheduling

Yahoo Cloudera

slide-12
SLIDE 12

Job response times vs Cluster Load

12

  • 99th percentile job response times shooting up for all

traces

  • More the system utilization more the response time

degradation

Yahoo Cloudera Google Response times normalized to constrained jobs for Eagle-C

Need for a scalable scheduler that could handle tasks with multiple constraints

slide-13
SLIDE 13

Outline

13

  • Scheduler Design Paradigm
  • Motivation
  • Modeling and synthesizing task constraints
  • Phoenix architecture
  • Results
slide-14
SLIDE 14

Phoenix architectural overview

14

Distributed Scheduler 1 Worker Queue 1 CRV Monitor Worker Queue 2 Worker Queue 3 Worker Queue n-1 Worker Queue n more worker queues ...... Distributed Scheduler 2 Distributed Scheduler 3 Heartbeat Interval Centralized Scheduler

Constraint Resource Vector Lookup Table

Distributed Scheduler n ......

slide-15
SLIDE 15

15

Worker 1 Worker 2 Worker 3 Worker 4 CRV Monitor utilization < threshold SRPT reordering DS DS DS DS

SRPT fails at tail of constrained jobs at higher utilization

slide-16
SLIDE 16

16 16

Worker 1 Worker 2 Worker 3 Worker 4 CRV Monitor utilization > threshold CRV reordering DS DS DS DS

slide-17
SLIDE 17

Architecture contd..

17

  • CRV monitor keeps track of Constraint Resource

Vector (CRV)

  • Demand and supply ratio of every constraint at every

machine is updated for every heartbeat interval

  • P-K based queue waiting time estimators for

admission control

  • When CRV increase beyond a set threshold CRV

based reordering is initiated

slide-18
SLIDE 18

Outline

18

  • Scheduler Design Paradigm
  • Motivation
  • Modeling and synthesizing task constraints
  • Phoenix architecture
  • Results
slide-19
SLIDE 19

Phoenix compared to Eagle

19

*Lower the better Google Yahoo

slide-20
SLIDE 20

Phoenix compared to Hawk/Sparrow

20

Google trace- Hawk Cloudera Google trace- Sparrow *Lower the better

slide-21
SLIDE 21

Response times for long jobs

21

Yahoo Cloudera Google

slide-22
SLIDE 22

Summary

22

  • Phoenix is a hybrid constraint-aware scheduler
  • Dynamically adapts it self at high resource demands

using CRV metric based reordering

  • Improves tail-latency by an average of 1.9x for heavily

resource constrained tasks

  • Not affecting long job response times and fairness of
  • ther unconstrained tasks
slide-23
SLIDE 23

prashanth@cse.psu.edu http://www.cse.psu.edu/hpcl/index.html

slide-24
SLIDE 24

CRV statistics

24

  • Number of task reordering is reliant on inter arrival

patterns of jobs

  • Average utilization of the cluster was 80%
slide-25
SLIDE 25

Constraint Modeling

25

  • Types of constraints
  • Hard constraints -> Eg., Minimum memory, No. of cores
  • Soft constraints -> Clock speed, Network bandwidth
  • Affinity constraints -> HDFS Data locality, MPI tasks
  • Constraint support in existing schedulers
  • Mesos - Locality preferences of tasks
  • Kubernetes - It is on their roadmap to support soft & hard
  • Affinity constraints impact scheduling delays 2x to 4x
slide-26
SLIDE 26

Eagle Yahoo Eagle CC

slide-27
SLIDE 27

Scheduling Optimization Metrics

27

  • Job response Times
  • Hybrid schedulers uses SRPT for job turnaround times
  • Comes at the cost of fairness of the other unconstrained jobs
  • Admission Control
  • Negotiating for Jobs with multiple resource constraints
  • Hard to soft relaxation of constraints
  • Late Binding
  • Avoiding early commit and reducing queue waiting times especially for short jobs.
  • Load Balancing
  • Job stealing techniques improves the overall resource utilization but not always the case.
  • Task migration overheads and constraint preferences violation should be taken in to account
slide-28
SLIDE 28

Queuing delays of Google jobs using SRPT

28

  • Sporadic peaks and valleys of job submission pattern
  • At peaks, heavy tail latency leads to QoS violations of short jobs
  • Queuing delays cascade in to other unconstrained tasks
  • Naive SRPT based queue management fails to deliver
  • Constrained tasks scheduled by Hawk and Yacc-d also experience 2

to 2.5x queueing delays (repetition of information)

slide-29
SLIDE 29

Evaluation Methodology

29

  • Trace-driven simulator built on top of Eagle and

Sparrow

  • Three production datacenter traces were used for

evaluation

  • Yahoo, Cloudera & Google
  • Cluster inter arrival rate is bursty and unpredictable

with peak to median ratio from 9:1 to 260:1

slide-30
SLIDE 30

Impact on unconstrained jobs

30