Heterogeneous Resource Scheduling Using Apache Mesos for Cloud - - PowerPoint PPT Presentation

heterogeneous resource scheduling using apache mesos for
SMART_READER_LITE
LIVE PREVIEW

Heterogeneous Resource Scheduling Using Apache Mesos for Cloud - - PowerPoint PPT Presentation

Heterogeneous Resource Scheduling Using Apache Mesos for Cloud Native Frameworks Sharma Podila Senior Software Engineer Netflix Aug 20th MesosCon 2015 Agenda Context, motivation Fenzo scheduler library Usage at Netflix


slide-1
SLIDE 1

Heterogeneous Resource Scheduling Using Apache Mesos for Cloud Native Frameworks

Sharma Podila

Senior Software Engineer Netflix Aug 20th MesosCon 2015

slide-2
SLIDE 2

Agenda

  • Context, motivation
  • Fenzo scheduler library
  • Usage at Netflix
  • Future direction
slide-3
SLIDE 3

Why use Apache Mesos in a cloud?

Resource granularity Application start latency

slide-4
SLIDE 4

A tale of two frameworks

Reactive stream processing Container deployment and management

slide-5
SLIDE 5

Reactive stream processing, Mantis

  • Cloud native
  • Lightweight, dynamic jobs

○ Stateful, multi-stage ○ Real time, anomaly detection, etc.

  • Task placement constraints

○ Cloud constructs ○ Resource utilization

  • Service and batch style
slide-6
SLIDE 6

Mantis job topology

Worker

Source App Stage 1 Stage 2 Stage 3 Sink

A job is set of 1 or more stages A stage is a set of 1 or more workers A worker is a Mesos task

Job

slide-7
SLIDE 7

Container management, Titan

  • Cloud native
  • Service and batch workloads
  • Jobs with multiple sets of container tasks
  • Container placement constraints

○ Cloud constructs ○ Resource affinity ○ Task locality

slide-8
SLIDE 8

Job 2

Set 0 Set 1

Container scheduling model

Job 1

Co-locate tasks from multiple task sets

slide-9
SLIDE 9

Why develop a new framework?

slide-10
SLIDE 10

Easy to write a new framework?

slide-11
SLIDE 11

Easy to write a new framework?

What about scale? Performance? Fault tolerance? Availability?

slide-12
SLIDE 12

Easy to write a new framework?

What about scale? Performance? Fault tolerance? Availability? And scheduling is a hard problem to solve

slide-13
SLIDE 13

Long term justification is needed to create a new Mesos framework

slide-14
SLIDE 14

Our motivations for new framework

  • Cloud native

(autoscaling)

slide-15
SLIDE 15

Our motivations for new framework

  • Cloud native

(autoscaling)

  • Customizable task placement optimizations

(Mix of service, batch, and stream topologies)

slide-16
SLIDE 16

Cluster autoscaling challenge

Host 1 Host 2 Host 3 Host 4 Host 1 Host 2 Host 3 Host 4

vs.

For long running stateful services

slide-17
SLIDE 17

Cluster autoscaling challenge

Host 1 Host 2 Host 3 Host 4 Host 1 Host 2 Host 3 Host 4

vs.

For long running stateful services

slide-18
SLIDE 18

Components of a mesos framework

API for users to interact

slide-19
SLIDE 19

Components of a mesos framework

API for users to interact Be connected to Mesos via the driver

slide-20
SLIDE 20

Components of a mesos framework

API for users to interact Be connected to Mesos via the driver Compute resource assignments for tasks

slide-21
SLIDE 21

Components of a mesos framework

API for users to interact Be connected to Mesos via the driver Compute resource assignments for tasks

slide-22
SLIDE 22

Fenzo

A common scheduling library for Mesos frameworks

slide-23
SLIDE 23

Fenzo usage in frameworks

Mesos master Mesos framework Tasks requests Available resource

  • ffers

Fenzo task scheduler

Task assignment result

  • Host1
  • Task1
  • Task2
  • Host2
  • Task3
  • Task4

Persistence

slide-24
SLIDE 24

Fenzo scheduling library

Heterogeneous resources Autoscaling

  • f cluster

Visibility of scheduler actions Plugins for Constraints, Fitness High speed Heterogeneous task requests

slide-25
SLIDE 25

Announcing availability of Fenzo in Netflix OSS suite

slide-26
SLIDE 26

Fenzo details

slide-27
SLIDE 27

Scheduling problem

Fitness Pending Assigned Urgency N tasks to assign from M possible slaves

slide-28
SLIDE 28

Scheduling optimizations

Speed Accuracy First fit assignment Optimal assignment Real world trade-offs ~ O (1) ~ O (N * M)1

1 Assuming tasks are not reassigned

slide-29
SLIDE 29

Scheduling strategy

For each task On each host Validate hard constraints Eval fitness and soft constraints Until fitness good enough, and A minimum #hosts evaluated

slide-30
SLIDE 30

Task constraints

Soft Hard

slide-31
SLIDE 31

Task constraints

Soft Hard Extensible

slide-32
SLIDE 32

Built-in Constraints

slide-33
SLIDE 33

Host attribute value constraint

Task HostAttrConstraint:instanceType=r3 Host1 Attr:instanceType=m3 Host2 Attr:instanceType=r3 Host3 Attr:instanceType=c3

Fenzo

slide-34
SLIDE 34

Unique host attribute constraint

Task UniqueAttr:zone Host1 Attr:zone=1a Host2 Attr:zone=1a Host3 Attr:zone=1b

Fenzo

slide-35
SLIDE 35

Balance host attribute constraint

Host1 Attr:zone=1a Host2 Attr:zone=1b Host3 Attr:zone=1c

Job with 9 tasks, BalanceAttr:zone

Fenzo

slide-36
SLIDE 36

Fitness evaluation

Degree of fitness Composable

slide-37
SLIDE 37

Bin packing fitness calculator

Fitness for Host1 Host2 Host3 Host4 Host5

fitness = usedCPUs / totalCPUs

slide-38
SLIDE 38

Bin packing fitness calculator

Fitness for 0.25 0.5 0.75 1.0 0.0

fitness = usedCPUs / totalCPUs

Host1 Host2 Host3 Host4 Host5

slide-39
SLIDE 39

Bin packing fitness calculator

Fitness for 0.25 0.5 0.75 1.0 0.0

Host1 Host2 Host3 Host4 Host5

fitness = usedCPUs / totalCPUs

Host1 Host2 Host3 Host4 Host5

slide-40
SLIDE 40

Composable fitness calculators

Fitness = ( BinPackFitness * BinPackWeight + RuntimePackFitness * RuntimeWeight ) / 2.0

slide-41
SLIDE 41

Cluster autoscaling in Fenzo

ASG/Cluster: mantisagent MinIdle: 8 MaxIdle: 20 CooldownSecs: 360 ASG/Cluster: mantisagent MinIdle: 8 MaxIdle: 20 CooldownSecs: 360

ASG/cluster: computeCluster MinIdle: 8 MaxIdle: 20 CooldownSecs: 360

Fenzo

ScaleUp action: Cluster, N ScaleDown action: Cluster, HostList

slide-42
SLIDE 42

Rules based cluster autoscaling

  • Set up rules per host attribute value

○ E.g., one autoscale rule per ASG/cluster, one cluster for network- intensive jobs, another for CPU/memory-intensive jobs

  • Sample:

#Idle hosts Trigger down scale Trigger up scale

min max

Cluster Name Min Idle Count Max Idle Count Cooldown Secs NetworkClstr 5 15 360 ComputeClstr 10 20 300

slide-43
SLIDE 43

Shortfall analysis based autoscaling

  • Rule-based scale up has a cool down period

○ What if there’s a surge of incoming requests?

  • Pending requests trigger shortfall analysis

○ Scale up happens regardless of cool down period ○ Remembers which tasks have already been covered

slide-44
SLIDE 44

Usage at Netflix

slide-45
SLIDE 45

Cluster autoscaling

Number of Mesos Slaves Time

slide-46
SLIDE 46

Scheduler run time (milliseconds)

Scheduler run time in milliseconds

(over a week)

Average 2 mS Maximum 38 mS

Note: times can vary depending on # of tasks, # and types of constraints, and # of hosts

slide-47
SLIDE 47

Experimenting with Fenzo

Note: Experiments can be run without requiring a physical cluster

slide-48
SLIDE 48

A bin packing experiment

Host 3 Host 2 Host 1

Mesos Slaves

Host 3000 Tasks with cpu=1 Tasks with cpu=3 Tasks with cpu=6

Fenzo

Iteratively assign Bunch of tasks Start with idle cluster

slide-49
SLIDE 49

Bin packing sample results

Bin pack tasks using Fenzo’s built-in CPU bin packer

slide-50
SLIDE 50

Bin packing sample results

Bin pack tasks using Fenzo’s built-in CPU bin packer

slide-51
SLIDE 51

Task runtime bin packing sample

Bin pack tasks based on custom fitness calculator to pack short vs. long run time jobs separately

slide-52
SLIDE 52

Scheduler speed experiment

# of hosts # of tasks to assign each time Avg time Avg time per task Min time Max time Total time Hosts: 8-CPUs each Task mix: 20% running 1-CPU jobs, 40% running 4-CPU, and 40% running 6-CPU jobs Goal: starting from an empty cluster, assign tasks to fill all hosts Scheduling strategy: CPU bin packing

slide-53
SLIDE 53

Scheduler speed experiment

# of hosts # of tasks to assign each time Avg time Avg time per task Min time Max time Total time 1,000 1 3 mS 3 mS 1 mS 188 mS 9 s 1,000 200 40 mS 0.2 mS 17 mS 100 mS 0.5 s Hosts: 8-CPUs each Task mix: 20% running 1-CPU jobs, 40% running 4-CPU, and 40% running 6-CPU jobs Goal: starting from an empty cluster, assign tasks to fill all hosts Scheduling strategy: CPU bin packing

slide-54
SLIDE 54

Scheduler speed experiment

Hosts: 8-CPUs each Task mix: 20% running 1-CPU jobs, 40% running 4-CPU, and 40% running 6-CPU jobs Goal: starting from an empty cluster, assign tasks to fill all hosts Scheduling strategy: CPU bin packing # of hosts # of tasks to assign each time Avg time Avg time per task Min time Max time Total time 1,000 1 3 mS 3 mS 1 mS 188 mS 9 s 1,000 200 40 mS 0.2 mS 17 mS 100 mS 0.5 s 10,000 1 29 mS 29 mS 10 mS 240 mS 870 s 10,000 200 132 mS 0.66 mS 22 mS 434 mS 19 s

slide-55
SLIDE 55

Accessing Fenzo

Code at https://github.com/Netflix/Fenzo Wiki at https://github.com/Netflix/Fenzo/wiki

slide-56
SLIDE 56

Future directions

  • Task management SLAs
  • Support for newer Mesos features
  • Collaboration
slide-57
SLIDE 57

To summarize...

slide-58
SLIDE 58

Fenzo: scheduling library for frameworks

Heterogeneous resources Autoscaling

  • f cluster

Visibility of scheduler actions Plugins for Constraints, Fitness High speed Heterogeneous task requests

slide-59
SLIDE 59

Fenzo is now available in Netflix OSS suite at

https://github.com/Netflix/Fenzo

slide-60
SLIDE 60

Questions? Heterogeneous Resource Scheduling Using Apache Mesos for Cloud Native Frameworks

Sharma Podila

spodila @ netflix . com @podila